Data Visualization Using Python Using Various Modules

Are you looking for a full article about data visualization using Python and its various modules? Stay with the article and we will explore various ways to visualize data using Python.

Data visualization is the process of using visual representations of our data to identify trends and relationships. We can use a variety of Python data visualization libraries, like Matplotlib, Seaborn, Plotly, etc., to do data visualization. In this article, we will discuss different techniques to visualize data using various Python modules. Moreover, we will also explain what those visualizations show about the dataset.

You may also be interested in heatmaps, hexagons on a google map, and 3D plots using Python.

Data Visualization in Python

There is a lot of data being produced every day in the modern world. And occasionally, if the data is in its raw format, it may be challenging to examine it for specific trends or patterns. Hence, data visualization is used to address this issue. Data visualization makes things simpler to comprehend, observe, and analyze the data by providing a good, organized pictorial depiction of it. Python being the most popular language for Machine learning and Data science had many modules to visualize data in different ways and they are super easy to learn.

Now, here is a question that most people wonder why to use Python to visualize data? We will go through some of the important points that make Python one of the best languages for data visualization.

  • Python is an open-source language and has a large number of developers who keep it up to date.
  • Python is easy to learn as compared to many other programming languages.
  • Python has large database connectivity. The ability to connect to almost any file and/or database system is a huge benefit of Python.
  • The next most important point about Python is its scalability. Python is excellent at scaling. Python ought to be able to handle anything you throw at it if your firm uses a lot of data.
  • One of the main reasons why you should choose Python to visualize your data is because it has a large number of libraries to visualize data and some of which we will discuss here.

Top 10 Popular Python Visualization Libraries

As we already discussed, Python has a large number of visualization libraries. Here we will discuss the top 10 popular python libraries for visualization.

  1. Matplotlib: Matplotlib is a Python plotting library that allows you to construct static, dynamic, and interactive visualizations. NumPy is its computational mathematics extension. Despite the fact that it is over a decade old, it is still the most popular plotting library in the Python world.
  2. Seaborn: A Python module called Seaborn is used to make statistical visuals. It features sophisticated tools for producing statistical visualizations that are both aesthetically pleasing and instructive. Data scientists primarily utilize Seaborn for publishing and practical demonstrations.
  3. Ggplot: Python ggplot is a plotting library that is based on the ggplot2 library for R programming. The letter gg stands for Grammar of Graphics in ggplot, and creating graphs with it is related to writing sentences with proper grammar.
  4. Bokeh: Bokeh is a Python library for creating interactive visualizations for modern web browsers. It helps you build beautiful graphics, ranging from simple plots to complex dashboards with streaming datasets.
  5. Pygal: Pygal is a Python module that creates SVG (Scalable Vector Graphics) graphs/charts in a variety of styles. Pygal is highly customizable, yet also extremely simplistic, which is a very rare combination.
  6. Plotly: Plotly is a visualization library that is used to create data visualizations for free.
  7. Geoplotlib: Geoplotlib is a powerful API that can be used for various types of map representations, such as Voronoi tesselation, Delaunay triangulation, markers, and so on.
  8. Folium: Folium is a powerful Python library that helps you create several types of Leaflet maps. By default, Folium creates a map in a separate HTML file.
  9. Gleam: Gleam programs are made up of bundles of functions and types called modules. Each module has its own namespace and can export types and values to be used by other modules in the program.
  10. Altair: Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite.

Basic Data Visualization Using Python

Now, we will go to the practical part the visualize the data set using various plots in Python. For simplicity, we take help from only two Python visualization libraries.

  • matplotlib
  • seaborn
  • pandas

So, before proceeding with this article, make sure that you have already installed these modules on your system. You can use the pip command to install these modules on your system. Also, you can get access to the source code from my GitHub account.

For demonstration purposes, we will be using one of the most popular data sets in Machine Learning, the iris dataset. Let us first load the dataset from sklearn module.

# importing required modules
import pandas as pd
from sklearn import datasets
import numpy as np

# importing the dataset
iris = datasets.load_iris()

dataset = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])

# heading of the dataset
dataset.head()

Output:

data-visualization-using-python-importing-dataset

Now, we will use various visualization plots to plot the above dataset.

Scattered Plot Using Python

Plotting scattered plots in matplotlib is really easy. We will use the scatter() function and provide the data. As, in this section, we will focus on only 2D plots, if you want to visualize your dataset using 3D visualization, you can go to 3D visualizations using Python. So, let us first visualize the sepal length and sepal width using a scattered plot.

# importing the module
import matplotlib.pyplot as plt

# dataset for axes
x_axis = dataset['sepal length (cm)']
y_axis = dataset['sepal width (cm)']

# fixing the size of plot
plt.Figure(figsize=(10, 8))

# plotting simple scattered plot
plt.scatter(x_axis, y_axis)
plt.show()

Output:

data-visualization-using-python-scatter-plot-matplot

As you can see, we have visualized the data about sepals in a scattered plot. But we still don’t know which data point represents which flower. So, in order to get more information from the scatter plot, we will use the seaborn library to plot each flower’s data point in different colors.

# importing the module
import seaborn as sns

# plotting the scatter plot in seaborn
sns.scatterplot(x=x_axis, y=y_axis, hue=dataset.target, s=70)

Output:

data-visualization-in-python-seaborn-scatter-plot

As you can see, this time the scatter plot shows more information and plots the sepal sizes of each of the flowers in different colors.

Another way to visualize the same information is by using the subplot in matplotlib module and visualize the same information.

# setting the color of the plot
colors = {0.0:'b', 1.0:'g', 2.0:'r'}

# creating the subplots
fig, ax = plt.subplots()

# plot each data-point
for i in range(len(dataset['sepal length (cm)'])):
    ax.scatter(dataset['sepal length (cm)'][i], dataset['sepal width (cm)'][i],color=colors[dataset['target'][i]])

Output:

data-visualization-using-python-scatter-plot-in-matplotlib

This plot represents the same information as we did before.

Line Charts Using Python

A line chart is simply a plot that shows the data in point in the form of a line. Usually, it is very useful for time-series datasets. We will first, plot the simple line charts of the dataset using matplotlib .

# dropping the target column
columns = dataset.columns.drop(['target'])


# create x data
x_data = range(0, dataset.shape[0])


# create figure and axis
fig, ax = plt.subplots()


# plot each column
for column in columns:
    ax.plot(x_data, dataset[column])

Output:

data-visualization-using-python-line-chart-matplotlib

To make the chart looks more interactive, we can add markers as well which represent the actual data.

# dropping the target column
columns = dataset.columns.drop(['target'])


# create x data
x_data = range(0, dataset.shape[0])


# create figure and axis
fig, ax = plt.subplots()


# Data visualization using Python 
for column in columns:
    ax.plot(x_data, dataset[column], marker='o')

Output:

data-visualization-using-python-line-chart-with-markers

The dots in the above plot shows the actual data points.

Histogram Plots Using Python

A Histogram is a plot consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval. We plot histograms using various Python modules but here we will use the pandas module to plot histograms of our dataset. Let us first plot the histogram of the length of sepal sizes.

# importing pandas module
import pandas as pd

# Data visualization using Python 
dataset['sepal length (cm)'].plot.hist()

Output:

data-visualization-using-pandas-histogram

We can also plot multiple histogram plots using the pandas module.

# plotting multiple histogram
dataset.drop('target', axis=1).plot.hist(subplots=True,
                                         layout=(2,2), 
                                         figsize=(10, 10),
                                         bins=20)

Output:

data-visualization-using-python-multiple-histograms

Let us also plot the histogram of the dataset using the seaborn module as well. We will also plot the distribution line of the histogram plots using seaborn.

# plotting histogram in seaborn  / Data visualization using Python 
sns.distplot(dataset['sepal width (cm)'], 
             bins=10, 
             kde=True)

Output:

data-visualization-using-python-seaborn-histogram

This shows that the sepal’s width is nearly normally distributed.

Box Plots Using Python

A box plot is a simple way of representing statistical data on a plot in which a rectangle is drawn to represent the second and third quartiles, usually with a vertical line inside to indicate the median value. The lower and upper quartiles are shown as horizontal lines on either side of the rectangle.

Let us first plot the box plot using the Seaborn module.

# plotting the boxplots
sns.boxplot('target', 'sepal width (cm)', data=dataset)

Output:

data-visualization-using-python-box-plots

The dotted points outside the box represent the outliers. We can also plot the box plot for each of the independent variables by passing the whole dataset.

# box plot using seaborn
sns.boxplot( data=dataset.drop('target', axis=1))

Output:

data-visualization-using-python-box-plot-of-data

As you can see, we have successfully plotted the box plots for each of the input attributes. In a similar way, plotting a box plot is also very easy using matplotlib module.

# fixing the size
fig = plt.figure(figsize =(8, 6))
 
# Creating plot
plt.boxplot(dataset.drop('target', axis=1))
 
# Data visualization using Python 
plt.show()

Output:

data-visualization-using-python-box-plot-using-matplotlib

The orange line in the box plots shows the mean of the dataset.

Faceting Using Python

Faceting is the act of breaking data variables up across multiple subplots and combining those subplots into a single figure. It is very useful when we want to explore our dataset.

For example, we will plot the length of the sepals of each of the flowers separately using the Seaborn module.

# plotig for the target variables
target_class = sns.FacetGrid(dataset, col='target')

# Data visualization using Python 
g = target_class.map(sns.kdeplot, 'sepal length (cm)')

Output:

data-visualization-using-python-faceting-plot

As you can see we have plotted the density of each of the flowers separately according to the length of the sepals.

Pair Plots Using Python

A pair plot pairwise relationships in a dataset. The pair plot function creates a grid of Axes such that each variable in data will be shared on the y-axis across a single row and on the x-axis across a single column.

Let us now use the seaborn module to plot pair plots of the iris dataset.

# pair plots  / Data visualization using Python 
sns.pairplot(dataset.drop('target', axis=1))

Output:

data-visualization-usig-python-pairplots

As you can see, the above plot shows the relationship of each of the variables with another one.

Summary

Data visualization using Python allows business users to gain insight into their vast amounts of data. It benefits them to recognize new patterns and errors in the data. Making sense of these patterns helps the users pay attention to areas that indicate red flags or progress. In this article, we discussed the basic data visualization process using Python programming language.

26 thoughts on “Data Visualization Using Python Using Various Modules”

  1. Pingback: Data visualization using pandas-Basic plots using pandas

  2. Pingback: Neural Network for regression using TensorFlow -

  3. Pingback: Principal component analysis (PCA) using Python -

  4. Pingback: Ada boost and hyperparameter tuning using Python

  5. Pingback: XGBoost Algorithm Using Python - Hyperparameter tuning -

  6. Pingback: LightGBM using Python - Hyperparameter tuning of LightGBM - TechFor-Today

  7. Pingback: How to detect and handle outliers in Machine Learning? - TechFor-Today

  8. Pingback: Top 12 ways of stock price predictions using machine learning

  9. Pingback: How to use isolation forest to detect outliers in machine learning - TechFor-Today

  10. Pingback: How to do Hyperparameter tuning of Gradient boosting algorithm using Python? - TechFor-Today

  11. Pingback: Two simple ways to analyze stock market using Machine Learning - TechFor-Today

  12. Pingback: ARIMA model for non stationary time series in a simple way - TechFor-Today

  13. Pingback: Extra trees classifier and regressor using Python - TechFor-Today

  14. Pingback: Hyperparameter tuning of Linear regression algorithm in machine learning - TechFor-Today

  15. Pingback: How to plot interactive graphs in Python? - TechFor-Today

  16. Pingback: What is the random state in sklearn? - TechFor-Today

  17. Pingback: Create hexagon in OpenSCAD - Fully explained - TechFor-Today

  18. Pingback: Create Bar Chart layout in Plotly dashboard - TechFor-Today

  19. Pingback: How to create a Rainbow in Python? - TechFor-Today

  20. Pingback: [Solved] modulenotfounderror: no module named 'matplotlib' - TechFor-Today

  21. Pingback: [Solved] Module matplotlib.cbook has no attribute iterable - TechFor-Today

  22. Pingback: K-means clustering in Python | Visualize and implement - TechFor-Today

  23. Pingback: RANSAC method to handle outliers in regression models - TechFor-Today

  24. Pingback: [Solved] Attributeerror: module matplotlib has no attribute subplots - TechFor-Today

  25. Pingback: Plot Interactive Graphs in Python Using Simp Plotly - Techfor-Today

  26. Pingback: Create Bar Chart in Plotly Dashboard - Techfor-Today

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top