Advanced Data Visualization in Python - Building Data visualization dashboards using Python libraries | EduGrad

Introduction to Data Visualization tools-

Data Visualization techniques is one of the key components of any analytics project. An end to end analytics use case involves ideation, requirement gathering, getting the raw data, analyzing the data, building a predictive model, deploying the model, and communicating the end result to the business.

Throughout this entire process, the analysis of data, and the communication of results to the business requires visualizing the raw data and understanding several inter-linked relations among the features. Python is the most preferred language which has several libraries and packages such as Pandas, NumPy, Matplotlib, Seaborn, and so on used to visualize the data.

We have another detailed tutorial, covering the Data Visualization libraries in Python.

Data Visualization tools and start creating your own Dashboards | EduGrad

Below are some of the data visualization examples using python on real data.

 

Data Visualization Projects in Python

Example 1 : –

Data visualization dataset:- Iris Dataset

#Importing the necessary libraries

import pandas as pd
Import numpy as np
Import matplotlib.pyplot as plt
Import seaborn as sns
sns.set(style=”white”, color_codes=True)
%matplotlib inline

After all the libraries are imported, we load the data using the read_csv command of pandas and store it into a dataframe.

df = pd.read_csv(./iris.csv)

To understand the structure of the data, the .head() function is used in pandas.

df.head()

Data Visualization in Python - Reading Data from CSV | EduGrad

The pandas library has a .plot() feature which is mostly used for any quick visual analysis. The scatter plot of all the Iris features is displayed below.

df.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")

Data Visualization in Python - Scatter plot of iris dataset | EduGrad

Seaborn could be used to generate similar plots. Univariate histograms, and bivariate scatter plots is shown using the jointplot of seaborn.

sns.jointplot(x="SepalLengthCm", y="SepalWidthCm", data=df, size=5)

Data Visualization in Python - Univariate histogram and bivariate scatter plot using jointplot in seaborn | EduGrad

Finding which species, the plant belongs to. FacetGrid in seaborn is used for the same. It gives the scatter plot color by species.

sns.FacetGrid(df, hue="Species", size=5) \
   .map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
   .add_legend()

Data Visualization in Python - scatterplot color of species using FacetGrid in seaborn | EduGrad

A boxplot in Seaborn gives individual feature details.

sns.boxplot(x="Species", y="PetalLengthCm", data=df)

Building interactive dasboards in Python - Creating a box plot in Seaborn | EduGrad

A layer of individual points is added to this plot using the Strip plot in Seaborn. To avoid all pints falling in a single vertical line the jitter = True value is used.

ax = sns.boxplot(x="Species", y="PetalLengthCm", data=df)
ax = sns.stripplot(x="Species", y="PetalLengthCm", data=df, jitter=True, 
edgecolor="gray")

Advanced Data Visualization tools in Python - Strip plot in Seaborn | EduGrad

The benefits of the previous two plots combined using the violin plot.

sns.violinplot(x="Species", y="PetalLengthCm", data=df, size=6)

Data visualization examples in Python - Violin plots in Seaborn | EduGrad

Kernel Density Estimation, Kde plot is used to look into univariate relations by plotting the kernel density estimate of the features.

sns.FacetGrid(df, hue="Species", size=6) \
   .map(sns.kdeplot, "PetalLengthCm") \
   .add_legend()

Data visualization techniques in Python - Creating Kernel Density Estimation plot | EduGrad

To show the bivariate relation between each feature, the pair plot is used in Seaborn. In the below plots, the Iris-setosa species is separated from the other two species.

sns.pairplot(df.drop("Id", axis=1), hue="Species", size=3)

Data Visualization dashboards in Python - Pair plot in Seaborn | EduGrad

To show the diagonal elements in a pair plot in form of a histogram.

sns.pairplot(df.drop("Id", axis=1), hue="Species", size=3, diag_kind="kde")

Creating a pair plot using kernel density estimation | EduGrad

So far, we have covered some of the visualizations using Seaborn, now let’s explore some with Pandas library as well. Below is a boxplot using Pandas.

df.drop("Id", axis=1).boxplot(by="Species", figsize=(12, 6))

Data Visualization in Python - Creating a boxplot in Pandas | EduGrad

The next plot is of Andrews Curves which uses sample attributes as coefficient for Fourier series.

from pandas.plotting import andrews_curves
andrews_curves(df.drop("Id", axis=1), "Species")

 

Data Visualization techniques in Python - Creating Andrews curve in Matplotlib | Data visualization tutorial

Parallel co-ordinates are another multivariate data visualization technique in pandas where each feature is plotted on a separate column and then lines are drawn which connects each data sample feature.

from pandas.plotting import parallel_coordinates
parallel_coordinates(df.drop("Id", axis=1), "Species")

Multivariate visualization technique in pandas - Parallel coordinates | Data Visualization tutorial

Radviz is another data visualization technique in pandas used for multivariate plotting. Here, on a 2D plane each feature is put, and then simulates having each sample attached to those points through a spring weighted by the value of the feature.

from pandas.plotting import radviz
radviz(df.drop("Id", axis=1), "Species")

Data visualization technique in pandas - Radviz | Data Visualization in Pandas

These were some of the data visualizations best practices done on a Iris dataset.

Example 2:-

Data Visualization dataset: San Francisco Salaries

The very first step is to read the data.

salaries = pd.read_csv(‘./Salaries.csv’)

Checking the columns present using the .info() function in pandas.

salaries.info()

Checking the columns present using info() function in pandas

Converting all the columns to numeric.

for col in ['BasePay', 'OvertimePay', 'OtherPay', 'Benefits']:
    salaries[col] = pd.to_numeric(salaries[col], errors='coerce')

All the pay columns are plotted in one plot.

pay_columns = salaries.columns[3:salaries.columns.get_loc('Year')]
pay_columns

Pay columns in Pandas Data frame Index | Data visualization in Pandas

A 2×3 figure is plotted with histogram which is useful for grouping elements.

pays_arrangement = list(zip(*(iter(pay_columns),) * 3))

The plt.subplots command gives a figure and a 2×3 array of axes.

fig, axes = plt.subplots(2,3)
for i in range(len(pays_arrangement)):
  for j in range(len(pays_arrangement[i])):

# pass in axes to pandas hist
salaries[pays_arrangement[i][j]].hist(ax=axes[i,j])

# axis objects have a lot of methods for customizing the look of a plot
axes[i,j].set_title(pays_arrangement[i][j])
plt.show()

Building subplots in Pandas | Data Visualization tutorial

To make the plot more readable, a combination of figure height, width, and subplot spacing could be used.

fig, axes = plt.subplots(2,3)

# set the figure height
fig.set_figheight(5)
fig.set_figwidth(12)

for i in range(len(pays_arrangement)):
    for j in range(len(pays_arrangement[i])):
        # pass in axes to pandas hist
        salaries[pays_arrangement[i][j]].hist(ax=axes[i,j])
        axes[i,j].set_title(pays_arrangement[i][j])
        
# add a row of emptiness between the two rows
plt.subplots_adjust(hspace=1)
# add a row of emptiness between the cols
plt.subplots_adjust(wspace=1)
plt.show()

Implementing figure height in subplots in Pandas | Data Visualization tutorials

On top of this, the ticks could be rotated.

# and here is a cleaner version using tick rotation and plot spacing
fig, axes = plt.subplots(2,3)

# set the figure height
fig.set_figheight(5)
fig.set_figwidth(12)

for i in range(len(pays_arrangement)):
    for j in range(len(pays_arrangement[i])):
        salaries[pays_arrangement[i][j]].hist(ax=axes[i,j])
        axes[i,j].set_title(pays_arrangement[i][j])
        
        # set xticks with these labels,
        axes[i,j].set_xticklabels(labels=axes[i,j].get_xticks(), 
                                  # with this rotation
                                  rotation=30)
        
plt.subplots_adjust(hspace=1)
plt.subplots_adjust(wspace=1)
plt.show()

Subplots in Pandas | Data Visualization techniques |

There is an innumerable list of plots available in the official documentation of Matplotlib and Seaborn. Another library that has gained a reputation and used quite regularly is Plotly which makes interactive browser-friendly plots.

Conclusion

In this blog, we covered some of the Data visualization techniques could be performed using Python. EduGrad has a rich set of courses pertaining to Python, Data visualization, Data Science, and so on which would enhance your skill to the level necessary to sustain in the industry.

Explore our courses – 

Learn Data Analytics using Python | EduGrad Learn web scraping using Python | EduGrad Learn Python for Data science | EduGrad

Explore all courses here.

Industrial Dataset Machine Learning Projects –

Learn Predictive Regression models in Machine Learning | Machine Learning Projects | EduGrad Learn to build Recommendation system in Python | Machine Learning Projects | Data science Projects | EduGrad Build classification model in Python | Data science projects | EduGrad

Explore all projects here.

 

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here