How to Drop Missing Values in Pandas? Dropna method

Pandas is an open-source Python module that is used to manipulate, visualize and preprocess the dataset. The Pandas module is so powerful and because of having powerful methods, it has become very popular among data analysts. One of the important things, when we read data, is to handle missing values. One of the simple methods to deal with missing values is to drop them. Here we will learn how to drop missing values in pandas from the data frame. We will use the dropna() method which drops the rows having null values.

How to Drop Missing Values in Pandas DataFrame?

There are many ways to handle the missing values in a DataFrame in pandas. One of the simple methods is to drop all the missing values from the data frame. In pandas, the dropna() method is used to drop all the missing values from the data frame. It can also be used for a whole column or row containing missing values. So, we need to specify the axis for droping before applying the function.

The simple syntax of the dropna() method in pandas is given below:

df.dropna()

The dropna() method takes various parameters which help us to drop the rows, columns, or all missing values from the dataset.

Let us now jump into the implementation of the dropna() method and see how to drop missing values in Pandas.

Loading Dataset in Pandas

Pandas provide many useful methods to read and load the dataset. In this case, we will load and open a CSV file that contains a dataset about the house prices in Dushanbe city.

# import pandas 
import pandas as pd

data = pd.read_csv("house.csv")

data.head()

Output:

how to drop missing values in pandas

As you can see, there are many null values showing in the dataset already. We will now drop them from our dataset.

How to Find the Total Number of Missing Values in Each Column?

Pandas provides many useful methods that help us to do data analysis. We can easily find the total number of missing values in each of the columns by combining the isnull() and sum() methods. The isnull() method returns a boolean value and the sum() helps us to count the boolean values.

Let us first use the isnull() value and see how it helps to calculate the missing values:

# is null values
data.isnull()

Output:

number_of_rooms	floor	area	latitude	longitude	price
0	False	False	False	False	False	False
1	False	False	False	False	False	False
2	False	False	False	True	True	False
3	False	False	False	False	False	False
4	False	False	False	False	False	False
...	...	...	...	...	...	...
5574	False	False	False	True	True	False
5575	False	False	False	True	True	False
5576	False	False	False	True	True	False
5577	False	False	False	True	True	False
5578	False	False	False	True	True	False

Notice that all the missing values are represented by True value. The isnull().sum() method in pandas is used to find the total number of missing values in each of the columns.

# total number of missing values
data.isnull().sum()

Output:

number_of_rooms       0
floor                 0
area                  0
latitude           1849
longitude          1849
price                 0
dtype: int64

As you can see, there are missing values in two columns. Let us now drop these values

Drop Missing Values in Pandas

First, we will drop the missing values along with the row. The dropna() method will help us to drop the rows that contain missing values.

#copy data
df = data.copy()

# drop missing values/
df.dropna(axis=0, inplace=True)

# total missing values
df.isnull().sum()

Output:

number_of_rooms    0
floor              0
area               0
latitude           0
longitude          0
price              0
dtype: int64
  • axis=0: This means we want to drop the row which contains the missing value.
  • inplace=Ture: This will help you to drop the null values permanently from the dataset.

If you want to drop the column that contains null values then you need to change the axis to 1 as shown below:

#copy data
df = data.copy()

# drop missing values/
df.dropna(axis=1, inplace=True)

# total missing values
df.isnull().sum()

Output:

number_of_rooms    0
floor              0
area               0
price              0
dtype: int64

As you can see, now the columns that had the null values are no more part of our dataset as we have dropped them permanently out of our dataset.

Conclusion

The pandas.dropna() method is used to drop the missing values from the data frame in pandas. This method can be used to drop the rows or the columns that have missing values. In this article, we used an example to see how to drop missing values in pandas from the data frame.

Other Related Articles

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top