Sklearn labelencoder is a process of converting categorical values to numeric values so that machine learning models can understand the data and find hidden patterns. Although, there are various ways for categorical encoding and Sklearn labelencoder is one of them. In this short article, we will learn how Sklearn labelencoder works by taking various examples. Moreover, we will also compare Sklearn labelencoder with Sklearn one hot encoder.
What is Sklearn Module?
Sklearn, also known as Scikit-learn is probably the most useful library for machine learning in Python. The Sklearn library contains a lot of efficient tools for machine learning and statistical modeling including classification(KNN, SVM, Decision trees), regression(linear regression, isolation forest, random forest), clustering(k-mean clustering), and dimensionality reduction(PCA). It also supports Python numerical and scientific libraries like NumPy and SciPy.
More importantly, it has various methods for data preprocessing including random state, data splitting, data encoding, and many more. In this article, we will focus on only one encoding method which is one-hot encoding.
How Does Sklearn Labelencoder Works?
Sklearn labelencoder converts the labels or categories into labels so that the machine learning model can understand the dataset. The label encoding assigns a new numeric value to each of the categories as shown below:
As you can see, the label encoder has assigned a specific value for each of the categories.
Examples of Sklearn Labelencoder
Now, we will take various examples of Sklearn label encoder and will solve various examples. Here is what we are going to do in this section:
- Sklearn label encoding one column
- Sklearn label encoding multiple columns
Example 1: Sklearn Label Encoding on One Column
Let us first import the dataset and then use the sklearn label encoding to convert categorical values to numeric ones.
# importing pandas import pandas as pd # importing dataset data = pd.read_excel('Label_Encoding.xlsx') # heading of data data.head()
As you can see, the output of the data is categorical. Now, we will use the Sklearn labelencoder to convert these values into numeric values.
# Import sklearn labelencoder from sklearn import preprocessing # initializing sklearn labelencoder label_encoder = preprocessing.LabelEncoder() # encoding marrige column data['Marrige_Status']= label_encoder.fit_transform(data['Marrige_Status']) # printing data['Marrige_Status'].unique()
As you can see, there are only numeric values in the output column.
You may also like: MinMax Scaling in Sklearn
Example 2: Sklearn Label Encoding Multiple Columns
Encoding multiple columns in Sklearn is very much similar to a single column. Here we just need to specify the names of all columns.
# importing the dataset df = pd.read_csv("Placement_Data_Full_Class.csv") # data frame df.head()
As you can see, there are many columns with categorical values. Let us now apply the Sklearn labelencoder to convert the data into numeric values.
# multiple columns cols = ['workex', 'status', 'hsc_s', 'degree_t'] # sklearn labelencoder df[cols] = df[cols].apply(preprocessing.LabelEncoder().fit_transform) # print df.head()
As you can see, we get the encoded dataset.
Label encoding assigns each categorical value an integer value based on alphabetical order. In this short article, we learned how we can use the Sklearn label encoder to convert categorical values to numeric ones. If you have any specific questions related to the label encoding method in Sklearn, please let us know through comments.