Index of DataFrame in Pandas is like an address, that’s how any data point across the DataFrame or series can be accessed. This is similar to an index that you would see at the end of a book that helps you find content faster.
The DataFrame in Python is labeled as two-dimensional data structures and comprises the main three components – Index, Columns and Data.
Indexing in Pandas helps in selecting particular rows and column of data from a DataFrame. While working with large DataFrames, indexes are quite helpful.
Setting the Index
Let’s understand the index in Pandas by creating a DataFrame.
#importing pandas package
import pandas as pd
# making data frame from dictionary
dict= {'Name':["Joe", "Parth", "Sunny", "Lauren", "Lara"],
'Dept': ["Manager", "Dev", "Admin", "Manager", "Admin"],
'ID':[90, 40, 80, 98, 50],
'Salary':[90000, 65000, 50000, 92000, 50000]}
#Converting dataset dict to a DataFrame
df = pd.DataFrame(dict)
df
#Output
Name Dept ID Salary
0 Joe Manager 90 90000
1 Parth Dev 40 65000
2 Sunny Admin 80 50000
3 Lauren Manager 98 92000
4 Lara Admin 50 50000
The left most column in the DataFrame is auto-generated index column by Pandas to optimize the dataset.
Setting index using DataFrame.index
The auto-generated index starts from 0 but it can manipulate by using DataFrame.index property.
#Manipulating index value from 0 to 1
df.index = [i for i in range(1, len(df.values)+1)]
df
#Output
Name Dept ID Salary
1 Joe Manager 90 90000
2 Parth Dev 40 65000
3 Sunny Admin 80 50000
4 Lauren Manager 98 92000
5 Lara Admin 50 50000
Here, len(df.values) returns the number of rows in current DataFrame.
Instead of auto-generated index, let’s set the ‘ID’ column of DataFrame as an index field.
#Setting ID as an index
df.index = df['ID']
df
#Output
Name Dept ID Salary
ID
90 Joe Manager 90 90000
40 Parth Dev 40 65000
80 Sunny Admin 80 50000
98 Lauren Manager 98 92000
50 Lara Admin 50 50000
Now, the index column name is shown as ID. But as you see we have a duplication, Index and Column[‘ID’] are identical. To solve this, let’s drop the column ID.
#Dropping ID Column
df = df.drop(['ID'], axis = 1)
df
#Output
Name Dept Salary
ID
90 Joe Manager 90000
40 Parth Dev 65000
80 Sunny Admin 50000
98 Lauren Manager 92000
50 Lara Admin 50000
Setting index using set_index() Function
Now, we will set an index for the Python DataFrame using the set_index() method.
set_index() is a built-in method that is used to set the List in Series or DataFrame as an index of a Data Frame. It sets the DataFrame index using existing columns. It can replace the existing index or expand on it.
Syntax
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
Parameters
- keys: Column name or list of a column name.
- drop: It’s a Boolean value which drops the column used for the index if set True.
- append: It appends the column to the existing index column if True.
- inplace: It makes the changes in the DataFrame if True.
- verify_integrity: It checks the new index column for duplicates if True.
import pandas as pd
dict= {'Name':["Joe", "Parth", "Sunny", "Lauren", "Lara"],
'Dept': ["Manager", "Dev", "Admin", "Manager", "Admin"],
'ID':[90, 40, 80, 98, 50],
'Salary':[90000, 65000, 50000, 92000, 50000]}
#Converting dataset dict to a DataFrame
df = pd.DataFrame(dict)
#Setting index as ID
df.set_index('ID', inplace = True, drop = True)
df
#Output
Name Dept Salary
ID
90 Joe Manager 90000
40 Parth Dev 65000
80 Sunny Admin 50000
98 Lauren Manager 92000
50 Lara Admin 50000
In the above code, as inplace parameter is True, current DataFrame is modified. When inplace is set to False, current DataFrame remains unaltered and instead modified DataFrame is returned. The drop = False parameter will keep the ID column in the list, by default the column to be indexed will be drop (drop = True)
MultiIndexing
We can set more than one column can be set as Index. Let’s set columns ‘Dept’ and ‘ID’ as Index.
import pandas as pd
dict= {'Name':["Joe", "Parth", "Sunny", "Lauren", "Lara"],
'Dept': ["Manager", "Dev", "Admin", "Manager", "Admin"],
'ID':[90, 40, 80, 98, 50],
'Salary':[90000, 65000, 50000, 92000, 50000]}
#Converting dataset dict to a DataFrame
df = pd.DataFrame(dict)
#Setting index as ID and Dept
df.set_index(['Dept','ID'], inplace = True, append = True, drop = False )
df
#Output
Name Dept ID Salary
Dept ID
0 Manager 90 Joe Manager 90 90000
1 Dev 40 Parth Dev 40 65000
2 Admin 80 Sunny Admin 80 50000
3 Manager 98 Lauren Manager 98 92000
4 Admin 50 Lara Admin 50 50000
In the above code, drop parameter is used to drop the column and append parameter is used to append passed columns to the already existing index column.
Resetting the Index
Resetting is useful to reset the index of a DataFrame.
The index of DataFrame and Series can be reassigned to the row number starting from 0 using index_reset() function. You may need to re-index in certain sitiations like order of the rows changes after sorting or missing row number after deleting the row.
It also removes the current index and set it as a new column of the DateFrame.
Syntax of reset_index()
DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=”)
Parameter
level: int, string or a list to select and remove passed column from index.
drop: Boolean value, Adds the replaced index column to the data if False.
inplace: Boolean value, make changes in the original data frame itself if True.
col_level: Select in which column level to insert the labels.
col_fill: Object, to determine how the other levels are named.
import pandas as pd
dict= {'Name':["Joe", "Parth", "Sunny", "Lauren", "Lara"],
'Dept': ["Manager", "Dev", "Admin", "Manager", "Admin"],
'ID':[90, 40, 80, 98, 50],
'Salary':[90000, 65000, 50000, 92000, 50000]}
#Converting dataset dict to a DataFrame
df = pd.DataFrame(dict)
#Setting index as ID
df.set_index(['Dept','ID'], inplace = True, append = True, drop = False )
df
#Before reset_index
Name Dept ID Salary
Dept ID
0 Manager 90 Joe Manager 90 90000
1 Dev 40 Parth Dev 40 65000
2 Admin 80 Sunny Admin 80 50000
3 Manager 98 Lauren Manager 98 92000
4 Admin 50 Lara Admin 50 50000
#Remove index with default index
df.reset_index(drop = True, inplace = True)
df
#Output after reset index
Name Dept ID Salary
0 Joe Manager 90 90000
1 Parth Dev 40 65000
2 Sunny Admin 80 50000
3 Lauren Manager 98 92000
4 Lara Admin 50 50000
In the above example, Index is set to default and Department & ID has been reset to a column.
Sort Index
The sort_index() function is used to sort DataFrame/Series by index labels.
Syntax
DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, sort_remaining=True, by=None)
Parameter
axis: index, columns to direct sorting
level: if not None, sort on values in specified index level(s)
ascending: Sort ascending vs. descending
inplace: if True, perform operation in-place
kind: {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’. Choice of sorting algorithm. However, it is only applied when sorting on a single column or label of a DataFrame.
na_position: [{‘first’, ‘last’}, default ‘last’] First puts NaNs at the beginning, last puts NaNs at the end. Not implemented for MultiIndex.
sort_remaining: If true and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level
import pandas as pd
dict= {'Name':["Joe", "Parth", "Sunny", "Lauren", "Lara"],
'Dept': ["Manager", "Dev", "Admin", "Manager", "Admin"],
'ID':[90, 40, 80, 98, 50],
'Salary':[90000, 65000, 50000, 92000, 50000]}
#Converting dataset dict to a DataFrame
df = pd.DataFrame(dict)
#Sort the DataFrame by row labels
df.sort_index(axis = 0)
#Output
Name Dept ID Salary
0 Joe Manager 90 90000
1 Parth Dev 40 65000
2 Sunny Admin 80 50000
3 Lauren Manager 98 92000
4 Lara Admin 50 50000
#Sort the DataFrame by descending order
df.sort_index(ascending = False)
#Output
Name Dept ID Salary
4 Lara Admin 50 50000
3 Lauren Manager 98 92000
2 Sunny Admin 80 50000
1 Parth Dev 40 65000
0 Joe Manager 90 90000
#Sort the DataFrame by column labels
df.sort_index(axis = 1)
#Output
Dept ID Name Salary
0 Manager 90 Joe 90000
1 Dev 40 Parth 65000
2 Admin 80 Sunny 50000
3 Manager 98 Lauren 92000
4 Admin 50 Lara 50000
Index are sorted in ascending order by default. However, you can also sort Index in descending order by setting the argument ascending to False.
Summary
In this post we have covered various features of indexing like setting an index, resetting an index and sorting an index.
C P Gupta is a YouTuber and Blogger. He is expert in Microsoft Word, Excel and PowerPoint. His YouTube channel @pickupbrain is very popular and has crossed 9.9 Million Views.