Pandas is an open-source library that is built on top of NumPy library. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users.
It provides many functions and methods to expedite the data analysis process.
In this post, I will explain 10 pandas functions with examples. Some of them are so common that I’m sure you have used before and some might be new for you. But, all of them will add value to your data analysis process.
Shape Function
The shape function is used to get the number of rows and number of columns of the dataframe. It returns a tuple(rows, columns) of the DataFrame.
#initialize a dataframe
import pandas as pd
df = pd.read_csv('CompleteCricketData.csv')
#getting dataframe shape
df.shape
#Output
(92852, 17)
#Create an empty df
empty_df = pd.DataFrame()
#Shape of Empty DataFrame
empty_df.shape
#Output
(0, 0)
We can also get the number of rows or number of columns using index on the shape.
#Number of rows through index
df.shape[0]
#Number of columns through index
df.shape[1]
#Output
92852
17
Size Function
The Size of a dataframe represents the number of elements in the dataframe which is nothing but the number of rows times number of columns of dataframe.
# Getting size of the dataframe
df.size
#Output
1578484
#Create an empty df
empty_df = pd.DataFrame()
#Shape of Empty DataFrame
empty_df.shape
#Output
0
ndim Function
The ndim() function is used to get an int representing the number of axes / array dimensions.
If it’s a Series then it will return 1. Otherwise it will return 2 for DataFrame.
#Creating a dataframe
import pandas as pd
data1 = {'Name': ['Sameer', 'Jay'], 'Salary': [10000, 15000]}
example1 =pd.DataFrame(data1)
#using ndim function
example1.ndim
#Output
>>>2
#Creating a series
import pandas as pd
example2 = pd.Series({'Sameer': 20000, 'Jay': 30000})
#using ndim function
example2.ndim
#Output
>>>1
This function is very useful when you are working multiple databases and you don’t know which one is Series or DataFrame. Instead of fetching entire dataset, this function is very helpful.
dtypes Function
This function is used to get the data type of column in DataFrame.
#creating a dataframe
import pandas as pd
dict1 = {'Name':['Nike','Sameer','Olivia','Allison','Mike'],
'Gender':["M","M","F","F","M"],
'Salary':[60000,45000,50000,70000,35000],
'Age':[43,28,36,55,32],
'Weight':[55.8, 78.5, 49.9, 65.3, 70.1]}
ex1 = pd.DataFrame(dict1)
#Finding datatypes of all column
ex1.dtypes
#Output
Name object
Gender object
Salary int64
Age int64
Weight float64
dtype: object
#Finding datatype of 'Salary' Column
ex1['Salary'].dtypes
#Output
dtype('int64')
select_dtypes Function
The select_dtypes() function is used to get a subset of the DataFrame’s columns based on the column dtypes. It allows to include or exclude certain data types using include and exclude parameters.
Syntax: df.select_dtypes(self, include = None, exclude = None)
Parameter: A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.
ValueError:
- If both of include and exclude are empty
- If include and exclude have overlapping elements
- If any kind of string dtype is passed in.
#initialize a dataframe
import pandas as pd
dict1 = {'Name':['Nike','Sameer','Olivia','Allison','Mike'],
'Gender':["M","M","F","F","M"],
'Salary':[60000,45000,50000,70000,35000],
'Age':[43,28,36,55,32],
"Weight":[55.8,78.5,49.9,65.3,70.1]}
ex1 = pd.DataFrame(dict1)
# Select all columns having integer datatype
ex1.select_dtypes(include ='int64')
#Output
Salary Age
0 60000 43
1 45000 28
2 50000 36
3 70000 55
4 35000 32
# Select all columns having float datatype
ex1.select_dtypes(include ='float64')
#Output
Weight
0 55.8
1 78.5
2 49.9
3 65.3
4 70.1
#Select all columns except integer based
ex1.select_dtypes(exclude ='int64')
#Output
Name Gender Weight
0 Nike M 55.8
1 Sameer M 78.5
2 Olivia F 49.9
3 Allison F 65.3
4 Mike M 70.1
# Including Float and excluding interger based columns
ex1.select_dtypes(include ='float',exclude ='int')
#Output
Weight
0 55.8
1 78.5
2 49.9
3 65.3
4 70.1
values Function
The values() function is used to get a Numpy representation of the DataFrame. It only returns the values in the DataFrame, the axes labels will be removed.
#Getting values of dataframe
ex1.values
#Output
array([['Nike', 'M', 60000, 43, 55.8],
['Sameer', 'M', 45000, 28, 78.5],
['Olivia', 'F', 50000, 36, 49.9],
['Allison', 'F', 70000, 55, 65.3],
['Mike', 'M', 35000, 32, 70.1]], dtype=object)
Axes Function
The axes() function returns a list with row axis labels and column axis labels as the only members of the DataFrame.
#Using axes
ex1.axes
#Output
[RangeIndex(start=0, stop=5, step=1),
Index(['Name', 'Gender', 'Salary', 'Age', 'Weight'], dtype='object')]
Empty() Function
The empty() function shows whether DataFrame is empty or not. It returns in boolean form.
If it returns True then DataFrame is entirely empty (no items), meaning any of the axes are of length 0. And if it returns False then DataFrame is not empty.
#creating a dataframe
import pandas as pd
dict1 = {'Name':['Nike','Sameer','Olivia','Allison','Mike'],
'Gender':["M","M","F","F","M"],
'Salary':[60000,45000,50000,70000,35000],
'Age':[43,28,36,55,32],
'Weight':[55.8, 78.5, 49.9, 65.3, 70.1]}
ex1 = pd.DataFrame(dict1)
#Checking the dataframe
ex1.empty
#Output
False
#Creating an empty dataframe
empty_df = pd.DataFrame()
#Checking the dataframe
empty_df.empty
#Output
True
Transpose Function
This function converts rows into columns and columns into rows.
#Converting row into column and column into rows
ex1.T
#Output
0 1 2 3 4
Name Nike Sameer Olivia Allison Mike
Gender M M F F M
Salary 60000 45000 50000 70000 35000
Age 43 28 36 55 32
Weight 55.8 78.5 49.9 65.3 70.1
In this tutorial, we learned how to find the shape, size, ndim, dtypes, select_dtypes, values, axes, empty and Transpose functions of DataFrame.
C P Gupta is a YouTuber and Blogger. He is expert in Microsoft Word, Excel and PowerPoint. His YouTube channel @pickupbrain is very popular and has crossed 9.9 Million Views.