PickupBrain

Commonly Used Functions in Pandas

Pandas is an open-source library that is built on top of NumPy library. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users.

It provides many functions and methods to expedite the data analysis process.

In this post, I will explain 10 pandas functions with examples. Some of them are so common that I’m sure you have used before and some might be new for you. But, all of them will add value to your data analysis process.

Shape Function

The shape function is used to get the number of rows and number of columns of the dataframe. It returns a tuple(rows, columns) of the DataFrame.

#initialize a dataframe
import pandas as pd
df = pd.read_csv('CompleteCricketData.csv')

#getting dataframe shape
df.shape

#Output
(92852, 17)

#Create an empty df
empty_df = pd.DataFrame()

#Shape of Empty DataFrame
empty_df.shape

#Output
(0, 0)

We can also get the number of rows or number of columns using index on the shape.

#Number of rows through index
df.shape[0]

#Number of columns through index
df.shape[1]

#Output
92852
17


Size Function

The Size of a dataframe represents the number of elements in the dataframe which is nothing but the number of rows times number of columns of dataframe.

# Getting size of the dataframe
df.size

#Output
1578484

#Create an empty df
empty_df = pd.DataFrame()

#Shape of Empty DataFrame
empty_df.shape

#Output
0

ndim Function

The ndim() function is used to get an int representing the number of axes / array dimensions.

If it’s a Series then it will return 1. Otherwise it will return 2 for DataFrame.

#Creating a dataframe 
import pandas as pd
data1 = {'Name': ['Sameer', 'Jay'], 'Salary': [10000, 15000]}
example1 =pd.DataFrame(data1)

#using ndim function
example1.ndim

#Output
>>>2

#Creating a series
import pandas as pd
example2 = pd.Series({'Sameer': 20000, 'Jay': 30000})

#using ndim function
example2.ndim

#Output
>>>1

This function is very useful when you are working multiple databases and you don’t know which one is Series or DataFrame. Instead of fetching entire dataset, this function is very helpful.

dtypes Function

This function is used to get the data type of column in DataFrame.

#creating a dataframe
import pandas as pd
dict1 = {'Name':['Nike','Sameer','Olivia','Allison','Mike'],
     'Gender':["M","M","F","F","M"],      
    'Salary':[60000,45000,50000,70000,35000],
    'Age':[43,28,36,55,32],
    'Weight':[55.8, 78.5, 49.9, 65.3, 70.1]}
ex1 = pd.DataFrame(dict1)

#Finding datatypes of all column
ex1.dtypes

#Output
Name       object
Gender     object
Salary      int64
Age         int64
Weight    float64
dtype: object


#Finding datatype of 'Salary' Column
ex1['Salary'].dtypes

#Output
dtype('int64')

select_dtypes Function

The select_dtypes() function is used to get a subset of the DataFrame’s columns based on the column dtypes. It allows to include or exclude certain data types using include and exclude parameters.

Syntax: df.select_dtypes(self, include = None, exclude = None)

Parameter: A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.

ValueError:

  • If both of include and exclude are empty
  • If include and exclude have overlapping elements
  • If any kind of string dtype is passed in.
#initialize a dataframe
import pandas as pd
dict1 = {'Name':['Nike','Sameer','Olivia','Allison','Mike'],
     'Gender':["M","M","F","F","M"],      
    'Salary':[60000,45000,50000,70000,35000],
    'Age':[43,28,36,55,32],
    "Weight":[55.8,78.5,49.9,65.3,70.1]}
ex1 = pd.DataFrame(dict1)

# Select all columns having integer datatype 
ex1.select_dtypes(include ='int64')

#Output
	Salary	Age
0	60000	43
1	45000	28
2	50000	36
3	70000	55
4	35000	32

# Select all columns having float datatype 
ex1.select_dtypes(include ='float64')

#Output
	Weight
0	55.8
1	78.5
2	49.9
3	65.3
4	70.1

#Select all columns except integer based 
ex1.select_dtypes(exclude ='int64')

#Output
        Name  Gender   Weight
0	Nike	M	55.8
1	Sameer	M	78.5
2	Olivia	F	49.9
3	Allison	F	65.3
4	Mike	M	70.1

# Including Float and excluding interger based columns
ex1.select_dtypes(include ='float',exclude ='int')

#Output
	Weight
0	55.8
1	78.5
2	49.9
3	65.3
4	70.1

values Function

The values() function is used to get a Numpy representation of the DataFrame. It only returns the values in the DataFrame, the axes labels will be removed.

#Getting values of dataframe
ex1.values

#Output
array([['Nike', 'M', 60000, 43, 55.8],
       ['Sameer', 'M', 45000, 28, 78.5],
       ['Olivia', 'F', 50000, 36, 49.9],
       ['Allison', 'F', 70000, 55, 65.3],
       ['Mike', 'M', 35000, 32, 70.1]], dtype=object)

Axes Function

The axes() function returns a list with row axis labels and column axis labels as the only members of the DataFrame.

#Using axes
ex1.axes

#Output
[RangeIndex(start=0, stop=5, step=1),
 Index(['Name', 'Gender', 'Salary', 'Age', 'Weight'], dtype='object')]

Empty() Function

The empty() function shows whether DataFrame is empty or not. It returns in boolean form.

If it returns True then DataFrame is entirely empty (no items), meaning any of the axes are of length 0. And if it returns False then DataFrame is not empty.

#creating a dataframe
import pandas as pd
dict1 = {'Name':['Nike','Sameer','Olivia','Allison','Mike'],
     'Gender':["M","M","F","F","M"],      
    'Salary':[60000,45000,50000,70000,35000],
    'Age':[43,28,36,55,32],
    'Weight':[55.8, 78.5, 49.9, 65.3, 70.1]}
ex1 = pd.DataFrame(dict1)

#Checking the dataframe
ex1.empty

#Output
False

#Creating an empty dataframe
empty_df = pd.DataFrame()

#Checking the dataframe
empty_df.empty

#Output
True

Transpose Function

This function converts rows into columns and columns into rows.

#Converting row into column and column into rows
ex1.T

#Output

        0	1	2	3	4
Name	Nike	Sameer	Olivia	Allison	Mike
Gender	M	M	F	F	M
Salary	60000	45000	50000	70000	35000
Age	43	28	36	55	32
Weight	55.8	78.5	49.9	65.3	70.1

In this tutorial, we learned how to find the shape, size, ndim, dtypes, select_dtypes, values, axes, empty and Transpose functions of DataFrame.

Leave a Reply