Dataframe in Python Class 12 Notes
Dataframe in Python Class 12 Notes
DataFrame
A DataFrame is a two-dimensional labelled data structure similar to spreadsheet or table of MySQL. It contains rows and columns, and therefore has both row and column index. Each column can have a different type of value such as numeric, string, boolean, etc.
NOTE: Number of rows and columns can be increased or decreased in DataFrame.
How to create DataFrame in Python?
There are many ways to create DataFrame in Python. Let we discuss few of them
1. Creation of an empty DataFrame:
Code to create an empty DataFrame is given below
import pandas as pd
DF = pd.DataFrame( )
print(DF)
OUTPUT:
Empty DataFrame
Columns: [ ]
Index: [ ]
2. Creation of DataFrame from numpy arrays:
Let we create DataFrame from the numpy arrays
import numpy as np import pandas as pd ar1 = np.array([1, 2, 3, 4]) #First array created containing 4 integers ar2 = np.array([10, 20, 30, 40]) #Second array created containing 4 integers ar3 = np.array([-23, -43, 67, 90]) #Third array created containing 4 integers #Let we create DataFrame using first array only and observe the output DF = pd.DataFrame(ar1) print(DF) OUTPUT: 0 0 1 1 2 2 3
Dataframe in Python Class 12 Notes
import numpy as np import pandas as pd ar1 = np.array([1, 2, 3, 4]) #First array created containing 4 integers ar2 = np.array([10, 20, 30, 40]) #Second array created containing 4 integers ar3 = np.array([-23, -43, 67, 90]) #Third array created containing 4 integers #Let we create DataFrame using first and second array only and observe the output DF = pd.DataFrame([ar1,ar2]) #Creating dataframe using first and second array print(DF) OUTPUT: 0 1 2 3 0 1 2 3 4 1 10 20 30 40
Dataframe in Python Class 12 Notes
import numpy as np import pandas as pd ar1 = np.array([1, 2, 3, 4]) #First array created containing 4 integers ar2 = np.array([10, 20, 30, 40]) #Second array created containing 4 integers ar3 = np.array([-23, -43, 67, 90]) #Third array created containing 4 integers #Let we create DataFrame using all the three arrays and observe the output DF = pd.DataFrame([ar1, ar2, ar3]) #Creating dataframe using all three arrays print(DF) OUTPUT: 0 1 2 3 0 1 2 3 4 1 10 20 30 40 2 -23 -43 67 90
3. Creation of DataFrame from Lists: We can create dataframe from list by passing list to DataFrame( ) function. All the elements of list will be displayed as columns. The default label of column is 0. for example
Practical 1: To create dataframe from simple list.
import pandas as pd df = pd.DataFrame([11, 22, 33, 44, 55]) print(df) OUTPUT: 0 0 11 1 22 2 33 3 44 4 55
Practical 2: To create dataframe from simple list by passing appropriate column heading and row index.
import pandas as pd df = pd.DataFrame([11, 22, 33, 44, 55], index=['R1', 'R2','R3','R4','R5'], columns=['C1']) print(df) OUTPUT: C1 R1 11 R2 22 R3 33 R4 44 R5 55
Practical 3: To create dataframe from nested list.
import pandas as pd df = pd.DataFrame([[21, 'X', 'A'], [32, 'IX', 'B'], [23, 'X', 'A'], [12, 'XI','A']]) print(df) OUTPUT: 0 1 2 0 21 X A 1 32 IX B 2 23 X A 3 12 XI A
Dataframe in Python Class 12 Notes
Practical 4: To create dataframe from nested list by passing appropriate column heading and row index.
import pandas as pd df = pd.DataFrame([[21, 'X', 'A'], [32, 'IX', 'B'], [23, 'X', 'A'],[12, 'XI','A']], index= ['Rec1', 'Rec2', 'Rec3', 'Rec4'], columns = ["Rno", "Class", "Sec"]) print(df) OUTPUT: Rno Class Sec Rec1 21 X A Rec2 32 IX B Rec3 23 X A Rec4 12 XI A
4. Creation of DataFrame from Dictionary of lists: We can create dataframe from dictionaries of list as shown below. for example
Practical 1: To create dataframe using dictionaries of list.
import pandas as pd df = pd.DataFrame({'Rno' : [21, 28, 31], 'Class' : ['IX', 'X', 'XI'], 'Sec' : ['B', 'A','C']}) print(df) OUTPUT: Rno Class Sec 0 21 IX B 1 28 X A 2 31 XI C
Practical 2: To create dataframe using dictionaries of list with appropriate row index.
import pandas as pd df = pd.DataFrame({'B_id' : ['B1', 'B8', 'B5'], 'Sub' : ['Hindi', 'Math', 'Science'], 'Cost' : [450, 520, 400]}, index=['R1', 'R2', 'R3']) print(df) OUTPUT: B_id Sub Cost R1 B1 Hindi 450 R2 B8 Math 520 R3 B5 Science 400
Note: Dictionary keys become column labels by default in a DataFrame, and the lists become the rows
5. Creation of DataFrame from List of Dictionaries : We can create dataframe from list of dictionaries. for example
import pandas as pd df = pd.DataFrame([{'Ram' : 25, 'Anil' : 29, 'Simple' : 28}, {'Ram' : 21, 'Anil' : 25, 'Simple':23}, {'Ram' : 23, 'Anil' : 18, 'Simple' : 26}], index=['Term1', 'Term2', 'Term3']) print(df) OUTPUT: Ram Anil Simple Term1 25 29 28 Term2 21 25 23 Term3 23 18 26
Here, the keys of dictionaries are taken as column labels, and the values corresponding to each key are taken as rows. There will be as many rows as the number of dictionaries present in the list.
NOTE: NaN (Not a Number) is inserted if a corresponding value for a column is missing as shown in the following example.
import pandas as pd df = pd.DataFrame([{'Ram' : 25, 'Anil' : 29, 'Simple' : 28}, {'Ram' : 21, 'Anil' : 25, 'Simple':23}, {'Ram' : 23, 'Anil' : 18}], index=['Term1', 'Term2', 'Term3']) print(df) OUTPUT: Ram Anil Simple Term1 25 29 28 Term2 21 25 23 Term3 23 18 NaN
Dataframe in Python Class 12 Notes
6. Creation of DataFrame from Series : We can create dataframe from single or multiple Series. for example
Example 1: Creation of DataFrame from Single Series.
import pandas as pd S1 = pd.Series([10, 20, 30, 40]) S2 = pd.Series([11, 22, 33, 44]) S3 = pd.Series([34, 44, 54, 24]) df = pd.DataFrame(S1) print(df) OUTPUT: 0 0 10 1 20 2 30 3 40
Here, the DataFrame has as many numbers of rows as the numbers of elements in the series, but has only one column.
Example 2: Creation of DataFrame from two Series.
import pandas as pd S1 = pd.Series([10, 20, 30, 40]) S2 = pd.Series([11, 22, 33, 44]) S3 = pd.Series([34, 44, 54, 24]) df = pd.DataFrame([S1, S2], index = ['R1', 'R2']) print(df) OUTPUT: 0 1 2 3 R1 10 20 30 40 R2 11 22 33 44
Example 3: Creation of DataFrame from three Series.
import pandas as pd S1 = pd.Series([10, 20, 30, 40]) S2 = pd.Series([11, 22, 33, 44]) S3 = pd.Series([34, 44, 54, 24]) df = pd.DataFrame([S1, S2, S3],index = ['R1', 'R2', 'R3']) print(df) OUTPUT: 0 1 2 3 R1 10 20 30 40 R2 11 22 33 44 R3 34 44 54 24
To create a DataFrame using more than one series, we need to pass multiple series in the list as shown above
NOTE: if a particular series does not have a corresponding value for a label, NaN is inserted in the DataFrame column. for example
import pandas as pd S1 = pd.Series([10, 20, 30, 40]) S2 = pd.Series([11, 22, 33, 44]) S3 = pd.Series([34, 44, 54]) df = pd.DataFrame([S1, S2, S3],index = ['R1', 'R2', 'R3']) print(df) OUTPUT: 0 1 2 3 R1 10.0 20.0 30.0 40.0 R2 11.0 22.0 33.0 44.0 R3 34.0 44.0 54.0 NaN
Dataframe in Python Class 12 Notes
Operations on rows and columns in DataFrames
We can perform some basic operations on rows and columns of a DataFrame like
1. Adding a New Column to a DataFrame:
We can easily add a new column to a DataFrame. Lets see the example given below
import pandas as pd df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18, 'Simple':26}],index=['R1','R2','R3']) print(df) df['Amit']=[18, 22, 25] #Adding column to DataFrame print(df) df['Parth']=[28, 12, 30] #Adding column to DataFrame print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 Ram Anil Simple Amit R1 25 29 28 18 R2 21 25 23 22 R3 23 18 26 25 Ram Anil Simple Amit Parth R1 25 29 28 18 28 R2 21 25 23 22 12 R3 23 18 26 25 30
NOTE: If we try to add a column with lesser/more values than the number of rows in the DataFrame, it results in a ValueError, with the error message: ValueError: Length of values does not match length of index. for example
import pandas as pd df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18, 'Simple':26}],index=['R1','R2','R3']) print(df) df['Amit']=[18, 22] print(df) OUTPUT: ValueError: Length of values does not match length of index
2. Adding a New Row to a DataFrame:
We can add a new row to a DataFrame using the DataFrame.loc[ ] method. Lets see the example given below
import pandas as pd df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23}, {'Ram':23, 'Anil':18, 'Simple':26}], index=['R1', 'R2', 'R3']) print(df) df.loc['R4']=[12, 22, 10] #Adding new row print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 R4 12 22 10
NOTE: If we try to add a row with lesser/more values than the number of columns in the DataFrame, it results in a ValueError, with the error message: ValueError: Cannot set a row with mismatched columns. for example
import pandas as pd df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23}, {'Ram':23, 'Anil':18, 'Simple':26}], index=['R1', 'R2', 'R3']) print(df) df.loc['R4']=[12, 22] #Adding new row with less number of values print(df) OUTPUT: ValueError: cannot set a row with mismatched columns
3. Deleting a Row from a DataFrame:
We can use the DataFrame.drop() method to delete rows. To delete a row, the parameter axis is assigned the value 0. Lets see the examples given below
Example 1: To delete a single row from a Dataframe.
import pandas as pd df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18, 'Simple':26}],index=['R1', 'R2', 'R3']) print(df) print("----------------------------------------------------") df=df.drop('R2', axis = 0) #Deleting a row from datafarame print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 ---------------------------------------------------- Ram Anil Simple R1 25 29 28 R3 23 18 26
Example 2: To delete a multiple rows from a Dataframe.
import pandas as pd df = pd.DataFrame({'Ram' : [25, 21, 23], 'Anil' : [29, 25, 18], 'Simple' : [28, 23, 26]}, index=['R1', 'R2', 'R3']) print(df) print("----------------------------------------------------") df=df.drop(['R2', 'R1'], axis = 0) #deleting multiple rows from dataframe print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 ---------------------------------------------------- Ram Anil Simple R3 23 18 26
4. Deleting a Column from a DataFrame:
We can delete the columns from a dataframe by using the following methods
1. pop( ): This method deletes the column from a dataframe and also return the values of deleted column. for example:
import pandas as pd df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18], 'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3']) print(df.pop('Simple')) #Deleting a particular Column and returning the value. print("----------------------------------------------------") print(df) OUTPUT: R1 28 R2 23 R3 26 Name: Simple, dtype: int64 ---------------------------------------------------- Ram Anil R1 25 29 R2 21 25 R3 23 18
2. drop( ): This method deletes the entire column from a dataframe. To delete a column, the parameter axis is assigned the value 1. Lets see the examples given below
import pandas as pd df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18], 'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3']) print(df) print("----------------------------------------------------") df=df.drop('Simple', axis=1) #Deleting column from dataframe print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 ---------------------------------------------------- Ram Anil R1 25 29 R2 21 25 R3 23 18
To delete multiple columns
import pandas as pd df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18], 'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3']) print(df) print("----------------------------------------------------") df=df.drop(['Simple', 'Ram'], axis=1) #deleting multiple columns print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 ---------------------------------------------------- Anil R1 29 R2 25 R3 18
5. Renaming Row Labels of a DataFrame :
We can change the labels of rows in a DataFrame using the DataFrame.rename() method. for example to rename the row indices R1 to Maths, we can write the following code.
Example 1: To change row index ‘R1’ to ‘Maths’
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index = ['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) df=df.rename({'R1' : 'Maths'}) #Statement to change 'R1' to 'Maths' print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 Ram Anil Simple Maths 25 29 28 R2 21 25 23 R3 23 18 26
Example 2: To change row index ‘R1’ to ‘Maths’, ‘R2’ to ‘Science’ and ‘R3’ to ‘English’
import pandas as pd df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) df=df.rename({'R1' : 'Maths', 'R2' : 'Science', 'R3' : 'English'}, axis = 'index') print("-----------------------------------------------------") print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 ----------------------------------------------------- Ram Anil Simple Maths 25 29 28 Science 21 25 23 English 23 18 26 NOTE: The parameter axis='index' is used to specify that the row label is to be changed. We can skip this also as bydefault rename() function changes the row indices.
6. Renaming Column Labels of a DataFrame :
To alter the column names of a DataFrame we can use the rename() method, as shown below. The parameter
axis=’columns’ implies we want to change the column labels:
Example 1: To change the column heading from ‘Ram’ to ‘Ravi’
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) df=df.rename({'Ram' : 'Ravi'}, axis = 'columns') print("-----------------------------------------------------") print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 ----------------------------------------------------- Ravi Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26
Example 2: To change the column heading from ‘Ram’ to ‘Ravi’ and from ‘Simple’ to ‘Sumit’
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) df=df.rename({'Ram' : 'Ravi', 'Simple' : 'Sumit'}, axis = 'columns') print("-----------------------------------------------------") print(df) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 ----------------------------------------------------- Ravi Anil Sumit R1 25 29 28 R2 21 25 23 R3 23 18 26
Accessing DataFrames Element through Indexing
Data elements in a DataFrame can be accessed using indexing.There are two ways of indexing Dataframes :
1. Label based indexing
There are several methods in Pandas to implement label based indexing. DataFrame.loc[ ] is an important method that is used for label based indexing with DataFrames.
Example 1: To display single row from a dataframe using loc( ) method.
import pandas as pd df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.loc['R2']) #row label indexing OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- Ram 21 Anil 25 Simple 23 Name: R2, dtype: int64
Example 2: To display multiple rows from a dataframe.
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.loc[['R1', 'R3']]) #Multiple rows from dataframe OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- Ram Anil Simple R1 25 29 28 R3 23 18 26
Example 3: To display the values of single column label without using loc( ) method.
import pandas as pd df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df['Ram']) #Column label indexing OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- R1 25 R2 21 R3 23 Name: Ram, dtype: int64
Example 4: To display the values of multiple columns from dataframe without using loc( ) method.
import pandas as pd df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df[['Ram', 'Anil']]) #Multiple Column label indexing OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- Ram Anil R1 25 29 R2 21 25 R3 23 18
Example 5: To display the values of single column label using loc( ) method.
import pandas as pd df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.loc[: , 'Ram']) #Column label indexing using loc( ) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- R1 25 R2 21 R3 23 Name: Ram, dtype: int64
Example 6: To display the values of multiple columns from dataframe using loc( ) method.
import pandas as pd df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.loc[:, 'Ram' : 'Anil']]) #Multiple Column label indexing OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- Ram Anil R1 25 29 R2 21 25 R3 23 18
To access/display columns or rows from a dataframe using positional indexing then iloc( ) method will be used.
Example 7: To display first column from a dataframe
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.iloc[:, 0 : 1]) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- Ram R1 25 R2 21 R3 23
Example 8: To display first and second column from a dataframe
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23],[23, 18, 26]], index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.iloc[:, 0 : 2]) # print(df.iloc[:, [0,1]]) can also be used OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- Ram Anil R1 25 29 R2 21 25 R3 23 18
Example 9: To display only second row from a dataframe
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.iloc[1 : 2]) OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- Ram Anil Simple R2 21 25 23
Example 10: To display first and second row from a dataframe
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.iloc[0:2]) # print(df.iloc[[0,1]]) can also be used OUTPUT: Ram Anil Simple R1 25 29 28 R2 21 25 23 R3 23 18 26 --------------------------------------------------- Ram Anil Simple R1 25 29 28 R2 21 25 23
Example 11: To display first, second and third row from a dataframe.
import pandas as pd df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['R1', 'R2', 'R3', 'R4'], columns = ['Ram', 'Anil', 'Simple', 'Anuj']) print(df) print("---------------------------------------------------") print(df.loc[['R1', 'R2', 'R4']]) # print(df.iloc[[0, 1, 3]]) or print(df.loc[[True,True, False, True]]) can also be used OUTPUT: Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 R3 23 18 26 23 R4 20 18 30 15 --------------------------------------------------- Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 R4 20 18 30 15
Example 12: To display marks of subject Math, English and Science of ‘Anil’ from a dataframe.
import pandas as pd df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science', 'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj']) print(df) print("---------------------------------------------------") print(df.loc['Math' : 'Science', 'Anil']) OUTPUT: Ram Anil Simple Anuj Math 25 29 28 17 English 21 25 23 20 Science 23 18 26 23 Hindi 20 18 30 15 --------------------------------------------------- Math 29 English 25 Science 18 Name: Anil, dtype: int64
Example 13: To display marks of subject Math, English and Science of ‘Ram’ and ‘Anil’ from a dataframe.
import pandas as pd df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science', 'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj']) print(df) print("---------------------------------------------------") print(df.loc['Math' : 'Science','Ram' : 'Anil']) OUTPUT: Ram Anil Simple Anuj Math 25 29 28 17 English 21 25 23 20 Science 23 18 26 23 Hindi 20 18 30 15 --------------------------------------------------- Ram Anil Math 25 29 English 21 25 Science 23 18
Example 14: To display marks of subject Math, English and Science of ‘Ram’, ‘Anil’ and ‘Anuj’ from a dataframe.
import pandas as pd df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science', 'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj']) print(df) print("---------------------------------------------------") print(df.loc['Math' : 'Science', ['Ram', 'Anil', 'Anuj']]) OUTPUT: Ram Anil Simple Anuj Math 25 29 28 17 English 21 25 23 20 Science 23 18 26 23 Hindi 20 18 30 15 --------------------------------------------------- Ram Anil Anuj Math 25 29 17 English 21 25 20 Science 23 18 23
2. Boolean indexing
In Boolean indexing, we can select the data based on the actual values in the DataFrame rather than their row/column labels. we can use conditions on column names to filter data values.
Example 1: Who scored more than 25 marks in Math
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['Math','English','Science'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.loc['Math']>25) OUTPUT: Ram Anil Simple Math 25 29 28 English 21 25 23 Science 23 18 26 --------------------------------------------------- Ram False Anil True Simple True Name: Math, dtype: bool
Example 2: To check in which subjects ‘Anil’ has scored more than 25
import pandas as pd df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['Math','English','Science'], columns = ['Ram', 'Anil', 'Simple']) print(df) print("---------------------------------------------------") print(df.loc[:,'Anil']>25) OUTPUT: Ram Anil Simple Math 25 29 28 English 21 25 23 Science 23 18 26 --------------------------------------------------- Math True English False Science False Name: Anil, dtype: bool
Merging of DataFrames
We can use the pandas.DataFrame.append() method to merge two DataFrames. It appends rows of the second
DataFrame at the end of the first DataFrame. Columns not present in the first DataFrame are added as new
columns. for example
import pandas as pd df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]], index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil', 'Simple', 'Anuj']) print(df) print("-------------------------------------------------") df1 = pd.DataFrame([[10, 12, 8, 7], [1, 5, 3, 2], [2, 1, 2, 2],[0, 1, 3, 5]], index=['R1', 'R2', 'R5', 'R6'], columns = ['Ram', 'Anil', 'Ravi', 'Ashish']) print(df1) print("-------------------------------------------------") df = df.append(df1) #merging two data frames print(df) OUTPUT: Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 R3 23 18 26 23 R4 20 18 30 15 R5 12 15 20 3 R6 23 12 16 30 ------------------------------------------------- Ram Anil Ravi Ashish R1 10 12 8 7 R2 1 5 3 2 R5 2 1 2 2 R6 0 1 3 5 ------------------------------------------------- Ram Anil Simple Anuj Ravi Ashish R1 25 29 28.0 17.0 NaN NaN R2 21 25 23.0 20.0 NaN NaN R3 23 18 26.0 23.0 NaN NaN R4 20 18 30.0 15.0 NaN NaN R5 12 15 20.0 3.0 NaN NaN R6 23 12 16.0 30.0 NaN NaN R1 10 12 NaN NaN 8.0 7.0 R2 1 5 NaN NaN 3.0 2.0 R5 2 1 NaN NaN 2.0 2.0 R6 0 1 NaN NaN 3.0 5.0
To get the column labels appear in sorted order we can set the parameter sort=True. for example
df = df.append(df1, sort=True)
print(df)
The output of above code will be
Anil Anuj Ashish Ram Ravi Simple R1 29 17.0 NaN 25 NaN 28.0 R2 25 20.0 NaN 21 NaN 23.0 R3 18 23.0 NaN 23 NaN 26.0 R4 18 15.0 NaN 20 NaN 30.0 R5 15 3.0 NaN 12 NaN 20.0 R6 12 30.0 NaN 23 NaN 16.0 R1 12 NaN 7.0 10 8.0 NaN R2 5 NaN 2.0 1 3.0 NaN R5 1 NaN 2.0 2 2.0 NaN R6 1 NaN 5.0 0 3.0 NaN NOTE: Observe the column names which are alphabetically arranged
Attributes of DataFrames
Like Series, we can access certain properties called attributes of a DataFrame. Some Attributes of Pandas DataFrame are
1. DataFrame.index: This attribute display all the row labels of dataframe.
2. DataFrame.columns: This attribute display all the column labels of the dataframe.
3. DataFrame.dtypes: This attribute display data type of each column in the dataframe.
4. DataFrame.shape: This attribute display a tuple representing the dimensions of the dataframe. In other words it simply displays the number of rows and columns in the dataframe.
5. DataFrame.size: This attribute simply displays total number of values in the dataframe.
6. DataFrame.T: This attribute transpose the DataFrame. Means, row indices and column labels of the DataFrame replace each other’s position.
7. DataFrame.values: This attribute display a NumPy ndarray having all the values in the DataFrame, without the axes labels.
8. DataFrame.empty: This attribute returns the value True if DataFrame is empty and False otherwise.
import pandas as pd df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['R1', 'R2', 'R3', 'R4'], columns = ['Ram', 'Anil', 'Simple', 'Anuj']) print(df) print("---------------------------------------------------") print(df.index) print("---------------------------------------------------") print(df.columns) print("---------------------------------------------------") print(df.dtypes) print("---------------------------------------------------") print(df.shape) print("---------------------------------------------------") print(df.size) print("---------------------------------------------------") print(df.T) print("---------------------------------------------------") print(df.values) print("---------------------------------------------------") print(df.empty) OUTPUT: Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 R3 23 18 26 23 R4 20 18 30 15 --------------------------------------------------- Index(['R1', 'R2', 'R3', 'R4'], dtype='object') --------------------------------------------------- Index(['Ram', 'Anil', 'Simple', 'Anuj'], dtype='object') --------------------------------------------------- Ram int64 Anil int64 Simple int64 Anuj int64 dtype: object --------------------------------------------------- (4, 4) --------------------------------------------------- 16 --------------------------------------------------- R1 R2 R3 R4 Ram 25 21 23 20 Anil 29 25 18 18 Simple 28 23 26 30 Anuj 17 20 23 15 --------------------------------------------------- [[25 29 28 17] [21 25 23 20] [23 18 26 23] [20 18 30 15]] --------------------------------------------------- False
Methods of DataFrames
1. head( ): This method display the first n rows in the DataFrame. If the parameter n is not specified by default it gives the first 5 rows of the DataFrame. for example
import pandas as pd df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23], [20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]], index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil', 'Simple', 'Anuj']) print(df) print("---------------------------------------------------") print(df.head(2)) #display first two rows print("---------------------------------------------------") print(df.head(1)) #display only first row print("---------------------------------------------------") print(df.head()) #display first five rows as value of n not specified. print("---------------------------------------------------") OUTPUT: Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 R3 23 18 26 23 R4 20 18 30 15 R5 12 15 20 3 R6 23 12 16 30 --------------------------------------------------- Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 --------------------------------------------------- Ram Anil Simple Anuj R1 25 29 28 17 --------------------------------------------------- Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 R3 23 18 26 23 R4 20 18 30 15 R5 12 15 20 3 ---------------------------------------------------
2. tail( ): This method display the last n rows in the DataFrame. If the parameter n is not specified by default it gives the last 5 rows of the DataFrame. for example
import pandas as pd df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]], index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil', 'Simple', 'Anuj']) print(df) print("---------------------------------------------------") print(df.tail(2)) #display last two rows print("---------------------------------------------------") print(df.tail(3)) #display last three rows print("---------------------------------------------------") print(df.tail()) #display last five rows as value of n not specified. print("---------------------------------------------------") OUTPUT: Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 R3 23 18 26 23 R4 20 18 30 15 R5 12 15 20 3 R6 23 12 16 30 --------------------------------------------------- Ram Anil Simple Anuj R5 12 15 20 3 R6 23 12 16 30 --------------------------------------------------- Ram Anil Simple Anuj R4 20 18 30 15 R5 12 15 20 3 R6 23 12 16 30 --------------------------------------------------- Ram Anil Simple Anuj R2 21 25 23 20 R3 23 18 26 23 R4 20 18 30 15 R5 12 15 20 3 R6 23 12 16 30 ---------------------------------------------------
Importing a CSV file to a DataFrame
In order to practice the code , you are suggested to create this csv file using a spreadsheet and save in your computer by name “data.csv”. (Save your file in the same folder where python is installed in your computer or give complete path in the code)
Rollno Name Class Sec 1 Anil X A 2 Anuj XI B 3 Ravi XII B 4 Ananya VI A 5 Sumit VI C 6 Deepak VIII D 7 Parth X A
We can load the data from the data.csv file into a DataFrame, say “stud” using Pandas read_csv() function as shown below:
import pandas as pd stud = pd.read_csv("data.csv", sep=",", header=0) print(stud) OUTPUT: Rollno Name Class Sec 0 1 Anil X A 1 2 Anuj XI B 2 3 Ravi XII B 3 4 Ananya VI A 4 5 Sumit VI C 5 6 Deepak VIII D 6 7 Parth X A
Line by Line Explanation of above code
- The first parameter to the read_csv() is the name of the csv file along with its path.
- The parameter sep specifies whether the values are separated by comma, semicolon, tab, or any other character. The default value for sep is a space.
- header=0 implies that column names are inferred from the first line of the file. By default, header=0.
We can exclusively specify column names using the parameter names while creating the DataFrame using
the read_csv() function. For example
import pandas as pd m = pd.read_csv("data.csv", sep=",", header=0, names=['Rno', 'S_Name', 'S_Class', 'Section']) print(m) OUTPUT: Rno S_Name S_Class Section 0 1 Anil X A 1 2 Anuj XI B 2 3 Ravi XII B 3 4 Ananya VI A 4 5 Sumit VI C 5 6 Deepak VIII D 6 7 Parth X A
Exporting a Dataframe to a CSV file
We can use the to_csv() function to save a DataFrame to a csv file. Let we have a dataframe named “df_stud” contains the following data.
Ram Anil Simple Anuj R1 25 29 28 17 R2 21 25 23 20 R3 23 18 26 23 R4 20 18 30 15 R5 12 15 20 3 R6 23 12 16 30
We want to store the data of “df_stud” in a csv file named “data.csv”. For this we will write te following code
df_stud.to_csv(‘C:\Users\abc\Desktop\data.csv’, sep=’ , ‘)#path will be according to your choice
The above code will create a file “data.csv” on the desktop. When we open this file in any text editor or a spreadsheet, we will find the above data along with the row labels and the column headers, separated by comma.
In case we do not want the column names to be saved to the file we may use the parameter header=False.
Another parameter index=False is used when we do not want the row labels to be written to the file on disk. For example:
df_stud.to_csv(‘C:\Users\abc\Desktop\data.csv’, sep=’ , ‘, header = False, index = False)
Difference between Pandas Series and NumPy Arrays
Pandas Series | NumPy Arrays |
In series we can define our own labeled index to access elements of an array. These can be numbers or letters. | NumPy arrays are accessed by their integer position using numbers only. |
The elements can be indexed in descending order also. | The indexing starts with zero for the first element and the index is fixed. |
If two series are not aligned, NaN or missing values are generated. | There is no concept of NaN values |
Series require more memory. | NumPy occupies lesser memory. |
SUMMARY
1. A DataFrame is a two-dimensional labeled data structure like a spreadsheet. It contains rows and columns and therefore has both a row and column index.
2. When using a dictionary to create a DataFrame, keys of the Dictionary become the column labels of the DataFrame. A DataFrame can be thought of as a dictionary of lists/ Series (all Series/columns sharing the same index label for a row).
3. Data can be loaded in a DataFrame from a file on the disk by using Pandas read_csv function.
4. Data in a DataFrame can be written to a text file on disk by using the pandas.DataFrame.to_csv() function.
5. DataFrame.T gives the transpose of a DataFrame.
6. Pandas haves a number of methods that support label based indexing but every label asked for must be in the index, or a KeyError will be raised.
7. DataFrame.loc[ ] is used for label based indexing of rows in DataFrames.
8. Pandas.DataFrame.append() method is used to merge two DataFrames.
9. Pandas supports non-unique index values. Only if a particular operation that does not support duplicate index values is attempted, an exception is raised at that time.
Important Questions of DataFrame
Important MCQ of DataFrame
Pandas Series NOTES
Important questions of Series
Important MCQ of Series
MCQ of Computer Science Chapter Wise
2. Flow of Control (Loop and Conditional statement)
3. 140+ MCQ on Introduction to Python
4. 120 MCQ on String in Python
7. 100+ MCQ on Flow of Control in Python
8. 60+ MCQ on Dictionary in Python
Important Links
100 Practice Questions on Python Fundamentals
120+ MySQL Practice Questions
90+ Practice Questions on List
50+ Output based Practice Questions
100 Practice Questions on String
70 Practice Questions on Loops
70 Practice Questions on if-else
Disclaimer : I tried to give you the simple notes of ”Dataframe in Python Pandas” , but if you feel that there is/are mistakes in the code or explanation of “Dataframe in Python Pandas“ given above, you can directly contact me at csiplearninghub@gmail.com. Reference for the notes is NCERT book.