Best Dataframe in Python Class 12 Simple Notes

Share with others

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python
Dataframe in Python

DataFrame

A DataFrame is a two-dimensional labelled data structure similar to spreadsheet or table of MySQL. It contains rows and columns, and therefore has both row and column index. Each column can have a different type of value such as numeric, string, boolean, etc.

NOTE: Number of rows and columns can be increased or decreased in DataFrame.

How to create DataFrame in Python?

There are many ways to create DataFrame in Python. Let we discuss few of them

1. Creation of an empty DataFrame:

Code to create an empty DataFrame is given below

import pandas as pd
DF = pd.DataFrame( )
print(DF)

OUTPUT:
Empty DataFrame
Columns: [ ]
Index: [ ]

2. Creation of DataFrame from numpy arrays:

Let we create DataFrame from the numpy arrays

import numpy as np
import pandas as pd
ar1 = np.array([1, 2, 3, 4]) #First array created containing 4 integers
ar2 = np.array([10, 20, 30, 40]) #Second array created containing 4 integers
ar3 = np.array([-23, -43, 67, 90]) #Third array created containing 4 integers

#Let we create DataFrame using first array only and observe the output
DF = pd.DataFrame(ar1)
print(DF)

OUTPUT:

   0
0 1
1 2
2 3

Dataframe in Python Class 12 Notes

import numpy as np
import pandas as pd
ar1 = np.array([1, 2, 3, 4]) #First array created containing 4 integers
ar2 = np.array([10, 20, 30, 40]) #Second array created containing 4 integers
ar3 = np.array([-23, -43, 67, 90]) #Third array created containing 4 integers

#Let we create DataFrame using first and second array only and observe the output

DF = pd.DataFrame([ar1,ar2]) #Creating dataframe using first and second array
print(DF)

OUTPUT:

       0    1     2   3
0     1    2     3   4
1    10  20  30  40

Dataframe in Python Class 12 Notes

import numpy as np
import pandas as pd
ar1 = np.array([1, 2, 3, 4]) #First array created containing 4 integers
ar2 = np.array([10, 20, 30, 40]) #Second array created containing 4 integers
ar3 = np.array([-23, -43, 67, 90]) #Third array created containing 4 integers

#Let we create DataFrame using all the three arrays and observe the output

DF = pd.DataFrame([ar1, ar2, ar3]) #Creating dataframe using all three arrays
print(DF)

OUTPUT:

       0     1     2     3
0     1     2     3     4
1   10   20    30    40
2  -23 -43    67    90

3. Creation of DataFrame from Lists: We can create dataframe from list by passing list to DataFrame( ) function. All the elements of list will be displayed as columns. The default label of column is 0. for example

Practical 1: To create dataframe from simple list.

import pandas as pd
df = pd.DataFrame([11, 22, 33, 44, 55])
print(df)

OUTPUT:

      0
0  11
1  22
2  33
3  44
4  55

Practical 2: To create dataframe from simple list by passing appropriate column heading and row index.

import pandas as pd
df = pd.DataFrame([11, 22, 33, 44, 55], index=['R1', 'R2','R3','R4','R5'], columns=['C1'])
print(df)

OUTPUT:

      C1
R1  11
R2  22
R3  33
R4  44
R5  55

Practical 3: To create dataframe from nested list.

import pandas as pd
df = pd.DataFrame([[21, 'X', 'A'], [32, 'IX', 'B'], [23, 'X', 'A'], [12, 'XI','A']]) 
print(df)


OUTPUT:


      0   1  2
0  21   X  A
1  32  IX  B
2  23   X  A
3  12  XI  A

Dataframe in Python Class 12 Notes

Dataframe in Python
Dataframe in Python

Practical 4: To create dataframe from nested list by passing appropriate column heading and row index.

import pandas as pd
df = pd.DataFrame([[21, 'X', 'A'], [32, 'IX', 'B'], [23, 'X', 'A'],[12, 'XI','A']], index= ['Rec1', 'Rec2', 'Rec3', 'Rec4'], columns = ["Rno", "Class", "Sec"]) 
print(df)

OUTPUT:

          Rno Class Sec
Rec1   21     X   A
Rec2   32    IX   B
Rec3   23     X   A
Rec4   12    XI   A

4. Creation of DataFrame from Dictionary of lists: We can create dataframe from dictionaries of list as shown below. for example

Practical 1: To create dataframe using dictionaries of list.

import pandas as pd
df = pd.DataFrame({'Rno' : [21, 28, 31], 'Class' : ['IX', 'X', 'XI'], 'Sec' : ['B', 'A','C']}) 
print(df)


OUTPUT:


   Rno Class Sec
0   21    IX   B
1   28     X   A
2   31    XI   C

Practical 2: To create dataframe using dictionaries of list with appropriate row index.

import pandas as pd
df = pd.DataFrame({'B_id' : ['B1', 'B8', 'B5'], 'Sub' : ['Hindi', 'Math', 'Science'], 'Cost' : [450, 520, 400]}, index=['R1', 'R2', 'R3']) 
print(df)


OUTPUT:


      B_id     Sub    Cost
R1   B1    Hindi     450
R2   B8     Math    520
R3   B5  Science   400

Note: Dictionary keys become column labels by default in a DataFrame, and the lists become the rows

5. Creation of DataFrame from List of Dictionaries : We can create dataframe from list of dictionaries. for example

import pandas as pd
df = pd.DataFrame([{'Ram' : 25, 'Anil' : 29, 'Simple' : 28}, {'Ram' : 21, 'Anil' : 25, 'Simple':23}, {'Ram' : 23, 'Anil' : 18, 'Simple' : 26}], index=['Term1', 'Term2', 'Term3']) 
print(df)

OUTPUT:

             Ram  Anil    Simple
Term1   25      29      28
Term2   21      25      23
Term3   23      18      26

Here, the keys of dictionaries are taken as column labels, and the values corresponding to each key are taken as rows. There will be as many rows as the number of dictionaries present in the list.

NOTE: NaN (Not a Number) is inserted if a corresponding value for a column is missing as shown in the following example.

import pandas as pd
df = pd.DataFrame([{'Ram' : 25, 'Anil' : 29, 'Simple' : 28}, {'Ram' : 21, 'Anil' : 25, 'Simple':23}, {'Ram' : 23, 'Anil' : 18}], index=['Term1', 'Term2', 'Term3']) 
print(df)

OUTPUT:

             Ram  Anil    Simple
Term1   25      29      28
Term2   21      25      23
Term3   23      18      NaN

Dataframe in Python Class 12 Notes

6. Creation of DataFrame from Series : We can create dataframe from single or multiple Series. for example

Example 1: Creation of DataFrame from Single Series.

import pandas as pd
S1 = pd.Series([10, 20, 30, 40])
S2 = pd.Series([11, 22, 33, 44])
S3 = pd.Series([34, 44, 54, 24])
df = pd.DataFrame(S1) 
print(df)


OUTPUT:

      0
0  10
1  20
2  30
3  40

Here, the DataFrame has as many numbers of rows as the numbers of elements in the series, but has only one column.

Example 2: Creation of DataFrame from two Series.

import pandas as pd
S1 = pd.Series([10, 20, 30, 40])
S2 = pd.Series([11, 22, 33, 44])
S3 = pd.Series([34, 44, 54, 24])
df = pd.DataFrame([S1, S2], index = ['R1', 'R2']) 
print(df)

OUTPUT:

        0    1    2    3
R1  10  20  30  40
R2  11  22  33  44

Example 3: Creation of DataFrame from three Series.

import pandas as pd
S1 = pd.Series([10, 20, 30, 40])
S2 = pd.Series([11, 22, 33, 44])
S3 = pd.Series([34, 44, 54, 24])
df = pd.DataFrame([S1, S2, S3],index = ['R1', 'R2', 'R3'])
print(df)

OUTPUT:

        0    1    2    3
R1  10  20  30  40
R2  11  22  33  44
R3  34  44  54  24

To create a DataFrame using more than one series, we need to pass multiple series in the list as shown above

NOTE: if a particular series does not have a corresponding value for a label, NaN is inserted in the DataFrame column. for example

import pandas as pd
S1 = pd.Series([10, 20, 30, 40])
S2 = pd.Series([11, 22, 33, 44])
S3 = pd.Series([34, 44, 54])
df = pd.DataFrame([S1, S2, S3],index = ['R1', 'R2', 'R3'])
print(df)

OUTPUT:

           0      1       2       3
R1  10.0  20.0  30.0  40.0
R2  11.0  22.0  33.0  44.0
R3  34.0  44.0  54.0   NaN

Dataframe in Python Class 12 Notes

Dataframe in Python
Dataframe in Python

Operations on rows and columns in DataFrames

We can perform some basic operations on rows and columns of a DataFrame like

1. Adding a New Column to a DataFrame:

We can easily add a new column to a DataFrame. Lets see the example given below

import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18, 'Simple':26}],index=['R1','R2','R3']) 
print(df)
df['Amit']=[18, 22, 25] #Adding column to DataFrame
print(df)
df['Parth']=[28, 12, 30]  #Adding column to DataFrame
print(df)

OUTPUT:

       Ram  Anil  Simple
R1   25      29      28
R2   21      25      23
R3   23      18      26
      Ram  Anil  Simple  Amit
R1   25     29      28    18
R2   21     25      23    22
R3   23     18      26    25
      Ram  Anil  Simple  Amit  Parth
R1   25     29      28    18     28
R2   21     25      23    22     12
R3   23     18      26    25     30

NOTE: If we try to add a column with lesser/more values than the number of rows in the DataFrame, it results in a ValueError, with the error message: ValueError: Length of values does not match length of index. for example

import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18, 'Simple':26}],index=['R1','R2','R3'])
print(df)
df['Amit']=[18, 22]
print(df)

OUTPUT:

ValueError: Length of values does not match length of index

2. Adding a New Row to a DataFrame:

We can add a new row to a DataFrame using the DataFrame.loc[ ] method. Lets see the example given below

import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23}, {'Ram':23, 'Anil':18, 'Simple':26}], index=['R1', 'R2', 'R3']) 
print(df)
df.loc['R4']=[12, 22, 10] #Adding new row
print(df)

OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
R4   12    22      10

NOTE: If we try to add a row with lesser/more values than the number of columns in the DataFrame, it results in a ValueError, with the error message: ValueError: Cannot set a row with mismatched columns. for example

import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23}, {'Ram':23, 'Anil':18, 'Simple':26}], index=['R1', 'R2', 'R3']) 
print(df)
df.loc['R4']=[12, 22] #Adding new row with less number of values
print(df)

OUTPUT:

ValueError: cannot set a row with mismatched columns

3. Deleting a Row from a DataFrame:

We can use the DataFrame.drop() method to delete rows. To delete a row, the parameter axis is assigned the value 0. Lets see the examples given below

Example 1: To delete a single row from a Dataframe.

import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28}, {'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18, 'Simple':26}],index=['R1', 'R2', 'R3'])
print(df)
print("----------------------------------------------------")
df=df.drop('R2', axis = 0) #Deleting a row from datafarame
print(df)


OUTPUT:


      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
----------------------------------------------------
      Ram  Anil  Simple
R1   25    29      28
R3   23    18      26

Example 2: To delete a multiple rows from a Dataframe.

import pandas as pd
df = pd.DataFrame({'Ram' : [25, 21, 23], 'Anil' : [29, 25, 18], 'Simple' : [28, 23, 26]}, index=['R1', 'R2', 'R3'])
print(df)
print("----------------------------------------------------")
df=df.drop(['R2', 'R1'], axis = 0) #deleting multiple rows from dataframe
print(df)


OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
----------------------------------------------------
      Ram  Anil  Simple
R3   23    18      26

4. Deleting a Column from a DataFrame:

We can delete the columns from a dataframe by using the following methods

1. pop( ): This method deletes the column from a dataframe and also return the values of deleted column. for example:

import pandas as pd
df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18], 'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3'])
print(df.pop('Simple')) #Deleting a particular Column and returning the value.
print("----------------------------------------------------")
print(df)

OUTPUT:

R1    28
R2    23
R3    26
Name: Simple, dtype: int64
----------------------------------------------------
       Ram  Anil
R1   25    29
R2   21    25
R3   23    18

2. drop( ): This method deletes the entire column from a dataframe. To delete a column, the parameter axis is assigned the value 1. Lets see the examples given below

import pandas as pd
df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18], 'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3'])
print(df)
print("----------------------------------------------------")
df=df.drop('Simple', axis=1) #Deleting column from dataframe
print(df)

OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
----------------------------------------------------
      Ram  Anil
R1   25    29
R2   21    25
R3   23    18

To delete multiple columns

import pandas as pd
df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18], 'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3'])
print(df)
print("----------------------------------------------------")
df=df.drop(['Simple', 'Ram'], axis=1) #deleting multiple columns
print(df)


OUTPUT:


      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
----------------------------------------------------
       Anil
R1    29
R2    25
R3    18

5. Renaming Row Labels of a DataFrame :

We can change the labels of rows in a DataFrame using the DataFrame.rename() method. for example to rename the row indices R1 to Maths, we can write the following code.

Example 1: To change row index ‘R1’ to ‘Maths’

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index = ['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
df=df.rename({'R1' : 'Maths'}) #Statement to change 'R1' to 'Maths'
print(df)


OUTPUT:

    Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
            Ram  Anil  Simple
Maths   25    29      28
R2         21    25      23
R3         23    18      26

Example 2: To change row index ‘R1’ to ‘Maths’, ‘R2’ to ‘Science’ and ‘R3’ to ‘English’

import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
df=df.rename({'R1' : 'Maths', 'R2' : 'Science', 'R3' : 'English'}, axis = 'index')
print("-----------------------------------------------------")
print(df)


OUTPUT:


      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26

-----------------------------------------------------
              Ram  Anil  Simple
Maths     25    29      28
Science   21    25      23
English   23    18      26

NOTE: The parameter axis='index' is used to specify that the row label is to be changed. We can skip this also as bydefault rename() function changes the row indices.

6. Renaming Column Labels of a DataFrame :

To alter the column names of a DataFrame we can use the rename() method, as shown below. The parameter
axis=’columns’ implies we want to change the column labels:

Example 1: To change the column heading from ‘Ram’ to ‘Ravi’

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
df=df.rename({'Ram' : 'Ravi'}, axis = 'columns')
print("-----------------------------------------------------")
print(df)

OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
-----------------------------------------------------
       Ravi  Anil  Simple
R1    25    29      28
R2    21    25      23
R3    23    18      26

Example 2: To change the column heading from ‘Ram’ to ‘Ravi’ and from ‘Simple’ to ‘Sumit’

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
df=df.rename({'Ram' : 'Ravi', 'Simple' : 'Sumit'}, axis = 'columns')
print("-----------------------------------------------------")
print(df)

OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
-----------------------------------------------------
       Ravi  Anil  Sumit
R1    25    29      28
R2    21    25      23
R3    23    18      26
Dataframe in Python
Dataframe in Python

Accessing DataFrames Element through Indexing

Data elements in a DataFrame can be accessed using indexing.There are two ways of indexing Dataframes :

1. Label based indexing

There are several methods in Pandas to implement label based indexing. DataFrame.loc[ ] is an important method that is used for label based indexing with DataFrames.

Example 1: To display single row from a dataframe using loc( ) method.

import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc['R2']) #row label indexing


OUTPUT:


      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
Ram        21
Anil         25
Simple    23
Name: R2, dtype: int64

Example 2: To display multiple rows from a dataframe.

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc[['R1', 'R3']]) #Multiple rows from dataframe

OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
      Ram  Anil  Simple
R1   25    29      28
R3   23    18      26

Example 3: To display the values of single column label without using loc( ) method.

import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df['Ram']) #Column label indexing


OUTPUT:

       Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
R1    25
R2    21
R3    23
Name: Ram, dtype: int64

Example 4: To display the values of multiple columns from dataframe without using loc( ) method.

import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df[['Ram', 'Anil']]) #Multiple Column label indexing


OUTPUT:

       Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
       Ram  Anil
R1   25    29
R2   21    25
R3   23    18

Example 5: To display the values of single column label using loc( ) method.

import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc[: , 'Ram']) #Column label indexing using loc( )


OUTPUT:

       Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
R1    25
R2    21
R3    23
Name: Ram, dtype: int64

Example 6: To display the values of multiple columns from dataframe using loc( ) method.

import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23, 18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc[:, 'Ram' : 'Anil']]) #Multiple Column label indexing


OUTPUT:

       Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
       Ram  Anil
R1   25    29
R2   21    25
R3   23    18

To access/display columns or rows from a dataframe using positional indexing then iloc( ) method will be used.

Example 7: To display first column from a dataframe

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.iloc[:, 0 : 1]) 

OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
       Ram
R1   25
R2   21
R3   23

Example 8: To display first and second column from a dataframe

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23],[23, 18, 26]], index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.iloc[:, 0 : 2]) # print(df.iloc[:, [0,1]]) can also be used

OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
      Ram  Anil
R1   25    29
R2   21    25
R3   23    18

Example 9: To display only second row from a dataframe

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.iloc[1 : 2])

OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
      Ram  Anil  Simple
R2   21    25      23

Example 10: To display first and second row from a dataframe

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.iloc[0:2]) # print(df.iloc[[0,1]]) can also be used


OUTPUT:

      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23
R3   23    18      26
---------------------------------------------------
      Ram  Anil  Simple
R1   25    29      28
R2   21    25      23

Example 11: To display first, second and third row from a dataframe.

import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['R1', 'R2', 'R3', 'R4'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.loc[['R1', 'R2', 'R4']]) # print(df.iloc[[0, 1, 3]])  or print(df.loc[[True,True, False, True]]) can also be used  

OUTPUT:

    Ram  Anil  Simple  Anuj
R1   25    29      28    17
R2   21    25      23    20
R3   23    18      26    23
R4   20    18      30    15
---------------------------------------------------
    Ram  Anil  Simple  Anuj
R1   25    29      28    17
R2   21    25      23    20
R4   20    18      30    15

Example 12: To display marks of subject Math, English and Science of ‘Anil’ from a dataframe.

import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science', 'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.loc['Math' : 'Science', 'Anil'])

OUTPUT:

             Ram  Anil  Simple  Anuj
Math       25    29      28    17
English    21    25      23    20
Science    23    18      26    23
Hindi       20    18      30    15
---------------------------------------------------
Math        29
English     25
Science    18
Name: Anil, dtype: int64

Example 13: To display marks of subject Math, English and Science of ‘Ram’ and ‘Anil’ from a dataframe.

import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science', 'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.loc['Math' : 'Science','Ram' : 'Anil'])

OUTPUT:

            Ram  Anil  Simple  Anuj
Math      25    29      28    17
English   21    25      23    20
Science   23    18      26    23
Hindi      20    18      30    15
---------------------------------------------------
              Ram  Anil
Math       25    29
English    21    25
Science   23    18

Example 14: To display marks of subject Math, English and Science of ‘Ram’, ‘Anil’ and ‘Anuj’ from a dataframe.

import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science', 'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.loc['Math' : 'Science', ['Ram', 'Anil', 'Anuj']])

OUTPUT:

             Ram  Anil  Simple  Anuj
Math       25     29      28       17
English    21     25      23       20
Science   23     18       26      23
Hindi      20     18       30       15
---------------------------------------------------
            Ram  Anil  Anuj
Math      25    29    17
English   21    25    20
Science   23    18    23

2. Boolean indexing

In Boolean indexing, we can select the data based on the actual values in the DataFrame rather than their row/column labels. we can use conditions on column names to filter data values.

Example 1: Who scored more than 25 marks in Math

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['Math','English','Science'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc['Math']>25)


OUTPUT:

             Ram  Anil  Simple
Math       25    29      28
English    21    25      23
Science   23    18      26
---------------------------------------------------
Ram          False
Anil          True
Simple     True
Name: Math, dtype: bool

Example 2: To check in which subjects ‘Anil’ has scored more than 25

import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]], index=['Math','English','Science'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc[:,'Anil']>25)

OUTPUT:

             Ram  Anil  Simple
Math       25    29      28
English    21    25      23
Science   23    18      26
---------------------------------------------------
Math        True
English     False
Science    False
Name: Anil, dtype: bool

Merging of DataFrames

We can use the pandas.DataFrame.append() method to merge two DataFrames. It appends rows of the second
DataFrame at the end of the first DataFrame. Columns not present in the first DataFrame are added as new
columns. for example

import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]], index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("-------------------------------------------------")
df1 = pd.DataFrame([[10, 12, 8, 7], [1, 5, 3, 2], [2, 1, 2, 2],[0, 1, 3, 5]], index=['R1', 'R2', 'R5', 'R6'], columns = ['Ram', 'Anil', 'Ravi', 'Ashish'])
print(df1)
print("-------------------------------------------------")
df = df.append(df1) #merging two data frames
print(df)

OUTPUT:

    Ram  Anil  Simple  Anuj
R1   25    29      28    17
R2   21    25      23    20
R3   23    18      26    23
R4   20    18      30    15
R5   12    15      20     3
R6   23    12      16    30
-------------------------------------------------
     Ram  Anil  Ravi  Ashish
R1   10    12     8       7
R2    1     5       3       2
R5    2     1       2       2
R6    0     1       3       5
-------------------------------------------------
    Ram  Anil  Simple  Anuj  Ravi    Ashish
R1   25    29    28.0    17.0   NaN     NaN
R2   21    25    23.0    20.0   NaN     NaN
R3   23    18    26.0    23.0   NaN     NaN
R4   20    18    30.0    15.0   NaN     NaN
R5   12    15    20.0    3.0     NaN     NaN
R6   23    12    16.0    30.0   NaN     NaN
R1   10    12    NaN   NaN     8.0        7.0
R2    1     5      NaN   NaN     3.0        2.0
R5    2     1      NaN   NaN     2.0        2.0
R6    0     1      NaN   NaN     3.0        5.0

To get the column labels appear in sorted order we can set the parameter sort=True. for example

df = df.append(df1, sort=True)
print(df)

The output of above code will be

       Anil  Anuj  Ashish  Ram  Ravi  Simple
R1    29  17.0     NaN   25   NaN      28.0
R2    25  20.0     NaN   21   NaN      23.0
R3    18  23.0     NaN   23   NaN      26.0
R4    18  15.0     NaN   20   NaN      30.0
R5    15   3.0     NaN   12    NaN      20.0
R6    12  30.0     NaN   23   NaN      16.0
R1    12   NaN     7.0   10     8.0       NaN
R2     5   NaN     2.0    1       3.0       NaN
R5     1   NaN     2.0    2       2.0       NaN
R6     1   NaN     5.0    0       3.0       NaN

NOTE: Observe the column names which are alphabetically arranged

Attributes of DataFrames

Like Series, we can access certain properties called attributes of a DataFrame. Some Attributes of Pandas DataFrame are

1. DataFrame.index: This attribute display all the row labels of dataframe.

2. DataFrame.columns: This attribute display all the column labels of the dataframe.

3. DataFrame.dtypes: This attribute display data type of each column in the dataframe.

4. DataFrame.shape: This attribute display a tuple representing the dimensions of the dataframe. In other words it simply displays the number of rows and columns in the dataframe.

5. DataFrame.size: This attribute simply displays total number of values in the dataframe.

6. DataFrame.T: This attribute transpose the DataFrame. Means, row indices and column labels of the DataFrame replace each other’s position.

7. DataFrame.values: This attribute display a NumPy ndarray having all the values in the DataFrame, without the axes labels.

8. DataFrame.empty: This attribute returns the value True if DataFrame is empty and False otherwise.

import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15]], index=['R1', 'R2', 'R3', 'R4'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.index)
print("---------------------------------------------------")
print(df.columns)
print("---------------------------------------------------")
print(df.dtypes)
print("---------------------------------------------------")
print(df.shape)
print("---------------------------------------------------")
print(df.size)
print("---------------------------------------------------")
print(df.T)
print("---------------------------------------------------")
print(df.values)
print("---------------------------------------------------")
print(df.empty)

OUTPUT:

    Ram  Anil  Simple  Anuj
R1   25    29      28    17
R2   21    25      23    20
R3   23    18      26    23
R4   20    18      30    15
---------------------------------------------------
Index(['R1', 'R2', 'R3', 'R4'], dtype='object')
---------------------------------------------------
Index(['Ram', 'Anil', 'Simple', 'Anuj'], dtype='object')
---------------------------------------------------
Ram       int64
Anil      int64
Simple    int64
Anuj      int64
dtype: object
---------------------------------------------------
(4, 4)
---------------------------------------------------
16
---------------------------------------------------
             R1  R2  R3  R4
Ram      25  21  23  20
Anil       29  25  18  18
Simple  28  23  26  30
Anuj     17  20  23  15
---------------------------------------------------
[[25 29 28 17]
 [21 25 23 20]
 [23 18 26 23]
 [20 18 30 15]]
---------------------------------------------------
False

Methods of DataFrames

1. head( ): This method display the first n rows in the DataFrame. If the parameter n is not specified by default it gives the first 5 rows of the DataFrame. for example

import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23], [20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]], index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.head(2)) #display first two rows
print("---------------------------------------------------")
print(df.head(1)) #display only first row
print("---------------------------------------------------")
print(df.head()) #display first five rows as value of n not specified.
print("---------------------------------------------------")


OUTPUT:

        Ram  Anil  Simple  Anuj
R1   25    29      28    17
R2   21    25      23    20
R3   23    18      26    23
R4   20    18      30    15
R5   12    15      20     3
R6   23    12      16    30
---------------------------------------------------
    Ram  Anil  Simple  Anuj
R1   25    29      28    17
R2   21    25      23    20
---------------------------------------------------
    Ram  Anil  Simple  Anuj
R1   25    29      28    17
---------------------------------------------------
    Ram  Anil  Simple  Anuj
R1   25    29      28    17
R2   21    25      23    20
R3   23    18      26    23
R4   20    18      30    15
R5   12    15      20     3
---------------------------------------------------

2. tail( ): This method display the last n rows in the DataFrame. If the parameter n is not specified by default it gives the last 5 rows of the DataFrame. for example

import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18, 26, 23],[20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]], index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.tail(2)) #display last two rows
print("---------------------------------------------------")
print(df.tail(3)) #display last three rows
print("---------------------------------------------------")
print(df.tail()) #display last five rows as value of n not specified.
print("---------------------------------------------------")

OUTPUT:

     Ram  Anil  Simple  Anuj
R1   25    29      28    17
R2   21    25      23    20
R3   23    18      26    23
R4   20    18      30    15
R5   12    15      20     3
R6   23    12      16    30
---------------------------------------------------
     Ram  Anil  Simple  Anuj
R5   12    15      20     3
R6   23    12      16    30
---------------------------------------------------
     Ram  Anil  Simple  Anuj
R4   20    18      30    15
R5   12    15      20     3
R6   23    12      16    30
---------------------------------------------------
     Ram  Anil  Simple  Anuj
R2   21    25      23    20
R3   23    18      26    23
R4   20    18      30    15
R5   12    15      20     3
R6   23    12      16    30
---------------------------------------------------

Importing a CSV file to a DataFrame

In order to practice the code , you are suggested to create this csv file using a spreadsheet and save in your computer by name “data.csv”. (Save your file in the same folder where python is installed in your computer or give complete path in the code)

Rollno	Name	Class	Sec
1	         Anil	        X	        A
2	         Anuj	XI	        B
3	         Ravi	XII	        B
4	         Ananya	VI	        A
5	         Sumit	VI	        C
6	         Deepak	VIII	        D
7	         Parth	X	        A

We can load the data from the data.csv file into a DataFrame, say “stud” using Pandas read_csv() function as shown below:

import pandas as pd
stud = pd.read_csv("data.csv", sep=",",  header=0)
print(stud)

OUTPUT:

       Rollno    Name      Class      Sec
0       1          Anil           X            A
1       2         Anuj           XI           B
2       3         Ravi            XII          B
3       4        Ananya       VI           A
4       5        Sumit          VI           C
5       6        Deepak       VIII         D
6       7        Parth           X            A

Line by Line Explanation of above code

  1. The first parameter to the read_csv() is the name of the csv file along with its path.
  2. The parameter sep specifies whether the values are separated by comma, semicolon, tab, or any other character. The default value for sep is a space.
  3. header=0 implies that column names are inferred from the first line of the file. By default, header=0.

We can exclusively specify column names using the parameter names while creating the DataFrame using
the read_csv() function. For example

import pandas as pd
m = pd.read_csv("data.csv", sep=",",  header=0, names=['Rno', 'S_Name', 'S_Class', 'Section'])
print(m)

OUTPUT:

       Rno    S_Name      S_Class     Section
0       1          Anil           X            A
1       2         Anuj           XI           B
2       3         Ravi            XII          B
3       4        Ananya       VI           A
4       5        Sumit          VI           C
5       6        Deepak       VIII         D
6       7        Parth           X            A

Exporting a Dataframe to a CSV file

We can use the to_csv() function to save a DataFrame to a csv file. Let we have a dataframe named “df_stud” contains the following data.

     Ram  Anil  Simple  Anuj
R1   25    29      28        17
R2   21    25      23        20
R3   23    18      26        23
R4   20    18      30        15
R5   12    15      20          3
R6   23    12      16        30

We want to store the data of “df_stud” in a csv file named “data.csv”. For this we will write te following code

df_stud.to_csv(‘C:\Users\abc\Desktop\data.csv’, sep=’ , ‘)#path will be according to your choice

The above code will create a file “data.csv” on the desktop. When we open this file in any text editor or a spreadsheet, we will find the above data along with the row labels and the column headers, separated by comma.

In case we do not want the column names to be saved to the file we may use the parameter header=False.
Another parameter index=False is used when we do not want the row labels to be written to the file on disk. For example:

df_stud.to_csv(‘C:\Users\abc\Desktop\data.csv’, sep=’ , ‘, header = False, index = False)

Difference between Pandas Series and NumPy Arrays

Pandas SeriesNumPy Arrays
In series we can define our own labeled index to
access elements of an array. These can be numbers
or letters.
NumPy arrays are accessed by their integer
position using numbers only.
The elements can be indexed in descending order
also.
The indexing starts with zero for the first
element and the index is fixed.
If two series are not aligned, NaN or missing values
are generated.
There is no concept of NaN values
Series require more memory.NumPy occupies lesser memory.
SUMMARY

1. A DataFrame is a two-dimensional labeled data structure like a spreadsheet. It contains rows and columns and therefore has both a row and column index.

2. When using a dictionary to create a DataFrame, keys of the Dictionary become the column labels of the DataFrame. A DataFrame can be thought of as a dictionary of lists/ Series (all Series/columns sharing the same index label for a row).

3. Data can be loaded in a DataFrame from a file on the disk by using Pandas read_csv function.

4. Data in a DataFrame can be written to a text file on disk by using the pandas.DataFrame.to_csv() function.

5. DataFrame.T gives the transpose of a DataFrame. 

6. Pandas haves a number of methods that support label based indexing but every label asked for must be in the index, or a KeyError will be raised.

7. DataFrame.loc[ ] is used for label based indexing of rows in DataFrames.

8. Pandas.DataFrame.append() method is used to merge two DataFrames.

9. Pandas supports non-unique index values. Only if a particular operation that does not support duplicate index values is attempted, an exception is raised at that time.


Important Questions of DataFrame

Important MCQ of DataFrame

Pandas Series NOTES

Important questions of Series

Important MCQ of Series



MCQ of Computer Science Chapter Wise

1. Functions in Python

2. Flow of Control (Loop and Conditional statement)

3. 140+ MCQ on Introduction to Python

4. 120 MCQ on String in Python

5. 100+ MCQ on List in Python

6. 50+ MCQ on Tuple in Python

7. 100+ MCQ on Flow of Control in Python

8. 60+ MCQ on Dictionary in Python


Important Links

100 Practice Questions on Python Fundamentals

120+ MySQL Practice Questions

90+ Practice Questions on List

50+ Output based Practice Questions

100 Practice Questions on String

70 Practice Questions on Loops

70 Practice Questions on if-else


Disclaimer : I tried to give you the simple notes of ”Dataframe in Python Pandas” , but if you feel that there is/are mistakes in the code or explanation of “Dataframe in Python Pandas“ given above, you can directly contact me at csiplearninghub@gmail.com. Reference for the notes is NCERT book.


Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes

Dataframe in Python Class 12 Notes


Share with others

Leave a Reply

error: Content is protected !!