Data handling using Pandas Series Class 12 Notes
Data handling using Pandas Series
Introduction to Python Libraries
NumPy, Pandas and Matplotlib are three Python libraries which are used for scientific and analytical use. These libraries allow us to manipulate, transform and visualise data easily and efficiently.
NumPy, which stands for ‘Numerical Python’, is a library that can be used for numerical data analysis and scientific computing.
PANDAS stands for PanelData is a high level data manipulation tool used for analysing data. It is built in packages like NumPy and Matplotlib and gives us a single, convenient place to do most of our data analysis and visualisation work. Pandas has three important data structures.
- Series
- DataFrame
- Panel
The Matplotlib library in Python is used for plotting graphs and visualisation. Using Matplotlib, we can generate publication quality plots, histograms, bar charts etc.
Differences between Pandas and Numpy
Pandas | Numpy |
It can create Series and DataFrame | It creates arrays |
Pandas DataFrame can have different data types (float, int, string, datetime, etc.). | A Numpy array requires homogeneous data. |
Pandas is used when data is in tabular format | Numpy is used for numeric array based data |
Pandas is used for data analysis and visualizations. | NumPy is used for numerical calculations. |
Installing Pandas
Command used to install Pandas is given below
pip install Pandas
NOTE : Pandas can be installed only when Python is already installed on that system.
Data Structures in Pandas
A data structure is collection of data values and operations that can be applied to that data. Two commonly used data structures in Pandas are:
- Series
- DataFrame
Series:
A Series is a one-dimensional array containing a sequence of values of any data type (int, float, list, string, etc). By default Series have numeric data labels starting from zero. The data label associated with a particular value is called its index. We can also assign values of other data types as index. Example of a series containing names of Cities is given below
Index Data 0 Delhi 1 Faridabad 2 Jaipur 3 Mumbai 4 Bangalore Here the index value is numeric
Example of a series containing names of Fruits is given below
Index Data 0 Mango 1 Guava 2 Banana 3 Grapes 4 Water melon Here the index value is numeric
Example of Series containing month name as index and number of days as Data
Index Data Jan 31 Feb 28 Mar 31 April 30 May 31 Here the index value is String
How to create Series:
A Series in Pandas can be created using Series( ) method.
1. Creation of empty Series
>>>import pandas as pd >>>s1 = pd.Series( ) >>>s1 Series([ ], dtype: float64)
There are different ways in which a series can be created in Pandas.
2. Creation of Series using List
A Series can be created using list as shown in the example below:
>>> import pandas as pd #import Pandas with alias pd >>> s1 = pd.Series([2, 4, 8, 12, 14, 20]) #create a Series using list >>> print(s1) #Display the series OUTPUT 0 2 1 4 2 8 3 12 4 14 5 20 dtype: int64
Observe that output is shown in two columns – the index is on the left and the data value is on the right.
We can also assign a user-defined labels to the index and use them to access elements of a Series. The following example has a numeric index in random order
>>> series2 = pd.Series(["Raman","Rosy","Ram"], index=[1, 7, 9]) >>> print(series2) #Display the series OUTPUT 1 Raman 7 Rosy 9 Ram dtype: object Here, data values Raman, Rosy and Ram have index values 1, 7 and 9 respectively
We can also use letters or strings as indices, for example:
>>> import pandas as pd >>> S2 = pd.Series([2,3,4],index=["Feb","Mar","Apr"]) >>> print(S2) #Display the series OUTPUT Feb 2 Mar 3 Apr 4 dtype: int64 Here, data values 2, 3 and 4 have index values Feb, Mar and Apr respectively
3. Creation of Series using NumPy Arrays
We can create a series from a one-dimensional (1D) NumPy array, as shown below:
>>> import numpy as np # import NumPy with alias np >>> import pandas as pd >>> a1 = np.array([6, 4, 8, 9]) >>> s3 = pd.Series(a1) >>> print(s3) Output: 0 6 1 4 2 8 3 9 dtype: int32
We can also use letters or strings as indices. for example
>>> import numpy as np # import NumPy with alias np
>>> import pandas as pd
>>> a1 = np.array([6, 4, 8, 9])
>>> s3 = pd.Series(a1, index = ['a', 'b', 'c', 'd'])
>>> print(s3)
Output:
a 6
b 4
c 8
d 9
dtype: int64
When index labels are passed with the array, then the length of the index and array must be of the same
size, else it will result in a ValueError like shown below
>>> import numpy as np # import NumPy with alias np
>>> import pandas as pd
>>> a1 = np.array([6, 4, 8, 9])
>>> s3 = pd.Series(a1, index = ['a', 'b', 'c', 'd', 'e'])
>>> print(s3)
OUTPUT:
ValueError: Length of values (4) does not match length of index (5)
4. Creation of Series from Dictionary:
When a series is created from dictionary then the keys of the dictionary becomes the index of the series, so no need to declare the index as a separate list as the built-in keys will be treated as the index of the series. Let we do some practicals.
Practical 1: Pass the dictionary to the method Series()
import pandas as pd S2 = pd.Series({2 : "Feb", 3 : "Mar", 4 : "Apr"}) print(S2) #Display the series OUTPUT: 2 Feb 3 Mar 4 Apr dtype: object NOTE: In above example, you can see that keys of a dictionary becomes the index of the Series
Practical 2: Store the dictionary in a variable and pass it the variable to method Series()
import pandas as pd d = {"One" : 1, "Two" : 2, "Three" : 3, "Four" : 4} S2 = pd.Series(d) print(S2) #Display the series OUTPUT: One 1 Two 2 Three 3 Four 4 dtype: int64
Practical 3: Lets try to pass index while creating Series from Dictionary
import pandas as pd d = {"One" : 1, "Two" :2, "Three" : 3, "Four" : 4} S2 = pd.Series(d, index=["A", "B", "C", "D"]) print(S2) OUTPUT: A NaN B NaN C NaN D NaN dtype: float64
5. Creation of Series using mathematical expressions:
import pandas as pd d = [12, 13, 14, 15] S2 = pd.Series(data = [d]*2, index = d) print(S2) #Display the series OUTPUT ValueError: Length of values (2) does not match length of index (4)
import pandas as pd d = [12, 13, 14, 15] S2 = pd.Series(data=[d]*4, index=d) print(S2) #Display the series OUTPUT 12 [12, 13, 14, 15] 13 [12, 13, 14, 15] 14 [12, 13, 14, 15] 15 [12, 13, 14, 15] dtype: object
6. Creation of Series using String:
Practical 1:
import pandas as pd S2 = pd.Series('a', 'b', 'c') print(S2) #Display the series OUTPUT 0 a 1 b 2 c
Practical 2:
import pandas as pd S2 = pd.Series('anil', 'bhuvan', 'ravi') print(S2) #Display the series OUTPUT 0 anil 1 bhuvan 2 ravi
Practical 3:
import pandas as pd S2 = pd.Series('anil', 'bhuvan', 'ravi', index = [1, 4, 7]) print(S2) #Display the series OUTPUT 1 anil 4 bhuvan 7 ravi
How to modify the index value of the existing Series:
We can change the existing index value of the Series by using index method.
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry) seriesCapCntry.index=[10,20,30,40] #this statement is used to change the index of the Series. print(seriesCapCntry) OUTPUT India NewDelhi USA WashingtonDC UK London France Paris dtype: object 10 NewDelhi 20 WashingtonDC 30 London 40 Paris dtype: object
How to access elements of the Series:
There are two common ways for accessing the elements of a series: Indexing and Slicing.
1. Indexing :
Indexing in Series is used to access elements in a series. Indexes are of two types:
- Positional index
- Labelled index.
Positional index takes an integer value that corresponds to its position in the series starting from 0, whereas
Labelled index takes any user-defined label as index.
Lets do some practicals of accessing elements of Series using Positional index
Practical 1: Accessing single element from the series.
import pandas as pd d = [31, 15, 17, 20] S2 = pd.Series(d) print(S2[2]) #Display the third value of the series using it's index value print(S2[0]) #Display the first value of the series using it's index value OUTPUT: 17 31
Practical 2: What happen if we type wrong Series name to access an element.
import pandas as pd d = [31, 15, 17, 20] S2 = pd.Series(d) print(s2[2]) #Wrong Series name OUTPUT: NameError: name 's2' is not defined
Practical 3: What happen if we give wrong index to access an element.
import pandas as pd d = [31, 15, 17, 20] S2 = pd.Series(d) print(S2[5]) #Wrong index OUTPUT: KeyError: 5
Practical 4: What happen if we give negative index to access an element.
import pandas as pd d = [31, 15, 17, 20] S2 = pd.Series(d) print(S2[-1]) #Negative index OUTPUT: KeyError: -1 The above error can be rectify by adding index as shown below import pandas as pd d = [1, 2, 3] S2 = pd.Series(d, index=["One", "Two", "Three"]) print(S2[-1]) OUTPUT: 3
Practical 5: What happen if we give negative index(enclosed in square brackets) to access an element.
import pandas as pd d = [1, 2, 3] S2 = pd.Series(d, index=["One", "Two", "Three"]) print(S2[[-1]]) OUTPUT: Three 3 dtype: int64
Lets do some practicals of accessing elements of Series having index value.
In the following example, value NewDelhi is displayed for the labelled index India.
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry['India']) #Using Labelled index OUTPUT: NewDelhi
We can also access an element of the series using the positional index:
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry[1]) OUTPUT: WashingtonDC
More than one element of a series can be accessed using a list of positional integers.
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry[[1, 2]]) OUTPUT: USA WashingtonDC UK London dtype: object
More than one element of a series can also be accessed using a list of index labels as shown in the following examples:
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry[['India', 'UK']]) #Accessing Multiple elements using index labels OUTPUT: India NewDelhi UK London dtype: object
2. Slicing :
Sometimes, we may need to extract a part of a series. This can be done through slicing. This is similar to
slicing used with List. When we use positional indices for slicing, the value at the end index position will be excluded. for example
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry[0:2]) #Here the value at index 0 and 1 will be extracted OUTPUT: India NewDelhi USA WashingtonDC dtype: object
Let we take another example
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry[0:1]) #Here the value at index 0 will be extracted OUTPUT: India NewDelhi dtype: object
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry[-1 : -3 : -1]) #Here the value at index 0 will be extracted OUTPUT: France Paris UK London dtype: object
We can also get the series in reverse order, for example:
import pandas as pd seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'], index=['India', 'USA', 'UK', 'France']) print(seriesCapCntry[: : -1]) #Here the series will be extracted in reverse order OUTPUT: France Paris UK London USA WashingtonDC India NewDelhi dtype: object
If labelled indexes are used for slicing, then value at the end index label is also included in the output, for example:
Practical 1:
import pandas as pd S2 = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"]) print(S2["Two" : "Five"]) OUTPUT: Two 2 Three 3 Four 4 Five 5 dtype: int64
Practical 2:
import pandas as pd S2 = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"]) print(S2["One" : "Three"]) OUTPUT: One 1 Two 2 Three 3 dtype: int64
How to modify the elements of the Series:
We can modify the values of series elements by assigning the value to the keys of the series as shown in the following example:
Example 1:
import pandas as pd S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"]) S2["Two"] = 22 print(S2) OUTPUT: One 1 Two 22 Three 3 Four 4 Five 5 dtype: int64
Example 2:
import pandas as pd S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"]) S2["Two", "Three"] = 22 print(S2) OUTPUT: One 1 Two 22 Three 22 Four 4 Five 5 dtype: int64
Example 3:
import pandas as pd S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"]) S2[1 : 4] = 22 #we can use slicing to modify the value print(S2) OUTPUT: One 1 Two 22 Three 22 Four 22 Five 5 dtype: int64
Observe that updating the values in a series using slicing also excludes the value at the end index position.
But, it changes the value at the end index label when slicing is done using labels.
Example 4:
import pandas as pd S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"]) S2["One" : "Four"] = 22 print(S2) OUTPUT: One 22 Two 22 Three 22 Four 22 Five 5 dtype: int64
Practice Exercise :
Q1. Consider the following Series and write the output of the following:
import pandas as pd S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
a. print(S[“Two”])
b. print(S[2])
c. print(S[0])
d. print(S[0 : 2])
e. print(S[“Two” : “Three”])
f. print(S[: : -1])
g. print(S[1 : 4])
h. print(S[[“Two”, “Four”]])
i. print(S[“Two” : “Four”])
j. print(S[-1])
SOLUTIONS
a. 2 b. 3 c. 1 d. One 1 Two 2 dtype: int64 e. Two 2 Three 3 dtype: int64 f. Five 5 Four 4 Three 3 Two 2 One 1 dtype: int64 g. Two 2 Three 3 Four 4 dtype: int64 h. Two 2 Four 4 dtype: int64 i. Two 2 Three 3 Four 4 dtype: int64 j. 5
Attributes of Series:
We can access various properties of a series by using its attributes with the series name. Syntax of using attribute is given below
<Series Name>.<Attribute Name>
Few attributes of Pandas Series are given in the following table:
Attribute Name | Purpose |
name | This attribute assigns a name to the Series. |
index.name | It assigns a name to the index of the series |
values | This attributes prints all the values of the series in the form of list. |
size | This attribute prints the number of values in the Series. |
empty | prints True if the series is empty, and False otherwise |
index | It returns the index of the series. |
hasnans | It returns “True” if series has any NaN |
Practice Exercise of Series Attributes
Example 1: Demonstration of “name” attribute
import pandas as pd S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"]) print(S) print("------------------------------------------") S.name="Sample" print(S)
OUTPUT One 1 Two 2 Three 3 Four 4 Five 5 dtype: int64 ------------------------------------------ One 1 Two 2 Three 3 Four 4 Five 5 Name: Sample, dtype: int64
Example 2: Demonstration of “index.name” attribute
import pandas as pd S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"]) print(S) print("------------------------------------------") S.index.name="Number" print(S)
OUTPUT: One 1 Two 2 Three 3 Four 4 Five 5 dtype: int64 ------------------------------------------ Number One 1 Two 2 Three 3 Four 4 Five 5 dtype: int64
Example 3: Demonstration of “values”, “size” and “empty” attribute
import pandas as pd S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"]) print(S) print("------------------------------------------") print(S.values) print("------------------------------------------") print(S.size) print("------------------------------------------") print(S.empty)
OUTPUT: One 1 Two 2 Three 3 Four 4 Five 5 dtype: int64 ------------------------------------------ [1 2 3 4 5] ------------------------------------------ 5 ------------------------------------------ False
Example 4: Demonstration of “index” and “hasnans” attribute
import pandas as pd S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"]) print(S) print("------------------------------------------") print(S.index) print("------------------------------------------") print(S.hasnans)
Methods of Series:
In this section, we are going to discuss methods available for Pandas Series. Let us consider the following Series.
import pandas as pd S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six", "Seven"]) print(S)
1. head(n): This method returns the first n members of the series. If the value for n is not passed, then by default first five members are displayed. for example
import pandas as pd S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six", "Seven"]) print(S.head(3)) #Display First three members of Series print("------------------------------------------------") print(S.head( ))#Display First five members of Series as no argument is passed
OUTPUT: One 1 Two 2 Three 3 dtype: int64 ------------------------------------------------ One 1 Two 2 Three 3 Four 4 Five 5 dtype: int64
2. tail(n): This method returns the last n members of the series. If the value for n is not passed, then by default last five members will be displayed. for example
import pandas as pd S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six", "Seven"]) print(S.tail(4)) #Display last four members of Series print("------------------------------------------------") print(S.tail( ))#Display last five members of Series as no argument is passed
OUTPUT: Four 4 Five 5 Six 6 Seven 7 dtype: int64 ------------------------------------------------ Three 3 Four 4 Five 5 Six 6 Seven 7 dtype: int64
3. count( ): This method returns returns the number of non-NaN values in the Series. for example
import pandas as pd S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six", "Seven"]) print(S.count()) OUTPUT: 7
Accessing values of Series using conditions:
We can display particular values of Series using conditions for example:
import pandas as pd S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six", "Seven"]) print([S>5]) #will return True for those values of Series which satisfy the condition print("----------------------------------------") print(S[S>5]) #will return those values of Series which satisfy the condition
OUTPUT: [One False Two False Three False Four False Five False Six True Seven True dtype: bool] ---------------------------------------- Six 6 Seven 7 dtype: int64
Deleting elements from Series :
We can delete elements from Series using drop( ) method. To delete a particular element we have to pass the index of the element to be deleted.
Syntax of drop( ) method:
<Series name>.drop(index, inplace = True/False)
Example 1: To delete a particular element from Series
import pandas as pd S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six", "Seven"]) print(S) print("-------------------------------------------") print(S.drop("Four"))#This statement will delete only one element.
OUTPUT: One 1 Two 2 Three 3 Four 4 Five 5 Six 6 Seven 7 dtype: int64 ------------------------------------------- One 1 Two 2 Three 3 Five 5 Six 6 Seven 7 dtype: int64
Example 2: To delete more than one element from Series.
import pandas as pd S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six", "Seven"]) print(S) print("-------------------------------------------") print(S.drop(["Four", "Five"]))#This statement will delete two elements.
One 1 Two 2 Three 3 Four 4 Five 5 Six 6 Seven 7 dtype: int64 ------------------------------------------- One 1 Two 2 Three 3 Six 6 Seven 7 dtype: int64
Important questions of Series
Important MCQ of Series
MCQ of Computer Science Chapter Wise
2. Flow of Control (Loop and Conditional statement)
3. 140+ MCQ on Introduction to Python
4. 120 MCQ on String in Python
7. 100+ MCQ on Flow of Control in Python
8. 60+ MCQ on Dictionary in Python
Important Links
100 Practice Questions on Python Fundamentals
120+ MySQL Practice Questions
90+ Practice Questions on List
50+ Output based Practice Questions
100 Practice Questions on String
70 Practice Questions on Loops
70 Practice Questions on if-else
Disclaimer : I tried to give you the simple notes of ”Data handling using Pandas Series” , but if you feel that there is/are mistakes in the code or explanation of “Data handling using Pandas Series“ given above, you can directly contact me at csiplearninghub@gmail.com. Reference for the notes is NCERT book.