Python Pandas Advanced Data Exploration ##Feed 10

 



We've already learnt about the basics of using pandas for Data exploration.

Let's start with more topics in data exploration.
Here we will use  an excel containing student marks for our data exploration.

Our First step is to import all the libraries required to do the analysis.
>> import numpy as np
>> import pandas as pd

Second step is to import data from excel sheet to python.

>> Data= pd.read_excel(r'F:/Python/Student_Marks.xlsx')

Third step is to check the data. Here the first  rows of data can be displayed using head() function

>> Data.head()

Output:


Data Exploration:

1. Displaying any 5 rows of data

Syntax:

                                                                 dataframevariable.sample(5)

Here the data frame variable is Data.  We are displaying randomly 5 rows of data from our data frame.

>> Data.sample(5)

Output:



If you have not gone through our previous post regarding the basic understanding of data and statistics, please go through this link.

2. Adding a new column to data frame

Syntax:

 dataframevariable['New column name']= Values need to be assigned

From now-on-wards dataframevariable is denoted by df.  Here data frame variable is Data for the example we were discussing.

Ex: Create a new column with name Roll No.

>>Data['Roll No']=np.arange(701,721,1)

>>Data

We can see a new column called 'Roll No' at the end of the data frame. The Roll No. starts at 701 and ends at 720 as there were 20 student names in the marks list. 

3. Set an Index

Set an existing column as an index for the data frame:

Syntax:

df.set_index('column name', inplace=True)

Ex: Set the Roll No. column as an index of the data frame.

>>Data.set_index('Roll No')

>>Data


Now the Roll No. column is set as an index. If inplace=True is not given inside the brackets on set_index() then it will temporarily set the Roll No. column as an index.

4. Reset an index

Syntax: 

                                                df.reset_index( inplace=True)

Ex: Reset the index of data frame (Remove Roll No. as index of data frame) .

>>Data.reset_index(inplace=True)

Now the index values are from 0 to 19 and Roll No. is just a column in the data frame.


5. Remove a column from dataframe

Syntax:

                                            df.drop ('column name', axis=1, inplace=True)

Ex: Remove English marks of all the students from the marks list.

>>Data.drop('English', axis=1, inplace=True)



Here axis = 1 indicates columns in the data frame.  Observe there is no column of English marks. If inplace=True is not provided then the changes does not update in the dataframe.

6. Accessing rows from a data frame

Rows of the data frame can be accessed using an index.

Let's check the index of the dataframe.

>> Data.index

Output:

RangeIndex(start=0, stop=20, step=1)

Here we have not specified any index, by default there are index values starting from 0.

a) Accessing a row using index position from a data frame in python. 

Syntax:

df.iloc [ index position]

Ex: Accessing row at index position2.

>>Data.iloc[2]

Output:

b) Accessing multiple continous rows from a data frame in python  

                            df.iloc [ start_index position : end_index position]

Ex: Access all the rows with index position between 2 to 8.
>> Data.iloc[2:8]

c) Accessing a row from a data frame using index value. 

Syntax: 

 df.loc[ index value]

Ex: Display all subject marks of Sandy from the Marks list dataframe.

>>Data.loc['Sandy']

Output:

    

d) Accessing multiple continous rows from a data frame in python using index. 

Syntax:

                             df.loc[ start_index value : stop_index value]

Ex: Display all subject marks of  Sandy, Hari, Rani,Sita from the Marks list dataframe.

>> Data.loc['Sandy': 'Sita']

Output:


e) Accessing multiple  rows which are not continous from a data frame in python.

Syntax:

                         df.loc[[ row index value1, row index value2,  row index value3,  ....]]

Ex:  Get marks of Sandy, Hari, Sita, rosy.

>>Data.loc[['Sandy','Hari','Sita','rosy']]

Output:


7. Adding a new row in the data frame

df.loc['new_row_index value']= Values need to be assigned

Ex: Add a new student with Roll No. 721 with Maths, Science , Social and English  marks  as 85, 96, 74, 86.

>>Data.loc['New student']=[721,85,96,74,86]
>>Data

Output:

A new row with Student name as New student is created in the dataframe. Whenever a new row or a new column is added they will be added at the end of rows and columns respectively.

8. Remove a  single row from the data frame

Syntax:

df.drop (row_index value, axis=0, inplace=True)

Ex: Remove the row of New student from the marks list.
>>Data.drop('New student', axis=0, inplace=True)
>>Data

9. Remove multiple rows from the data frame

Syntax:

df.drop ([row_index value1, row_index_value2, ...], axis=0, inplace=True)

Ex:  Remove Marks of srinu, pallavi and rosy from a data frame in python. 

>> Data.drop(['srinu','pallavi','rosy'], axis=0, inplace=True)

>> Data

Output:

10. Accessing a particular value from the data frame in python. 

a) Using row_index value and column name:

df.loc [row, column]    

b) Using index position of rows and columns

df.iloc [row_index, column_index]  

Ex: Marks obtained by Devi in  Maths.
>> Data.loc['Devi','Science']
Output:
75

11. Accessing multiple values from the data frame in python. 

a) Using row_index value and column name:

df.loc [[row1, row2, row3, ...], [column1, column2, column3, ...]]

Ex: Get Science and Englidh Marks of Devi and vidya.
>> Data.loc[['Devi', 'vidya'],['Science', 'English']]
Output:

12. Accessing only few rows of a particular column
df['column name'][start_row_index : end_row_index]

Ex: Get science marks for first 5 students in data frame.
>>Data['Science'][0:5]
Output:

Ex: Get science marks for last 5 students by the student names in data frame.
>>Data['Science']['chitra': 'priya']
Output:



Happy learning...
😊😊


Comments