We've already learnt about the basics of using pandas for Data exploration.
Here we will use an excel containing student marks for our data exploration.
Our First step is to import all the libraries required to do the analysis.
>> import pandas as pd
Second step is to import data from excel sheet to python.
>> Data= pd.read_excel(r'F:/Python/Student_Marks.xlsx')
Third step is to check the data. Here the first rows of data can be displayed using head() function
>> Data.head()
Output:
1. Displaying any 5 rows of data
Syntax:
dataframevariable.sample(5)
Here the data frame variable is Data. We are displaying randomly 5 rows of data from our data frame.
>> Data.sample(5)
Output:
2. Adding a new column to data frame
Syntax:
dataframevariable['New column name']= Values need to be assigned
From now-on-wards dataframevariable is denoted by df. Here data frame variable is Data for the example we were discussing.
Ex: Create a new column with name Roll No.
>>Data['Roll No']=np.arange(701,721,1)
>>Data
We can see a new column called 'Roll No' at the end of the data frame. The Roll No. starts at 701 and ends at 720 as there were 20 student names in the marks list.3. Set an Index
Set an existing column as an index for the data frame:
Syntax:
df.set_index('column name', inplace=True)
Ex: Set the Roll No. column as an index of the data frame.
>>Data.set_index('Roll No')
>>Data
Now the Roll No. column is set as an index. If inplace=True is not given inside the brackets on set_index() then it will temporarily set the Roll No. column as an index.
4. Reset an index
Syntax:
df.reset_index( inplace=True)
Ex: Reset the index of data frame (Remove Roll No. as index of data frame) .
>>Data.reset_index(inplace=True)
Now the index values are from 0 to 19 and Roll No. is just a column in the data frame.
5. Remove a column from dataframe
Syntax:
df.drop ('column name', axis=1, inplace=True)
Ex: Remove English marks of all the students from the marks list.
>>Data.drop('English', axis=1, inplace=True)
Here axis = 1 indicates columns in the data frame. Observe there is no column of English marks. If inplace=True is not provided then the changes does not update in the dataframe.
6. Accessing rows from a data frame
Rows of the data frame can be accessed using an index.
Let's check the index of the dataframe.
>> Data.index
Output:
RangeIndex(start=0, stop=20, step=1)
Here we have not specified any index, by default there are index values starting from 0.
a) Accessing a row using index position from a data frame in python.
Syntax:
df.iloc [ index position]
Ex: Accessing row at index position2.
>>Data.iloc[2]
Output:
b) Accessing multiple continous rows from a data frame in python
df.iloc [ start_index position : end_index position]
c) Accessing a row from a data frame using index value.
Syntax:
df.loc[ index value]
Ex: Display all subject marks of Sandy from the Marks list dataframe.
>>Data.loc['Sandy']
Output:
d) Accessing multiple continous rows from a data frame in python using index.
Syntax:
df.loc[ start_index value : stop_index value]
Ex: Display all subject marks of Sandy, Hari, Rani,Sita from the Marks list dataframe.
>> Data.loc['Sandy': 'Sita']
Output:
e) Accessing multiple rows which are not continous from a data frame in python.
Syntax:
df.loc[[ row index value1, row index value2, row index value3, ....]]
Ex: Get marks of Sandy, Hari, Sita, rosy.
>>Data.loc[['Sandy','Hari','Sita','rosy']]
Output:
7. Adding a new row in the data frame
df.loc['new_row_index value']= Values need to be assigned
Ex: Add a new student with Roll No. 721 with Maths, Science , Social and English marks as 85, 96, 74, 86.
>>Data
A new row with Student name as New student is created in the dataframe. Whenever a new row or a new column is added they will be added at the end of rows and columns respectively.
8. Remove a single row from the data frame
Syntax:
9. Remove multiple rows from the data frame
Syntax:
Ex: Remove Marks of srinu, pallavi and rosy from a data frame in python.
>> Data.drop(['srinu','pallavi','rosy'], axis=0, inplace=True)
>> Data
Output:
10. Accessing a particular value from the data frame in python.
a) Using row_index value and column name:
df.loc [row, column]
b) Using index position of rows and columns
11. Accessing multiple values from the data frame in python.
a) Using row_index value and column name:
df.loc [[row1, row2, row3, ...], [column1, column2, column3, ...]]
>>Data['Science']['chitra': 'priya']
Comments
Post a Comment