PYTHON PANDAS
INTRODUCTION:
- Python is one of popular libraries in python for Data Science, Machine Learning and Artificial Intelligence.
- It is extensively used to read and process data(Especially Tabular data).
1)To use pandas library, we have to import it.
Syntax:
import pandas as pd.
In the above syntax we used 'as' to give an alias name to the tool we imported and the alias name is 'pd' so that we can use alias name instead of typing the large name of the library.
2)To read a CSV(Comma Separated Value) file.
Syntax:
df=pd.read_csv('filename.csv')
Example:
df=pd.read_csv('hyd_weather.csv')
3)To get the maximum value in a column of a table.
Syntax:
df['column_name'].max()
Example:
df['humidity'].max()
Similarly, we can use other aggregate functions like min(), sum() etc.,
we can also use describe() to values such as mean,std etc.,
DATAFRAME BASICS:
Dataframe is like a table or an excel spreadsheet and is extensively used in Data Science.
1)To construct a data frame
Syntax:
dataframe_name=[(c1v1,c2v2,c3v3),(c1v4,c2v5,c3v6)....]
Example:
output:jan_data=[('1/1/2019',32,6,'Rain'),('1/2/2019',35,7,'Sunny')]df=pd.DataFrame(jan_data,columns=['day','temperature','windspeed','event'])print(df)
day temperature windspeed event 0 1/1/2019 32 6 Rain1 1/2/2019 35 7 Sunny
2)df.shape---> returns no.of rows and columns
Example:
df.shapeoutput:
(2, 4)3)df.head()--->returns first five rows in default.
df.head(1)--->returns first one row.
Example:
output:row1=df.head(1)print(row1)
4)df.last()--->returns last five rows in default.day temperature windspeed event 0 1/1/2019 32 6 Rain
df.last(1)--->returns last one row.
Example:
output:rowl=df.tail(1)print(rowl)
day temperature windspeed event
1 1/2/2019 35 7 Sunny
5)Sclicing:
Syntax:df[start_row:last_row]
Example:
print(df[:1])output:
day temperature windspeed event
0 1/1/2019 32 6 Rain6)df.colums--->returns column names in the table.
Example:
print(df.columns)output:
Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')7)To print particular column data.
Syntax:
df.column_name
Example:
print(df.day)output:
0 1/1/2019 1 1/2/2019
Name: day, dtype: objectOR
Syntax:
df['column_name']
Example:
print(df['day'])output:
0 1/1/2019 1 1/2/2019
Name: day, dtype: object8)To print two are more columns:
Syntax:
df[['column1_name','column2_name',........]
Example:
print(df[['day','temperature']])output:
day temperature 0 1/1/2019 32
1 1/2/2019 35
KEY OPERATIONS ON DATA FRAMES
To read excel files
Syntax:
df.read_excel('filename.xlsx')
Example:
output:import pandas as pddf=pd.read_excel('weather_data.xlsx')print(df)
day temperature windspeed event 0 1/1/2017 32 6 Rain 1 1/2/2017 35 7 Sunny 2 1/3/2017 28 2 Snow 3 1/4/2017 24 7 Snow 4 1/5/2017 32 4 Rain
5 1/6/2017 31 2 Sunny
To store data into excel file.
Syntax:
df.to_excel('filename.xslx',sheetname='hyd_weather_data')
Example:
df.to_excel('new2.xlsx',sheet_name='weather_data')
GROUPBY
g=df.groupby('column_name')---> groups alike elements of particular column into a group.
g.get_group('element_name') --->returns data corresponding to particular element.
CONCATENATE DATAFRAMES
df=pd.concat([dataframe1,dataframe2])
MERGE DATAFRAMES
Merging data frames is same as join operation in sql
df=pd.merge(dataframe1,dataframe2, on='column_name')
NUMERICAL INDEXING
To provide custom indexing
Syntax:
df=pd.DataFrame({'column_name':[list]},index=[list])
Example:
output:df=pd.DataFrame({'value':[0,1,2,3,4,5,6,7,8,9]},index=['a','b','c','d','e','f','g','h','i','j'])print(df)
value a 0 b 1 c 2 d 3 e 4 f 5 g 6 h 7 i 8
j 9
- s.loc[index]--->returns the value in the particular index value.
- s.iloc[row]--->returns the value in particular row.
- s.loc[start_index:End_index]--->returns data values from start index to end index.
- s.iloc[start_row:End_row]--->returns data values from start row to end row.
Note: In all the above examples pd is the alias name for Pandas
No comments:
Post a Comment