PANDAS

PYTHON PANDAS

INTRODUCTION:

  • Python is one of popular libraries in python for Data Science, Machine Learning and Artificial Intelligence.
  • It is extensively used to read and process data(Especially Tabular data).

1)To use pandas library, we have to import it.
Syntax:
             import pandas as pd.
In the above syntax we used 'as' to give an alias name to the tool we imported and the alias name is 'pd' so that we can use alias name instead of typing the large name of the library.

2)To read a CSV(Comma Separated Value)  file.
Syntax:
            df=pd.read_csv('filename.csv')
Example:
           df=pd.read_csv('hyd_weather.csv')

3)To get the maximum value in a column of a table.
Syntax:
            df['column_name'].max()
Example:
            df['humidity'].max()
Similarly, we can use other aggregate functions like min(), sum() etc.,
we can also use describe() to values such as mean,std etc.,

DATAFRAME BASICS:

Dataframe is like a table or an excel spreadsheet and is extensively used in Data Science.

1)To construct a data frame
Syntax:
            dataframe_name=[(c1v1,c2v2,c3v3),(c1v4,c2v5,c3v6)....]
Example:
jan_data=[('1/1/2019',32,6,'Rain'),
         ('1/2/2019',35,7,'Sunny')
         ]
df=pd.DataFrame(jan_data,columns=['day','temperature','windspeed','event'])
print(df)
output:
        day  temperature  windspeed  event
0  1/1/2019           32          6   Rain
1  1/2/2019           35          7  Sunny

2)df.shape---> returns no.of rows and columns
Example:
df.shape
output:
(2, 4) 
3)df.head()--->returns first five rows in default.
df.head(1)--->returns first one row.
Example:
row1=df.head(1)
print(row1)
output:
    day      temperature  windspeed  event
0  1/1/2019           32          6  Rain
4)df.last()--->returns last five rows in default.
df.last(1)--->returns last one row.
Example:
rowl=df.tail(1)
print(rowl)
output:
          day  temperature  windspeed  event 
1 1/2/2019 35 7 Sunny 

5)Sclicing:

Syntax:
            df[start_row:last_row]
Example:
print(df[:1]) 
output:
       day  temperature  windspeed event 
0 1/1/2019 32 6 Rain 
6)df.colums--->returns column names in the table.
Example:
print(df.columns)
output:
Index(['day', 'temperature', 'windspeed', 'event'], dtype='object') 
7)To print particular column data.
Syntax:
         df.column_name
Example:
print(df.day)
output:
      0    1/1/2019
      1    1/2/2019
Name: day, dtype: object 
                                             OR
Syntax:
         df['column_name']
Example:
print(df['day'])
output:
      0    1/1/2019
      1    1/2/2019 
Name: day, dtype: object
8)To print two are more columns:
Syntax:
          df[['column1_name','column2_name',........]
Example:
print(df[['day','temperature']])
output:
            day  temperature
    0  1/1/2019           32 
1 1/2/2019 35 

KEY OPERATIONS ON DATA FRAMES 

To read excel files
Syntax:
           df.read_excel('filename.xlsx')
Example:
import pandas as pd
df=pd.read_excel('weather_data.xlsx')
print(df)
output:
       day  temperature  windspeed  event
0  1/1/2017           32          6   Rain
1  1/2/2017           35          7  Sunny
2  1/3/2017           28          2   Snow
3  1/4/2017           24          7   Snow
4  1/5/2017           32          4   Rain 
5  1/6/2017           31          2  Sunny 

To store data into excel file.
Syntax:
      df.to_excel('filename.xslx',sheetname='hyd_weather_data')
Example:
df.to_excel('new2.xlsx',sheet_name='weather_data')

GROUPBY

g=df.groupby('column_name')---> groups alike elements of particular column into a group.

g.get_group('element_name') --->returns data corresponding to particular element.

CONCATENATE DATAFRAMES

df=pd.concat([dataframe1,dataframe2])

MERGE DATAFRAMES

Merging data frames is same as join operation in sql

df=pd.merge(dataframe1,dataframe2, on='column_name')

NUMERICAL INDEXING

To provide custom indexing
Syntax:
          df=pd.DataFrame({'column_name':[list]},index=[list])
Example:
df=pd.DataFrame({'value':[0,1,2,3,4,5,6,7,8,9]},index=['a','b','c','d','e','f','g','h','i','j'])
print(df)
output:
   value
a      0
b      1
c      2
d      3
e      4
f      5
g      6
h      7
i      8 
j      9 

  1. s.loc[index]--->returns the value in the particular index value.
  2. s.iloc[row]--->returns the value in particular row.
  3. s.loc[start_index:End_index]--->returns data values from start index to end index.
  4. s.iloc[start_row:End_row]--->returns data values from start row to end row. 
 Note: In all the above examples pd is the alias name for Pandas 





No comments:

Post a Comment