Excel to Pandas Cheat Sheet
Translating Common Excel Functions to Python Pandas Methods.
This is my cheat sheet for commonly used Excel functions roughly translated to their Python Pandas equivalents.
Over the last 3 years I have increasingly found myself booting up a Jupyter Notebook to perform data analysis using Python Pandas rather than relaying on Excel. A few factors play into that:
- speed when performing data analysis tasks: what one loses in GUI ability one can often make up for in speed of executing extractions, transformations, filters, calculations, etc.
- repeatability: one can easily repeat past extraction, transformation, and analysis tasks
- ability to handle large data sets: the reality of big data can sometimes mean Excel is not able to handle efficiently without slowing one’s system a halt
This cheat sheet will be updated as I continue to advance my Pandas tool belt.
It will assume some baseline familiarity with Pandas.
The breadcrumbs here will allow the reader to Google the method required to replace an Excel functionality more efficiently.
Excel to Pandas Cheat Sheet
Sorting Rows by Column Values [Excel: Control + Shift + L]
df = df.sort_values(['Col1_Name', 'Col2_Name', 'Col3_Name'], ascending=[False, False, True])
This function will sort the rows of a DataFrame on the Col1_Name, then Col2_Name, and finally on Col3_Name. The option ascending argument as False will tell Pandas to sort the column values respective to their list index positions largest (top rows) to smallest (bottom rows). For example, Col1_name will be sorted to have the values in Ascending=False (descending) order, Col2_name will be sorted to have the values in Ascending=False (descending) order, and Col3_name will be sorted to have the values in Ascending=True (ascending) order.
Delete Column [Excel: Control + -]
df.drop(columns=['column_to_delete_name','second_col_to_del_name], inplace=True)
VLOOKUP
df.merge(df_2, on='key_column_name', how='left)
–> where key_column_name is the column that contains keys or lookup values and df_2 is table array equivalent
–> Typically, my df2 is a three column df consisting of index, key_column, and a column with values to return as if it were a vlookup. You can also merge all and drop unneeded columns or filter with making the merged call
new_df = df.merge(df_2, on='key_column_name', how='left)['key_column_name, 'column_X', 'column_y']
SUMIFS
COUNTIFS
Count
df['col_name'].value_counts()
or
len(df)
Pivot Tables
CONCAT, LEFT, and RIGHT (for string values)
Rename a Column
df.rename(columns = {'old_name':'new_name'})
IF statements and conditional formulas
Applying logic to a column (i.e. if column 0 or 1, return True/False), first create the function –
def function_name('column'):
return column * 2
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Example TEST TEST Document</title>
</head>
<body>
<p>Test</p>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Example HTML5 Document</title>
</head>
<body>
<p>Test</p>
</body>
</html>
|
|
Now apply to the specified column–
df['column_to_apply_func_to'].apply(function_name)
Applying logic to a row (i.e. add three columns together, return result), first create the function–
def function_name('row'):
# each row is passed to the function as a series
variable_1 = row[0]
variable_2 = row[1]
variable_3 = row[2]
new_var = variable_1 + variable_2 + variable_3
return new_var
Now apply to the df for all rows – # if we want to save the returned values, df[‘new_var] will store those returned values as a new column in our df # axis is “columns” because we are moving across the columns left to right, this is kinda strange at first
df['new_var'] = df.apply(function_name, axis="columns")
Append
Graphs
Vlookup with two variables
Check if date is the last day of a month
# does not differentiate if the date is a business day or note
def check_if_last_day_of_week(date):
import datetime
import calendar
# calendar.monthrange return a tuple (weekday of first day of the
# month, number
# of days in month)
last_day_of_month = calendar.monthrange(date.year, date.month)[1]
# here i check if date is last day of month
if date == datetime.date(date.year, date.month, last_day_of_month):
return True
return False
Thank you Stack Overflow
Check if last business day of month
def check_if_last_biz_day_of_month(date):
offset = BMonthEnd()
#Last day of current month
try:
last_biz_day = offset.rollforward(date)
if date == last_biz_day:
return True
else:
return False
except:
return "ERROR"
Thank you Stack Overflow