Excel to Pandas Cheat Sheet

Translating Common Excel Functions to Python Pandas Methods.

This is my cheat sheet for commonly used Excel functions roughly translated to their Python Pandas equivalents.

Over the last 3 years I have increasingly found myself booting up a Jupyter Notebook to perform data analysis using Python Pandas rather than relaying on Excel. A few factors play into that:

  • speed when performing data analysis tasks: what one loses in GUI ability one can often make up for in speed of executing extractions, transformations, filters, calculations, etc.
  • repeatability: one can easily repeat past extraction, transformation, and analysis tasks
  • ability to handle large data sets: the reality of big data can sometimes mean Excel is not able to handle efficiently without slowing one’s system a halt

This cheat sheet will be updated as I continue to advance my Pandas tool belt.

It will assume some baseline familiarity with Pandas.

The breadcrumbs here will allow the reader to Google the method required to replace an Excel functionality more efficiently.

Excel to Pandas Cheat Sheet

Sorting Rows by Column Values [Excel: Control + Shift + L]

df = df.sort_values(['Col1_Name', 'Col2_Name', 'Col3_Name'], ascending=[False, False, True])

This function will sort the rows of a DataFrame on the Col1_Name, then Col2_Name, and finally on Col3_Name. The option ascending argument as False will tell Pandas to sort the column values respective to their list index positions largest (top rows) to smallest (bottom rows). For example, Col1_name will be sorted to have the values in Ascending=False (descending) order, Col2_name will be sorted to have the values in Ascending=False (descending) order, and Col3_name will be sorted to have the values in Ascending=True (ascending) order.

Delete Column [Excel: Control + -]

df.drop(columns=['column_to_delete_name','second_col_to_del_name], inplace=True)

VLOOKUP

df.merge(df_2, on='key_column_name', how='left)

–> where key_column_name is the column that contains keys or lookup values and df_2 is table array equivalent

–> Typically, my df2 is a three column df consisting of index, key_column, and a column with values to return as if it were a vlookup. You can also merge all and drop unneeded columns or filter with making the merged call

new_df = df.merge(df_2, on='key_column_name', how='left)['key_column_name, 'column_X', 'column_y']

SUMIFS

COUNTIFS

Count

df['col_name'].value_counts()

or

len(df)

Pivot Tables

CONCAT, LEFT, and RIGHT (for string values)

Rename a Column

df.rename(columns = {'old_name':'new_name'})

IF statements and conditional formulas

Applying logic to a column (i.e. if column 0 or 1, return True/False), first create the function –

def function_name('column'):
	return column * 2 
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Example TEST TEST Document</title>
</head>
<body>
  <p>Test</p>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Example HTML5 Document</title>
</head>
<body>
  <p>Test</p>
</body>
</html>
199
200
def function_name('column'):
	return column * 2 

Now apply to the specified column–

df['column_to_apply_func_to'].apply(function_name)

Applying logic to a row (i.e. add three columns together, return result), first create the function–

def function_name('row'):
	# each row is passed to the function as a series 
	variable_1 = row[0] 
	variable_2 = row[1]
	variable_3 = row[2]

	new_var = variable_1 + variable_2 + variable_3

	return new_var 

Now apply to the df for all rows – # if we want to save the returned values, df[‘new_var] will store those returned values as a new column in our df # axis is “columns” because we are moving across the columns left to right, this is kinda strange at first

df['new_var'] = df.apply(function_name, axis="columns")

Append

Graphs

Vlookup with two variables

Check if date is the last day of a month

# does not differentiate if the date is a business day or note 

def check_if_last_day_of_week(date):
    import datetime
    import calendar
    #  calendar.monthrange return a tuple (weekday of first day of the 
    #  month, number  
    #  of days in month)
    last_day_of_month = calendar.monthrange(date.year, date.month)[1]
    # here i check if date is last day of month
    if date == datetime.date(date.year, date.month, last_day_of_month):
        return True
return False

Thank you Stack Overflow

Check if last business day of month

  def check_if_last_biz_day_of_month(date):

    offset = BMonthEnd()

    #Last day of current month
    
    try:
        last_biz_day = offset.rollforward(date)
        
        if date == last_biz_day:
            return True
        else:
            return False
    except:
    return "ERROR"

Thank you Stack Overflow