Most of the code I google have purple links when the page loads, so I figured I’d just make a page where I can bookmark the useful code snippets I use all the time.
# Read .csv with a different delimiter
df = pd.read_csv('../output.csv',delimiter=';;',engine='python')
# Combining dataframes when the columns are the same (from many days of scraping, for instance)
import glob
import os
def get_data():
path = ''
all_files = glob.glob(os.path.join(path, "../data/*.csv"))
li = []
for filename in all_files:
df = pd.read_csv(filename,delimiter=';;',engine='python')
li.append(df)
df = pd.concat(li, axis=0)
df = df.reset_index()
return df
For string functions on pandas dataframes, tutorialspoint has a useful article.
import pandas_gbq
def read_query(query):
return pandas_gbq.read_gbq(query)
sql_query = f"""
SELECT
FROM
WHERE
"""
read_query(sql_query) # returns a dataframe
To pull data from a google sheet, you need to set up the Google Drive API
Note: if you are able to run a nontrivial python script using cron and a .sh file in less than 4 hours, you can rest assured that you are faster than me xD
Cron jobs allow you to set up jobs to run on your machine on a particular schedule. Some important lessons I’ve learned when using cron jobs:
---- bash / zshell ----
touch test.sh # Creates the file "test_script.sh"
chmod +x test.sh # Adds the "executable" option to test_script.sh
./test.sh # Executes the script "test_script.sh"
---- test.sh ----
#!/bin/bash
echo "========================================================================================"
date
. ~/.bash_profile
source env/bin/activate
python3 test.py
---- test.py ----
from datetime import datetime
time = datetime.now().strftime("%m-%d_%H:%M%p") # gets the time and converts it (strftime == "string from time")
print(f"*** Testing Cron Job - {time} ***") # Prints to console
---- bash / zshell ----
crontab -e # access crontab in vim editor
* * * * * cd /Users/jakeanderson/research/usa_jobs/ && ./test.sh >> tmp/test.log 2>&1
#* * * * * cd /Users/jakeanderson/research/usa_jobs/ && ./usa_jobs.sh >> tmp/usa_jobs.log 2>&1
~
~
~
~
~
~
~
~
---- bash / zshell ----
python3 -m venv env # make a new virtual environment
source env/bin/activate # activate the new virtual environment
pipreqs . # create the requirements.txt file
pip install -r requirements.txt # install all dependencies in virtual environment
deactivate # Exit the virtual environment
The python interpreter might look in different places, as well.
pip install pandas
Requirement already satisfied: pandas in /Users/jakeanderson/.pyenv/versions/3.7.6/lib/python3.7/site-packages (0.24.2)
So I need to make sure my cron job is using the correct site packages located in /Users/jakeanderson/.pyenv/versions/3.7.6/lib/python3.7/site-packages
You can get the requirements for a single file using pipreqs!
pipreqs .
INFO: Successfully saved requirements file in ./requirements.txt
Replaces all instances of “foo” with “bar” in all files in current and all subdirectories
sed -i -- 's/foo/bar/g' **/*(D.)
"/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py" -o PATH/TO/YOUR/MERGED/FILE.pdf /PATH/TO/ORIGINAL/1.pdf /PATH/TO/ANOTHER/2.pdf /PATH/TO/A/WHOLE/DIR/*.pdf
# I use sublime text, so opens any file using sublime
alias sublime="open -a /Applications/Sublime\ Text.app"
# tublime means "touch and sublime" for files I want to open in sublime that currently don't exist
function tublime { touch "$@"; sublime "$@"; }
# opens the zshell config file
alias zshconfig="sublime ~/.zshrc"
# Searches for all files in directory (including sub-directories where input text is present)
function grepme { grep -rnw . -e "$@"}
# I keep my notes about my job @ Moloco in a folder and want to open them all at once
alias notes='cd ~jakeanderson/Desktop/moloco_notes && sublime *.md'
# I use the mlh zshell theme from oh-my-zsh
ZSH_THEME="mlh"