August 16, 2024
Switching between applications is one of the pitfalls of efficiency. Remaining in the same UI but benefitting from the efficiency of different tools is the dream environment for analysts.
This blog post will guide you on how to create an environment that combines the efficiency of SQL databases (Snowflake), python analytical and visualization libraries and github’s code sourcing.
During this setup we will be using 1Password to connect retrieve sensitive information. If you do not have a password manager you may skip these sections
| Application | Link |Comment |
|—|—|—|
| VSCode | website | Free |
| Conda | website | Free |
| 1Password| website | Paid application |
1) Download the following Extensions:
After you download Conda:
analytics: conda create -n analytics python=3.6
conda activate analytics
conda install ipykernel
conda install -c conda-forge ipykernel
python -m ipykernel install --user --name=myenv
Now that we have set up the applications, it’s time to bring everything together in VSCode.
1) Follow the steps on the documentation to configure 1Password connection to:
2) Create a record in 1Password with your Snowflake connexion variables.
In the notebook:
## Initializing notebook
### Importing modules
#### data
import pandas as pd
import numpy as np
#### snowflake connector/query
import snowflake
import snowflake.connector
from snowflake.connector.pandas_tools import write_pandas
import snowflake.snowpark as snowpark
from snowflake.snowpark import Session
#### ini values
d={}
## Retrieving credentials
def from_1password(item):
'''
this function converts onePassword output into dictionary.
:param item= a Slist obtained through 1PasswordCLI command.Example: item=!op item get item-name
:output= dictionary with fields found in item SList
'''
items=[]
credentials={}
for item,line in enumerate(item):
try:
key=line.split(':')[0].strip().lstrip()
value=line.split(':')[1].strip()
except:
pass
credentials.update({key:value})
return credentials
credentials=!op item get my_snowflake_credentials
snow_credentials=from_1password(credentials)
Choose one of the following options:
You will need your default values, regardless of if you using a password manager such as 1password or storing them in a file.
1) If your snowflake instance uses authentication(recommended), add this code in your cell:
def start_snowflake_connection(credentials):
"""
this function establishes a snowflake connection ir order to later executes a query with SQL commands inside jupyter. It assumes snowflake credentials are stored in the same folder
in dictionary type variable.
This module requires snowflake connector to be installed.
Please view Snowflake documentation for details on downloading the right package:
https://docs.snowflake.com/en/user-guide/python-connector-install.html.
Ex:
!{sys.executable} -m pip install pip install -r https://raw.githubusercontent.com/snowflakedb/snowflake-connector-python/v2.7.11/tested_requirements/requirements_38.reqs
:param snow_credentials: dictionary with credentials
:returns: DataFrame styled query result
"""
import snowflake
import snowflake.connector
# fire up an instance of a snowflake connection. this will open a browser window for the authenticator
ctx= snowflake.connector.connect (
account = credentials['account_name'],
role = credentials['default_role'],
user = credentials['username'],
database = credentials['default_database'],
schema = credentials['default_schema'],
warehouse= credentials['default_warehouse'],
authenticator='externalbrowser'
)
cs = ctx.cursor()
clear_output(wait=True)
return cs
2) If you don’t use a password manager, here is an example of how to connect using a file. The example will allow you to have the credentials into a python variable.
Once you have read the credentials, you can also use this code:
def query_snowflake_passwd(query, credentials):
'''
:param query: SQL commands to execute
:param credentials: dictionary with account, user, password and default database, role, schema and warehouse
:returns: DataFrame styled query result
'''
import snowflake
import snowflake.connector
# fire up an instance of a snowflake connection
ctx= snowflake.connector.connect (
account = credentials['account_name'],
role = credentials['default_role'],
user = credentials['username'],
password = credentials['password'],
database = credentials['default_database'],
schema = credentials['default_schema'],
warehouse= credentials['default_warehouse'],
)
try:
cs = ctx.cursor()
return return cs
except Exception as e:
raise e
#build your query
d.update({'table_name': 'DATABASE.SCHEMA.TABLE_NAME'})
q = '''\
SELECT
*
FROM
{table_name}
LIMIT 100
;\
'''.format(**d)
#query snowflake
cs.execute(q)
df=pd.DataFrame(cs.fetchall(), columns=pd.DataFrame(cs.description)['name'])
#display result
with pd.option_context('display.max_columns', 500, 'display.min_rows',100):
display(df.head())
VSCode has native connection to Github so as long as you have connected your location to a repository you can select the “Source Control” tab on the left panel and commit the changes at the end of your session.