The Data Cloud World Tour is a series of Snowflake events across the globe that aim to share Snowflake’s latest innovations and customer use cases. This year, I attended the Data Cloud World Tour in Dubai and because my company In516ht was one of the sponsors, I was asked to deliver the hands-on workshop.
The workshop was based on the Getting Started with Data Engineering and ML using Snowpark for Python quickstart. Attendees were given instructions to set up their laptops with the pre-requisite software, including Python with some additional packages (snowflake-snowpark-python, snowflake-ml-python, pandas), a git client to clone the starter repository and a free trial Snowflake account.
The first part of the workshop covered Data Engineering with the following lessons:
- Configure the connection parameters and establish a connection from Snowpark Python to Snowflake
- Retrieve data from Snowflake tables that were set up initially into Snowpark data frames
- Perform exploratory data analysis on the data frames
- Use data frame methods to query and join data frames into new data structures and save them to Snowflake tables
- Automate a data pipeline that performs the data transformations using Snowflake tasks
Once the first part of the workshop was done, where we set up data transformations, we were ready to use the transformed data in the second part of the workshop which covered Machine Learning with the following lessons:
- Establish a connection from Snowpark Python to Snowflake just like before
- Retrieve the features and target variables from Snowflake tables into Snowpark data frames
- Perform some more feature engineering to prepare features for model training
- Train a machine learning model using Snowpark ML
- Create scalar and batch user-defined functions (UDFs) that perform inference on new data points using the trained machine learning model
Finally, when the UDFs that perform inference were created, we proceeded to the third and the most exciting part of the workshop! This part of the workshop demonstrated an interactive Streamlit application with the following lessons:
- Create a new Streamlit application in Snowsight
- Add the Streamlit code to the application
- Execute the application, where the user can adjust the advertising budget sliders to see the predicted ROI based on the chosen values
- Save the predicted ROI back to a table in Snowflake by clicking the Save button
Fortunately the workshop went smoothly without any technical glitches. The attendees were following along and engaging in the workshop which is always a good sign that they understand the content and ask clarifying questions. We covered quite a lot of material in the allotted time of 90 minutes, illustrating the complete end-to-end process of getting data from the source, transforming it in Snowpark, preparing it for machine learning, training a machine learning model and performing inference on user provided input via a Streamlit user interface.