Speaking at Snowflake Summit 2024

Speaking at Snowflake Summit 2024

Once again I attended the Snowflake Summit, this year in San Francisco (the previous two years, the Summit was held in Las Vegas). The climate in San Francisco is much pleasanter than the desert heat in Las Vegas and the venue was the Moscone center which was large enough to comfortably accommodate the incredibly huge masses of attendees. Snowflake is popular, even with students from nearby universities who dropped by on the last day which was focused on Builders with Snowflake. It was so much fun catching up with my fellow Superheroes again. I enjoyed hanging around the Snowflake Community space, chatting with attendees and sharing information about Snowflake. I also spent some time helping out with the FrostyFriday booth where participants solved Snowflake challenges for prizes. Snowflake ML Functions The highlight of my Summit was my presentation about Snowflake Forecasting ML Functions. I was thrilled that the session was well attended and that the audience was engaged by asking questions and wanting...
Read More
Another Certification Achieved: Snowflake Advanced DataEngineer

Another Certification Achieved: Snowflake Advanced DataEngineer

The Snowflake SnowPro Advanced Data Engineer certification is considered tough. After taking the exam, I understand where the "tough" comes from. It's not so much about the difficulty of the exam as about the vastness of the range of topics covered. Sometimes, it's difficult to judge what the responsibilities of a data engineer are. On one extreme, a data engineer is nothing but a developer who receives the requirements and implements them in the pipeline. On the other extreme, a data engineer is expected to understand the platform, configure it, design the security, architecture, and automation, while also performing data analysis. In the real world, data engineers usually fall somewhere between the two extremes. The Advanced Data Engineer exam tests the full spectrum, which covers: Data Movement: ingest data from various formats and load into Snowflake, design data pipelines, build data sharing solutions Performance Optimization: configure the pipelines for the best performance and troubleshoot queries that perform poorly Storage and Data...
Read More
Snowflake Data Engineering

Snowflake Data Engineering

Snowflake Data Engineering is my latest book in the making. It's available in the Manning Early Access Program (MEAP) where you get access to new chapters as I write them. In this book, you'll learn how to build data pipelines that ingest data from source systems and store the data in Snowflake. The chapters are organized so that you start by building a simple, basic data pipeline and add increasingly complex functionality. Some of the functionalities covered include ingesting structured and semi-structured data, setting up continuous ingestion with Snowpipe, transforming the data in Snowpark, augmenting your data with generative AI, optimizing performance and cost, designing robust data pipelines, incorporating CI/CD, and much more. Currently available chapters: Data Engineering with Snowflake Creating your First Data Pipeline Best Practices for Data Staging Transforming Data Continuous Data Ingestion Executing Code Natively with Snowpark Stay tuned for more information and updates as new chapters become available. Here is the link to the MEAP: https://www.manning.com/books/snowflake-data-engineering...
Read More
Getting Started with Data Engineering and ML using Snowpark for Python

Getting Started with Data Engineering and ML using Snowpark for Python

The Data Cloud World Tour is a series of Snowflake events across the globe that aim to share Snowflake’s latest innovations and customer use cases. This year, I attended the Data Cloud World Tour in Dubai and because my company In516ht was one of the sponsors, I was asked to deliver the hands-on workshop. The workshop was based on the Getting Started with Data Engineering and ML using Snowpark for Python quickstart. Attendees were given instructions to set up their laptops with the pre-requisite software, including Python with some additional packages (snowflake-snowpark-python, snowflake-ml-python, pandas), a git client to clone the starter repository and a free trial Snowflake account. The first part of the workshop covered Data Engineering with the following lessons: Configure the connection parameters and establish a connection from Snowpark Python to Snowflake Retrieve data from Snowflake tables that were set up initially into Snowpark data frames Perform exploratory data analysis on the data frames Use data frame methods to query...
Read More
Snowflake Summit 2023

Snowflake Summit 2023

This was my second consecutive year at the Snowflake Summit in Las Vegas. While the hot topic last year was data mesh and all sessions about data mesh sold out, this year data mesh was like last year's snow. Now the running theme is Generative AI and LLMs. The good news is that attendees were able to pre-book sessions and many sessions were also recorded so that I didn't miss any of the buzz around these topics. What an exciting Summit it was! So many announcements! Here are some of my favorites: Dynamic Tables. No more streams and tasks. Just write your SQL and Snowflake takes care of the rest, in some ways similar to materialized views, but with less restrictions about the types of queries you can use. Native Applications. Write your application and distribute it via the Snowflake Marketplace. I built a simple app and wrote a blog post about it: Maintaining a Mapping Table with a Snowflake Native App. ...
Read More
Another Certification Added to my Stash: SnowPro Advanced Data Analyst

Another Certification Added to my Stash: SnowPro Advanced Data Analyst

The latest Snowflake advanced certification that was just released is SnowPro Advanced: Data Analyst. Out of all the advanced certifications offered by Snowflake, this one is closest to my professional experience and that's why I decided to tackle it as my first advanced Snowflake certification. Some topics on the exam were relatively easy for me since I have been doing data analysis for decades and SQL is second nature to me. I was able to answer SQL questions without much prior preparation. However, some topics on the exam are very Snowflake-specific that required careful review and study time. These include: Snowsight dashboards. I haven't been using them much, but I had to learn them for the exam. Snowsight dashboards can't compete in terms of features and functionality as compared to more mature tools such as PowerBI, Tableau, or Cognos, to name a few. But what they offer in their simplicity is the possibility to quickly visualize and summarize data, either for...
Read More
Snowpark for Python First Impressions

Snowpark for Python First Impressions

Just back from Snowflake Summit 2022 held in Las Vegas. There were so many new announcements about upcoming features in Snowflake, it's hard to keep track. One topic that I'm most excited about is Snowpark for Python. As soon as I got back from Las Vegas, I started digging into the details because I had a use case waiting to be tested. I described my approach and summarized my first impressions in a blog post on Medium....
Read More
What the Snowflake Community Means to Me—and My Career

What the Snowflake Community Means to Me—and My Career

I was recently interviewed for the Snowflake blog where I discussed the benefits of being an active member of the Snowflake Community, how the community has helped me grow into my role as a Snowflake DataSuperhero and how it can be of use to anyone looking for hands-on Snowflake resources. With respect to the usefulness of the Snowflake community I was quoted as saying “Snowflake offers a lot of great technical documentation, but it’s useful that there’s so much material out there from the Snowflake community around personal case studies and how other people are implementing specific features.” Read the full article here....
Read More
Snowflake and Data Mesh

Snowflake and Data Mesh

More than ever, the ability to use data for decision-making is critical to company success. Despite this knowledge, companies are still not fully empowering their employees with easy access to the data they need. According to Zhamak Dehghani, the founder of Data Mesh, we must start thinking outside of the box because the traditional approach to managing and collecting data is not sufficient any longer. For decades, there has been a divide between operational and analytical data with ETL as the intermediary process to get data from operational systems into the analytical data warehouse. ETL, which has always been primarily in the hands of IT developers, is perceived as a bottleneck to delivering timely analytical data. Furthermore, dimensional data models are not well suited for machine learning models that have become essential. To overcome this, the data lake emerged around 2010. The idea of the data lake is to store vast amounts of semi-structured data in object stores to allow various consumers...
Read More