Snowflake Data Engineering

Snowflake Data Engineering

Snowflake Data Engineering is my latest book in the making. It's available in the Manning Early Access Program (MEAP) where you get access to new chapters as I write them. In this book, you'll learn how to build data pipelines that ingest data from source systems and store the data in Snowflake. The chapters are organized so that you start by building a simple, basic data pipeline and add increasingly complex functionality. Some of the functionalities covered include ingesting structured and semi-structured data, setting up continuous ingestion with Snowpipe, transforming the data in Snowpark, augmenting your data with generative AI, optimizing performance and cost, designing robust data pipelines, incorporating CI/CD, and much more. Currently available chapters: Data Engineering with Snowflake Creating your First Data Pipeline Best Practices for Data Staging Transforming Data Continuous Data Ingestion Executing Code Natively with Snowpark Stay tuned for more information and updates as new chapters become available. Here is the link to the MEAP: https://www.manning.com/books/snowflake-data-engineering...
Read More
Getting Started with Data Engineering and ML using Snowpark for Python

Getting Started with Data Engineering and ML using Snowpark for Python

The Data Cloud World Tour is a series of Snowflake events across the globe that aim to share Snowflake’s latest innovations and customer use cases. This year, I attended the Data Cloud World Tour in Dubai and because my company In516ht was one of the sponsors, I was asked to deliver the hands-on workshop. The workshop was based on the Getting Started with Data Engineering and ML using Snowpark for Python quickstart. Attendees were given instructions to set up their laptops with the pre-requisite software, including Python with some additional packages (snowflake-snowpark-python, snowflake-ml-python, pandas), a git client to clone the starter repository and a free trial Snowflake account. The first part of the workshop covered Data Engineering with the following lessons: Configure the connection parameters and establish a connection from Snowpark Python to Snowflake Retrieve data from Snowflake tables that were set up initially into Snowpark data frames Perform exploratory data analysis on the data frames Use data frame methods to query...
Read More
Snowflake Summit 2023

Snowflake Summit 2023

This was my second consecutive year at the Snowflake Summit in Las Vegas. While the hot topic last year was data mesh and all sessions about data mesh sold out, this year data mesh was like last year's snow. Now the running theme is Generative AI and LLMs. The good news is that attendees were able to pre-book sessions and many sessions were also recorded so that I didn't miss any of the buzz around these topics. What an exciting Summit it was! So many announcements! Here are some of my favorites: Dynamic Tables. No more streams and tasks. Just write your SQL and Snowflake takes care of the rest, in some ways similar to materialized views, but with less restrictions about the types of queries you can use. Native Applications. Write your application and distribute it via the Snowflake Marketplace. I built a simple app and wrote a blog post about it: Maintaining a Mapping Table with a Snowflake Native App. ...
Read More
Another Certification Added to my Stash: SnowPro Advanced Data Analyst

Another Certification Added to my Stash: SnowPro Advanced Data Analyst

The latest Snowflake advanced certification that was just released is SnowPro Advanced: Data Analyst. Out of all the advanced certifications offered by Snowflake, this one is closest to my professional experience and that's why I decided to tackle it as my first advanced Snowflake certification. Some topics on the exam were relatively easy for me since I have been doing data analysis for decades and SQL is second nature to me. I was able to answer SQL questions without much prior preparation. However, some topics on the exam are very Snowflake-specific that required careful review and study time. These include: Snowsight dashboards. I haven't been using them much, but I had to learn them for the exam. Snowsight dashboards can't compete in terms of features and functionality as compared to more mature tools such as PowerBI, Tableau, or Cognos, to name a few. But what they offer in their simplicity is the possibility to quickly visualize and summarize data, either for...
Read More
Snowpark for Python First Impressions

Snowpark for Python First Impressions

Just back from Snowflake Summit 2022 held in Las Vegas. There were so many new announcements about upcoming features in Snowflake, it's hard to keep track. One topic that I'm most excited about is Snowpark for Python. As soon as I got back from Las Vegas, I started digging into the details because I had a use case waiting to be tested. I described my approach and summarized my first impressions in a blog post on Medium....
Read More
What the Snowflake Community Means to Me—and My Career

What the Snowflake Community Means to Me—and My Career

I was recently interviewed for the Snowflake blog where I discussed the benefits of being an active member of the Snowflake Community, how the community has helped me grow into my role as a Snowflake DataSuperhero and how it can be of use to anyone looking for hands-on Snowflake resources. With respect to the usefulness of the Snowflake community I was quoted as saying “Snowflake offers a lot of great technical documentation, but it’s useful that there’s so much material out there from the Snowflake community around personal case studies and how other people are implementing specific features.” Read the full article here....
Read More
Snowflake and Data Mesh

Snowflake and Data Mesh

More than ever, the ability to use data for decision-making is critical to company success. Despite this knowledge, companies are still not fully empowering their employees with easy access to the data they need. According to Zhamak Dehghani, the founder of Data Mesh, we must start thinking outside of the box because the traditional approach to managing and collecting data is not sufficient any longer. For decades, there has been a divide between operational and analytical data with ETL as the intermediary process to get data from operational systems into the analytical data warehouse. ETL, which has always been primarily in the hands of IT developers, is perceived as a bottleneck to delivering timely analytical data. Furthermore, dimensional data models are not well suited for machine learning models that have become essential. To overcome this, the data lake emerged around 2010. The idea of the data lake is to store vast amounts of semi-structured data in object stores to allow various consumers...
Read More
Snowflake account administration best practices

Snowflake account administration best practices

Although it is well documented, it shouldn’t hurt to review a few of the basic Snowflake account administration best practices, as I often see in practice that they are not followed consistently. In my experience, there are so many IT administrators that are pressed for time that they don’t bother to create security roles and policies to be followed. To make it quick, they just use the ACCOUNTADMIN role on Snowflake (or an equivalent all powerful admin role on other systems) because it allows them to do anything and everything without having to think about security best practices. Using an admin role for day-to-day tasks is a bad idea Consider this true story. The administrator tries some functionality, and it works. The user (who doesn’t have admin privileges) tries, and it’s not working. The administrator doesn’t believe the user when they report that the functionality is not working, dismissing them as incompetent. This goes back and forth a few times, until someone gives...
Read More
I’m a Snowflake Data Superhero!

I’m a Snowflake Data Superhero!

I’m so excited about becoming a Snowflake Data Superhero! I first learned about the Superhero program from Kent Graziano’s blog more than a year ago and I immediately thought it would be cool to be a member of this group. At the time, I was still very much a beginner with Snowflake and I knew I had to become more involved if I wanted to join the Superhero program. Compared with other popular on-prem data platforms that have been on the market for decades, Snowflake as a cloud data platform is relatively younger which means that there still aren’t vast knowledge bases of information available to users and developers of the platform. There aren’t many experts out there answering questions, writing blogs, creating tutorials or YouTube videos. This is where Snowflake Data Superheroes come in. We (yes, I can use the pronoun “we” now since I have been officially welcomed to the club) are here to fill the gap. I have been...
Read More