Snowflake and Data Mesh

Snowflake and Data Mesh

More than ever, the ability to use data for decision-making is critical to company success. Despite this knowledge, companies are still not fully empowering their employees with easy access to the data they need. According to Zhamak Dehghani, the founder of Data Mesh, we must start thinking outside of the box because the traditional approach to managing and collecting data is not sufficient any longer. For decades, there has been a divide between operational and analytical data with ETL as the intermediary process to get data from operational systems into the analytical data warehouse. ETL, which has always been primarily in the hands of IT developers, is perceived as a bottleneck to delivering timely analytical data. Furthermore, dimensional data models are not well suited for machine learning models that have become essential. To overcome this, the data lake emerged around 2010. The idea of the data lake is to store vast amounts of semi-structured data in object stores to allow various consumers...
Read More
Snowflake account administration best practices

Snowflake account administration best practices

Although it is well documented, it shouldn’t hurt to review a few of the basic Snowflake account administration best practices, as I often see in practice that they are not followed consistently. In my experience, there are so many IT administrators that are pressed for time that they don’t bother to create security roles and policies to be followed. To make it quick, they just use the ACCOUNTADMIN role on Snowflake (or an equivalent all powerful admin role on other systems) because it allows them to do anything and everything without having to think about security best practices. Using an admin role for day-to-day tasks is a bad idea Consider this true story. The administrator tries some functionality, and it works. The user (who doesn’t have admin privileges) tries, and it’s not working. The administrator doesn’t believe the user when they report that the functionality is not working, dismissing them as incompetent. This goes back and forth a few times, until someone gives...
Read More
I’m a Snowflake Data Superhero!

I’m a Snowflake Data Superhero!

I’m so excited about becoming a Snowflake Data Superhero! I first learned about the Superhero program from Kent Graziano’s blog more than a year ago and I immediately thought it would be cool to be a member of this group. At the time, I was still very much a beginner with Snowflake and I knew I had to become more involved if I wanted to join the Superhero program. Compared with other popular on-prem data platforms that have been on the market for decades, Snowflake as a cloud data platform is relatively younger which means that there still aren’t vast knowledge bases of information available to users and developers of the platform. There aren’t many experts out there answering questions, writing blogs, creating tutorials or YouTube videos. This is where Snowflake Data Superheroes come in. We (yes, I can use the pronoun “we” now since I have been officially welcomed to the club) are here to fill the gap. I have been...
Read More
Snowflake Role Based Access Control (RBAC)

Snowflake Role Based Access Control (RBAC)

Snowflake recommends that roles are used for authorizing access to database objects instead of allowing direct access to objects by individual users. Roles may be granted to other roles, and this enables the Snowflake administrator to create access control hierarchies that act as building blocks for creating an overall access control strategy. There is some excellent information out there about Snowflake Role Based Access Control or RBAC for short that can be used as a starting point to learn the basics, such as this document on Snowflake Community and a series of posts written by John Ryan here, here and here. In this post I want to summarize the concepts of role based access control and then point out some additional considerations when implementing it. Here is a summary of best practices when setting up role based access control (RBAC): Define a set of functional roles that will be granted to users according to how they will be using the database, for example...
Read More
What is so great about cloning in Snowflake?

What is so great about cloning in Snowflake?

When companies move their infrastructure to the cloud, like for example Snowflake cloud where resources are billed according to their actual use, they become more conscious about the costs of storage and compute consumption. Those who are new to the cloud are often concerned about how much it will cost when they run a query or when they create a table instead of a view. It takes time to get used to this type of billing and also to understand the average consumption and the expected cost over time, so that it doesn’t become a day-to-day worry. Features that save costs are welcome in the cloud. One of such features in Snowflake that is not available in other databases is zero-copy cloning. This feature creates a copy of a database, a schema, or a table. The cloned object incurs no additional data storage costs because it shares all the existing micro-partitions of the original object at the time it was cloned....
Read More
How to query Slowly Changing Dimensions Type 2

How to query Slowly Changing Dimensions Type 2

Data warehouses use timestamps to keep track of data that is changing over time. A common term that is used to describe implementations of historically changing data is slowly changing dimensions. There are several types of slowly changing dimensions, explained in various sources (examples here and here). In this post I will focus on how to query slowly changing dimensions type 2 or SCD2 for short. The reason I decided to write this post is that there is very little information out there on how to query SCD2. When I Google it, I get tons of links explaining how to insert data into SCD2 tables. But there is hardly any information on how to query it. In my experience, developers who are working on ETL are usually well versed in using the ETL tools and the built-in functionalities for inserting SCD2 type data. But querying data from SCD2 tables is often misunderstood or not understood at all. To many developers and business...
Read More
A few cool SQL features in Snowflake (that are better than in Oracle)

A few cool SQL features in Snowflake (that are better than in Oracle)

For an Oracle consultant like myself with many years of experience building data warehouses and analytics solutions using the Oracle technology stack, I was naturally skeptical when I was asked to migrate an existing Oracle solution to Snowflake. Just like with any new technology, switching to Snowflake was a learning experience at first. But not to worry, I became a fan of Snowflake soon enough. There are so many cool features in Snowflake, some of them better than in Oracle, that it wasn’t difficult to make the switch to Snowflake. I’m thrilled with the ease of use of everything in Snowflake. All you have to do is to create an account and you are ready to go. There is no installation, no configuration, no figuring out how to get started as there is a web user interface ready and waiting for you when you sign up for your account. The SQL syntax in Snowflake is similar enough to Oracle that I...
Read More
SnowPro Core Certification

SnowPro Core Certification

I'm thrilled that I passed the SnowPro Core Certification exam! It was challenging because to be successful, exam takers must get at least 80% correct answers, which is quite high as compared to other industry exams (edit October 2021: the passing score has since been lowered). Before taking the exam, I wanted to be sure that I was well prepared so that there would be no surprises. In this post I will share how I prepared for the exam and some thoughts about the experience in general. Preparing for the exam The first step when taking any certification exam is to review the exam contents and have an understanding of what is covered on the exam, what types of questions there will be, how many questions and how long it would take. Then study time begins. At the time of taking the exam I had just under a year of experience on Snowflake. Some of this experience was doing exercises on my own and...
Read More
Workshop: Agile planning in practice (Together in Excellence 2020 conference)

Workshop: Agile planning in practice (Together in Excellence 2020 conference)

One of the myths about agile is that in agile there is no planning. This myth probably stems from comparison with waterfall approaches, where detailed up-front planning is the norm. In agile, there is still a need to plan over a long period of time, commit to a completion date, plan resources, and align the product to a strategic vision. But instead of detailed planning at the beginning, agile planning is spread throughout the entire development process, and it involves all team members, not just one individual who is the designated project manager. Agile planning and estimation breaks down development into small units which can deliver value to a customer. Teams plan for what they can accomplish to satisfy a customer in a short period of time. Some examples of agile planning are during daily standups, during sprint planning meetings and during release planning. In this workshop, delivered at the Together in Excellence 2020 conference, we covered topics related to agile planning...
Read More
Hadoop vs. relational databases

Hadoop vs. relational databases

People with limited knowledge of Hadoop sometimes ask me why do we need a new data storage technology? Why not stay with true and tested relational database technology? Why not indeed? In this post I will discuss the main differences between Hadoop and relational databases and some reasons why we want to use one versus another. Hadoop is technically not a database so when we compare it to relational databases it appears as if we are comparing apples to oranges. But Hadoop is actually used to store data sets across a cluster of computers although it behaves like a distributed file system. It is designed to store very large files and is fault-tolerant by replicating blocks of data within the cluster. From the point of view of being able to store large volumes of data, we can thus continue to compare it to relational databases. I am in no means suggesting that we have to use Hadoop rather than traditional databases because...
Read More