The Snowflake SnowPro Advanced Data Engineer certification is considered tough. After taking the exam, I understand where the “tough” comes from. It’s not so much about the difficulty of the exam as about the vastness of the range of topics covered.
Sometimes, it’s difficult to judge what the responsibilities of a data engineer are. On one extreme, a data engineer is nothing but a developer who receives the requirements and implements them in the pipeline. On the other extreme, a data engineer is expected to understand the platform, configure it, design the security, architecture, and automation, while also performing data analysis. In the real world, data engineers usually fall somewhere between the two extremes. The Advanced Data Engineer exam tests the full spectrum, which covers:
- Data Movement: ingest data from various formats and load into Snowflake, design data pipelines, build data sharing solutions
- Performance Optimization: configure the pipelines for the best performance and troubleshoot queries that perform poorly
- Storage and Data Protection: understand data recovery features, including time travel, implement data clustering, and analyze micro-partitions
- Security: understand security principles, system-defined roles, and data governance
- Data Transformation: implement user-defined functions, external functions, stored procedures, transform semi-structured data
While all of these topics are undoubtedly valuable for a data engineer to know, understand, and use, I still argue that not all data engineers are – nor should be – responsible for all of them. For example, data transformation and working with semi-structured data is more in the domain of data analysts who could help data engineers write the transformation queries when they are added to the pipeline. Data sharing is typically outside the hands of data engineers because it requires additional privileges and configuration, which are usually performed by Snowflake administrators. Security is always a hot topic.
Of course, data engineers must know how to connect to Snowflake securely and how to ensure the data is accessible only to authorized users through role-based access control. But the fine-tuning of access control, such as row access policies and ensuring data compliance via masking policies, is more often in the hands of solution architects who gather the requirements, discuss the best approach with the business users, and design a sound solution that the data engineers then implement. In the real world, data engineers usually look after the technical aspects of designing and implementing data pipelines. Any additional requirements that are not defined in enough detail, such as designing data governance, often result in half-baked solutions that are haphazardly put together to meet implementation deadlines but may lack an in-depth understanding of the big picture.
I usually work as a system architect, so I didn’t mind having the full spectrum of the topics covered on the data engineer exam. But thinking in terms of a typical data engineer role that I frequently encounter on projects where I work, I don’t see data engineers having to master all of the above.