DATA_FAIR, a Data Engineering and Data Science Conference

DATA_FAIR, a Data Engineering and Data Science Conference

I spent an incredible day at the DATA_FAIR, a conference dedicated to fostering an inclusive environment for knowledge exchange, networking and upskilling in data engineering and data science. It was a day packed with learning from my peers, meeting new like-minded individuals and exchanging experiences. The focus of the conference was on practical applications of data engineering technologies, current and emerging trends in ML and AI, followed by a round-table discussion about ethical data engineering. My contribution to the conference was to speak about "The Role of a Data Engineer in a Data Mesh Architecture". I explained the traditional data warehousing architecture, the challenges of this architecture that include long time to delivery, low flexibility, and dependence on the IT department for implementation. Because the ability to use data for decision-making is critical to company success, companies should empower their employees with easy access to the data they need. According to Zhamak Dehghani, the founder of Data Mesh, we must start thinking outside...
Read More
Getting Started with Data Engineering and ML using Snowpark for Python

Getting Started with Data Engineering and ML using Snowpark for Python

The Data Cloud World Tour is a series of Snowflake events across the globe that aim to share Snowflake’s latest innovations and customer use cases. This year, I attended the Data Cloud World Tour in Dubai and because my company In516ht was one of the sponsors, I was asked to deliver the hands-on workshop. The workshop was based on the Getting Started with Data Engineering and ML using Snowpark for Python quickstart. Attendees were given instructions to set up their laptops with the pre-requisite software, including Python with some additional packages (snowflake-snowpark-python, snowflake-ml-python, pandas), a git client to clone the starter repository and a free trial Snowflake account. The first part of the workshop covered Data Engineering with the following lessons: Configure the connection parameters and establish a connection from Snowpark Python to Snowflake Retrieve data from Snowflake tables that were set up initially into Snowpark data frames Perform exploratory data analysis on the data frames Use data frame methods to query...
Read More
Artificial Intelligence for Project Managers

Artificial Intelligence for Project Managers

With the current wave of Generative AI opportunities permeating all aspects of work and personal life, I grabbed the chance to enroll in PMI's Generative AI Overview for Project Managers course. According to the course introduction, Generative AI will impact project management in various ways, including in improving project delivery success rates, benefits realization, societal impact of projects with global influence, and career advancements for project managers. The World Economic Forum predicts that 75% of companies might adopt some form of AI technologies by 2027. Time for project managers to get ready. The course illustrates many practical ways that Generative AI tools, such as ChatGPT, Bard, and other emerging tools can help project managers in various scenarios, such as: cost-benefit analysis developing a business case justification creating a project charter calculating earned value creating agile user stories prepare talking points to communicate with difficult stakeholders give advice on how to communicate more empathically It can also help project managers with repetitive tasks which...
Read More
Snowflake Role Based Access Control (RBAC)

Snowflake Role Based Access Control (RBAC)

Snowflake recommends that roles are used for authorizing access to database objects instead of allowing direct access to objects by individual users. Roles may be granted to other roles, and this enables the Snowflake administrator to create access control hierarchies that act as building blocks for creating an overall access control strategy. There is some excellent information out there about Snowflake Role Based Access Control or RBAC for short that can be used as a starting point to learn the basics, such as this document on Snowflake Community and a series of posts written by John Ryan here, here and here. In this post I want to summarize the concepts of role based access control and then point out some additional considerations when implementing it. Here is a summary of best practices when setting up role based access control (RBAC): Define a set of functional roles that will be granted to users according to how they will be using the database, for example...
Read More
SnowPro Core Certification

SnowPro Core Certification

I'm thrilled that I passed the SnowPro Core Certification exam! It was challenging because to be successful, exam takers must get at least 80% correct answers, which is quite high as compared to other industry exams (edit October 2021: the passing score has since been lowered). Before taking the exam, I wanted to be sure that I was well prepared so that there would be no surprises. In this post I will share how I prepared for the exam and some thoughts about the experience in general. Preparing for the exam The first step when taking any certification exam is to review the exam contents and have an understanding of what is covered on the exam, what types of questions there will be, how many questions and how long it would take. Then study time begins. At the time of taking the exam I had just under a year of experience on Snowflake. Some of this experience was doing exercises on my own and...
Read More
Hadoop vs. relational databases

Hadoop vs. relational databases

People with limited knowledge of Hadoop sometimes ask me why do we need a new data storage technology? Why not stay with true and tested relational database technology? Why not indeed? In this post I will discuss the main differences between Hadoop and relational databases and some reasons why we want to use one versus another. Hadoop is technically not a database so when we compare it to relational databases it appears as if we are comparing apples to oranges. But Hadoop is actually used to store data sets across a cluster of computers although it behaves like a distributed file system. It is designed to store very large files and is fault-tolerant by replicating blocks of data within the cluster. From the point of view of being able to store large volumes of data, we can thus continue to compare it to relational databases. I am in no means suggesting that we have to use Hadoop rather than traditional databases because...
Read More
Main components of chatbot development

Main components of chatbot development

Artificial Intelligence is becoming ubiquitous and nowadays anyone can get their hands on natural language processing technologies. One example of an application of natural language processing is a chatbot that provides customer support or augments call centers by supplying computer generated responses to customer questions. Building a chatbot that provides customer support on a website is technologically quite feasible. However, comparing chatbot development with typical software application development, there are major differences. The key aspect of chatbot development is natural language understanding for which we can't provide completely detailed specifications up front. Natural language means that customers may pose questions in many different forms, not all of which can be planned ahead. How do we begin to develop a chatbot? Below I have listed the main components and addressed some challenges to be overcome when building chatbots. Define the purpose of the chatbot Chatbots come in different types, depending on which target group of users they address and which business problem they are meant...
Read More
Coursera IBM Data Science Professional Certificate

Coursera IBM Data Science Professional Certificate

Recently I came across the IBM Data Science Professional Certificate set of courses on Coursera and I wanted to brush up my data science knowledge. I had taken a similar series of courses some six years ago and the first thing I noticed this time is how much the field has advanced. Six years ago, the language of choice was R, which I never really embraced, mostly because I saw it as an archaic language that has no business in the 21st century. I’m really happy that nowadays it appears that python is the language of choice. I love python, and this is definitely one of the reasons why I enjoyed doing this set of courses. Next topic is Pandas. I hate dataframes. As an SQL person, I find it frustrating that I can’t just simply select and group by and join tables using syntax that comes off the top of my head in seconds. With dataframes, I struggle to do even...
Read More
Promises and challenges of artificial intelligence

Promises and challenges of artificial intelligence

Although artificial intelligence has been around for many decades, it has emerged as the next big thing only in recent years. Everyone is talking about it but in my view not everyone knows precisely what it is and what it is used for. I hear the term artificial intelligence used in many different contexts, anywhere from simple data analysis to full-fledged robots with the potential to conquer the Earth. Let’s look further into what is artificial intelligence, how it is applied, how mature it is and what are its promises and challenges. In science fiction, artificial intelligence is often portrayed as computers or robots with human-like characteristics, sometimes also in human form, but not necessarily. A machine that thinks like a human can be a representative example of artificial intelligence. It reminds me of HAL, the computer in 2001: A Space Odyssey, a prime example of artificial intelligence that is willing to exert extreme measures to protect its own existence. But generally, artificial...
Read More
Moving data into production at the click of a button

Moving data into production at the click of a button

Alongside with DevOps, which ideally means deploying code into production with the click of a button through a well-defined process, data professionals are thinking about a similar concept for moving data into production. What if a business user wants new data in an analytical application, can we do it with the click of a button? Despite new technology that can handle vast volumes of data at lightning speed, organizations are still struggling with implementing analytics in a timely manner, with good quality data and before the results become obsolete in the constantly evolving environment. Roadblocks There are misconceptions about what it takes to capture, store, organize and analyze data and eventually get it into a productive environment. While tool vendors want you to believe that it all happens at the click of a button, the reality is that data typically can’t be used without prior knowledge and understanding of what it represents. Data often has errors that must be recognized and dealt with,...
Read More