Data warehouses use timestamps to keep track of data that is changing over time. A common term that is used to describe implementations of historically changing data is slowly changing dimensions. There are several types of slowly changing dimensions, explained in various sources (examples here and here). In this post I will focus on how to query slowly changing dimensions type 2 or SCD2 for short.
The reason I decided to write this post is that there is very little information out there on how to query SCD2. When I Google it, I get tons of links explaining how to insert data into SCD2 tables. But there is hardly any information on how to query it.
In my experience, developers who are working on ETL are usually well versed in using the ETL tools and the built-in functionalities for inserting SCD2 type data. But querying data from SCD2 tables is often misunderstood or not understood at all. To many developers and business...
A widespread reason why projects don’t meet expectations is that they fulfill specifications to the dot but neglect to consider what the users really wanted. IT professionals typically argue that the application that they delivered does exactly what the requirements say. Unfortunately, requirements are sometimes ambiguous, incomplete, or just plain wrong. Sure, it is not IT’s job to correct wrong requirements, but a little more flexibility on both sides, business and IT, could avoid many misunderstandings and unmet expectations.
Getting data in is disconnected from getting data out
In my data warehousing experience, I frequently come across the understanding that data integration or getting data into the data warehouse is completely unrelated to business intelligence or getting data out of the data warehouse in the form of querying or reports. This is similar to implementing projects according to specifications (getting data in) but neglecting a deep understanding of what the users actually wanted (getting data out).
Technically, we might use different tools and...
Has this happened to you before? You spend months designing, building, testing, and delivering a data warehouse solution. You feel a sense of accomplishment, not just because you delivered on time and on budget, but because you delivered something that was needed and that is now being used and appreciated by the end users.
And then some vendor walks in and laughs at the amount of time you spent developing your solution. They say to management: why don’t you just buy our tool, and you could click-click-click have your solution ready in 15 minutes! Sometimes I want to grab such vendors by the neck and shake them: do you really know what you are talking about?
Before expanding on this topic, let’s be realistic: today’s data warehouses are not keeping up with the big data explosion. It takes too long to deliver reports to the business users who may have lost interest in the time it took from initial enthusiasm until they...
Recent references to business intelligence being on the way out (examples here and here) may be a result of misinterpreting what Gartner said about business intelligence competency centers being dead.
Probably everyone agrees that business intelligence itself is not on the way out. It is evolving from the traditional data warehouse and single version of truth to a more self-service, distributed, in the cloud format.
We can rest assured that there will continue to be a need for data analysis and reporting. Business users will still want their financials and sales figures and market share in a spreadsheet despite new trends in self-service data access. They will not all learn to become their own data scientists. Many business users are not tech-savvy enough to be able to get their own data from various sources so they will still require support from business intelligence professionals.
The new BI
To fulfill modern requirements and ways that business wants to exploit data, BI will shift from building a single...
Agile software development is most often associated with small companies and small development teams. The very fact that agile teams should have no more than about a dozen members sets limitations that lead us to question the possibility of doing large enterprise data warehousing projects in an agile manner. Despite the limitations, agile can nevertheless be done in large projects. It may require more organizational change than in small companies that work on smaller projects and must coordinate more than one agile development team. -- Article published in MonitorPro magazine, 03/2016, p. 22-23...
In data warehousing and business intelligence implementations we usually start by choosing between two most popular approaches. One of them is the enterprise normalized data warehouse approach as defined by Bill Inmon, the father of data warehousing. The second approach is a collection of dimensional data marts based on a common bus architecture as popularized by Ralph Kimball. In addition to these two we can always choose other approaches, such as a combination of the above or something completely different. An example of a different approach is the Data Vault. -- Article published in MonitorPro magazine, IV. 2015, p. 28-29...
Loading data into data warehouses, also known as the ETL process, is an established way of taking data from the source systems and bringing it into the data warehouse. The process consists of three steps: Extract data from the source systems, Transform the data so that it conforms to the data warehousing environment and finally Load it. With the proliferation of big data and Hadoop as the underlying technological platform we may have to rethink traditional approaches to loading data. The ETL process may not be the best or most efficient way of loading big data. -- Article published in MonitorPro magazine, 01/15, p. 28-29...
The book Data Smart: Using Data Science to Transform Information into Insight by John W. Foreman is one of those fantastic books that upon reading it I kept on asking myself why I haven't come across it sooner. The author makes everything about data science appear less mysterious and so much clearer.
On the one hand, the book introduces real life case studies of data science problems that can be solved using algorithms such as k-means clustering, regression, network clustering, optimization methods, ensemble models, prediction and the like. On the other hand, each of these case studies is implemented in Excel. Yes, that's correct, data science can be done in Excel if we really want. It's the perfect tool to use for case studies because everyone knows Excel and thus the algorithms can be explained without the added complexity of having to learn a data mining technology, such as R for example.
We probably wouldn't use an Excel spreadsheet to process huge volumes...
According to many sources, agile data warehousing is the way to go in the future.
Recently, Larissa Moss spoke about this topic at the TDWI conference in Munich, Germany in her session Extreme Scoping: Agile approach for enterprise-class EDW/BI. The main point of her session was the importance of the availability of data that is clearly understood and correctly interpreted by the business users. Based on my experience building data warehouses and business intelligence solutions I couldn't agree more. Good data is what makes or breaks a BI solution, regardless of how many features and functions we adorn it with.
How does agile fit in all this? It enables us to get the data in and out quickly, so that business users have time to validate the results even before the complete solution is built. This is how data warehousing should always have been done. We must make certain that the data we are using is right before we go any further.
Boris Evelson, analyst...
Dealing with today’s huge amounts of data requires analytics. IDC’s recent Big Data and Business Analytics forum was all about connecting data and analytics. The organizers of the event addressed questions such as: what are the forces behind the explosion within the digital universe? Technology has certainly helped, by driving down the cost of creating, capturing, managing, and storing information to a sixth of what it was in 2005. The real impetus, nonetheless, is financial. Since 2005, investments in the digital universe among enterprises have expanded some 50% to $4 trillion – money spent on hardware, software, services, and staff, ultimately, to generate revenue.
I was one of the speakers at the event. My presentation Quick results in the conversion of large amounts of data into business information was about agile approaches in the implementation of solutions for data analysis, business solutions in real time and predicting the likelihood of business events. The magazine MonitorPro wrote about me in the February 2014...