A widespread reason why projects don’t meet expectations is that they fulfill specifications to the dot but neglect to consider what the users really wanted. IT professionals typically argue that the application that they delivered does exactly what the requirements say. Unfortunately, requirements are sometimes ambiguous, incomplete, or just plain wrong. Sure, it is not IT’s job to correct wrong requirements, but a little more flexibility on both sides, business and IT, could avoid many misunderstandings and unmet expectations.
Getting data in is disconnected from getting data out
In my data warehousing experience, I frequently come across the understanding that data integration or getting data into the data warehouse is completely unrelated to business intelligence or getting data out of the data warehouse in the form of querying or reports. This is similar to implementing projects according to specifications (getting data in) but neglecting a deep understanding of what the users actually wanted (getting data out).
Technically, we might use different tools and technologies to implement each of the two components of a data warehouse. Typically this would require different skills and probably different individuals who work on each of the components. But overall, data should flow seamlessly from the source system through the data warehouse layers until it reaches the end users. Most often, when there is a discrepancy in a report or some data looks suspicious to the end users, they would trace the data flow back through the layers until they find the reason for the problem. The flow should work both ways, from the source to the target as well as from the target to the source.
Developers too frequently just develop, but they might not bother to check for example if their data conversion is meaningful. They usually know how to insert data to create slowly changing dimensions type 2 (the internet is full of tutorials on how to do that in various ETL tools), but they don’t understand how to query this data to find out how it was changing through history (there are so few resources out there that teach how to query historical data that I decided to write my own mini tutorial here). Sadly, developers often code ETL procedures according to data mapping without considering technical optimization. This is not always the best course of action, especially because data mapping is usually done by business analysts who understand data but they do not necessarily know how to write efficient queries.
On the other hand, business intelligence developers know how to query the data and produce reports, but they don’t necessarily know how this data flows in, so we have situations where values in the data are unexpected, but they can’t explain why this is so. When reports that they produce show results that are not correct, it decreases trust from the business user community. Unfortunately, business intelligence developers often do not know how to trace the error back to the source system.
This is where we most often run into problems: ETL developers don’t understand the meaning of the data that they are populating while business intelligence developers don’t know how the data flows through the system. And we have the same story all over again: reports from the data warehouse don’t meet expectations.
What can we do
What can we do to bring both sides of the story closer together and to maximize the output? An obvious first course of action would be to allow all developers, from the ETL and the business intelligence team to work closely together. Instead of having ETL developers first populate a large number of data warehouse tables and only after that is done, having business intelligence developers create a large number of reports, both groups should work closely together by populating a few tables first and creating a few reports and jointly figuring out whether the reports show satisfactory results.
At the same time, we should get the business users involved to validate those reports. Having business on board from the beginning also helps to manage expectations and to catch and fix problems early, while at the same time building trust and a sense of joint accomplishment by involving all teams to work towards a common goal.