A different way of data warehouse data modeling: Data Vault

A different way of data warehouse data modeling: Data Vault

In data warehousing and business intelligence implementations we usually start by choosing between two most popular approaches. One of them is the enterprise normalized data warehouse approach as defined by Bill Inmon, the father of data warehousing. The second approach is a collection of dimensional data marts based on a common bus architecture as popularized by Ralph Kimball. In addition to these two we can always choose other approaches, such as a combination of the above or something completely different. An example of a different approach is the Data Vault. -- Article published in MonitorPro magazine, IV. 2015, p. 28-29...
Read More
Big data is changing our approach to ETL

Big data is changing our approach to ETL

Loading data into data warehouses, also known as the ETL process, is an established way of taking data from the source systems and bringing it into the data warehouse. The process consists of three steps: Extract data from the source systems, Transform the data so that it conforms to the data warehousing environment and finally Load it. With the proliferation of big data and Hadoop as the underlying technological platform we may have to rethink traditional approaches to loading data. The ETL process may not be the best or most efficient way of loading big data. -- Article published in MonitorPro magazine, 01/15, p. 28-29...
Read More
Alternatives to MapReduce

Alternatives to MapReduce

When we talk about Big Data, we nearly always associate Hadoop as the platform for storing huge amounts of data in a distributed environment. We often include MapReduce as the programming model for processing large data sets. Although it has been known for some time that MapReduce has limitations and is thus not universally applicable to all types of data processing, it has been the only available programming model until recently. Alternatives to MapReduce are emerging that challenge its existence in the future. -- Article published in MonitorPro magazine, 02/2015, p. 24-25...
Read More
Applying agile in traditional environments

Applying agile in traditional environments

A large number of companies are starting to implement at least a few of their projects using agile approaches. They do so because it is becoming known that agile projects are more successful and their stakeholders more satisfied. It is risky if a company starts implementing agile before they understand exactly what it means in terms of changing their way of thinking about projects and they expect automatic success as if by magic. -- Article published in MonitorPro magazine, 05/14, p. 26-27...
Read More
Yes, you can do data science in Excel

Yes, you can do data science in Excel

The book Data Smart: Using Data Science to Transform Information into Insight by John W. Foreman is one of those fantastic books that upon reading it I kept on asking myself why I haven't come across it sooner. The author makes everything about data science appear less mysterious and so much clearer. On the one hand, the book introduces real life case studies of data science problems that can be solved using algorithms such as k-means clustering, regression, network clustering, optimization methods, ensemble models, prediction and the like. On the other hand, each of these case studies is implemented in Excel. Yes, that's correct, data science can be done in Excel if we really want. It's the perfect tool to use for case studies because everyone knows Excel and thus the algorithms can be explained without the added complexity of having to learn a data mining technology, such as R for example. We probably wouldn't use an Excel spreadsheet to process huge volumes...
Read More
Big data for small business

Big data for small business

Article published in MonitorPro 04/14, p. 39-40. Do small businesses have big data and if so, how can they take advantage of big data analytics? The perception that small businesses have too little data or that there is no time or interest to perform data analytics or that it is too expensive is simply not true. They can still gain valuable insights from data, they just have to apply analytics in the right scope. Big data is relative. What was perceived as big data some years ago is not big data today. Data volumes and storage capacity are constantly increasing. What we consider big data today may not be big data tomorrow. Big data may be considered as data that reaches the limits of the technical storage capacity that is available. If a small business has so much data that it fills up its disk space, this could be perceived as big data within their scope. Big data and analytics We often interchange the terms big...
Read More
We have data: now what?

We have data: now what?

"Big data" generally refers to enormous volumes of data stored in distributed NoSQL databases on massive servers with parallel processing. Thus many people believe that advanced analytics or data science is the same as buying the required technology. While we do need technology to analyze huge amounts of data, we also have to understand the content of our data and know what questions we want to answer based on this data. -- Article published in MonitorPro 04/14, p. 36-38...
Read More
The future of data warehousing is agile

The future of data warehousing is agile

According to many sources, agile data warehousing is the way to go in the future. Recently, Larissa Moss spoke about this topic at the TDWI conference in Munich, Germany in her session Extreme Scoping: Agile approach for enterprise-class EDW/BI. The main point of her session was the importance of the availability of data that is clearly understood and correctly interpreted by the business users. Based on my experience building data warehouses and business intelligence solutions I couldn't agree more. Good data is what makes or breaks a BI solution, regardless of how many features and functions we adorn it with. How does agile fit in all this? It enables us to get the data in and out quickly, so that business users have time to validate the results even before the complete solution is built. This is how data warehousing should always have been done. We must make certain that the data we are using is right before we go any further. Boris Evelson, analyst...
Read More
Software Extension to the PMBOK guide

Software Extension to the PMBOK guide

PMI continues with its annual publishing of extensions to the Project Management Body of Knowledge (PMBOK). This year they finally published the long awaited Software Extension to the PMBOK Guide Fifth Edition. The extension includes widely accepted practices in managing software projects. The structure of the software extension is the same as the structure of the PMBOK guide. Each chapter includes extensions which are applicable to software projects. Some chapters have no extensions or very few of them, for example change management, which in software projects doesn’t differ from the widely accepted change management as described in the PMBOK. We find more extensions in the areas of risk management, communicating with stakeholders and monitoring and controlling. We know that software development presents numerous risks and that these types of projects are often not successful. We should be spending much more time and focus on risk management and relevant communication when managing IT projects. PMI does not take sides for either traditional or agile project...
Read More
Practical Scrum advice

Practical Scrum advice

Mitch Lacey's book »The Scrum Field Guide: Practical Advice for Your First Year« is an extremely useful book that helps new Scrum practitioners take the first step from learning about Scrum to actually doing it. The book does not explain the basics of Scrum and its artifacts because it assumes that the reader is well informed already and wants to go from theory to practice. Each of the 30 chapters in this book jumps right in and gives practical advice that may be used in a real Scrum project. Most of the chapters are structured similarly. At the beginning of the chapter there is a real life story as an example. Some of the topics that are covered in the chapters include: getting people on board, optimizing team performance, determining team velocity, implementing team roles, establishing core hours, presenting the case for a full time scrum master, establishing good engineering practices, understanding when we are done, release planning, decomposing user stories,...
Read More