Analytics in the Lean Startup movement and beyond

Analytics in the Lean Startup movement and beyond

The book Lean Analytics by Alistair Croll and Benjamin Yoskovitz builds on the Lean Startup movement. The authors say that we live in a digital world, and we can build something on the cheap, measure its effect, and learn from it to build something better next time. Lean Analytics is used to measure the progress, helping you to ask the most important questions and get clear answers quickly. In the book, the authors show you how to figure out your business model and your stage of growth. They further explain how to find the One Metric That Matters to you right now, and how to draw a line in the sand so you know where you stand. Analytics is about tracking the metrics that are critical to your business, such as where your revenue comes from, how much you are spending, how many customers you have, and so on. In a startup, you don’t always know which metrics are key, because you...
Read More
Learning MapReduce

Learning MapReduce

Chances are that when you started learning MapReduce, the first example that was covered was counting how many times a word appears in a given text or set of texts. This example is sometimes referred to as the “Hello, world!” of MapReduce. The example is straightforward enough and it explains how MapReduce works nicely. But once you’ve got “Hello, world!” out of the way, what next? How do you become comfortable applying the principles of MapReduce in real world situations? For me, the book MapReduce Design Patterns by Donald Miner and Adam Shook was the next step. The authors of the book state that “…motivation for us to write this book was to fill a missing gap we saw in a lot of new MapReduce developers. They had learned how to use the system, got comfortable with writing MapReduce, but were lacking the experience to understand how to do things right or well.” They further explain that the intent of this book...
Read More
Can you really build a data warehouse in 15 minutes?

Can you really build a data warehouse in 15 minutes?

Has this happened to you before? You spend months designing, building, testing, and delivering a data warehouse solution. You feel a sense of accomplishment, not just because you delivered on time and on budget, but because you delivered something that was needed and that is now being used and appreciated by the end users. And then some vendor walks in and laughs at the amount of time you spent developing your solution. They say to management: why don’t you just buy our tool, and you could click-click-click have your solution ready in 15 minutes! Sometimes I want to grab such vendors by the neck and shake them: do you really know what you are talking about? Before expanding on this topic, let’s be realistic: today’s data warehouses are not keeping up with the big data explosion. It takes too long to deliver reports to the business users who may have lost interest in the time it took from initial enthusiasm until they...
Read More
Business intelligence is not on the way out … but ETL may be

Business intelligence is not on the way out … but ETL may be

Recent references to business intelligence being on the way out (examples here and here) may be a result of misinterpreting what Gartner said about business intelligence competency centers being dead. Probably everyone agrees that business intelligence itself is not on the way out. It is evolving from the traditional data warehouse and single version of truth to a more self-service, distributed, in the cloud format. We can rest assured that there will continue to be a need for data analysis and reporting. Business users will still want their financials and sales figures and market share in a spreadsheet despite new trends in self-service data access. They will not all learn to become their own data scientists. Many business users are not tech-savvy enough to be able to get their own data from various sources so they will still require support from business intelligence professionals. The new BI To fulfill modern requirements and ways that business wants to exploit data, BI will shift from building a single...
Read More
Big data and personalization: where do we draw the line?

Big data and personalization: where do we draw the line?

As a private citizen, I am ever more annoyed with the rising amount of advertising that hits me everywhere: on the Internet, in magazines and newspapers, on billboards, in public transportation, even in public toilets. It is especially annoying when I am bombarded with ads for stuff that I don’t want because the advertisers missed the boat on personalization. On the other hand, as a professional, my job is to analyze data to come up with insights about behavior, trends and to provide recommendations that can be packaged into advertising. And so I am conflicted: am I really doing to others what I don’t want to be done to me? Personalizing ads It frightens me how much personal information the advertisers have about us. I came across one example in the book Thank You for Being Late: An Optimist's Guide to Thriving in the Age of Accelerations. The author of the book Thomas L. Friedman explains that a lot of people may not realize...
Read More
Beyond dimensional modeling: what lies ahead?

Beyond dimensional modeling: what lies ahead?

Ever since Ralph Kimball, the guru of dimensional modeling, announced his retirement, it has felt like the end of an era. He was among the first data warehousing/business intelligence pioneers some 20 years ago and although there has been much advancement in the field since then, his dimensional modeling principles are still strongly rooted and widely used even today. Dimensional modeling is easy to understand because it clearly represents measures that are used in business and dimensions by which we analyze them. The dimensional data model can be shared with business users which allows better alignment between the technical implementation and the intended use. However, dimensional modeling is best suited for relational or OLAP databases. In order keep up with big data trends we should examine how to extend dimensional modeling to make it fit with the latest trends. This article was first published in MonitorPro magazine, VI. 2015, p. 34-35 ...
Read More
Big data is changing our approach to ETL

Big data is changing our approach to ETL

Loading data into data warehouses, also known as the ETL process, is an established way of taking data from the source systems and bringing it into the data warehouse. The process consists of three steps: Extract data from the source systems, Transform the data so that it conforms to the data warehousing environment and finally Load it. With the proliferation of big data and Hadoop as the underlying technological platform we may have to rethink traditional approaches to loading data. The ETL process may not be the best or most efficient way of loading big data. -- Article published in MonitorPro magazine, 01/15, p. 28-29...
Read More
Alternatives to MapReduce

Alternatives to MapReduce

When we talk about Big Data, we nearly always associate Hadoop as the platform for storing huge amounts of data in a distributed environment. We often include MapReduce as the programming model for processing large data sets. Although it has been known for some time that MapReduce has limitations and is thus not universally applicable to all types of data processing, it has been the only available programming model until recently. Alternatives to MapReduce are emerging that challenge its existence in the future. -- Article published in MonitorPro magazine, 02/2015, p. 24-25...
Read More
Yes, you can do data science in Excel

Yes, you can do data science in Excel

The book Data Smart: Using Data Science to Transform Information into Insight by John W. Foreman is one of those fantastic books that upon reading it I kept on asking myself why I haven't come across it sooner. The author makes everything about data science appear less mysterious and so much clearer. On the one hand, the book introduces real life case studies of data science problems that can be solved using algorithms such as k-means clustering, regression, network clustering, optimization methods, ensemble models, prediction and the like. On the other hand, each of these case studies is implemented in Excel. Yes, that's correct, data science can be done in Excel if we really want. It's the perfect tool to use for case studies because everyone knows Excel and thus the algorithms can be explained without the added complexity of having to learn a data mining technology, such as R for example. We probably wouldn't use an Excel spreadsheet to process huge volumes...
Read More
Big data for small business

Big data for small business

Article published in MonitorPro 04/14, p. 39-40. Do small businesses have big data and if so, how can they take advantage of big data analytics? The perception that small businesses have too little data or that there is no time or interest to perform data analytics or that it is too expensive is simply not true. They can still gain valuable insights from data, they just have to apply analytics in the right scope. Big data is relative. What was perceived as big data some years ago is not big data today. Data volumes and storage capacity are constantly increasing. What we consider big data today may not be big data tomorrow. Big data may be considered as data that reaches the limits of the technical storage capacity that is available. If a small business has so much data that it fills up its disk space, this could be perceived as big data within their scope. Big data and analytics We often interchange the terms big...
Read More