Recent references to business intelligence being on the way out (examples here and here) may be a result of misinterpreting what Gartner said about business intelligence competency centers being dead.
Probably everyone agrees that business intelligence itself is not on the way out. It is evolving from the traditional data warehouse and single version of truth to a more self-service, distributed, in the cloud format.
We can rest assured that there will continue to be a need for data analysis and reporting. Business users will still want their financials and sales figures and market share in a spreadsheet despite new trends in self-service data access. They will not all learn to become their own data scientists. Many business users are not tech-savvy enough to be able to get their own data from various sources so they will still require support from business intelligence professionals.
The new BI
To fulfill modern requirements and ways that business wants to exploit data, BI will shift from building a single version of the truth towards supporting the business users to make their own data driven decisions. The BI team will produce governed data sets to be used by the business users. There will be fewer predefined reports and dashboards built by the BI team because the users will want more flexibility in building their own customized reports which would be more efficient because they already know their own requirements.
The shift in BI is also towards more real-time analytics. Everyone wants their data immediately, at this instant. They want their reports with up-to-date data. There is no patience to wait for the data warehouse to be loaded in batch overnight.
In traditional data warehouse environments, the time-to-value takes too long to achieve. Defining requirements, integrating data from various data sources, developing, testing and productionalizing a data warehouse solution typically takes months to achieve. In today’s agile business environments, months is too long. However, there are still areas of the business where a data warehouse will be required, for example in finance or for regulatory reporting where accuracy and auditability are essential.
While technology is allowing more and more business users to just connect their Excel spreadsheet to a data source and do their own data analyses to gain quick insights, this type of analysis is not production ready in the sense that the analyses are repeatable or easily shared among users or across departments. Combining data from multiple sources, creating a data model and authoring reports from it still require specialized BI skillsets.
Instead of monolithic data warehouses we can expect BI to evolve into a supporting role to allow users more self-service and faster results.
The emergence of data engineering
ETL, on the other hand, is a different story. Typical ETL tools are cumbersome, complex tools that do not allow to easily add functionality to do all those complex transformations that are not simply “click the source column and connect it to the target column”.
The data engineer is an emerging role within the big data realm. When dealing with millions and millions or records, you can’t simply use an ETL tool to point and click in order to move data across. Specialized skills and custom coding are required in combination with the newest big data technologies. ETL tools with their graphical interfaces are limited in the functionality that they provide for data transformation. The abstractions needed cannot be expressed intuitively in those tools. Code is the best abstraction for software, especially because new data sources such as NoSQL databases require specialized interfaces to access and manipulate data. New programming models such as MapReduce and Spark require custom coding with less possibility for ETL-type drag and drop.
Data engineering is a combination of business intelligence and data warehousing skills with more emphasis on software engineering. Understanding distributed systems is a must, especially when using the Hadoop ecosystem and real-time processing.
New ETL skills
Some new skills that will be required by data engineers that are not easily available through ETL tools include:
- Ingesting big data sources: web scraping, reading huge log files, fetching data from APIs, etc.
- Processing real time data pipelines: data is constantly changing, therefore instead of static reports, data is analyzed for behavior, trends, etc.
- Metadata management: cataloging data from various sources, especially from unstructured data lakes to allow users to correctly interpret what is being analyzed
We do not expect ETL to be phased out overnight. There has been too much investment in ETL tools and skills to simply throw it away. But we can expect that data volumes will continue to grow. New data sources will continue to become available on a regular basis and this trend is expected to continue. In addition to data volumes, data complexity is also increasing, requiring more and more effort to interpret it correctly. There will be an increased need for data engineers to support the technical infrastructure and at the same time less opportunity to use ETL tools as we know them.