Through education and consulting work, Kimball Group has been exposed to hundreds of successful data warehouses. Careful study of these successes has revealed a set of extract, transformation, and load (ETL) best practices. We first described these best practices in an Intelligent Enterprise column three years ago (see “The 38 Subsystems of ETL” ). Since then we have continued to refine the practices based on client experiences, feedback from students and continued research. As a result, we have carefully restructured these best practices into 34 subsystems that represent the key ETL architecture components required in almost every dimensional data warehouse environment. No wonder the ETL system takes such a large percentage of data warehouse and BI project resources!
The good news is that if you study these 34 subsystems, you’ll recognize almost all of them and will be on the way to leveraging your experience as you build your ETL system. While we understand and accept the industry’s accepted acronym, the “ETL” process really has four major components: Extracting, Cleaning and Conforming, Delivering and Managing. Each of these components and all 34 subsystems contained therein are explained below.