Dimensions implement the user interface UI in a business intelligence BI tool. In a dimensional data warehouse DW/BI system, the textual descriptors of all the data warehouse entities like customer, product, location and time reside in dimension tables. My two previous columns carefully described three major types of dimensions according to how the DW/BI system responds to their slowly changing characteristics. But why all this fuss about dimensions? They are the smallest tables in the data warehouse, and the real “meat” is actually the set of numeric measurements in the fact tables. But that argument misses the point that the DW/BI system is always accessed through the dimensions. The dimensions are the gatekeepers, the entry points, the labels, the groupings, the drill-down paths and, ultimately, the texture of the DW/BI system. The actual content of the dimensions determines what is shown on the screen of a BI tool and what UI gestures are possible. That is why we say that the dimensions implement the UI.
Category: Data Warehouse
Maintaining Dimension Hierarchies
Dimensions are key to navigating the data warehouse / business intelligence system, and hierarchies are the key to navigating dimensions. Any time a business user talks about wanting to drill up, down or into the data, they are implicitly referring to a dimension hierarchy. In order for those drill paths to work properly, and for a large DW/BI system to perform well, those hierarchies must be correctly designed, cleaned, and maintained.
IntelligentEnterprise : Kimball University: Maintaining Dimension Hierarchies.
Behind every success or failure are people. People are the only differentiators. Every data warehousing DW and business intelligence BI project, whether successful or not, teaches us something. It is generally on failures that we base our new success. Having said that, it’s not always necessary that you fail to learn; you can also learn from other’s failures, 10 of which are discussed here.
IntelligentEnterprise : Kimball University: Eight Recommendations for International Data Quality
Language, culture, and country-by-country compliance and privacy requirements are just a few of the tough data quality problems global organizations must solve. Start by addressing data accuracy at the source and adopting an MDM strategy, then follow these six other best-practice approaches.
Microsoft, IBM, Oracle and Sun are now fueling the growing fire around the database-as-a-service and cloud database markets, but what’s the difference between these offerings and what’s the appeal? Database guru Don Feinberg defines terms and raises important questions about reliability and security.
Should You Use An ETL Tool?
IntelligentEnterprise : Kimball University: Should You Use An ETL Tool?
You can still hand-code an extract, transform and load system, but in most cases the self-documentation, structured development path and extensibility of an ETL tool is well worth the cost. Here’s a close look at the pros and cons of buying rather than building.
Data Fingerprinting as a General Refresh Solution:
Refreshing data warehouses can be a challenge. Truncating and reloading tables is time-consuming and wastes I/O, but common incremental refresh techniques have their problems, too. Using date-time stamps to capture row changes, for example, can turn into a major software project and puts additional processing on source systems. Log-based replication is another possibility, but it can be tricky to set up and monitor. And while third-party tools eliminate the need for custom code, they also cost money.
Data warehouse appliances have to have specialized hardware. Fiction. Indeed, most contenders except Teradata and Netezza — for example, DATAllegro, Vertica, ParAccel, Greenplum, and Infobright — offer Type 2 appliances. (Dataupia is another exception.)
IntelligentEnterprise : Kimball University: Keep to the Grain in Dimensional Modeling:
When developing fact tables, aggregated data is NOT the place to start. To avoid “mixed granularity” woes including bad and overlapping data, stick to rich, expressive, atomic-level data that’s closely connected to the original source and collection process.