Database archiving is becoming an important new topic for data managers. The need for this function has surfaced at most IT organizations, and the problems it addresses are only getting bigger and bigger. These problems include challenges with data retention requirements, application renovations and e-discovery. Most IT data managers recognize the problems but many do not associate database archiving as a solution. This will change as the technology matures and spreads.
Category: Data Warehouse
Alternative 5: The PathString Attribute
By now you’re probably desperate for a recommendation. Two years ago, a clever student in a Kimball University modeling class described an approach that allows complex ragged hierarchies to be modeled without using a bridge table. Furthermore, this approach avoids the Type 2 SCD explosion described in Alternative #1, and it works equally well in both OLAP and ROLAP environments.
A student attending one of Kimball Group’s recent onsite dimensional modeling classes asked me for a list of “Kimball’s Commandments” for dimensional modeling. We’ll refrain from using religious terminology, but let’s just say the following are not-to-be-broken rules together with less stringent rule-of-thumb recommendations.
As our data warehousing process grows and the workflows get more complex, we’ve revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we’re better off using something else as long as a distributed processing platform is the only thing that can get the job done. I’m also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM’s Infosphere Streams, and other similar approaches. Still, I think I’ll continue to rely on something else for a while and see how things shake out. Continuous integration clearly is the future, but there are many ways to get there.
via What we’re looking for in a data integration tool – Fishpool.
Simple Steps to Sustainable ETL
Notwithstanding the claims of some DW appliance vendors that you just “wheel it in and slap your data on (just load, and don’t worry where it is) and then you do the analytics” a lot of organisations go along the long-established route of regular batch data loads to a data warehouse. These traditional data warehouses often have long life spans, they run day in day out for many years; and in my opinion to do that you need sustainable, supportable ETL code.
via Rittman Mead Consulting » Blog Archive » Simple Steps to Sustainable ETL.
New data sources and BI delivery modes make it that much harder for EDW initiatives to succeed. Here are eight recommendations for controlling project costs and reducing risks.
Materialized Views and Partitioning are two key Oracle features when working with data warehouses. Using them together though can sometimes cause unexpected problems when you need to refresh them, as we found on a recent project. Here’s what happened, reproduced using the SH Sample Schema.
via Rittman Mead Consulting » Blog Archive » Materialized View and Partitioning “Gotchas”.
Fact Tables
Fact tables are the foundation of the data warehouse. They contain the fundamental measurements of the enterprise, and they are the ultimate target of most data warehouse queries. Perhaps you are wondering why it took me so long to get to fact tables in DM Review? Well, there is no point in hoisting fact tables up the flagpole unless they have been chosen to reflect urgent business priorities, have been carefully quality assured and are surrounded by dimensions that provide a wealth of entry points for constraining and grouping. Now that we have paved the way for fact tables, let’s see how to build them and use them.
via Fact Tables.
Data stewards are the liaisons between business users and the data warehouse team, and they ensure consistent, accurate, well-documented and timely insight on resources and requirements.
Fact Tables – Kimball
Fact tables are the foundation of the data warehouse. They contain the fundamental measurements of the enterprise, and they are the ultimate target of most data warehouse queries. Perhaps you are wondering why it took me so long to get to fact tables in DM Review? Well, there is no point in hoisting fact tables up the flagpole unless they have been chosen to reflect urgent business priorities, have been carefully quality assured and are surrounded by dimensions that provide a wealth of entry points for constraining and grouping. Now that we have paved the way for fact tables, let’s see how to build them and use them.