Data Warehouse

You are currently browsing the archive for the Data Warehouse category.

DBAs facing the problem of corporate data explosion have an excellent new tool to help them in the MySQL 5.0 Archive storage engine. Whether it’s a data warehousing, data archiving, or data auditing situation, MySQL Archive tables can be just what the doctor ordered when it comes to maintaining large amounts of standard or sensitive information, while keeping storage costs at a bare-bones minimum.

via MySQL :: The MySQL 5.0 Archive Storage Engine.

Should probably investigate using ARCHIVE storage for the multi-year history tables

Whether you are developing a new dimensional data warehouse or replacing an existing environment, the ETL (extract, transform, load) implementation effort is inevitably on the critical path. Difficult data sources, unclear requirements, data quality problems, changing scope, and other unforeseen problems often conspire to put the squeeze on the ETL development team. It simply may not be possible to fully deliver on the project team’s original commitments; compromises will need to be made. In the end, these compromises, if not carefully considered, may create long-term headaches.

via IntelligentEnterprise : Kimball University: Three ETL Compromises to Avoid (printable version).

What does it take to develop a robust dimensional model? Here's how to get from requirements-gathering to final approval in a process that will ferret out the good, bad and ugly realities of your source data and help you avoid surprises, delays and cost overruns.

via IntelligentEnterprise : Kimball University: Practical Steps for Designing a Dimensional Model (printable version).

Tags:

How do you deal with changing dimensions? Hybrid approaches fill gaps left by the three fundamental techniques

via IntelligentEnterprise : Slowly Changing Dimensions Are Not Always as Easy as 1, 2, 3 (printable version).

Tags:

Follow the rules to ensure granular data, flexibility and a future-proofed information resource. Break the rules and youll confuse users and run into data warehousing brick walls.

via IntelligentEnterprise : Kimball University: The 10 Essential Rules of Dimensional Modeling printable version.

Tags: ,

This article describes six key decisions that must be made while crafting the ETL architecture. These decisions have significant impacts on the upfront and ongoing cost and complexity of the ETL solution and, ultimately, on the success of the overall BI/DW solution. Read on for Kimball Group’s advice on making the right choices.

via IntelligentEnterprise : Kimball University: Six Key Decisions for ETL Architectures:

Tags:

Database archiving is becoming an important new topic for data managers. The need for this function has surfaced at most IT organizations, and the problems it addresses are only getting bigger and bigger. These problems include challenges with data retention requirements, application renovations and e-discovery. Most IT data managers recognize the problems but many do not associate database archiving as a solution. This will change as the technology matures and spreads.

via Database Archiving Basics.

Tags: ,

Alternative 5: The PathString Attribute

By now you’re probably desperate for a recommendation. Two years ago, a clever student in a Kimball University modeling class described an approach that allows complex ragged hierarchies to be modeled without using a bridge table. Furthermore, this approach avoids the Type 2 SCD explosion described in Alternative #1, and it works equally well in both OLAP and ROLAP environments.

via IntelligentEnterprise : Kimball University: Five Alternatives for Better Employee Dimension Modeling (printable version) .

A student attending one of Kimball Group’s recent onsite dimensional modeling classes asked me for a list of “Kimball’s Commandments” for dimensional modeling. We’ll refrain from using religious terminology, but let’s just say the following are not-to-be-broken rules together with less stringent rule-of-thumb recommendations.

via Kimball University: The 10 Essential Rules of Dimensional Modeling > > Intelligent Enterprise: Better Insight for Business Decisions .

As our data warehousing process grows and the workflows get more complex, we’ve revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we’re better off using something else as long as a distributed processing platform is the only thing that can get the job done. I’m also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM’s Infosphere Streams, and other similar approaches. Still, I think I’ll continue to rely on something else for a while and see how things shake out. Continuous integration clearly is the future, but there are many ways to get there.

via What we’re looking for in a data integration tool – Fishpool.

« Older entries