As our data warehousing process grows and the workflows get more complex, we’ve revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we’re better off using something else as long as a distributed processing platform is the only thing that can get the job done. I’m also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM’s Infosphere Streams, and other similar approaches. Still, I think I’ll continue to rely on something else for a while and see how things shake out. Continuous integration clearly is the future, but there are many ways to get there.
via What we’re looking for in a data integration tool – Fishpool.