Sheeri Kritzer » Blog Archive » Data Warehousing Tips and Tricks – The MySQL She-BA
It’s not easy to do a DW in MySQL — but it’s not impossible either. Easier to go to Teradata than to write your own.
I need to learn more about MERGE TABLES and INSERT … ON DUPLICATE KEY UPDATE
3 replies on “Data Warehousing Tips and Tricks”
The manual pages for the MERGE storage engine:
http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
and
http://dev.mysql.com/doc/refman/5.0/en/merge-table-problems.html
MySQL Forums for the MERGE talbe are at:
http://forums.mysql.com/list.php?93
In a nutshell, a MERGE table is really a set of pointers to similarly-schema’d MyISAM tables. So if you have the same table schema multiple times (ie, partition per day, so you have tables named 2007_04_27_Sales, 2007_04_26_Sales, etc) you’d use a MERGE table to link them all together and then you can run a query on the MERGE table and it will query all the tables that the MERGE table points to.
As for INSERT . . . ON DUPLICATE KEY UPDATE —
MySQL gives many ways to deal with INSERTs and unique/primary keys. If you do an INSERT and the primary key you are trying to insert is already in the table, MySQL will give an error. Ways to deal with this:
1) Try & catch errors in the application code.
2) Use INSERT IGNORE INTO . . . this will insert a new record if a record with the key does not exist. If it does exist, nothing happens.
3) Use REPLACE INTO . . .this will insert a new record if a record with the key does not exist. If a record does exist, MySQL will *delete* the record and then INSERT your record. This can cause problems when you just want to update part of a row, and not insert the whole row again. And it changes timestamps and auto-increment numbers, which may not be a desired result.
4) Use INSERT . . .ON DUPLICATE KEY UPDATE. The syntax is the regular INSERT statement, and at the end add ON DUPLICATE KEY UPDATE [expression]. For instance,
INSERT INTO tbl (id,name,thing) VALUES (154,’sheeri’,’book’) ON DUPLICATE KEY UPDATE thing=’book’;
and what makes it easier, if you have variables or whatever in your VALUES, you can actually set the update statement to say “just use the value I wanted to insert, OK?” as in the following:
INSERT INTO tbl (id,name,thing) VALUES (154,’sheeri’,’book’) ON DUPLICATE KEY UPDATE thing=VALUES(thing);
Manual page:
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
Hope this helps!
[…] and you shall receive: http://face.centosprime.com/rdb-w/?p=68 linked to my previous post on the Data Warehousing Tips and Tricks session […]
THANKS!
I can go concentrate on how best to use these.
Especially happy to know about REPLACE INTO. A “nice” way to do an UPSERT for my example cases. Now If I only knew when a source system deleted a row.
Assuming a table with 100 columns, 500,000 rows.
Test REPLACE INTO against INSERT … ON DUPLICATE UPDATE.
Assume all 500,000 rows get “modified” every time.
compare to TRUNCATE and LOAD DATA which solves the immediate problem of “what got deleted”.