Bob, if "real time" means "up to a few minutes is acceptable" then I'd recommend you use storm to do any pre-load processing and write the result to a text/csv/etc file in a directory. Then use a seperate utility (most databases have something that does this) to load data from the files you create into the database.
This sounds slower, but remember that establishing a connection to a database to run a SQL INSERT has noticable latency. It's also true that each connection (usually) takes a port/socket, memory and is often a seperate OS task so you are consuming resources that you would probably want storm using. There are other solutions for something closer to real time, but they require an in-memory database or "fun with caching" which will require specialized expertise. HTH ________________________________ From: Adaryl "Bob" Wakefield, MBA [[email protected]] Sent: Friday, March 06, 2015 7:54 PM To: [email protected] Subject: real time warehouse loads I’m looking at storm as a method to load data warehouses in real time. I am not that familiar with Java. I’m curious about the actual mechanism to load records into tables. Is it just a matter of feeding the final result of processing into a INSERT INTO SQL statement or is it more complicated than that? It seems to me that hammering the database with SQL statements of real time data is a bit inefficient. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData
