I've seen this: http://storm.apache.org/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.html but it doesn't explain how workers coordinate with each other, so requesting a bit of clarity.
I'm considering a situation where I have 2 million rows in MySQL or MongoDB. 1. I want to use a Spout to read the first 1000 rows and send the processed output to a Bolt. This happens in Worker1. 2. I want a different instance of the same Spout class to read the next 1000 rows in parallel with the working of the Spout of 1, then send the processed output to an instance of the same Bolt used in 1. This happens in Worker2. 3. Same as 1 and 2, but it happens in Worker 3. 4. I might setup 10 workers like this. 5. When all the Bolts in the workers are finished, they send their outputs to a single Bolt in Worker 11. 6. The Bolt in Worker 11 writes the processed value to a new MySQL table. *My confusion here is in how to make the database iterations happen batch by batch, parallelly*. Obviously the database connection would have to be made in some static class outside the workers, but if workers are started with just "conf.setNumWorkers(2);", then how do I tell the workers to iterate different rows of the database? Assuming that the workers are running in different machines. -- Regards, Navin
