Hey ,

One way how I handle the similar problem - say if  only 1 worker slot is
there on 1 VM then based on hostname/host ip  I  will force to fetch rows
from the database .Another choice but with diff setup is using hdfs in
place of   MySQL.

eg.

if(InetAddress.getLocalHost().getHostName().compareTo("anshuStormSCsup1")==0)
   msgId= (long) (1*Math.pow(10,12)+r.nextInt(10));




On Tue, Apr 19, 2016 at 10:02 AM, Navin Ipe <[email protected]
> wrote:

> I've seen this:
> http://storm.apache.org/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.html
> but it doesn't explain how workers coordinate with each other, so
> requesting a bit of clarity.
>
> I'm considering a situation where I have 2 million rows in MySQL or
> MongoDB.
>
> 1. I want to use a Spout to read the first 1000 rows and send the
> processed output to a Bolt. This happens in Worker1.
> 2. I want a different instance of the same Spout class to read the next
> 1000 rows in parallel with the working of the Spout of 1, then send the
> processed output to an instance of the same Bolt used in 1. This happens in
> Worker2.
> 3. Same as 1 and 2, but it happens in Worker 3.
> 4. I might setup 10 workers like this.
> 5. When all the Bolts in the workers are finished, they send their outputs
> to a single Bolt in Worker 11.
> 6. The Bolt in Worker 11 writes the processed value to a new MySQL table.
>
> *My confusion here is in how to make the database iterations happen batch
> by batch, parallelly*. Obviously the database connection would have to be
> made in some static class outside the workers, but if workers are started
> with just "conf.setNumWorkers(2);", then how do I tell the workers to
> iterate different rows of the database? Assuming that the workers are
> running in different machines.
>
> --
> Regards,
> Navin
>



-- 
Thanks & Regards,
Anshu Shukla

Reply via email to