As you mentioned already: Storm is designed to run topologies forever ;) If you have finite data, why do you not use a batch processor???
As a workaround, you can embed "control messages" in your stream (or use
an additional stream for them).
If you want a topology to shut down itself, you could use
`NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
in your spout/bolt code.
Something like:
- Spout emit all tuples
- Spout emit special "end of stream" control tuple
- Bolt1 processes everything
- Bolt1 forward "end of stream" control tuple (when it received it)
- Bolt2 processed everything
- Bolt2 receives "end of stream" control tuple => flush to DB => kill
topology
But I guess, this is kinda weird pattern.
-Matthias
On 05/05/2016 06:13 AM, Navin Ipe wrote:
> Hi,
>
> I know Storm is designed to run forever. I also know about Trident's
> technique of aggregation. But shouldn't Storm have a way to let bolts
> know that a certain bunch of processing has been completed?
>
> Consider this topology:
> Spout------>Bolt-A------>Bolt-B
> | |--->Bolt-B
> | \--->Bolt-B
> |--->Bolt-A------>Bolt-B
> | |--->Bolt-B
> | \--->Bolt-B
> \--->Bolt-A------>Bolt-B
> |--->Bolt-B
> \--->Bolt-B
>
> * From Bolt-A to Bolt-B, it is a FieldsGrouping.
> * Spout emits only a few tuples and then stops emitting.
> * Bolt A takes those tuples and generates millions of tuples.
>
>
> *Bolt-B accumulates tuples that Bolt A sends and needs to know when
> Spout finished emitting. Only then can Bolt-B start writing to SQL.*
>
> *Questions:*
> 1. How can all Bolts B be notified that it is time to write to SQL?
> 2. After all Bolts B have written to SQL, how to know that all Bolts B
> have completed writing?
> 3. How to stop the topology? I know of
> localCluster.killTopology("HelloStorm"), but shouldn't there be a way to
> do it from the Bolt?
>
> --
> Regards,
> Navin
signature.asc
Description: OpenPGP digital signature
