There's no such thing as a total order in a distributed system, as streams are processed in parallel. The ordering guarantee Storm provides is that tuples sent between tasks are received in the order they were sent.
Another part of your question is what kind of ordering guarantees you get during failures. With regular Storm, when a tuple fails it depends on the spout to determine what to re-emit. Suppose for example your spout emits tuples A, B, C, D, E and tuple C fails. A spout like KestrelSpout would re-emit only tuple C. KafkaSpout, on the other hand, would also re-emit all tuples after the failed tuple. So it would re-emit C, D, and E, even if D and E were successfully processed. Trident provides stronger ordering guarantees, as it provides a total ordering among the commit phases for batches. So if a batch fails to commit it will be retried indefinitely until it succeeds. See http://storm.apache.org/documentation/Trident-state.html and http://storm.apache.org/documentation/Trident-spouts.html for more info on this. On Wed, Jan 21, 2015 at 2:34 PM, Shawn Bonnin <[email protected]> wrote: > Trying to look for patterns in the input stream based on the arrival > sequence. We can use something like kafka on the input so guarantee order > but once the tuples enter the topology, how can we make sure that they are > processed in the same order as they arrived on Kafka. > > On Wed, Jan 21, 2015 at 11:30 AM, Naresh Kosgi <[email protected]> > wrote: > >> Also more information about why you need a certain order for processing >> would help in recommending how to approach the problem >> >> On Wed, Jan 21, 2015 at 2:28 PM, Naresh Kosgi <[email protected]> >> wrote: >> >>> Storm as a framework does not guarantee order. You will have to code it >>> if you would like your tuples processed in certain order >>> >>> On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <[email protected]> >>> wrote: >>> >>>> Resending... >>>> >>>> Our use case requires the tuples be processed in order across failures. >>>> >>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt >>>> that aggregates data from B & C and writes to a database. >>>> >>>> We want to make sure that when we use tuple at a time processing OR use >>>> the Trident API, the data always gets processed in the same order as it was >>>> read by our spout. Given that between Bolt B & C there would be parallelism >>>> and intermittent failures, my question is the following - >>>> >>>> How does Storm guarantee processing order of tuples? >>>> >>>> Thanks in advance! >>>> >>>> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <[email protected]> >>>> wrote: >>>> >>>>> Our use case requires the tuples be processed in order across failures. >>>>> >>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last >>>>> bolt that aggregates data from B & C and writes to a database. >>>>> >>>>> We want to make sure that when we use tuple at a time processing OR >>>>> use the Trident API, the data always gets processed in the same order as >>>>> it >>>>> was read by our spout. Given that between Bolt B & C there would be >>>>> parallelism and intermittent failures, my question is the following - >>>>> >>>>> How does Storm guarantee processing order of tuples? >>>>> >>>>> Thanks in advance! >>>>> >>>>> >>>>> >>>> >>> >> > -- Twitter: @nathanmarz http://nathanmarz.com
