There's no such thing as a total order in a distributed system, as streams
are processed in parallel. The ordering guarantee Storm provides is that
tuples sent between tasks are received in the order they were sent.

Another part of your question is what kind of ordering guarantees you get
during failures. With regular Storm, when a tuple fails it depends on the
spout to determine what to re-emit. Suppose for example your spout emits
tuples A, B, C, D, E and tuple C fails. A spout like KestrelSpout would
re-emit only tuple C. KafkaSpout, on the other hand, would also re-emit all
tuples after the failed tuple. So it would re-emit C, D, and E, even if D
and E were successfully processed.

Trident provides stronger ordering guarantees, as it provides a total
ordering among the commit phases for batches. So if a batch fails to commit
it will be retried indefinitely until it succeeds. See
http://storm.apache.org/documentation/Trident-state.html and
http://storm.apache.org/documentation/Trident-spouts.html for more info on
this.

On Wed, Jan 21, 2015 at 2:34 PM, Shawn Bonnin <[email protected]> wrote:

> Trying to look for patterns in the input stream based on the arrival
> sequence. We can use something like kafka on the input so guarantee order
> but once the tuples enter the topology, how can we make sure that they are
> processed in the same order as they arrived on Kafka.
>
> On Wed, Jan 21, 2015 at 11:30 AM, Naresh Kosgi <[email protected]>
> wrote:
>
>> Also more information about why you need a certain order for processing
>> would help in recommending how to approach the problem
>>
>> On Wed, Jan 21, 2015 at 2:28 PM, Naresh Kosgi <[email protected]>
>> wrote:
>>
>>> Storm as a framework does not guarantee order.  You will have to code it
>>> if you would like your tuples processed in certain order
>>>
>>> On Wed, Jan 21, 2015 at 2:24 PM, Shawn Bonnin <[email protected]>
>>> wrote:
>>>
>>>> Resending...
>>>>
>>>> Our use case requires the tuples be processed in order across failures.
>>>>
>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last bolt
>>>> that aggregates data from B & C and writes to a database.
>>>>
>>>> We want to make sure that when we use tuple at a time processing OR use
>>>> the Trident API, the data always gets processed in the same order as it was
>>>> read by our spout. Given that between Bolt B & C there would be parallelism
>>>> and intermittent failures, my question is  the following -
>>>>
>>>> How does Storm guarantee processing order of tuples?
>>>>
>>>> Thanks in advance!
>>>>
>>>> On Wed, Jan 21, 2015 at 10:57 AM, Shawn Bonnin <[email protected]>
>>>> wrote:
>>>>
>>>>> Our use case requires the tuples be processed in order across failures.
>>>>>
>>>>> So we have SpoutA sending data to bolt B &C and Bolt D is the last
>>>>> bolt that aggregates data from B & C and writes to a database.
>>>>>
>>>>> We want to make sure that when we use tuple at a time processing OR
>>>>> use the Trident API, the data always gets processed in the same order as 
>>>>> it
>>>>> was read by our spout. Given that between Bolt B & C there would be
>>>>> parallelism and intermittent failures, my question is  the following -
>>>>>
>>>>> How does Storm guarantee processing order of tuples?
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Twitter: @nathanmarz
http://nathanmarz.com

Reply via email to