Kushan, Thats strange if you are using fieldsGrouping than this
shouldn't be a problem as there is one instance of your bolt updating
one (x,y) values. It probably helps if you can paste your
topologybuilder part of the code. -Harsha


On Tue, Jan 20, 2015, at 01:11 PM, Kushan Maskey wrote:
> Not at the moment. We have been using KafkaSpout for all the other
> projects but have not looked into using trident. How would it help
> resolve the issue we are facing at the moment. We also need to keep in
> mind the development time it would take to implement triedent. While
> KafkaSpout has been working fine with all the other projects.
>
> --
> Kushan Maskey
>
> On Tue, Jan 20, 2015 at 3:05 PM, Rajiv Onat <[email protected]> wrote:
>> Seems like stateful processing, have you looked at using trident ?
>>
>> -Rajiv
>>
>> On Jan 20, 2015, at 12:26 PM, Kushan Maskey
>> <[email protected]> wrote:
>>
>>> Thanks Keith and Itai,
>>>
>>> We are using fieldGrouping. Initially we were using suffleGrouping,
>>> we saw this problem and then moved to fieldGrouping, with better
>>> result, until now. I am thinking due to bolts parallelism which we
>>> have set it to 4, is the culprit here. My understanding of
>>> parallelism is threading, correct me if I am not incorrect.
>>>
>>> --
>>> Kushan Maskey
>>>
>>> On Tue, Jan 20, 2015 at 1:03 PM, Itai Frenkel <[email protected]>
>>> wrote:
>>>> Hello,


>>>>


>>>> Are you familiar with field grouping ? The idea is that the same
>>>> bolt instance would always update the value of a specific key
>>>> (similar to web load balancer cookie stickiness).


>>>> https://storm.apache.org/documentation/Concepts.html


>>>> **"Fields grouping***: The stream is partitioned by the fields
>>>> specified in the grouping. For example, if the stream is grouped by
>>>> the "user-id" field, tuples with the same "user-id" will always go
>>>> to the same task, but tuples with
 different "user-id"'s may go to different tasks."*


>>>> **


>>>> Itai


>>>>
>>>> *From:* Kushan Maskey <[email protected]> *Sent:*
>>>> Tuesday, January 20, 2015 8:55 PM *To:* [email protected]
>>>> *Subject:* URGENT!! Race condition
>>>>
>>>> We are having a major issue trying to update Cassandra database
>>>> where we see race condition in a bolt.
>>>>
>>>> Here is an example,
>>>>
>>>> I have a columnfamily, where i have 2 partitioning columns say X
>>>> and Y. There is another columns Z which basically aggregated
>>>> number. We are suppose to update Z based on X and Y. Storm is
>>>> reading a huge volume of data from Kafka. When sport receives a
 message, first bolt reads the database for that combination of X and Y
 and get the value of Z. Then it updates the value Z and store it back
 into the database. Bolt parallelism is set to be 4 which mean 4
 instances of bolt are trying to update the database. So when first bolt
 (B1) read the value of Z to be say 100, same time the second bolt (B2)
 also read it to be 100, but once B1 completed execution and the value
 of Z is now 150, B2 still has 100 so the value of Z is out of sync.
>>>>
>>>> How can we prevent the race condition like this? This is causing a
>>>> major nuisance to us.
>>>>
>>>> Any help is highly appreciated. Thanks.
>>>>
>>>> --
>>>> Kushan Maskey
>>>>
>>>
>

Reply via email to