Kushan, Thats strange if you are using fieldsGrouping than this shouldn't be a problem as there is one instance of your bolt updating one (x,y) values. It probably helps if you can paste your topologybuilder part of the code. -Harsha
On Tue, Jan 20, 2015, at 01:11 PM, Kushan Maskey wrote: > Not at the moment. We have been using KafkaSpout for all the other > projects but have not looked into using trident. How would it help > resolve the issue we are facing at the moment. We also need to keep in > mind the development time it would take to implement triedent. While > KafkaSpout has been working fine with all the other projects. > > -- > Kushan Maskey > > On Tue, Jan 20, 2015 at 3:05 PM, Rajiv Onat <[email protected]> wrote: >> Seems like stateful processing, have you looked at using trident ? >> >> -Rajiv >> >> On Jan 20, 2015, at 12:26 PM, Kushan Maskey >> <[email protected]> wrote: >> >>> Thanks Keith and Itai, >>> >>> We are using fieldGrouping. Initially we were using suffleGrouping, >>> we saw this problem and then moved to fieldGrouping, with better >>> result, until now. I am thinking due to bolts parallelism which we >>> have set it to 4, is the culprit here. My understanding of >>> parallelism is threading, correct me if I am not incorrect. >>> >>> -- >>> Kushan Maskey >>> >>> On Tue, Jan 20, 2015 at 1:03 PM, Itai Frenkel <[email protected]> >>> wrote: >>>> Hello, >>>> >>>> Are you familiar with field grouping ? The idea is that the same >>>> bolt instance would always update the value of a specific key >>>> (similar to web load balancer cookie stickiness). >>>> https://storm.apache.org/documentation/Concepts.html >>>> **"Fields grouping***: The stream is partitioned by the fields >>>> specified in the grouping. For example, if the stream is grouped by >>>> the "user-id" field, tuples with the same "user-id" will always go >>>> to the same task, but tuples with different "user-id"'s may go to different tasks."* >>>> ** >>>> Itai >>>> >>>> *From:* Kushan Maskey <[email protected]> *Sent:* >>>> Tuesday, January 20, 2015 8:55 PM *To:* [email protected] >>>> *Subject:* URGENT!! Race condition >>>> >>>> We are having a major issue trying to update Cassandra database >>>> where we see race condition in a bolt. >>>> >>>> Here is an example, >>>> >>>> I have a columnfamily, where i have 2 partitioning columns say X >>>> and Y. There is another columns Z which basically aggregated >>>> number. We are suppose to update Z based on X and Y. Storm is >>>> reading a huge volume of data from Kafka. When sport receives a message, first bolt reads the database for that combination of X and Y and get the value of Z. Then it updates the value Z and store it back into the database. Bolt parallelism is set to be 4 which mean 4 instances of bolt are trying to update the database. So when first bolt (B1) read the value of Z to be say 100, same time the second bolt (B2) also read it to be 100, but once B1 completed execution and the value of Z is now 150, B2 still has 100 so the value of Z is out of sync. >>>> >>>> How can we prevent the race condition like this? This is causing a >>>> major nuisance to us. >>>> >>>> Any help is highly appreciated. Thanks. >>>> >>>> -- >>>> Kushan Maskey >>>> >>> >
