I am only fieldGrouping on X and not Y. Is it necessary to fieldGroup by both the fields? Is there any sample document I can look at? Thanks.
-- Kushan Maskey 817.403.7500 M. Miller & Associates <http://mmillerassociates.com/> [email protected] On Tue, Jan 20, 2015 at 3:14 PM, Nathan Leung <[email protected]> wrote: > which fields are you doing fieldsGrouping on? If you do fields grouping > on X and Y, why are you having a race condition in a separate bolt task? > Each X and Y combo should always go to the same bolt task with > fieldsGrouping, and the scenario you describe should work properly whether > you have 1 task, 4 tasks, or 100 tasks. > > On Tue, Jan 20, 2015 at 4:11 PM, Kushan Maskey < > [email protected]> wrote: > >> Not at the moment. We have been using KafkaSpout for all the other >> projects but have not looked into using trident. How would it help resolve >> the issue we are facing at the moment. We also need to keep in mind the >> development time it would take to implement triedent. While KafkaSpout has >> been working fine with all the other projects. >> >> -- >> Kushan Maskey >> >> On Tue, Jan 20, 2015 at 3:05 PM, Rajiv Onat <[email protected]> wrote: >> >>> Seems like stateful processing, have you looked at using trident ? >>> >>> -Rajiv >>> >>> On Jan 20, 2015, at 12:26 PM, Kushan Maskey < >>> [email protected]> wrote: >>> >>> Thanks Keith and Itai, >>> >>> We are using fieldGrouping. Initially we were using suffleGrouping, we >>> saw this problem and then moved to fieldGrouping, with better result, until >>> now. I am thinking due to bolts parallelism which we have set it to 4, is >>> the culprit here. My understanding of parallelism is threading, correct me >>> if I am not incorrect. >>> >>> -- >>> Kushan Maskey >>> >>> On Tue, Jan 20, 2015 at 1:03 PM, Itai Frenkel <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> >>>> Are you familiar with field grouping ? The idea is that the same bolt >>>> instance would always update the value of a specific key (similar to web >>>> load balancer cookie stickiness). >>>> >>>> https://storm.apache.org/documentation/Concepts.html >>>> >>>> *"Fields grouping**: The stream is partitioned by the fields specified >>>> in the grouping. For example, if the stream is grouped by the "user-id" >>>> field, tuples with the same "user-id" will always go to the same task, but >>>> tuples with different "user-id"'s may go to different tasks."* >>>> >>>> >>>> Itai >>>> >>>> ------------------------------ >>>> >>>> *From:* Kushan Maskey <[email protected]> >>>> *Sent:* Tuesday, January 20, 2015 8:55 PM >>>> *To:* [email protected] >>>> *Subject:* URGENT!! Race condition >>>> >>>> We are having a major issue trying to update Cassandra database where >>>> we see race condition in a bolt. >>>> >>>> Here is an example, >>>> >>>> I have a columnfamily, where i have 2 partitioning columns say X and >>>> Y. There is another columns Z which basically aggregated number. We are >>>> suppose to update Z based on X and Y. Storm is reading a huge volume of >>>> data from Kafka. When sport receives a message, first bolt reads the >>>> database for that combination of X and Y and get the value of Z. Then it >>>> updates the value Z and store it back into the database. Bolt parallelism >>>> is set to be 4 which mean 4 instances of bolt are trying to update the >>>> database. So when first bolt (B1) read the value of Z to be say 100, same >>>> time the second bolt (B2) also read it to be 100, but once B1 completed >>>> execution and the value of Z is now 150, B2 still has 100 so the value of Z >>>> is out of sync. >>>> >>>> How can we prevent the race condition like this? This is causing a >>>> major nuisance to us. >>>> >>>> Any help is highly appreciated. Thanks. >>>> >>>> -- >>>> Kushan Maskey >>>> >>>> >>> >> >
