We are having a major issue trying to update Cassandra database where we
see race condition in a bolt.

Here is an example,

I have a columnfamily, where i have 2 partitioning columns say X and Y.
There is another columns Z which basically aggregated number. We are
suppose to update Z based on X and Y. Storm is reading a huge volume of
data from Kafka. When sport receives a message, first bolt reads the
database for that combination of X and Y and get the value of Z. Then it
updates the value Z and store it back into the database. Bolt parallelism
is set to be 4 which mean 4 instances of bolt are trying to update the
database. So when first bolt (B1) read the value of Z to be say 100, same
time the second bolt (B2) also read it to be 100, but once B1 completed
execution and the value of Z is now 150, B2 still has 100 so the value of Z
is out of sync.

How can we prevent the race condition like this? This is causing a major
nuisance to us.

Any help is highly appreciated. Thanks.

--
Kushan Maskey

Reply via email to