Hi guys,
We're looking at storm to solve a message processing scenario that needs to be horizontally scalable for high projected volume. The use case goes like this: 1.- receive messages from external source. 2.- generate a set of messages from this external input, based on rules. 3.- persist these message sets into a DB. There is no update, the use case is insert only. Currently we have implemented this as a PoC using a Spout for step 1, a bolt for step 2 and a bolt for step 3. There is no aggregation or partitioning between steps, we're just doing shuffling between bolts and looking to scale by just throwing nodes at it. We need to guarantee exactly once processing - but bolt #3 is going to a database. How does one guarantee that the scenario where the DB transaction is successful but for some reason the spout decides to retry and we get duplicate entries? We don't see the need for batching, and we don't quite see how Trident would help us in this case. If you could offer suggestions of alternatives (or point out what are we missing about Trident), we would be very grateful. Thanks in advance, JG
