Use a message bus with a transactional get, get the message, send to cassandra, upon write success, submit to esp, commit get on bus. Messaging systems like rabbitmq support this semantic.
Using cassandra as a queuing mechanism is an anti-pattern. -- Colin Clark +1-320-221-9531 > On Jan 3, 2015, at 6:07 PM, Hugo José Pinto <hugo.pi...@inovaworks.com> wrote: > > Thank you all for your answers. > > It seems I'll have to go with some event-driven processing before/during the > Cassandra write path. > > My concern would be that I'd love to first guarantee the disk write of the > Cassandra persistence and then do the event processing (which is mostly CRUD > intercepts at this point), even if slightly delayed, and doing so via > triggers would probably bog down the whole processing pipeline. > > What I'd probably do is to write, in trigger, a separate key table with all > the CRUDed elements and to have the ESP process that table. > > Thank you for your contribution. Should anyone else have any experiende > experience in these scenarios I'm obviously all ears as well. > > Best, > > Hugo > > Enviado do meu iPhone > > No dia 03/01/2015, às 11:09, DuyHai Doan <doanduy...@gmail.com> escreveu: > >> Hello Hugo >> >> I was facing the same kind of requirement from some users. Long story >> short, below are the possible strategies with advantages and draw-backs of >> each >> >> 1) Put Spark in front of the back-end, every incoming >> modification/update/insert goes into Spark first, then Spark will forward it >> to Cassandra for persistence. With Spark, you can perform pre or >> post-processing and notify external clients of mutation. >> >> The draw back of this solution is that all the incoming mutations must go >> through Spark. You may set up a Kafka queue as temporary storage to >> distribute the load and consume mutations with Spark but it add ups to the >> architecture complexity with additional components & technologies >> >> 2) For high availability and resilience, you probably want to have all >> mutations saved first into Cassandra then process notifications with Spark. >> In this case the only way to have notifications from Cassandra, as of >> version 2.1, is to rely on manually coded triggers (which is still >> experimental feature). >> >> With the triggers you can notify whatever clients you want, not only Spark. >> >> The big draw back of this solution is that playing with triggers is >> dangerous if you are not familiar with Cassandra internals. Indeed the >> trigger is on the write path and may hurt performance if you are doing >> complex and blocking tasks. >> >> That's the 2 solutions I can see, maybe the ML members will propose other >> innovative choices >> >> Regards >> >>> On Sat, Jan 3, 2015 at 11:46 AM, Hugo José Pinto >>> <hugo.pi...@inovaworks.com> wrote: >>> Hello. >>> >>> We're currently using Hazelcast (http://hazelcast.org/) as a distributed >>> in-memory data grid. That's been working sort-of-well for us, but going >>> solely in-memory has exhausted its path in our use case, and we're >>> considering porting our application to a NoSQL persistent store. After the >>> usual comparisons and evaluations, we're borderline close to picking >>> Cassandra, plus eventually Spark for analytics. >>> >>> Nonetheless, there is a gap in our architectural needs that we're still not >>> grasping how to solve in Cassandra (with or without Spark): Hazelcast >>> allows us to create a Continuous Query in that, whenever a row is >>> added/removed/modified from the clause's resultset, Hazelcast calls up back >>> with the corresponding notification. We use this to continuously update the >>> clients via AJAX streaming with the new/changed rows. >>> >>> This is probably a conceptual mismatch we're making, so - how to best >>> address this use case in Cassandra (with or without Spark's help)? Is there >>> something in the API that allows for Continuous Queries on key/clause >>> changes (haven't found it)? Is there some other way to get a stream of >>> key/clause updates? Events of some sort? >>> >>> I'm aware that we could, eventually, periodically poll Cassandra, but in >>> our use case, the client is potentially interested in a large number of >>> table clause notifications (think "all changes to Ship positions on >>> California's coastline"), and iterating out of the store would kill the >>> streamer's scalability. >>> >>> Hence, the magic question: what are we missing? Is Cassandra the wrong tool >>> for the job? Are we not aware of a particular part of the API or external >>> library in/outside the apache realm that would allow for this? >>> >>> Many thanks for any assistance! >>> >>> Hugo >>> >>