Many thanks once again. I rethought the target data structure, and things started coming together to allow for really elegant, compact ESP preprocessing and storage.
Best. Enviado do meu iPhone No dia 03/01/2015, às 23:53, Peter Lin <wool...@gmail.com> escreveu: > > if you like SQL dialect, try out products that use streamSQL to do continuous > queries. Espers comes to mind. Google to see what other products support > streamSQL > >> On Sat, Jan 3, 2015 at 6:48 PM, Hugo José Pinto <hugo.pi...@inovaworks.com> >> wrote: >> Thanks :) >> >> Duly noted - this is all uncharted territory for us, hence the value of >> seasoned advice. >> >> >> Best >> >> -- >> Hugo José Pinto >> >> No dia 03/01/2015, às 23:43, Peter Lin <wool...@gmail.com> escreveu: >> >>> >>> listen to colin's advice, avoid the temptation of anti-patterns. >>> >>>> On Sat, Jan 3, 2015 at 6:10 PM, Colin <colpcl...@gmail.com> wrote: >>>> Use a message bus with a transactional get, get the message, send to >>>> cassandra, upon write success, submit to esp, commit get on bus. >>>> Messaging systems like rabbitmq support this semantic. >>>> >>>> Using cassandra as a queuing mechanism is an anti-pattern. >>>> >>>> -- >>>> Colin Clark >>>> +1-320-221-9531 >>>> >>>> >>>>> On Jan 3, 2015, at 6:07 PM, Hugo José Pinto <hugo.pi...@inovaworks.com> >>>>> wrote: >>>>> >>>>> Thank you all for your answers. >>>>> >>>>> It seems I'll have to go with some event-driven processing before/during >>>>> the Cassandra write path. >>>>> >>>>> My concern would be that I'd love to first guarantee the disk write of >>>>> the Cassandra persistence and then do the event processing (which is >>>>> mostly CRUD intercepts at this point), even if slightly delayed, and >>>>> doing so via triggers would probably bog down the whole processing >>>>> pipeline. >>>>> >>>>> What I'd probably do is to write, in trigger, a separate key table with >>>>> all the CRUDed elements and to have the ESP process that table. >>>>> >>>>> Thank you for your contribution. Should anyone else have any experiende >>>>> experience in these scenarios I'm obviously all ears as well. >>>>> >>>>> Best, >>>>> >>>>> Hugo >>>>> >>>>> Enviado do meu iPhone >>>>> >>>>> No dia 03/01/2015, às 11:09, DuyHai Doan <doanduy...@gmail.com> escreveu: >>>>> >>>>>> Hello Hugo >>>>>> >>>>>> I was facing the same kind of requirement from some users. Long story >>>>>> short, below are the possible strategies with advantages and draw-backs >>>>>> of each >>>>>> >>>>>> 1) Put Spark in front of the back-end, every incoming >>>>>> modification/update/insert goes into Spark first, then Spark will >>>>>> forward it to Cassandra for persistence. With Spark, you can perform pre >>>>>> or post-processing and notify external clients of mutation. >>>>>> >>>>>> The draw back of this solution is that all the incoming mutations must >>>>>> go through Spark. You may set up a Kafka queue as temporary storage to >>>>>> distribute the load and consume mutations with Spark but it add ups to >>>>>> the architecture complexity with additional components & technologies >>>>>> >>>>>> 2) For high availability and resilience, you probably want to have all >>>>>> mutations saved first into Cassandra then process notifications with >>>>>> Spark. In this case the only way to have notifications from Cassandra, >>>>>> as of version 2.1, is to rely on manually coded triggers (which is still >>>>>> experimental feature). >>>>>> >>>>>> With the triggers you can notify whatever clients you want, not only >>>>>> Spark. >>>>>> >>>>>> The big draw back of this solution is that playing with triggers is >>>>>> dangerous if you are not familiar with Cassandra internals. Indeed the >>>>>> trigger is on the write path and may hurt performance if you are doing >>>>>> complex and blocking tasks. >>>>>> >>>>>> That's the 2 solutions I can see, maybe the ML members will propose >>>>>> other innovative choices >>>>>> >>>>>> Regards >>>>>> >>>>>>> On Sat, Jan 3, 2015 at 11:46 AM, Hugo José Pinto >>>>>>> <hugo.pi...@inovaworks.com> wrote: >>>>>>> Hello. >>>>>>> >>>>>>> We're currently using Hazelcast (http://hazelcast.org/) as a >>>>>>> distributed in-memory data grid. That's been working sort-of-well for >>>>>>> us, but going solely in-memory has exhausted its path in our use case, >>>>>>> and we're considering porting our application to a NoSQL persistent >>>>>>> store. After the usual comparisons and evaluations, we're borderline >>>>>>> close to picking Cassandra, plus eventually Spark for analytics. >>>>>>> >>>>>>> Nonetheless, there is a gap in our architectural needs that we're still >>>>>>> not grasping how to solve in Cassandra (with or without Spark): >>>>>>> Hazelcast allows us to create a Continuous Query in that, whenever a >>>>>>> row is added/removed/modified from the clause's resultset, Hazelcast >>>>>>> calls up back with the corresponding notification. We use this to >>>>>>> continuously update the clients via AJAX streaming with the new/changed >>>>>>> rows. >>>>>>> >>>>>>> This is probably a conceptual mismatch we're making, so - how to best >>>>>>> address this use case in Cassandra (with or without Spark's help)? Is >>>>>>> there something in the API that allows for Continuous Queries on >>>>>>> key/clause changes (haven't found it)? Is there some other way to get a >>>>>>> stream of key/clause updates? Events of some sort? >>>>>>> >>>>>>> I'm aware that we could, eventually, periodically poll Cassandra, but >>>>>>> in our use case, the client is potentially interested in a large number >>>>>>> of table clause notifications (think "all changes to Ship positions on >>>>>>> California's coastline"), and iterating out of the store would kill the >>>>>>> streamer's scalability. >>>>>>> >>>>>>> Hence, the magic question: what are we missing? Is Cassandra the wrong >>>>>>> tool for the job? Are we not aware of a particular part of the API or >>>>>>> external library in/outside the apache realm that would allow for this? >>>>>>> >>>>>>> Many thanks for any assistance! >>>>>>> >>>>>>> Hugo >>>>>>> >