Hi Maxim, I think Ken's approach is a good idea. However, you would need to a add a stateful operator to join the results of the individual queries if that is needed. In order to join the results, you would need a unique id on which you can keyBy() to collect all 20 records that originated from the same input record.
Best, Fabian 2018-04-03 19:39 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>: > Hi Maxim, > > If reducing latency is the goal, then option #1 seems better. > > Though you’d need additional logic inside of your AsyncFunction to run all > 20 queries in parallel. > > I’d also consider a third option... > > Use a FlatMapFunction to create 20 copies of the event (assuming it’s not > large), with an additional field indicating which query should be made. > > Follow that with a rebalance(), and a single AsyncFunction that makes the > appropriate query for the event, based on this new field. > > Then make sure you’ve got sufficient parallelism for your AsyncFunction to > handle this fan-out. > > This should let you run the queries for a single event in parallel. > > — Ken > > > > On Apr 3, 2018, at 9:59 AM, Maxim Parkachov <lazy.gop...@gmail.com> > wrote: > > > > Hi everyone, > > > > I'm writing streaming job which needs to query Cassandra for each event > multiple times, around 20. I would like to use Async IO for that but not > sure which option to choose: > > > > 1. Implement One AsyncFunction with 20 queries inside > > 2. Implement 20 AsyncFunctions, each with 1 query inside > > > > Taking into account that each event needs all queries. Reduce amount of > queries for each record is not an option. > > > > In this case I would like to minimise processing time of event, even if > throughput will suffer. Any advice or consideration is greatly appreciated. > > > > Thanks, > > Maxim. > > > > -------------------------------------------- > http://about.me/kkrugler > +1 530-210-6378 > >