Hi Maxim,

I think Ken's approach is a good idea. However, you would need to a add a
stateful operator to join the results of the individual queries if that is
needed.
In order to join the results, you would need a unique id on which you can
keyBy() to collect all 20 records that originated from the same input
record.

Best, Fabian

2018-04-03 19:39 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>:

> Hi Maxim,
>
> If reducing latency is the goal, then option #1 seems better.
>
> Though you’d need additional logic inside of your AsyncFunction to run all
> 20 queries in parallel.
>
> I’d also consider a third option...
>
> Use a FlatMapFunction to create 20 copies of the event (assuming it’s not
> large), with an additional field indicating which query should be made.
>
> Follow that with a rebalance(), and a single AsyncFunction that makes the
> appropriate query for the event, based on this new field.
>
> Then make sure you’ve got sufficient parallelism for your AsyncFunction to
> handle this fan-out.
>
> This should let you run the queries for a single event in parallel.
>
> — Ken
>
>
> > On Apr 3, 2018, at 9:59 AM, Maxim Parkachov <lazy.gop...@gmail.com>
> wrote:
> >
> > Hi everyone,
> >
> > I'm writing streaming job which needs to query Cassandra for each event
> multiple times, around 20. I would like to use Async IO for that but not
> sure which option to choose:
> >
> > 1. Implement One AsyncFunction with 20 queries inside
> > 2. Implement 20 AsyncFunctions, each with 1 query inside
> >
> > Taking into account that each event needs all queries. Reduce amount of
> queries for each record is not an option.
> >
> > In this case I would like to minimise processing time of event, even if
> throughput will suffer. Any advice or consideration is greatly appreciated.
> >
> > Thanks,
> > Maxim.
> >
>
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
>
>

Reply via email to