Communication between two queries

Mikael Gordani Thu, 12 Mar 2020 04:51:28 -0700

Hello everyone!

So essentially, I've two identical queries (q1 and q2) running in parallel
(Streams).
I'm trying to activate the ingestion of data to q2 based on what is
processed in q1.
E.g say that we want to start ingesting data to q2 when a tuple with
timestamp > 5000 appears in q1.


The queries are constructed in this way. (they share the same source)
q1: Source -> Filter -> Aggregate -> Filter -> Sink
            |
           V
q2:      Filter -> Filter -> Aggregate -> Filter -> Sink

The initial idea was to have a global variable which is shared between the
two queries. When this tuple appears in q1, it will set the variable to *true
*in the first Filter operator*.* While in q2, the first Filter-operator
returns tuples depending on the value of the global variable.
When the variable = true, it will let data pass, when set to false, no data
is allowed to be ingested.

This works fine when you have all the tasks on the same machine, but of
course, it becomes troublesome in distributed deployments (tasks in
different nodes and such).

My second approach was to create some sort of "loop" in the query. So let's
say that we have the processing logic placed in the last *Filter* operator
in q1, and when this "special" tuple appears, it can communicate with the
first *Filter *operator in q2, in order to allow data to be ingested.
I've tried playing around with *IterativeStreams* but I don't really get it
to work, and I feel like it's the wrong approach..

How can I achieve this sort of functionality?
I'm looking a bit on the BroadcastState part of the DataStream API, but I
feel confused on how to use it. Is it possible to broadcast from a
*downstream* to an *upstream?*
Suggestions would be much appreciated!

Best Regards,
Mikael Gordani

Communication between two queries

Reply via email to