Hi,
if you need range queries for the lookups, you can only use Option A (async
calls to an external store).
Queryable State only supports key lookups but no range queries.
Since version 1.2.0, Flink has a dedicated function type for async calls
[1].
This might be helpful to implement your usecas
Also, took a quick read on side input. it's unclear to me how side
input could solve this issue better.
At a high level, this is what I have in mind:
flatmap(byte[] value, Collector<> output) {
var iter = someStoreStateObject.seek(akeyprefix);
//or seek(akeyprefi
Thanks Gordon and Fabian.
The enriching data is really reference data, e.g. the reverseIP
database. It's hard to be keyed as the main data stream as the "ip
address" in the event is not a primary key in the main data stream.
QueryableState is close, but it does not support range scan as far as
I
+1 to what Gordon said.
Queryable state is rather meant as an external interface to streaming jobs
than for lookups within jobs.
Accessing co-located state should give you better performance and is
probably easier to implement and maintain.
Cheers,
Fabian
2017-05-19 7:43 GMT+02:00 Tzu-Li (Gordon
Hi,
Can the enriching data be keyed? Or is it something that has to be broadcasted
to each operator?
Either way, I think Side Inputs (an upcoming feature in the future) is the best
fit for this. You can take a look atÂ
https://issues.apache.org/jira/browse/FLINK-6131.
Regarding the 3 options yo
Hi. Say I have a few reference data sets need to be used for a
streaming job. The sizes range between 10M-10GB. The data is not
static, will be refreshed at minutes and/or day intervals.
With the new advancements in Flink, it seems there are quite a few options.
A. Store all the data in an exte