Re: Colocating Compute

2020-07-31 Thread Dawid Wysakowicz
Hi Satyam, It should be fine to have unbounded InputFormat. The important thing is not to produce more splits than there are parallel instances of your source. In createInputSplits(int minNumSplits) generate only minNumSplits. It is so that all splits can be assigned immediately. Unfortunately you

Re: Colocating Compute

2020-07-30 Thread Satyam Shekhar
Hi Dawid, I am currently on Flink v1.10. Do streaming pipelines support unbounded InputFormat in v1.10? My current setup uses SourceFunction for streaming pipeline and InputFormat for batch queries. I see the documentation for Flink v1.11 describe concepts for Split, SourceReader, and SplitEnumer

Re: Colocating Compute

2020-07-30 Thread Dawid Wysakowicz
Hi Satyam, I think you can use the InputSplitAssigner also for streaming pipelines through an InputFormat. You can use StreamExecutionEnvironment#createInput or for SQL you can write your source according to the documentation here: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/t

Colocating Compute

2020-07-29 Thread Satyam Shekhar
Hello, I am using Flink v1.10 in a distributed environment to run SQL queries on batch and streaming data. In my setup, data is sharded and distributed across the cluster. Each shard receives streaming updates from some external source. I wish to minimize data movement during query evaluation for