On 10.04.20 17:35, Jark Wu wrote:
1) For correctness, it is necessary to perform the watermark generation as early as possible in order to be close to the actual data generation within a source's data partition. This is also the purpose of per-partition watermark and event-time alignment. Many on going FLIPs (e.g. FLIP-27, FLIP-95) works a lot on this effort. Deseriazing records and generating watermark in chained ProcessFunction makes it difficult to do per-partition watermark in the future.
For me, this this the main reason for this, i.e. we need to extract the records in the source so that we can correctly generate per-partition watermarks.
Best, Aljoscha