Hi Guillermo > "When is this ordering done, and until when?"
Assuming the current watermark is 10:00 1. Currently, data before 10:00 will be sorted. 2. If data after 10:00 arrives at this time, the record will be stored in the state, waiting for the watermark to become 10:01 before sorting and output. 3. If data from 10:09 arrives at this time, it will be discarded. >. "How would adding an INTERVAL of 10 seconds affect this?" You should set the watermark to be 10 seconds later than the event time. Refer to document [1] for setting this up. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#watermark Best, Feng On Mon, Nov 11, 2024 at 7:18 PM Guillermo <konstt2...@gmail.com> wrote: > > I am running several queries in FlinkSQL, and in a final step before > inserting into Kafka, I perform an ORDER BY eventTime. When I look at the > execution plan, I see Exchange(distribution=[single]). Does this mean > that all the data is going to a single node and getting reordered there? I > haven't been able to find specific documentation. In the Hive dialect, > there are options like SORT BY and DISTRIBUTION BY, but is there no > option in FlinkSQL to sort at the partition level? > > On the other hand, I am using some OVER AGGREGATION functions like LAG. In > these functions, the partitioning and ordering fields are specified. I > partition by a field clientId and order by timestamp. In the source > tables, I use WATERMARKING on the timestamp field (event time). My > question is, when is this ordering done, and until when? When I define the > WATERMARKING field, I’m not setting any INTERVAL to wait for old events. > How would adding an INTERVAL of 10 seconds, for example, affect when/what > triggers the ordering? >