Does Flink 1.12.1 DataStream API batch execution mode support side outputs?

2021-05-22 Thread Marco Villalobos
I have been struggling for two days with an issue using the DataStream API in Batch Execution mode. It seems as though my side-output has no elements available to downstream operators. However, I am certain that the downstream operator received events. I logged the side-output element just before

Re: Issue reading from S3

2021-05-22 Thread Angelo G.
Hi Yun Gao, Thank you for your prompt response. I've changed the table 'format' from 'parquet' to 'raw' as in your example and I've been able to access the file: Job has been submitted with JobID 441e7518bb615109624c1f33f222475b ++ |url

Dependency vulnerabilities with flink 1.12.3

2021-05-22 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hello, Following dependency vulnerabilities found with flink 1.12.3 version. Please provide your input on this. 1. commons-io-2.7 Severity: High Description: Apache Commons IO contains a flaw that is due to the program failing to restrict which class can be ser

Re: Parallelism in Production: Best Practices

2021-05-22 Thread Yaroslav Tkachenko
Hi Robert, Thanks for the advice! Checking Flink Forward talks seems like a good idea, will do 👍 On Sat, May 22, 2021 at 4:19 AM Robert Metzger wrote: > Hi Yaroslav, > > My recommendation is to go with the 2nd pattern you've described, but I > only have limited insights into real world producti

Re: Choice of time characteristic and performance

2021-05-22 Thread Robert Metzger
Hi Bob, if you don't need any time characteristics, go with processing time. Ingestion time will call System.currentTimeMillis() on every incoming record, which is an somewhat expensive call. Event time (and ingestion time) will attach a long field to each record, making the records 8 bytes larger

Re: Stop command failure

2021-05-22 Thread Robert Metzger
Hi, can you provide the jobmanager log of that run? it seems that the operation timed out. The JobManager log will help us to give some insights into the root cause. On Tue, May 18, 2021 at 1:42 PM V N, Suchithra (Nokia - IN/Bangalore) < suchithra@nokia.com> wrote: > Hi, > > > > Stop command

Re: Fastest way for decent lookup JOIN?

2021-05-22 Thread Robert Metzger
Hi Theo, Since you are running Flink locally it would be quite easy to attach a profiler to Flink to see where most of the CPU cycles are burned (or: check if you are maybe IO bound?) .. this could provide us with valuable data on deciding for the next steps. On Tue, May 18, 2021 at 5:26 PM Theo

Re: Parallelism in Production: Best Practices

2021-05-22 Thread Robert Metzger
Hi Yaroslav, My recommendation is to go with the 2nd pattern you've described, but I only have limited insights into real world production workloads. Besides the parallelism configuration, I also recommend looking into slot sharing groups, and maybe disabling operator chaining. I'm pretty sure so

Re: Savepoint/checkpoint confusion

2021-05-22 Thread Robert Metzger
Hi Igor, In my understanding, checkpoints are managed by the system (Flink decides when to create and delete them), while savepoints are managed by the user (they decide when to create and delete them). Indeed, only checkpoints can be incremental (if that feature is enabled). > it's made on-dema