Re: How to proper hashCode() for keys.

2022-02-06 Thread Chesnay Schepler
Don't KeySelectors also need to be deterministic? * The {@link KeySelector} allows to use deterministic objects for operations such as reduce, * reduceGroup, join, coGroup, etc. *If invoked multiple times on the same object, the returned key* must be the same.* On 04/02/2022 18:25, John

Re: How to prevent check pointing of timers ?

2022-02-06 Thread Caizhi Weng
Hi! Could you elaborate more on your code or share it if possible? Which timer are you talking about? Are you using the data stream API or SQL API? Do you mean the timer registered per record for a window aggregation? Does mini batch aggregation [1] solve your problem? [1] https://nightlies.apach

Re: Loading a Hive lookup table data into TM memory takes so long.

2022-02-06 Thread Caizhi Weng
Hi! Each parallelism of the lookup operation will load all data from the lookup table source, so you're loading 10GB of data to each parallelism and storing them in JVM memory. That is not only slow but also very memory-consuming. Have you tried joining your main stream with the hive table direct

Re: pyflink datastream job

2022-02-06 Thread Dian Fu
The following code snippet should just work: ``` from pyflink.datastream import StreamExecutionEnvironment env = StreamExecutionEnvironment.get_execution_environment() ``` It works both in local deployment and in flink clusters. You could refer to [1] on how to submit PyFlink jobs to a remote cl

Re: Logging for both JM and TM in the web UI

2022-02-06 Thread Caizhi Weng
Hi! There is no need to change log4j configs or *web.log.path*. Just add *log.file* to your JVM options of both JM and TM (by setting env.java.opts: -Dlog.file=/opt/flink/log/flink.log to your Flink config) and it should work. For more information on why it works see [1]. You can see *appender.ro

Logging for both JM and TM in the web UI

2022-02-06 Thread Nikola Hrusov
Hi, I am trying to deploy my flink application and get some logs on the web UI. I am using the latest version of flink (1.14). I am deploying it with docker. When I run the application without any custom log4j2 everything works: - I can see the logs in "Logs" tab on the web UI for both JM and

Re: CDC using Query

2022-02-06 Thread mohan radhakrishnan
Hello, I have some specific questions. Appreciate some pointers 1. Does flink have any support to track any missed source Jdbc CDC records ? 2. What is the equivalent of Kafka consumer groups ? 3. Delivering to kafka from flink is not exactly once. Is that right ? Thanks On Friday, F

pyflink datastream job

2022-02-06 Thread nagi data monkey
Hi all, Anyone got a pyflink datastream job working? I think I'm having difficulty seeing a small flunk cluster I've set up in docker. I can't see any way that pyflink can pick up a Remote Execution Enviroment. This is the only 'compiling' code snippet I can find: from pyflink.datastream import S