Don't KeySelectors also need to be deterministic?
* The {@link KeySelector} allows to use deterministic objects for
operations such as reduce, * reduceGroup, join, coGroup, etc. *If
invoked multiple times on the same object, the returned key* must be
the same.*
On 04/02/2022 18:25, John
Hi!
Could you elaborate more on your code or share it if possible? Which timer
are you talking about? Are you using the data stream API or SQL API? Do you
mean the timer registered per record for a window aggregation? Does mini
batch aggregation [1] solve your problem?
[1]
https://nightlies.apach
Hi!
Each parallelism of the lookup operation will load all data from the lookup
table source, so you're loading 10GB of data to each parallelism and
storing them in JVM memory. That is not only slow but also very
memory-consuming.
Have you tried joining your main stream with the hive table direct
The following code snippet should just work:
```
from pyflink.datastream import StreamExecutionEnvironment
env = StreamExecutionEnvironment.get_execution_environment()
```
It works both in local deployment and in flink clusters.
You could refer to [1] on how to submit PyFlink jobs to a remote cl
Hi!
There is no need to change log4j configs or *web.log.path*. Just add
*log.file* to your JVM options of both JM and TM (by setting env.java.opts:
-Dlog.file=/opt/flink/log/flink.log to your Flink config) and it should
work.
For more information on why it works see [1]. You can see
*appender.ro
Hi,
I am trying to deploy my flink application and get some logs on the web UI.
I am using the latest version of flink (1.14). I am deploying it with
docker.
When I run the application without any custom log4j2 everything works:
- I can see the logs in "Logs" tab on the web UI for both JM and
Hello,
I have some specific questions. Appreciate some pointers
1. Does flink have any support to track any missed source Jdbc CDC records
?
2. What is the equivalent of Kafka consumer groups ?
3. Delivering to kafka from flink is not exactly once. Is that right ?
Thanks
On Friday, F
Hi all,
Anyone got a pyflink datastream job working? I think I'm having difficulty
seeing a small flunk cluster I've set up in docker. I can't see any way
that pyflink can pick up a Remote Execution Enviroment. This is the only
'compiling' code snippet I can find:
from pyflink.datastream import S