Re: Optimize exact deduplication for tens of billions data per day

2024-04-10 Thread Lei Wang
Hi Peter, I tried,this improved performance significantly,but i don't know exactly why. According to what i know, the number of keys in RocksDB doesn't decrease. Any specific technical material about this? Thanks, Lei On Fri, Mar 29, 2024 at 9:49 PM Lei Wang wrote: > Perhaps I can keyBy(Has

Re: How to enable RocksDB native metrics?

2024-04-10 Thread Lei Wang
Hi Biao, I tried, it doesn't work. The cmd is: flink run -m yarn-cluster -ys 4 -ynm EventCleaning_wl -yjm 2G -ytm 16G -yqu default -p 8 -Dstate.backend.latency-track.keyed-state-enabled=true -c com.zkj.task.EventCleaningTask SourceDataCleaning-wl_0410.jar --sourceTopic dwd_audio_record --grou

Re: How to enable RocksDB native metrics?

2024-04-10 Thread Lei Wang
Hi Biao, I tried, it does On Mon, Apr 8, 2024 at 9:48 AM Biao Geng wrote: > Hi Lei, > You can use the "-D" option in the command line to set configs for a > specific job. E.g, `flink run-application -t > yarn-application -Djobmanager.memory.process.size=1024m `. > See > https://nightlies

Re: Flink 1.18.1 cannot read from Kafka

2024-04-10 Thread Phil Stavridis
Hi Biao, I will check out running with flink run, but should this be run in the Flink JobManager? Would that mean that the container for the Flink JobManager would require both Python installed and a copy of the flink_client.py module? Are there some examples of running flink run in a Dockerize

Re: How are window's boundaries decided in flink

2024-04-10 Thread Dylan Fontana via user
Hi Sachin, Assignment for tumbling windows is exclusive on the endTime; see description here https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/operators/windows/#tumbling-windows . So in your example it would be assigned to window (60, 120) as in reality the windows w

Re: Flink 1.18.1 cannot read from Kafka

2024-04-10 Thread Biao Geng
Hi Phil, It should be totally ok to use `python -m flink_client.job`. It just seems to me that the flink cli is being used more often. And yes, you also need to add the sql connector jar to the flink_client container. After putting the jar in your client container, add codes like `table_env.get_con

How are window's boundaries decided in flink

2024-04-10 Thread Sachin Mittal
Hi, Lets say I have defined 1 minute TumblingEventTimeWindows. So it will create windows as: (0, 60), (60, 120), Now lets say I have an event at time t = 60. In which window would this get aggregated ? 1st or second or both. Say I want this to get aggregated only in the second window, how ca

Re: Flink 1.18.1 cannot read from Kafka

2024-04-10 Thread Phil Stavridis
Hi Biao, 1. I have a Flink client container like this: # Flink client flink_client: container_name: flink_client image: flink-client:local build: context: . dockerfile: flink_client/Dockerfile networks: - standard depends_on: - jobmanager - Kafka The flink_client/Dockerfile has this bash file whi

Re: Flink 1.18.1 cannot read from Kafka

2024-04-10 Thread Biao Geng
Hi Phil, Your codes look good. I mean how do you run the python script. Maybe you are using flink cli? i.e. run commands like ` flink run -t .. -py job.py -j /path/to/flink-sql-kafka-connector.jar`. If that's the case, the `-j /path/to/flink-sql-kafka-connector.jar` is necessary so that in client

Re: Flink 1.18.1 cannot read from Kafka

2024-04-10 Thread Phil Stavridis
Hi Biao, For submitting the job, I run t_env.execute_sql. Shouldn’t that be sufficient for submitting the job using the Table API with PyFlink? Isn’t that the recommended way for submitting and running PyFlink jobs on a running Flink cluster? The Flink cluster runs without issues, but there is