Optimize exact deduplication for tens of billions data per day

2024-03-28 Thread Lei Wang
Use RocksDBBackend to store whether the element appeared within the last one day, here is the code: *public class DedupFunction extends KeyedProcessFunction {* *private ValueState isExist;* *public void open(Configuration parameters) throws Exception {* *ValueStateDescriptor de

Re: One query just for curiosity

2024-03-28 Thread gongzhongqiang
Hi Ganesh, As Zhanghao Chen told before, He advise you two solutions for different scenarios. 1.Process record is a CPU-bound task: scale up parallelism of task and flink cluster to improve tps. 2.Process record is a IO-bound task: use Async-IO to reduce cost of resource and alse get better per

Re: IcebergSourceReader metrics

2024-03-28 Thread Péter Váry
Hi Chetas, Are you looking for this information? * public IcebergSourceReaderMetrics(MetricGroup metrics, String fullTableName) {* *MetricGroup readerMetrics =* *metrics.addGroup("IcebergSourceReader").addGroup("table", fullTableName);* *this.assignedSplits = readerMetrics.counter

flink version stable

2024-03-28 Thread Fokou Toukam, Thierry
Hi, just want to know which version of flink is stable? Thierry FOKOU | IT M.A.Sc Student Département de génie logiciel et TI École de technologie supérieure | Université du Québec 1100, rue Notre-Dame Ouest Montréal (Québec) H3C 1K3 Tél +1 (438) 336-9007 [image001]

Re: Flink cache support

2024-03-28 Thread Marco Villalobos
Zhanghao is correct. You can use what is called "keyed state". It's like a cache. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/fault-tolerance/state/ > On Mar 28, 2024, at 7:48 PM, Zhanghao Chen wrote: > > Hi, > > You can maintain a cache manually in your

Re: One query just for curiosity

2024-03-28 Thread Zhanghao Chen
Yes. However, a huge parallelism would require additional coordination cost so you might need to set up the JobManager with a decent spec (at least 8C 16G by experience). Also, you'll need to make sure there's no external bottlenecks (e.g. reading/writing data from the external storage). Best,

Re: need flink support framework for dependency injection

2024-03-28 Thread Ruibin Xing
Hi Thais, Thanks, that's really detailed and inspiring! I think we can use the same pattern for states too. On Wed, Mar 27, 2024 at 6:40 PM Schwalbe Matthias < matthias.schwa...@viseca.ch> wrote: > Hi Ruibin, > > > > Our code [1] targets a very old version of Flink 1.8, for current > development

Re: Flink cache support

2024-03-28 Thread Zhanghao Chen
Hi, You can maintain a cache manually in your operator implementations. You can load the initial cached data on the operator open() method before the processing starts. However, this would set up a cache per task instance. If you'd like to have a cache shared by all processing tasks without dup

Re: One query just for curiosity

2024-03-28 Thread Ganesh Walse
You mean to say we can process 32767 records in parallel. And may I know if this is the case then do we need to do anything for this. On Fri, 29 Mar 2024 at 8:08 AM, Zhanghao Chen wrote: > Flink can be scaled up to a parallelism of 32767 at max. And if your > record processing is mostly IO-bound

Re: One query just for curiosity

2024-03-28 Thread Zhanghao Chen
Flink can be scaled up to a parallelism of 32767 at max. And if your record processing is mostly IO-bound, you can further boost the throughput via Async-IO [1]. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/ Best, Zhanghao Chen

IcebergSourceReader metrics

2024-03-28 Thread Chetas Joshi
Hello, I am using Flink to read Iceberg (S3). I have enabled all the metrics scopes in my FlinkDeployment as below metrics.scope.jm: flink.jobmanager metrics.scope.jm.job: flink.jobmanager.job metrics.scope.tm: flink.taskmanager metrics.scope.tm.job: flink.taskmanager.job metrics.scope.task: flin

One query just for curiosity

2024-03-28 Thread Ganesh Walse
Hi Team, If my 1 record gets processed in 1 second in a flink. Then what will be the best time taken to process 1000 records in flink using maximum parallelism.

Flink cache support

2024-03-28 Thread Ganesh Walse
Hi Team, In my project my requirement is to cache data from the oracle database where the number of tables are more and the same data will be required for all the transactions to process. Can you please suggest the approach where cache should be 1st loaded in memory then stream processing should

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Yanfei Lei
Congratulations! Best, Yanfei Zhanghao Chen 于2024年3月28日周四 19:59写道: > > Congratulations! > > Best, > Zhanghao Chen > > From: Yu Li > Sent: Thursday, March 28, 2024 15:55 > To: d...@paimon.apache.org > Cc: dev ; user > Subject: Re: [ANNOUNCE] Apache Paimon is gr

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Zhanghao Chen
Congratulations! Best, Zhanghao Chen From: Yu Li Sent: Thursday, March 28, 2024 15:55 To: d...@paimon.apache.org Cc: dev ; user Subject: Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project CC the Flink user and dev mailing list. Paimon originated wi

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread gongzhongqiang
Congratulations! Best, Zhongqiang Gong Yu Li 于2024年3月28日周四 15:57写道: > CC the Flink user and dev mailing list. > > Paimon originated within the Flink community, initially known as Flink > Table Store, and all our incubating mentors are members of the Flink > Project Management Committee. I am c

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Rui Fan
Congratulations~ Best, Rui On Thu, Mar 28, 2024 at 3:55 PM Yu Li wrote: > CC the Flink user and dev mailing list. > > Paimon originated within the Flink community, initially known as Flink > Table Store, and all our incubating mentors are members of the Flink > Project Management Committee. I a

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Yu Li
CC the Flink user and dev mailing list. Paimon originated within the Flink community, initially known as Flink Table Store, and all our incubating mentors are members of the Flink Project Management Committee. I am confident that the bonds of enduring friendship and close collaboration will contin