from:"Jeyhun Karimov"

Re: Re: Optimize exact deduplication for tens of billions data per day

2024-04-01 Thread Jeyhun Karimov

Hi Lei, In addition to the valuable suggested options above, maybe you can try to optimize your partitioning function (since you know your data). Maybe sample [subset of] your data if possible and/or check the key distribution, before re-defining your partitioning function. Regards, Jeyhun On Mo

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 Thread Jeyhun Karimov

Congrats! Thanks to release managers and everyone involved. Regards, Jeyhun On Mon, Mar 18, 2024 at 9:25 AM Lincoln Lee wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19 series. > > Apache Fli

Re: Inquiry Regarding Flink Tumbling Window Persistence and Restart Handling for File Source

2023-12-04 Thread Jeyhun Karimov

Hi Arjun, Thanks for your query. Flink is fault tolerant and supports exactly-once semantics. In your case, the aggregated values can be recovered in case of a failure or application restart. You just need to enable checkpointing and configure an appropriate state backend. Regards, Jeyhun > > O

Re: Mixing Batch & Streaming

2016-03-04 Thread Jeyhun Karimov

Hi all, We are currently working on this issue to make efficient mixing between datastream window and dataset. However, the simplest solution would be, to output each window in a sequential file to HDFS and do computation on that datasource as dataset. On Fri, Mar 4, 2016 at 4:05 PM sskhiri w

Re: Stream conversion

2016-02-04 Thread Jeyhun Karimov

6 at 10:30 AM, Sane Lee wrote: >> >>> I have also, similar scenario. Any suggestion would be appreciated. >>> >>> On Thu, Feb 4, 2016 at 10:29 AM Jeyhun Karimov >>> wrote: >>> >>>> Hi Matthias, >>>> >>>> This need no

Re: Stream conversion

2016-02-04 Thread Jeyhun Karimov

Hi Matthias, This need not to be necessarily in api functions. I just want to get a roadmap to add this functionality. Should I save each window's data into disk and create a new dataset environment in parallel? Or change trigger functionality maybe? I have large windows. As I asked in previous q

Re: Re: Optimize exact deduplication for tens of billions data per day

Re: [ANNOUNCE] Apache Flink 1.19.0 released

Re: Inquiry Regarding Flink Tumbling Window Persistence and Restart Handling for File Source

Re: Mixing Batch & Streaming

Re: Stream conversion

Re: Stream conversion

6 matches

Site Navigation

Mail list logo

Footer information