Re: is there any detrimental side-effect if i set the max parallelism as 32768

2023-03-07 Thread Hangxiang Yu
Hi, Tony. "be detrimental to performance" means that some extra space overhead of the field of the key-group may influence performance. As we know, Flink will write the key group as the prefix of the key to speed up rescaling. So the format will be like: key group | key len | key | .. You could

Re: CSV File Sink in Streaming Use Case

2023-03-07 Thread ramkrishna vasudevan
Hi all, One thing to note is that, the CSVBulkReader does not support the splittable property. Previously with TextInputFormat we were able to use the block size to split them, but in Streaming world this is not there. Regards Ram On Wed, Mar 8, 2023 at 7:22 AM yuxia wrote: > Hi, as the doc sa

[ANNOUNCE] Apache Kyuubi released 1.7.0

2023-03-07 Thread Cheng Pan
Hi all, The Apache Kyuubi community is pleased to announce that Apache Kyuubi 1.7.0 has been released! Apache Kyuubi is a distributed multi-tenant Lakehouse gateway for large-scale data processing and analytics, built on top of Apache Spark, Apache Flink, Trino and also supports other computing e

is there any detrimental side-effect if i set the max parallelism as 32768

2023-03-07 Thread Tony Wei
Hi experts, Setting the maximum parallelism to a very large value can be detrimental to > performance because some state backends have to keep internal data > structures that scale with the number of key-groups (which are the internal > implementation mechanism for rescalable state). > > Changing

Re: Example of dynamic table

2023-03-07 Thread Jie Han
I’ve got the concept figured out, but don’t know how. For example, I have 2 kafka tables `a` and `b`, and want to execute a continuous query like ’select a.f1,b.f1 from a left join b on a .f0 = b.f0’. How to write the sql to tell flink that it’s a continuous query? > 2023年3月8日 09:24,yuxia 写道:

Re: CSV File Sink in Streaming Use Case

2023-03-07 Thread yuxia
Hi, as the doc said: 'The BulkFormat reads and decodes batches of records at a time.' So, the bulk is not binded to column format, the bulk writer for csv is indeed implemented in the Flink code. Actaully, you can use either Row or Bulk depending on what style you would like to write data. As

Re: Query on ProcessingTime Triggers on EventTime based window

2023-03-07 Thread Shammon FY
Hi I think you can give more detail such as example can help us to trace the cause, thanks Best, Shammon On Tue, Mar 7, 2023 at 5:31 PM Saurabh Singh via user wrote: > Hi Community, > > We have the below use case, > >- We have to use EventTime for Windowing (Tumbling Window) and >Wa

Re: CSV File Sink in Streaming Use Case

2023-03-07 Thread Shammon FY
Hi You can create a `BulkWriter.Factory` which will create `CsvBulkWriter` and create `FileSink` by `FileSink.forBulkFormat`. You can see the detail in `DataStreamCsvITCase.testCustomBulkWriter` Best, Shammon On Tue, Mar 7, 2023 at 7:41 PM Chirag Dewan via user wrote: > Hi, > > I am working o

Re: Example of dynamic table

2023-03-07 Thread yuxia
What do your mean "try the feature of dynamic table", do you want to know the concept of dynamic table[1] or User-defined Sources & Sinks[2] with dynamic table? [1]: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables/ [2]: https://nightlies.apache.org/f

Example of dynamic table

2023-03-07 Thread Jie Han
Hello community! I want to try the feature of dynamic table but do not find examples in the official doc. Is this part missing?

Re: Avoiding data shuffling when reading pre-partitioned data from Kafka

2023-03-07 Thread David Morávek
> That comes with the additional constraints that Ken mentioned, correct? It could break immediately in cases if a key comes through on a different partition, or if the number of partitions happen to change? I'm concerned about that for our use case as we don't have 100% control of the upstream dat

[SUMMARY] Flink 1.17 Release Sync 3/7/2023

2023-03-07 Thread Leonard Xu
Hi devs and users, I'd like to share some highlights from Flink 1.17 release sync on 3/7/2023. 1.17 Blockers: - Currently, there is one blocker issue (FLINK-31351[1]) that needs to be resolved before we can create a votable RC1. Our contributors are working hard to fix it as soon as possible

CSV File Sink in Streaming Use Case

2023-03-07 Thread Chirag Dewan via user
Hi, I am working on a Java DataStream application and need to implement a File sink with CSV format. I see that I have two options here - Row and Bulk (https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem/#format-types-1) So for CSV file distribution wh

Query on ProcessingTime Triggers on EventTime based window

2023-03-07 Thread Saurabh Singh via user
Hi Community, We have the below use case, - We have to use EventTime for Windowing (Tumbling Window) and Watermarking. - We use *TumbingEventTimeWindows* for this - We have to continuously emit the results for Window every 1 minute. - We are planning to use *ContinousProcessi

RE: Avoiding data shuffling when reading pre-partitioned data from Kafka

2023-03-07 Thread Schwalbe Matthias
Hi Tommy, While not coming up with a sure solution, I’ve got a number of idea on how to continue and shed light into the matter: * With respect to diagnostics, have you enabled flame graph (cluster-config.rest.flamegraph.enabled), * It allows you to see the call tree of each task an