Adjusted frame length exceeds 2147483647

2022-03-14 Thread Ori Popowski
I am running a production job for at least 1 year, and I got to day this error: org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: Adjusted frame length exceeds 2147483647: 2969686273 - discarded (connection to 'flink-session-playback-prod-1641716499-sw-6q8p.c.data-prod-

[Flink Batch SQL] Why DISTINCT aggregates not support =?utf-8?Q?in_Group_Window_Aggregation??=

2022-03-14 Thread zhouhaifengmath
I try to use the same SQL(Flink: 1.14) as streaming in batch to calculate every minute distinct user count, in order to repair the wrong result in streaming, but it give errors: `DISTINCT aggregates are currently not supported.`I wonder why DISTINCT aggregates is not supported? A

Re: INSERT INTO not immediate

2022-03-14 Thread dz902
Thank you so much for helping! You were correct that Hudi sink requires checkpointing and by default they were not enabled for Zeppelin and SQL Client. I added the interval setting and how it works. On Tue, Mar 15, 2022 at 10:58 AM Caizhi Weng wrote: > Hi! > > Hudi sink will commit only after a

Re: Reading FileSource Files in a particular order

2022-03-14 Thread Caizhi Weng
Hi! Are you running a batch job or a streaming job? For batch jobs just use ORDER BY keyword in SQL to sort the records. For streaming jobs I'm afraid it is hard to do so. A custom FileEnumerator might help, however if the parallelism of your file system source is more than one then it is possible

Re: INSERT INTO not immediate

2022-03-14 Thread Caizhi Weng
Hi! Hudi sink will commit only after a successful checkpoint or at the end of input. I guess you did not enable checkpointing and as Kafka is a never ending source Hudi will never commit the records. For your testing job, as value sources are finite and will end soon you can see records in Hudi in

Re: Debugging SQL queries

2022-03-14 Thread Caizhi Weng
Hi! My understanding is that Flink SQL will generate Java code equivalent and > submit the compiled jar to the cluster as a job, is that correct? > Almost correct. Flink SQL will generate Java code and compile them to Java classes. These compiled classes are stored in transactions (a type of data

Re: Use existing Kafka zookeeper for Flink

2022-03-14 Thread Zhanghao Chen
Hi, This will make the ZK cluster a single point of failure for your overall system. But other than that, as long as ZK is not overloaded (that will depend on the spec of your ZK cluster as well as the scale of your Kafka & Flink clusters), it's totally fine to do that. Best, Zhanghao Chen ___

Re: FileSink to GCS

2022-03-14 Thread Bobby Richard
Just to confirm, I would need to use writeAsText instead of the the FileSink? On Mon, Mar 14, 2022 at 11:21 AM Martijn Visser wrote: > Hi Bobby, > > You can already use it for writing in Flink 1.14 [1] as long as you're > aware of the implications in case of a failure / need to recover. > > Best

Re: FileSink to GCS

2022-03-14 Thread Martijn Visser
Hi Bobby, You can already use it for writing in Flink 1.14 [1] as long as you're aware of the implications in case of a failure / need to recover. Best regards, Martijn [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/filesystems/gcs/ On Mon, 14 Mar 2022 at 16:00,

Use existing Kafka zookeeper for Flink

2022-03-14 Thread HG
Hi Can we use zookeeper that is already running for Kafka to service Flink? Or is that not recommended? Regards Hans

Re: FileSink to GCS

2022-03-14 Thread Bobby Richard
Thanks Martijn, are there any alternatives to write to GCS using Flink 1.14? Recoverability isn't important for my current use case. On Mon, Mar 14, 2022 at 10:58 AM Martijn Visser wrote: > Hi Bobby, > > That's because Flink 1.14 currently doesn't support the RecoverableWriter > for GCS. This wi

Re: FileSink to GCS

2022-03-14 Thread Martijn Visser
Hi Bobby, That's because Flink 1.14 currently doesn't support the RecoverableWriter for GCS. This will be supported as of Flink 1.15, which you can see in the relevant Flink Jira ticket [1]. For more details on why RecoverableWriter is important, the JavaDoc [2] is probably the best source of info

Reading FileSource Files in a particular order

2022-03-14 Thread Kevin Lam
Hi all, We're interested in being able to use a FileSource read from a Google Cloud Storage (GCS) archive of messages from a Kafka topic, roughly in order. Our GCS archive is

Re: Checkpointing in StateFun

2022-03-14 Thread Seth Wiesman
There's a passing reference to it in the 1.2 docs but we removed it to discourage use. https://nightlies.apache.org/flink/flink-docs-release-1.2/dev/stream/state.html#raw-and-managed-state On Sat, Mar 12, 2022 at 2:50 AM Christopher Gustafson wrote: > Hi, > > > Thanks for the explanation, I had

FileSink to GCS

2022-03-14 Thread Bobby Richard
I am receiving the following exception when attempting to write to GCS with the FileSink in FLink 14.3. Using flink hadoop shaded 2.8.3-10.0 and gcs connector hadoop2-2.1.1. java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS I am able to write chec

INSERT INTO not immediate

2022-03-14 Thread dz902
Hi, I have two connectors created with SQL CLI. Source from Kafka/Debezium, and the sink S3 Hudi. I can SELECT from the source table OK. I can issue INSERT INTO the sink OK. So I think both of them work fine. Both have same table structure, jus However when I do: INSERT INTO sink SELECT id, LAS

Slowness using GenericWriteAheadSink

2022-03-14 Thread James Sandys-Lumsdaine
Hello, We are using the GenericWriteAheadSink to buffer up values to then send to a SQL Server database with a fast bulk copy upload. However, when I watch my process running it seems to be a huge amount of time iterating the Iterable provided to the sendValues() method. It takes such a long ti