Re: unaligned checkpoint for job with large start delay

2022-01-11 Thread Piotr Nowojski
Hi Thias and Mason, > state-backend-rocksdb-metrics-estimate-num-keys Indeed that can be a good indicator. However keep in mind that, depending on your logic, there might be many existing windows for each key. > However, it’s not so clear how to count the windows that have been registered since

Re: JVM SEGV crash in 1.14.2 for scala 2.12

2022-01-11 Thread Eugene Chung
I tested lower versions like 1.12 and 1.13, but it all failed with the same error. Fortunately, it is okay with OpenJDK 11.0_12. I think Flink should consider minimum Java version support as 11. Best regards, Eugene Chung (Korean : 정의근) On Wed, Jan 12, 2022 at 11:11 AM Eugene Chung wrote: > H

RE: unaligned checkpoint for job with large start delay

2022-01-11 Thread Schwalbe Matthias
Hi Mason, Since you are using RocksDB, you could enable this metric [1] state-backend-rocksdb-metrics-estimate-num-keys which gives (afaik) good indication of the number of active windows. I’ve never seen (despite the warning) negative effect on the runtime. Hope this help … Thias [1] htt

Re: Serving Machine Learning models

2022-01-11 Thread Xingbo Huang
Hi sonia, As far as I know, pyflink users prefer to use python udf[1][2] for model prediction. Load the model when the udf is initialized, and then predict each new piece of data [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/table/udfs/overview/ [2] https://nightl

JVM SEGV crash in 1.14.2 for scala 2.12

2022-01-11 Thread Eugene Chung
Hi all, I downloaded 1.14.2 for scala 2.12 and executed a simple example, ./bin/flink run examples/streaming/WordCount.jar But in my environment, Mac OS Monterey with OpenJDK 8_312, the TaskManager JVM crashes with SEGV. Clearly, Unsafe class call in MemorySegment makes it happen. How can I reso

Is FlinkKafkaProducer state compatible with KafkaSink sink? How to migrate?

2022-01-11 Thread Kevin Lam
Hi all, We're looking to migrating from FlinkKafkaProducer to the new KafkaSink for the new unified Sink API. Is the state compatible across the two Kafka sink APIs? If not, what's the best way to migrate from one to the other? Thanks in advance, Kevin

Async IO code not working

2022-01-11 Thread Siddhesh Kalgaonkar
I am using below code to get the data from the side output which has filtered records. So, it goes like this: val filterRecords: DataStream[String] = src.process(new ProcessFunction()).getSideOutput(filteredOutputTag) It has filtered records in it. Now, I want to add these records to the db asyn

Re: unaligned checkpoint for job with large start delay

2022-01-11 Thread Mason Chen
Hi Piotrek, No worries—I hope you had a good break. > Counting how many windows have been registered/fired and plotting that over > time. It’s straightforward to count windows that are fired (the trigger exposes the run time context and we can collect the information in that code path). Howeve

Re: Plans to update StreamExecutionEnvironment.readFiles to use the FLIP-27 compatible FileSource?

2022-01-11 Thread Kevin Lam
Hi Fabian, No problem, thanks for the clarification. In terms of its importance, we have some Flink applications running using StreamExecutionEnvironment.readFiles

Re: Sorting/grouping keys and State management in BATCH mode

2022-01-11 Thread Chesnay Schepler
Looping in Dawid who can hopefully answer your questions. On 11/01/2022 13:00, Krzysztof Chmielewski wrote: Hi, Im reading docs and FLIP-140 available for BATCH mode [1][2] where it reads that " In |BATCH| mode, the configured state backend is ignored. Instead, the input of a keyed operation i

Re: StaticFileSplitEnumerator - keeping track 9f already processed files

2022-01-11 Thread Fabian Paul
Hi Krzysztof, Thanks for your investigation. Can you maybe share the code with us? collectWithClient will insert a custom sink into the datastream that buffers all incoming records and make them queryable. It is already deprecated and one should use executeAndCollect that fulfills the same purpose

Re: Plans to update StreamExecutionEnvironment.readFiles to use the FLIP-27 compatible FileSource?

2022-01-11 Thread Fabian Paul
Hi Kevin, Sorry for the misleading information. The FileSink is compatible with the predecessor but unfortunately, it is not the case for the FileSource. I updated the ticket accordingly. Perhaps there is a way to migrate the state but it would be a larger effort. Is this an important feature for

Re: [E] Re: Orphaned job files in HDFS

2022-01-11 Thread David Clutter
Ok, that makes sense. I did see some job failures. However failures could happen occasionally. Is there any option to have the job manager clean-up these directories when the job has failed? On Mon, Jan 10, 2022 at 8:58 PM Yang Wang wrote: > IIRC, the staging directory(/user/{name}/.flink/app

Re: Could not find any factory for identifier 'jdbc'

2022-01-11 Thread Chesnay Schepler
How do you ensure that the connector is actually available at runtime? Are you bundling it in a jar or putting it into Flinks lib directory? On 11/01/2022 14:14, Ronak Beejawat (rbeejawa) wrote: Correcting subject -> Could not find any factory for identifier 'jdbc' From: Ronak Beejawat (rbeeja

Re: Cannot load user class: avro GenericRecord

2022-01-11 Thread Jason Politis
I realize those 2 specific ones are commented out, but I believe they are only used in flink-sql, and there's currently an issue when I include them in the pom. I might not have the repos correct. Thank you Jason Politis Solutions Architect, Carrera Group carrera.io | jpoli...@carrera.io

Re: Cannot load user class: avro GenericRecord

2022-01-11 Thread Jason Politis
http://maven.apache.org/POM/4.0.0"; xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> 4.0.0 boars_head w_build_d 0.1 jar Cql Flink DataStream Java https://flink.apache.org UTF-8 1.14.0 1.8

Sorting/grouping keys and State management in BATCH mode

2022-01-11 Thread Krzysztof Chmielewski
Hi, Im reading docs and FLIP-140 available for BATCH mode [1][2] where it reads that " In BATCH mode, the configured state backend is ignored. Instead, the input of a keyed operation is grouped by key (using sorting) and then we process all records of a key in turn." [1] I would like to ask: 1. W

Re: StaticFileSplitEnumerator - keeping track 9f already processed files

2022-01-11 Thread Krzysztof Chmielewski
Hi Fabian, Thank you for your input and I'm sorry for delay on my part. Before I will create a ticket I would like to ask about one thing more. There is test FileSourceTextLinesITCase::testBoundedTextFileSourceWithJobManagerFailover This test uses DataStreamUtils.collectWithClient(...) which retur