Re: DataStream API: Parquet File Format with Scala Case Classes

2022-02-24 Thread Fabian Paul
Hi Ryan, I guess the ticket you are looking for is the following [1]. AFAIK the work on it hasn't started yet. So we are still appreciating initial designs or ideas. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-25416 On Tue, Feb 22, 2022 at 11:54 PM Ryan van Huuksloot < ryan.van

Re: java.lang.Exception: Job leader for job id 0efd8681eda64b072b72baef58722bc0 lost leadership.

2022-02-24 Thread Nicolaus Weidner
Hi Jai, Do writes to ValueStates/MapStates have a direct on churn of the Flink > State or is the data buffered in between? > Writes to keyed state go directly to RocksDB. So there shouldn't be any memory issues with buffers overflowing or similar. In general, more memory should increase performan

Re: Flink metrics via permethous or opentelemerty

2022-02-24 Thread Nicolaus Weidner
Hi Sigalit, first of all, have you read the docs page on metrics [1], and in particular the Prometheus section on metrics reporters [2]? Apart from that, there is also a (somewhat older) blog post about integrating Flink with Prometheus, including a link to a repo with example code [3]. Hope that

Re: Flink job recovery after task manager failure

2022-02-24 Thread Afek, Ifat (Nokia - IL/Kfar Sava)
Thanks Zhilong. The first launch of our job is fast, I don’t think that’s the issue. I see in flink job manager log that there were several exceptions during the restart, and the task manager was restarted a few times until it was stabilized. You can find the log here: jobmanager-log.txt.gz

Re: Flink Statefun and Feature computation

2022-02-24 Thread Igal Shilman
Hello, For (1) I welcome you to visit our documentions, and many talks online to understand more about the motivation and the value of StateFun. I can say in a nutshell that StateFun provides few building blocks that makes building distributed stateful applications easier. For (2) checkout our pl

Re: Flink job recovery after task manager failure

2022-02-24 Thread Zhilong Hong
Hi, Afek I've read the log you provided. Since you've set the value of restart-strategy to be exponential-delay and the value of restart-strategy.exponential-delay.initial-backoff is 10s, everytime a failover is triggered, the JobManager will have to wait for 10 seconds before it restarts the job.

Possible BUG in 1.15 SQL JSON_OBJECT()

2022-02-24 Thread Jonathan Weaver
Using the latest SNAPSHOT BUILD. If I have a column definition as .column( "events", DataTypes.ARRAY( DataTypes.ROW( DataTypes.FIELD("status", DataTypes.STRING().notNull()), DataTypes.FIELD("times

Re: [Flink-1.14.3] Restart of pod due to duplicatejob submission

2022-02-24 Thread Yang Wang
This might be related with FLINK-21928 and seems already fixed in 1.14.0. But it will have some limitations and users need to manually clean up the HA entries. Best, Yang Parag Somani 于2022年2月24日周四 13:42写道: > Hello, > > Recently due to log4j vulnerabilities, we have upgraded to Apache Flink >

pyflink object to java object

2022-02-24 Thread Francis Conroy
Hi all, we're using pyflink for most of our flink work and are sometimes into a java process function. Our new java process function takes an argument in in the constructor which is a Row containing default values. I've declared my Row in pyflink like this: default_row = Row(ep_uuid="",

streaming mode with both finite and infinite input sources

2022-02-24 Thread Jin Yi
so we have a streaming job where the main work to be done is processing infinite kafka sources. recently, i added a fromCollection (finite) source to simply write some state once upon startup. this all seems to work fine. the finite source operators all finish, while all the infinite source oper