Re: High DirectByteBuffer Usage

2021-07-14 Thread bat man
Hi Timo, I am looking at these options. However, I had a couple of questions - 1. The off-heap usage grows overtime. My job does not do any off-heap operations so I don't think there is a leak there. Even after GC it keeps adding a few MBs after hours of running. 2. Secondly, I am seeing as the in

Re: Flink UDF Scalar Function called only once for all rows of select SQL in case of no parameter passed

2021-07-14 Thread shamit jain
Thanks!! On 2021/07/14 02:26:47, JING ZHANG wrote: > Hi, Shamit Jain, > In fact, it is an optimization to simplify expression. > If a Udf has no parameters, optimizer would be look it as an expression > which always generate constants results. > So it would be calculated once in optimization pha

Re: Kafka Consumer stop consuming data

2021-07-14 Thread Aeden Jameson
Yes, that’s the scenario I was referring to. Besides that I’m not sure what could be another source of your issue. On Tue, Jul 13, 2021 at 5:35 PM Jerome Li wrote: > Hi Aeden, > > > > Thanks for getting back. > > > > Do you mean one of the partitions is in idle state and not new watermark > gene

Re: java.lang.Exception: Could not complete the stream element: Record @ 1626200691540 :

2021-07-14 Thread Ragini Manjaiah
Hi, According to the suggestion I override timeout method in the async function . flink jobs processes real time events for few mins and later hangs does process at all. Is there any issue with the method below? I see 0 records per second . can you please help here @Override public void timeout(T

Re: Dead Letter Queue for JdbcSink

2021-07-14 Thread Maciej Bryński
This is the idea. Of course you need to wrap more functions like: open, close, notifyCheckpointComplete, snapshotState, initializeState and setRuntimeContext. The problem is that if you want to catch problematic record you need to set batch size to 1, which gives very bad performance. Regards, Ma

Re: Dead Letter Queue for JdbcSink

2021-07-14 Thread Rion Williams
Hi Maciej, Thanks for the quick response. I wasn't aware of the idea of using a SinkWrapper, but I'm not quite certain that it would suit this specific use case (as a SinkFunction / RichSinkFunction doesn't appear to support side-outputs). Essentially, what I'd hope to accomplish would be to pick

Re: Dead Letter Queue for JdbcSink

2021-07-14 Thread Maciej Bryński
Hi Rion, We have implemented such a solution with Sink Wrapper. Regards, Maciek śr., 14 lip 2021 o 16:21 Rion Williams napisał(a): > > Hi all, > > Recently I've been encountering an issue where some external dependencies or > process causes writes within my JDBCSink to fail (e.g. something is

Re: Question about POJO rules - why fields should be public or have public setter/getter?

2021-07-14 Thread Timo Walther
Hi Naehee, the serializer for case classes is generated using the Scala macro that is also responsible for extracting the TypeInformation implcitly from your DataStream API program. It should be possible to use POJO serializer with case classes. But wouldn't it be easier to just use regular

Re: Stateful Functions PersistentTable duration

2021-07-14 Thread Ammon Diether
Excellent Thank you. On Wed, Jul 14, 2021 at 5:53 AM Igal Shilman wrote: > Hi Ammon, > > The duration is per item, and the cleanup happens transparently and > incrementally via RocksDB (background compactions with a custom filter) [1] > > In your example a gets cleaned up, while b will be clean

Re: Running Flink Dataset jobs Sequentially

2021-07-14 Thread Ken Krugler
Hi Jason, Yes, I write the files inside of the mapPartition function. Note that you can get multiple key groups inside of one partition, so you have to manage your own map from the key group to the writer. The Flink DAG ends with a DiscardingSink, after the mapPartition. And no, we didn’t noti

Dead Letter Queue for JdbcSink

2021-07-14 Thread Rion Williams
Hi all, Recently I've been encountering an issue where some external dependencies or process causes writes within my JDBCSink to fail (e.g. something is being inserted with an explicit constraint that never made it's way there). I'm trying to see if there's a pattern or recommendation for handling

Re: Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Debraj Manna
I have increased it to 9 and seems to be running fine. If I see the failure still when I add some load I will post back in this thread. On Wed, Jul 14, 2021 at 7:19 PM Debraj Manna wrote: > Yes I forgot to mention in my first email. I have tried increasing > taskmanager.network.request-back

Re: [External] NullPointerException on accumulator after Checkpointing

2021-07-14 Thread Timo Walther
Hi Clemens, first of all can you try to use the MapView within an accumulator POJO class. This might solve your exception. I'm not sure if we support the views as top-level accumulators. In any case this seems to be a bug. I will open an issue once I get you feedback. We might simply throw a

Re: Upsert Kafka SQL Connector used as a sink does not generate an upsert stream

2021-07-14 Thread Timo Walther
Hi Carlos, currently, the changelog output might not always be optimal. We are continously improving this. For the upsert Kafka connector, we have added an reducing buffer to avoid those tombstone messages: https://issues.apache.org/jira/browse/FLINK-21191 Unfortunately, this is only availab

Re: High DirectByteBuffer Usage

2021-07-14 Thread Timo Walther
Hi Hemant, did you checkout the dedicated page for memory configuration and troubleshooting: https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/

Re: Stateful Functions PersistentTable duration

2021-07-14 Thread Igal Shilman
Hi Ammon, The duration is per item, and the cleanup happens transparently and incrementally via RocksDB (background compactions with a custom filter) [1] In your example a gets cleaned up, while b will be cleaned in ~10min. Kind regards, Igal. [1] https://ci.apache.org/projects/flink/flink-doc

Re: Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Debraj Manna
Yes I forgot to mention in my first email. I have tried increasing taskmanager.network.request-backoff.max to 3 in flink-conf.yaml. But I am getting the same error. On Wed, Jul 14, 2021 at 7:10 PM Timo Walther wrote: > Hi Debraj, > > I could find quite a few older emails that were suggesting

Re: Process finite stream and notify upon completion

2021-07-14 Thread Tamir Sagi
Thank you so much mate. Now Its makes sense to me. I will test it and keep you posted if anything else comes up. Best, Tamir. [https://my-email-signature.link/signature.gif?u=1088647&e=165233849&v=4e235bf5d741dcd0b3fdca02e2c4451c808f8b723a67bd9da385502cd4fc7595] ___

Re: Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Timo Walther
Hi Debraj, I could find quite a few older emails that were suggesting to play around with the `taskmanager.network.request-backoff.max` option. This was also recomended in the link that you shared. Have you tried it? Here is some background: http://deprecated-apache-flink-user-mailing-list-a

Re: Process finite stream and notify upon completion

2021-07-14 Thread Piotr Nowojski
Hi Tamir, Ok, let's take a step back. First of all let's assume we have a bounded source already. If so, when this source ends, it will emit MAX_WATERMARK shortly before closing and ending the stream. This MAX_WATERMARK will start flowing through your job graph firing all remaining event time time

Re: Process finite stream and notify upon completion

2021-07-14 Thread Timo Walther
Hi Tamir, a nice property of watermarks is that they are kind of synchronized across input operators and their partitions (i.e. parallel instances). Bounded sources will emit a final MAX_WATERMARK once they have processed all data. When you receive a MAX_WATERMARK in your current operator, you

Re: Process finite stream and notify upon completion

2021-07-14 Thread Tamir Sagi
Hey Piotr, Thank you for fast response, The refs are good, however , to be honest, I'm a little confused regarding the trick with MAX_WATERMARK . Maybe I'm missing something. keep in mind Flink is a distributed system so downstream operators/functions might still be busy for some time processin

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Piotr Nowojski
Hi Rahul, I would highly doubt that you are hitting the network bottleneck case. It would require either a broken environment/network or throughputs in orders of GB/second. More likely you are seeing empty input pool and you haven't checked the documentation [1]: > inPoolUsage - An estimate of th

Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Debraj Manna
Hi I am observing my flink jobs is failing with the below error 2021-07-14T12:07:00.918Z INFO runtime.executiongraph.Execution flink-akka.actor.default-dispatcher-29 transitionState:1446 MetricAggregateFunction -> (Sink: LateMetricSink10, Sink: TSDBSink9) (12/30) (3489393394c13fd1ad85136e11d67deb

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Rahul Patwari
Thanks, Piotrek. We have two Kafka sources. We are facing this issue for both of them. The downstream tasks with the sources form two independent directed acyclic graphs, running within the same Streaming Job. For Example: source1 -> task1 -> sink1 source2 -> task2 -> sink2 There is backpressure

Re: Process finite stream and notify upon completion

2021-07-14 Thread Piotr Nowojski
Hi Tamir, Sorry I missed that you want to use Kafka. In that case I would suggest trying out the new KafkaSource [1] interface and it's built-in boundness support [2][3]. Maybe it will do the trick? If you want to be notified explicitly about the completion of such a bounded Kafka stream, you stil

Re: Process finite stream and notify upon completion

2021-07-14 Thread Tamir Sagi
Hey Piotr, Thank you for your response. I saw the exact suggestion answer by David Anderson [1] but did not really understand how it may help. Sources when finishing are emitting {{org.apache.flink.streaming.api.watermark.Watermark#MAX_WATERMARK}} Assuming 10 messages are sent to Kafka topic ,

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Piotr Nowojski
Hi, Waiting for memory from LocalBufferPool is a perfectly normal symptom of a backpressure [1][2]. Best, Piotrek [1] https://flink.apache.org/2021/07/07/backpressure.html [2] https://www.ververica.com/blog/how-flink-handles-backpressure śr., 14 lip 2021 o 06:05 Rahul Patwari napisał(a): > Th

Re: savepoint failure

2021-07-14 Thread Till Rohrmann
Hi Dan, Can you provide us with more information about your job (maybe even the job code or a minimally working example), the Flink configuration, the exact workflow you are doing and the corresponding logs and error messages? Cheers, Till On Tue, Jul 13, 2021 at 9:39 PM Dan Hill wrote: > Coul