AvroRowDeserializationSchema

2022-04-20 Thread lan tran
Hi team, I want to implement AvroRowDeserializationSchema when consume data from Kafka, however from the documentation, I did not understand what are avro_schema_string and record_class ? I would be great if you can give me the example on this (I only have the example on Java, however, I was doing

Re: Integration Test for Kafka Streaming job

2022-04-20 Thread Aeden Jameson
I've had success using Kafka for Junit, https://github.com/mguenther/kafka-junit, for these kinds of tests. On Wed, Apr 20, 2022 at 3:01 PM Alexey Trenikhun wrote: > > Hello, > We have Flink job that read data from multiple Kafka topics, transforms data > and write in output Kafka topics. We wan

Kubernetes killing TaskManager - Flink ignoring taskmanager.memory.process.size

2022-04-20 Thread Dan Hill
Hi. I upgraded to Flink v1.14.4 and now my Flink TaskManagers are being killed by Kubernetes for exceeding the requested memory. My Flink TM is using an extra ~5gb of memory over the tm.memory.process.size. Here are the flink-config values that I'm using taskmanager.memory.process.size: 2560

Integration Test for Kafka Streaming job

2022-04-20 Thread Alexey Trenikhun
Hello, We have Flink job that read data from multiple Kafka topics, transforms data and write in output Kafka topics. We want write integration test for it. I've looked at KafkaTableITCase, we can do similar setup of Kafka topics, prepopulate data but since in our case it is endless stream, we n

Restore Job from CheckPoint in IntelliJ IDE - MiniCluster

2022-04-20 Thread Κωνσταντίνος Αγαπίδης
Hi everyone, I have a technical question about a problem I am dealing with in my Flink Jobs. I am using IntelliJ, Java 1.8 and Flink Version 1.15.0 Libraries and Maven Dependencies in my jobs. I am trying to restore a job in Flink Minicluster Local Environment but I cannot find documentation a

Re: RocksDB efficiency and keyby

2022-04-20 Thread Yaroslav Tkachenko
Yep, I'd give it another try. EBS could be too slow in some use-cases. On Wed, Apr 20, 2022 at 9:39 AM Trystan wrote: > Thanks for the info! We're running EBS gp2 volumes... awhile back we > tested local SSDs with a different job and didn't notice any gains, but > that was likely due to an under

Re: Problems with PrometheusReporter

2022-04-20 Thread Peter Schrott
Hi kuweiha, Just to confirm, you tried with 1.15 - none of the rcs are working for me? This port is definitely free as it was already used on the same hosts with Flink 1.14.4. And as I said, when no job is running on the taskmanager it actually reports metrics on that certain port - I only get th

Re: RocksDB efficiency and keyby

2022-04-20 Thread Trystan
Thanks for the info! We're running EBS gp2 volumes... awhile back we tested local SSDs with a different job and didn't notice any gains, but that was likely due to an under-optimized job where the bottleneck was elsewhere On Wed, Apr 20, 2022, 11:08 AM Yaroslav Tkachenko wrote: > Hey Trystan, >

Re: Problems with PrometheusReporter

2022-04-20 Thread huweihua
Hi, Peter I have not been able to reproduce this problem. From your description, it is possible that the specified port has been listened by other processes, and PrometheusReporter failed to start. You can confirm it from taskmanager.log, or check if port of the host is being listene

Re: RocksDB efficiency and keyby

2022-04-20 Thread Yaroslav Tkachenko
Hey Trystan, Based on my personal experience, good disk IO for RocksDB matters a lot. Are you using the fastest SSD storage you can get for RocskDB folders? For example, when running on GCP, we noticed *10x* throughput improvement by switching RocksDB storage to https://cloud.google.com/compute/d

RocksDB efficiency and keyby

2022-04-20 Thread Trystan
Hello, We have a job where its main purpose is to track whether or not we've previously seen a particular event - that's it. If it's new, we save it to an external database. If we've seen it, we block the write. There's a 3-day TTL to manage the state size. The downstream db can tolerate new data

Re: How to debug Metaspace exception?

2022-04-20 Thread John Smith
Or I can put in the config to treat org.apache.ignite. classes as first class? On Tue, Apr 19, 2022 at 10:18 PM John Smith wrote: > Ok, so I loaded the dump into Eclipse Mat and followed: > https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks > > - On the Histogram, I go

Problems with PrometheusReporter

2022-04-20 Thread Peter Schrott
Hi Flink-Users, After upgrading to Flink 1.15 (rc3) (coming from 1.14) I noticed that there is a problem with the metrics exposed through the PrometheusReporter. It is configured as followed in the flink-config.yml: metrics.reporters: prom metrics.reporter.prom.class: org.apache.flink.metrics.pro

Re: flink-stop command fails with ` Operation not found under key`

2022-04-20 Thread huweihua
Stop-with-savepoint is an async operation, it will trigger savepoint at first, and trigger this operation in a cache. These error logs indicate that the operation is in the cache which can no longer be found. Could you provide more jobmanager.log to find when this operation is evicted from cac

Re: /status endpoint of flink jobmanager not working

2022-04-20 Thread huweihua
Hi, peter, job status endpoint is introduced in FLINK-26641[1], and it will works in Flink 1.16 [1]https://issues.apache.org/jira/browse/FLINK-26641 > 2022年4月19日 下午9:18,Peter Schrott 写道: > > Hi Flink Users, > > Does anyone know what happened to the /status endpoint of a job? > > Calling /job

Re: DecimalType.MAX_PRECISION, why 38?

2022-04-20 Thread Martijn Visser
Hi Yaroslav, I did a quick dive into FLIP-37 [1] which was on the rework of the type system. In the FLIP it's noted: > Decimal considerations: We use Hive’s maximums: https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java So that's the

Re: how to initialize few things at task managers

2022-04-20 Thread huweihua
yarn.ship-files only works in yarn environment. Maybe you could use a custom Docker entry point[1] like Austin said. [1]https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#further-customization > 2022年4月20日 上午1:19,Great Info 写道: > > I a

Re: Alertmanager Sink Connector

2022-04-20 Thread Fabian Paul
Hi Dhruv, Yes, the AsyncSink might be a good fit to implement your use case. In general, the AsyncSink is meant to be used for writing to external systems that do not support native transactions and can therefore not guarantee exactly-once semantics. As a starting point, you can look at the existi

Re: Alertmanager Sink Connector

2022-04-20 Thread Konstantin Knauf
cc user@, bcc dev@ Hi Dhruv, Yes, this should be totally possible. For 1, I would use a ProcessFunction to buffer the alerts and register timers per alert for the repeated firing (or to clear it from state if it is resolved). From this ProcessFunction you send records to an AlertManagerSink. Fo

Re: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-20 Thread Roman Khachatryan
State Processor API works on a higher level and is not aware of any RocksDB specifics (in fact, it can be used with any backend). Regards, Roman On Tue, Apr 19, 2022 at 10:52 PM Alexis Sarda-Espinosa wrote: > > I can look into RocksDB metrics, I need to configure Prometheus at some point > any

Re: Discuss making KafkaSubscriber Public

2022-04-20 Thread Mason Chen
Hi all, Just following up on this thread. The subject header may have been misleading--I was proposing to make KafkaSubscriber @PublicEvolving and expose a setter to pass a custom implementation. It seems logical since the KafkaSource is also @PublicEvolving and this lets the user know that the in