Re: Broadcast state corrupted ?

2022-04-13 Thread Alexey Trenikhun
I think I’ve found the cause : dataInputView.read(data) could read partial data, it returns number of bytes stored in buffer. If I use dataInputView.readFully(data) instead, the problem disappears. Thanks, Alexey From: Alexey Trenikhun Sent: Wednesday, April 13,

Re: Flink state migration from 1.9 to 1.13

2022-04-13 Thread yu'an huang
Hi Qinghui, Did you used a difference keyby() for your KeyedCoProcesserOperator? For example, did you use a fied name (keyBy(“id”)) in 1.9 and while use a lambda (keyBy(e->e.getId()) in 1.13. This will make the key serializer incompatible. You may reference this link for how to use Apache Flink

Re: Broadcast state corrupted ?

2022-04-13 Thread Alexey Trenikhun
* Failing environment is using MinIO. * We use s3p:// filesystem * I don’t see errors in the Job Manager log: {"timestamp":"2022-04-14T00:14:13.358Z","message":"Triggering savepoint for job .","logger_name":"org.apache.flink.runtime.jobmaster.JobMaster"

Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-13 Thread Marco Villalobos
I'm trying to write very simple echo app with Stateful Function to prove it as a technology for some of my use cases. I have not been able to accept different content types though. Here is an example of my code for a simple echo function: My Echo stateful function class. package statefun.impl

Re: Weird Flink Kafka source watermark behavior

2022-04-13 Thread Jin Yi
i ended up just going back to FlinkKafkaConsumer instead of the new FlinkSource On Wed, Apr 13, 2022 at 3:01 AM Qingsheng Ren wrote: > Another solution would be setting the parallelism = #partitions, so that > one parallelism would be responsible for reading exactly one partition. > > Qingsheng

Re: Reactive mode and checkpointing

2022-04-13 Thread aryan m
Thanks for the pointers Alexander and David! We are exploring reactive mode on Flink 1.13 and my questions are merely hypothetical. Our streaming job consumes from Kafka, performs enrichment by querying external services and sinks to S3. Under backpressure or at random times, we observed one or t

Flink state migration from 1.9 to 1.13

2022-04-13 Thread XU Qinghui
Hello dear community, We are trying to upgrade our flink from 1.9 to 1.13, but it seems the same job running in 1.13 cannot restore the checkpoint / savepoint created by 1.9. The stacktrace looks like: java.lang.Exception: Exception while creating StreamOperatorStateContext. at org.apache.fli

Re: Discuss making KafkaSubscriber Public

2022-04-13 Thread Mason Chen
Hi Chesnay, Typically, users want to plug in a KafkaSubscriber that depends on an external system [1][2]. We could also provide a higher level interface that doesn’t depend on the Kafka Admin Client, but I think it would be more flexible to be able to re-use the one created by the enumerator if ne

Re: Avro deserialization issue

2022-04-13 Thread Anitha Thankappan
Hi Piotr, *The code i wrtten in 1.13.1* public final class BigQuerySourceFunction extends RichSourceFunction implements ResultTypeQueryable { @Override public void open(Configuration parameters) throws Exception { deserializer.open(RuntimeContextInitializationContextAdapters.deserialization

Re: Avro deserialization issue

2022-04-13 Thread Piotr Nowojski
Hey, Could you be more specific about how it is not working? A compiler error that there is no such class as RuntimeContextInitializationContextAdapters? This class has been introduced in Flink 1.12 in FLINK-18363 [1]. I don't know this code and I also don't know where it's documented, but: a) may

Re: Broadcast state corrupted ?

2022-04-13 Thread Chesnay Schepler
Is the failing environment using Azure, or MinIO? Which Flink filesystem did you use? Where there any errors in the job that took this savepoint? How was the cluster/job shut down? Does this happen reliably in the 1 environment, or only once? (did you try to reproduce it?) AFAIK sequences of A

Re: How to reprocess historical data with event-time windowing?

2022-04-13 Thread David Anderson
Ty, Usually what's done is to run a separate instance of the app to handle the re-ingestion of the historic data while another instance is processing live data. That way the backfill job won't be confused by observing events with recent timestamps -- it will only see the historic data. But you wil

Re: Broadcast state corrupted ?

2022-04-13 Thread Alexey Trenikhun
Any suggestions how to troubleshoot the issue? I still can reproduce the problem in environment A Thanks, Alexey From: Alexey Trenikhun Sent: Tuesday, April 12, 2022 7:10:17 AM To: Chesnay Schepler ; Flink User Mail List Subject: Re: Broadcast state corrupted ?

Re: Discuss making KafkaSubscriber Public

2022-04-13 Thread Chesnay Schepler
Could you expand a bit on possible alternative implementations that require this interface to become public, opposed to providing more built-in ways to subscribe? On 13/04/2022 11:26, Qingsheng Ren wrote: Thanks for the proposal Mason! I think exposing `KafkaSubscriber` as public API is helpf

Re: Weird Flink Kafka source watermark behavior

2022-04-13 Thread Qingsheng Ren
Another solution would be setting the parallelism = #partitions, so that one parallelism would be responsible for reading exactly one partition. Qingsheng > On Apr 13, 2022, at 17:52, Qingsheng Ren wrote: > > Hi Jin, > > Unfortunately I don’t have any quick bypass in mind except increasing t

Re: Weird Flink Kafka source watermark behavior

2022-04-13 Thread Qingsheng Ren
Hi Jin, Unfortunately I don’t have any quick bypass in mind except increasing the tolerance of out of orderness. Best regards, Qingsheng > On Apr 8, 2022, at 18:12, Jin Yi wrote: > > confirmed that moving back to FlinkKafkaConsumer fixes things. > > is there some notification channel/med

Re: Is there any way to get the ExecutionConfigurations in Dynamic factory class

2022-04-13 Thread Qingsheng Ren
Hi Anitha, I’m not quite familiar with BigQuery. Is it possible to postpone the operation to SplitEnumerator, in which you could get parallelism by org.apache.flink.api.connector.source.SplitEnumeratorContext#currentParallelism? If you are using SourceFunction you can get access to current p

Re: Discuss making KafkaSubscriber Public

2022-04-13 Thread Qingsheng Ren
Thanks for the proposal Mason! I think exposing `KafkaSubscriber` as public API is helpful for users to implement more complex subscription logics. +1 (non-binding) Cheers, Qingsheng > On Apr 12, 2022, at 11:46, Mason Chen wrote: > > Hi Flink Devs, > > I was looking to contribute to > ht

Re: Issue with doing filesink to HDFS

2022-04-13 Thread Guowei Ma
Hi,Anubhav Would you like to share the result of `echo $HADOOP_CLASSPATH` and the detailed information after you set up the hadoop classpaht? Best, Guowei On Wed, Apr 13, 2022 at 4:16 PM Anubhav Nanda wrote: > Hi Guomei, > > That i already did but still getting the issue > > Regards, > Anubh

Re: Issue with doing filesink to HDFS

2022-04-13 Thread Anubhav Nanda
Hi Guomei, That i already did but still getting the issue Regards, Anubhav [image: Mailtrack] Sender notified by Mailtrack

Re: Issue with doing filesink to HDFS

2022-04-13 Thread Guowei Ma
Hi I think you need to export HADOOP_CLASSPATH correclty. [1] [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/yarn/#preparation Best, Guowei On Wed, Apr 13, 2022 at 12:50 PM Anubhav Nanda wrote: > Hi, > > I have setup flink 1.13.5 and we are us