Re: Basic questions about resuming stateful Flink jobs

2022-02-17 Thread Piotr Nowojski
Hi James, > Do I copy the checkpoint into a savepoint directory and treat it like a savepoint? You don't need to copy the checkpoint. Actually you can not do that, as checkpoints are not relocatable. But you can point to the checkpoint directory and resume from it like you would from a savepoint.

Pyflink with pulsar

2022-02-17 Thread Ananth Gundabattula
Hello All, I am trying to build a pyflink application and I currently have a pulsar instance that I need to connect and start streaming messages from. I was wondering if there is any advice regarding pulsar as a source connector available via python ? Alternately, Pulsar seems to have a kafka

Re: Apache Flink - User Defined Functions - Exception when passing all arguments

2022-02-17 Thread Francesco Guardiani
Hi, The SQL syntax is not supported, as the SQL standard itself does not allow it. It sounds strange that it fails at validation phase rather than when parsing, but it shouldn't work anyway. I suggest you to just use Table API for that, as it's richer. You can even use withColumns(range(..)) whic

Re: SQL / Table Api lag() over partition by ... and windowing

2022-02-17 Thread Francesco Guardiani
Why do you need MapPartitionFunction? On Wed, Feb 16, 2022 at 7:02 PM HG wrote: > Thanks > > Would the option for datastream be to write a MapPartitionFunction? > > Op wo 16 feb. 2022 om 16:35 schreef Francesco Guardiani < > france...@ververica.com>: > >> > Which does not work since it cannot fi

Re: SQL / Table Api lag() over partition by ... and windowing

2022-02-17 Thread HG
Well I thought that in order to do the same with only the datastream api I would need to use MapPartitionFunction. Op do 17 feb. 2022 om 10:41 schreef Francesco Guardiani < france...@ververica.com>: > Why do you need MapPartitionFunction? > > On Wed, Feb 16, 2022 at 7:02 PM HG wrote: > >> Tha

Re: Task manager errors with Flink ZooKeeper High Availability

2022-02-17 Thread Yun Gao
Hi Koffman, From TM side the only possible usage come to me is that or components like BlobCache, which is used to transfer jars or large task informations between JM and TM. But specially for BlobService, if it failed to find the file it would turn to JM via http connection. If convenient could

Re: Task manager errors with Flink ZooKeeper High Availability

2022-02-17 Thread Koffman, Noa (Nokia - IL/Kfar Sava)
Hi, Thanks for your reply, Please see below the full stack trace, and the log message right after, it looks like it is trying to download via BlobClient after failing to download from store, as you have suggested. My question is, is there a way to avoid this attempt to copy from blob store? Is m

Re: Task manager errors with Flink ZooKeeper High Availability

2022-02-17 Thread Chesnay Schepler
Everything is fine. The TM tries to retrieve the jar (aka, the blob), and there is a fast path to access it directly from storage. This fails (because it has no access to it), and then falls back to retrieving it from the JM. On 17/02/2022 13:49, Koffman, Noa (Nokia - IL/Kfar Sava) wrote: H

Re: Task manager errors with Flink ZooKeeper High Availability

2022-02-17 Thread Koffman, Noa (Nokia - IL/Kfar Sava)
Thanks, I understand that the functionality isn’t affected, this is very good news. But is there a way to either skip this check or skip logging it? We see it in our log more the 400 times per task manager. It would be very helpful if the log level could be reduced, or the check could be skipped?

Re: Task manager errors with Flink ZooKeeper High Availability

2022-02-17 Thread Chesnay Schepler
You could reduce the log level of the PermanentBlobCache to WARN via the Log4j configuration. I think you could even filter this specific message with Log4j. On 17/02/2022 14:35, Koffman, Noa (Nokia - IL/Kfar Sava) wrote: Thanks, I understand that the functionality isn’t affected, this is ver

java.io.IOException: Failed to deserialize consumer record due to/ How to serialize table output for KafkaSink

2022-02-17 Thread HG
Hello, I have to convert the table to Datastream and try to do it with toAppendStream (just saw that it is deprecated ) But I have not been able to do the conversion as yet. (See the attached code). Also my final Sink should be Kafka and the format ObjectNode/JSON. So I need a serializer eventuall

RE: Basic questions about resuming stateful Flink jobs

2022-02-17 Thread Schwalbe Matthias
Hi James, Coming back to your original question on how to restart jobs from savepoints/checkpoints on LocalStreamEnvironment (the one used in a debugger): Out of the box LocalStreamEnvironment does not allow setting a snapshot path to resume the job from. The trick for me to do it anyway was to

Apache Flink - Continuously monitoring directory using filesystem connector - 1.14.3

2022-02-17 Thread M Singh
Hi:   I have a simple application and am using file system connector to monitor a directory and then print to the console (using datastream).  However, the application stops after reading the file in the directory (at the moment I have a single file in the directory).   I am using Apache Flink v

Re: Synchronization across tasks using checkpoint barriers

2022-02-17 Thread Gopi Krishna M
Just found that we can use prepareSnapshotPreBarrier in AbstractStreamOperator to achieve this. On Tue, Feb 15, 2022 at 9:13 AM

Re: java.io.IOException: Failed to deserialize consumer record due to/ How to serialize table output for KafkaSink

2022-02-17 Thread Yun Gao
Hi, I tried with a simplied version of the attached code, and it shows the detailed exception is Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.time.Instant at org$apache$flink$api$java$tuple$Tuple4$1$Converter.toInternal(Unknown Source) at org.apache.flink.

Re: Flink 1.15 deduplication view and lookup join

2022-02-17 Thread Yun Gao
Hi Francis, I think requiring primary for versioned table[1] used in temporarl join[2] should be expected. May I have a double confirmation that which table serves as the versioned table in this case? Is it the streaming table from the rabbitmq or the joined data? Best, Yun [1] https://n

Re: Pyflink with pulsar

2022-02-17 Thread Luning Wong
The Pulsar python source connector will be released in 1.15 version. if you want to use it right now, you could compile the master branch. When I completed the python connector code, I only tested the native pulsar protocol without KOP. Usage examples are in comments of the PulsarSource class and

Re: Apache Flink - Continuously monitoring directory using filesystem connector - 1.14.3

2022-02-17 Thread Francesco Guardiani
Hi, Filesystem source directory watching is going to be available from 1.15: https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#directory-watching FG On Fri, Feb 18, 2022 at 1:28 AM M Singh wrote: > Hi: > > I have a simple application and am using file system