Re: Create a lookup table in a StreamExecutionEnvironment

2021-09-10 Thread JING ZHANG
Hi Robert, First of all, the built-in Kafka connector source is not a `LookupTableSource`. If we use Kafka as a lookup table, we need to implement a user-defined source [1]. Secondly, about how to define a user-defined lookup table source for Kafka, I'm not an expert in Kafka, please correct me if

Re: How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread tao xiao
Thanks David for the tips. We have been running Flink with no performance degradation observed in EMR (which is EBS attached) for more than 1 year therefore we believe the same performance can be applied in Kubernetes. On Sat, Sep 11, 2021 at 3:13 AM David Morávek wrote: > OT: Beware that even i

Re: [External] : Re: Use FlinkKafkaConsumer to synchronize multiple Kafka topics

2021-09-10 Thread Yan Wang
Hi Arvid, Thanks for your reply. Yes, the warning is throwed by Kafka-clients. Here is the warning log after I deleted the topic that Kafka consumer is listening to. 18:46:27,297 WARN org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-2, groupId=osstest] Error while fetching

Re: CEP library support in Python

2021-09-10 Thread Seth Wiesman
Hi Pedro, The DataStream CEP library is not available in Python but you can use `MATCH_RECOGNIZE` in the table API which is implemented on-top of the CEP library from Python. https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/match_recognize/ Seth On Fri, Sep

Re: How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread David Morávek
OT: Beware that even if you manage to solve this, EBS is replicated network storage, therefore rocksdb performance will be affected significantly. Best, D. On Fri 10. 9. 2021 at 16:19, tao xiao wrote: > The use case we have is to store the RocksDB sst files in EBS. The EC2 > instance type (m5)

TaskManagers OOM'ing for Flink App with very large state only when restoring from checkpoint

2021-09-10 Thread Kevin Lam
Hi all, We've seen scenarios where TaskManagers will begin to OOM, shortly after a job restore from checkpoint. Our flink app has a very large state (100s of GB) and we use RocksDB as a backend. Our repro is something like this: run the job for an hour and let it accumulate state, kill a task man

Create a lookup table in a StreamExecutionEnvironment

2021-09-10 Thread Robert Cullen
I have a developer that wants to create a lookup table in Kafka with data that will be used later when sinking with S3. The lookup table will have folder information that will be used as a Bucket Assigner in the StreamingFileSink. I thought using the Table API to generate the lookup table and bui

CEP library support in Python

2021-09-10 Thread Pedro Silva
Hello, Is Flink's CEP library available in python? From the documentation I see no references so I'm guessing the answer is no but wanted some confirmation from the community or developers. Are there plans to support this

Re: DataStreamAPI and Stateful functions

2021-09-10 Thread Barry Higgins
Thanks Igal, I appreciate you coming back to me. I have quickly tried the fat jar solution as you've gone through it and am running into an exception: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Failed to execute job 'StatefulFunctions'.

Re: How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread tao xiao
The use case we have is to store the RocksDB sst files in EBS. The EC2 instance type (m5) we use doesn't provide local disk storage therefore EBS is the only option to store the local sst file. On Fri, Sep 10, 2021 at 7:10 PM Yang Wang wrote: > I am afraid Flink could not support creating dedica

Re: Usecase for flink

2021-09-10 Thread Timo Walther
If your graphs fit in memory (at least after an initial partitioning), you could use any external library for graph processing within a single node in a Flink ProcessFunction. Flink is a general data processor that allows to have arbitrary logic where user code is allowed. Regards, Timo On

Re: Usecase for flink

2021-09-10 Thread Dipanjan Mazumder
Good point what is the better option for graph processing with flink.. any suggestions On Friday, September 10, 2021, 04:52:30 PM GMT+5:30, Martijn Visser wrote: Hi, Please keep in mind that Gelly is approaching end-of-life [1]  Regards, Martijn [1] https://flink.apache.org/roadmap.htm

Re: DataStreamAPI and Stateful functions

2021-09-10 Thread Igal Shilman
Hello Barry, I assume that by "we don't need another installation of Flink to manage the stateful functions." You mean that you already have a running Flink cluster and you would like to submit an additional Flink Job that executes a Stateful functions application? Then perhaps just try to submit

Re: Streaming Patterns and Best Practices - featuring Apache Flink

2021-09-10 Thread Timo Walther
Thanks for sharing this with us Devin. If you haven't considered it already, maybe this could also be something for next Flink Forward? Regards, Timo On 02.09.21 21:02, Devin Bost wrote: I just released a new video that features Apache Flink in several design patterns: Streaming Patterns an

Re: Job crashing with RowSerializer EOF exception

2021-09-10 Thread Timo Walther
I assume you are still using toAppendStream or toRetractStream? Otherwise I'm wondering where the RowSerializer is actually coming from. The new planner doesn't use a row serializer. Debugging serializer issue is difficult. We need more information about the pipeline. Regards, Timo On 10.0

Re: Questions regarding broadcast join in Flink

2021-09-10 Thread Timo Walther
Hi Gerald, actually, this is a typical issue when performing a streaming join. An ideal solution would be to block the main stream until the broadcast stream is ready. However, this is currently not supported in the API. In any case, a user needs to handle this in a use case specific way to

Re: Usecase for flink

2021-09-10 Thread Martijn Visser
Hi, Please keep in mind that Gelly is approaching end-of-life [1] Regards, Martijn [1] https://flink.apache.org/roadmap.html On Fri, 10 Sept 2021 at 09:09, Dipanjan Mazumder wrote: > Hi Jing, > > Thanks for the input another question i had was can Gelly be used for > processing the graph

Re: Usecase for flink

2021-09-10 Thread Timo Walther
Hi Dipanjan, Gelly is built on top of the DataSet API which is a batch-only API that is slowly phasing out. It is not possible to connect a DataStream API program with a DataSet API program unless you go through a connector such as CSV in between. Regards, Timo On 10.09.21 09:09, Dipanjan

Re: Issue while creating Hive table from Kafka topic

2021-09-10 Thread Timo Walther
It seems that your Kafka clients dependency is not in your JAR file. ByteArrayDeserializer is a symptom that seems to occur often. At least, I can find a similar question on Stackoverflow: https://stackoverflow.com/questions/58408494/flink-1-3-class-not-found-org-apache-kafka-common-bytearray

Re: How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread Yang Wang
I am afraid Flink could not support creating dedicated PVC for each TaskManager pod now. But I think it might be a reasonable requirement. Could you please share why you need to mount a persistent volume claim per TaskManager? AFAIK, the TaskManager will be deleted once it fails. You expect the PV

Re: Allocation-preserving scheduling and task-local recovery

2021-09-10 Thread Xiang Zhang
Robert, thank you for your reply! I tried to remove "cluster.evenly-spread-out-slots", and then tested two scenarios: 1) restart the leader job manager; 2) restart a single task manager. These tests are done in a testing environment where I have six task managers and only four tasks to schedul

Re: De/Serialization API to tear-down user code

2021-09-10 Thread Arvid Heise
I created FLINK-24250 [1]. [1] https://issues.apache.org/jira/browse/FLINK-24250 On Fri, Sep 10, 2021 at 10:23 AM Sergio Morales wrote: > Please create the feature request ticket, I have no idea how to do it. > > > > Regards, > > Sergio. > > > > *From: *Arvid Heise > *Date: *Monday, 6 Septembe

Re: Issue while creating Hive table from Kafka topic

2021-09-10 Thread Harshvardhan Shinde
I'm unable to figure out which dependency to add in order for the ByteArrayDeserializer class to get included in the jar. I have added all the dependencies as per the documentation still unable to figure out the cause. On Fri, Sep 10, 2021 at 12:17 AM Robert Metzger wrote: > Does the jar file yo

Re: De/Serialization API to tear-down user code

2021-09-10 Thread Sergio Morales
Please create the feature request ticket, I have no idea how to do it. Regards, Sergio. From: Arvid Heise Date: Monday, 6 September 2021 at 18:21 To: Dawid Wysakowicz Cc: Sergio Morales , user Subject: Re: De/Serialization API to tear-down user code I think it's a valid request to have a tear-

Re: State processor API very slow reading a keyed state with RocksDB

2021-09-10 Thread David Causse
Thank you all for the great insights and suggestions! I understand that the underlying components used by the state processor api are sufficiently different that it may explain this slowness and this behavior is not something caused by the way we use this API. David. On Fri, Sep 10, 2021 at 5:27

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-10 Thread Puneet Duggal
Hi Robert, Thanks for taking out time to go through the logs. Problem: So reason for restarting all the task managers was to incorporate increased jvm metaspace size for each existing task manager. Currently each taskmanager has 32 slots. But JVM metaspace size was 256 MB which used to get fil

Re: Usecase for flink

2021-09-10 Thread Dipanjan Mazumder
Hi Jing,     Thanks for the input another question i had was can Gelly be used for processing the graph that flink receives through kafka and then using Gelly i decompose the graph into its nodes and edges and then process them individually through substreams and then write the final output of