AW: Statefun: cancel "sendAfter"

2021-02-02 Thread Stephan Pelikan
Hi, thank you Gordon for clarification. My use-case is processing business events of customers. Those events are triggered by ourself or by the customer depending of what’s the current state of the ongoing customer’s business use-case. We need to monitor delayed/missing business events which be

Re: Flink cli to upload flink jar and run flink jobs in the Jenkins pipeline

2021-02-02 Thread Yang Wang
I think the Flink client could make a connection with ZooKeeper via the network load balancer. Flink client is not aware of whether it is a network balancer or multiple ZooKeeper server address. After then Flink client will retrieve the active leader JobManager address via ZooKeeperHAService and su

Re: Statefun: cancel "sendAfter"

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi, You are right, currently StateFun does not support deleting a scheduled delayed message. StateFun supports delayed messages by building on top of two Flink constructs: 1) registering processing time timers, and 2) buffering the message payload to be sent in state. The delayed messages are ke

Re: Question on Flink and Rest API

2021-02-02 Thread Ejaskhan S
Yes Gordon, it's obviously gave me a starting point to think about. On Wed, Feb 3, 2021, 12:02 PM Tzu-Li (Gordon) Tai wrote: > Hi, > > There is no out-of-box Flink source/sink connector for this, but it isn't > unheard of that users have implemented something to support what you > outlined. > >

Re: Question on Flink and Rest API

2021-02-02 Thread Ejaskhan S
Hi Raghavendar, Yes , you are right. Your approach is correct ,and it is the most straightforward one.but I was just thinking about the possibilities of my question mentioned. Thanks EK On Wed, Feb 3, 2021, 12:02 PM Raghavendar T S wrote: > Hi Ejaskhan > > As per my understanding, this approac

Re: Question on Flink and Rest API

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi, There is no out-of-box Flink source/sink connector for this, but it isn't unheard of that users have implemented something to support what you outlined. One way to possibly achieve this is: in terms of a Flink streaming job graph, what you would need to do is co-locate the source (which expos

Re: Question on Flink and Rest API

2021-02-02 Thread Raghavendar T S
Hi Ejaskhan As per my understanding, this approach will require your data source to run a HTTP server within itself (embedded web server) and I am not sure If it is a good design. It looks like you are trying to build a synchronous(client-server model) processing model in Flink. But Flink is meant

Re: Question a possible use can for Iterative Streams.

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi Marco, In the ideal setup, enrichment data existing in external databases is bootstrapped into the streaming job via Flink's State Processor API, and any follow-up changes to the enrichment data is streamed into the job as a second union input on the enrichment operator. For this solution to sc

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Lasse Nedergaard
Hi We had something similar and our problem was class loader leaks. We used a summary log component to reduce logging but still turned out that it used a static object that wasn’t released when we got an OOM or restart. Flink was reusing task managers so only workaround was to stop the job wait

Question on Flink and Rest API

2021-02-02 Thread Ejaskhan S
Team, It's just a random thought. Can I make the Flink application exposing a rest endpoint for the data source? So a client could send data to this endpoint. Subsequently, Flink processed this data and responded to the client application through the endpoint, like a client-server model. Thanks

Re: LEAD/LAG functions

2021-02-02 Thread Patrick Angeles
Thanks, Jark. On Mon, Feb 1, 2021 at 11:50 PM Jark Wu wrote: > Yes. RANK/ROW_NUMBER is not allowed with ROW/RANGE over window, > i.e. the "ROWS BETWEEN 1 PRECEDING AND CURRENT ROW" clause. > > Best, > Jark > > On Mon, 1 Feb 2021 at 22:06, Timo Walther wrote: > >> Hi Patrick, >> >> I could imagi

Re: Question

2021-02-02 Thread Abu Bakar Siddiqur Rahman Rocky
Hi, Is there any source code for the checkpoints, snapshot and zookeeper mechanism? Thank you On Mon, Feb 1, 2021 at 4:23 AM Chesnay Schepler wrote: > Could you expand a bit on what you mean? Are you referring to *savepoints* > ? > > On 1/28/2021 3:24 PM, Abu Bakar Siddiqur Rahman Rocky wrote:

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Xintong Song
> > How is the memory measured? I meant which flink or k8s metric is collected? I'm asking because depending on which metric is used, the *container memory usage* can be defined differently. E.g., whether mmap memory is included. Also, could you share the effective memory configurations for the t

Question a possible use can for Iterative Streams.

2021-02-02 Thread Marco Villalobos
Hi everybody, I am brainstorming how it might be possible to perform database enrichment with the DataStream API, use keyed state for caching, and also utilize Async IO. Since AsyncIO does not support keyed state, then is it possible to use an Iterative Stream that uses keyed state for caching in

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-02-02 Thread Sebastián Magrí
The root of the previous error seemed to be the flink version the connector was compiled for. I've tried compiling my own postgresql-cdc connector, but still have some issues with dependencies. On Thu, 28 Jan 2021 at 11:24, Sebastián Magrí wrote: > Applied that parameter and that seems to get me

Re: [Flink SQL] CompilerFactory cannot be cast error when executing statement

2021-02-02 Thread Sebastián Magrí
Hi Timo! I've been building my jobs instead of using the binaries to avoid this issue, hence I've not looked at this again. But I'd say it's still an issue since nothing from the set up have changed in the meantime. Thanks! On Tue, 2 Feb 2021 at 08:51, Timo Walther wrote: > Hi Sebastian, > > s

Statefun: cancel "sendAfter"

2021-02-02 Thread Stephan Pelikan
Hi, I think about using "sendAfter" to implement some kind of timer functionality. I'm wondering if there is no possibility to cancel delayed sent message! In my use case it is possible that intermediate events make the delayed message obsolete. In some cases the statefun of that certain ID is

Re: Flink Datadog Timeout

2021-02-02 Thread Chesnay Schepler
The reported exception looks quite similar to the one in this thread , which was supposedly caused by Datadog rate limits but I don't think this was thoroughly investig

Flink Datadog Timeout

2021-02-02 Thread Claude M
Hello, I have a Flink jobmanager and taskmanagers deployed in a Kubernetes cluster. I integrated it with Datadog by having the following specified in the flink-conf.yaml. metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter metrics.reporter.dghttp.apikey: However

Re: Very slow recovery from Savepoint

2021-02-02 Thread Robert Metzger
Hey Yordan, have you checked the log files from the processes in that cluster? The JobManager log should give you hints about issues with the coordination / scheduling of the job. Could it be something unexpected, like your job could not start, because there were not enough TaskManagers available?

Max with retract aggregate function does not support type: ''CHAR''.

2021-02-02 Thread Yuval Itzchakov
Hi, I'm trying to use MAX on a field that is statically known from another table (let's ignore why for a moment). While running the SQL query, I receive an error: Max with retract aggregate function does not support type: ''CHAR''. Looking at the code for creating the max function: [image: image

Re: Dynamic statefun topologies

2021-02-02 Thread Igal Shilman
Hi Frédérique! Thank you for your kind words! let me try to answer your questions: >From the email thread, it looks like there’s going to be support for > dynamic function dispatch by name patterns which is pretty cool, but it > sounds like you still need to redeploy if you add a new ingress or e

Flink cli to upload flink jar and run flink jobs in the Jenkins pipeline

2021-02-02 Thread sidhant gupta
Hi I have a flink ECS cluster setup with HA mode using zookeeper where I have 2 jobmanagers out of which one of will be elected as leader using zookeeper leader election. I have one application load balancer in front of the jobmanagers and one network load balancer in front of zookeeper. As per [

AbstractMethodError while writing to parquet

2021-02-02 Thread Jan Oelschlegel
Hi at all, i'm using Flink 1.11 with the datastream api. I would like to write my results in parquet format into HDFS. Therefore i generated an Avro SpecificRecord with the avro-maven-plugin: org.apache.avro avro-maven-plugin 1.8.2

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Randal Pitt
Hi Xintong Song, Correct, we are using standalone k8s. Task managers are deployed as a statefulset so have consistent pod names. We tried using native k8s (in fact I'd prefer to) but got persistent "io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 242214695 (242413

Re: Integration with Apache AirFlow

2021-02-02 Thread Flavio Pompermaier
You're probably right Chesnay, I just asked to this mailing list to know if there are any pointers or blog post about this topic from a Flink perspective. Then the conversation has gone in the wrong direction. Best, Flavio On Tue, Feb 2, 2021 at 12:48 PM Chesnay Schepler wrote: > I'm sorry, but

Re: Integration with Apache AirFlow

2021-02-02 Thread Chesnay Schepler
I'm sorry, but aren't these question better suited for the Airflow mailing lists? On 2/2/2021 12:35 PM, Flavio Pompermaier wrote: Thank you all for the hints. However looking at the REST API[1] of AirFlow 2.0 I can't find how to setup my DAG (if this is the right concept). Do I need to first c

Re: Integration with Apache AirFlow

2021-02-02 Thread Flavio Pompermaier
Thank you all for the hints. However looking at the REST API[1] of AirFlow 2.0 I can't find how to setup my DAG (if this is the right concept). Do I need to first create a Connection? A DAG? a TaskInstance? How do I specify the 2 BashOperator? I was thinking to connect to AirFlow via Java so I can

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Xintong Song
Hi Randal, The image is too blurred to be clearly seen. I have a few questions. - IIUC, you are using the standalone K8s deployment [1], not the native K8s deployment [2]. Could you confirm that? - How is the memory measured? Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flin

RE: Event trigger query

2021-02-02 Thread Colletta, Edward
You can use a tumbling processing time window with an offset of 13 hours + your time zone offset. https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/operators/windows.html#tumbling-windows https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/

Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Randal Pitt
Hi, We're running Flink 1.11.3 on Kubernetes. We have a job with parallelism of 10 running on 10 task managers each with 1 task slot. The job has 4 time windows with 2 different keys, 2 windows have reducers and 2 are processed by window functions. State is stored in RocksDB. We've noticed when a

Re: Running Beam Pipelines on a Flink Application Mode Cluster

2021-02-02 Thread Yang Wang
Hi Jan, If you could run your Apache Beam application with Flink session mode, then it could also work in application mode. The key difference for application mode is that the job submission happens in the JobManager pod, not at the Flink client side. If you want to use the standalone application

Re: Proctime consistency

2021-02-02 Thread Timo Walther
As far as I know, we support ROW_NUMBER in SQL that could give you sequence number. Regarding window semantics, the processing time only determines when to trigger the evaluation (also mentioned here [1]). A timer is registered for the next evaluation. The window content and next timer is part

Dynamic statefun topologies

2021-02-02 Thread Frédérique Mittelstaedt
Hi! Thanks for all the great work on both Flink and Statefun. I saw this recent email thread (https://lists.apache.org/thread.html/re984157869f5efd136cda9d679889e6ba2f132213ae7afff715783e2%40%3Cuser.flink.apache.org%3E

Event trigger query

2021-02-02 Thread Abhinav Sharma
Newbie question: How can I set triggers to stream which execute according to system time? Eg: I want to sum the elements of streams at 1PM everyday.

Re: Flink CheckPoint/Savepoint Behavior Question

2021-02-02 Thread Arvid Heise
Hi Jason, you got it perfectly right. So everything that is not in an explicit state (or checkpointed in CheckpointedFunction#snapshotState) is lost on recovery. However, Flink applications always go through the complete life-cycle. Note that I'd look into CheckpointedFunction if the side-informa

Re: Integration with Apache AirFlow

2021-02-02 Thread Arvid Heise
Hi Flavio, If you know a bit of Python, it's also trivial to add a new Flink operator where you can use REST API. In general, I'd consider Airflow to be the best choice for your problem, especially if it gets more complicated in the future (do something else if the first job fails). If you have

Re: Integration with Apache AirFlow

2021-02-02 Thread 姜鑫
Hi Flavio, I probably understand what you need. Apache AirFlow is a scheduling framework which you can define your own dependent operators, therefore you can define a BashOperator to submit flink job to you local flink cluster. For example: ``` t1 = BashOperator( task_id=‘flink-wordcount',

Re: [Flink SQL] CompilerFactory cannot be cast error when executing statement

2021-02-02 Thread Timo Walther
Hi Sebastian, sorry for the late reply. Could you solve the problem in the meantime? It definitely looks like a dependency conflict. Regards, Timo On 22.01.21 18:18, Sebastián Magrí wrote: Thanks a lot Matthias! In the meantime I'm trying out something with the scala quickstart. On Fri,

Re: Integration with Apache AirFlow

2021-02-02 Thread Flavio Pompermaier
Hi Xin, let me state first that I never used AirFlow so I can probably miss some background here. I just want to externalize the job scheduling to some consolidated framework and from what I see Apache AirFlow is probably what I need. However I can't find any good blog post or documentation about h