Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Annemarie Burger
I was wondering if it is possible to use a Stateful Function within a Flink pipeline. I know they work with different API's, so I was wondering if it is possible to have a DataStream as ingress for a Stateful Function. Some context: I'm working on a streaming graph analytics system, and want to sa

Stateful Functions: java.lang.IllegalStateException: There are no routers defined

2020-04-23 Thread Annemarie Burger
Hi, I'm getting to know Stateful Functions and was trying to run the Harness RunnerTest example. If I clone the repository and open and execute the project from there it works fine, but when I copy the code into my own project, it keeps giving a "java.lang.IllegalStateException: There are no route

Re: Using Stateful Functions within a Flink pipeline

2020-04-30 Thread Annemarie Burger
Hi Igal, Thanks for your responses. Regarding "having a pre-processing job that does whatever transformations necessary with the DataStream and outputs to Kafka / Kinesis, and then having a separate StateFun deployment that consumes from that transformed Kafka / Kinesis topic." I was wondering how

Window processing in Stateful Functions

2020-05-06 Thread Annemarie Burger
Hi, I want to do windowed processing in each function when using Stateful Functions. Is this possible? Some pseudo code would be very helpful! Some more context: I'm having a stream of edges as input. I want to window these and save the graph representation (either as edge list, adjacency list, o

Incremental state with purging

2020-05-12 Thread Annemarie Burger
Hi, I'm trying to implement the most efficient way to incrementally put incoming DataStream elements in my (map)state, while removing old elements (older that x) from that same state. I then want to output the state every y seconds. I've looked into using the ProcessFunction with onTimer, or build

Re: Windowed Stream Queryable State Support

2020-05-18 Thread Annemarie Burger
Hi, I was wondering that since it is possible to "query the state of an in-flight window", if it is also possible to make sure we query *every* window at the proper time. So how to access in flight window state of a window of a PU from another PU with Queryable State. I want to query the window st

Using Queryable State within 1 job + docs suggestion

2020-05-18 Thread Annemarie Burger
Hi, I want to use Queryable State to communicate between PU's in the same Flink job. I'm aware this is not the intended use of Queryable State, but I was wondering if and how it could be done. More specifically, I want to query the (event-time) window state of one PU, from another PU, while both

Re: Incremental state with purging

2020-05-18 Thread Annemarie Burger
Hi, Thanks for your suggestions! However, as I'm reading the docs for queryable state, it says that it can only be used for Processing time, and my windows are defined using event time. So, I guess this means I should use the KeyedProcessFunction. Could you maybe suggest a rough implementation fo

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Annemarie Burger
Hi, Thanks for your response! What if I'm using regular state instead of windowState, is there any way to use query this state of a PU from another PU in the same Flink job? Best, Annemarie -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Annemarie Burger
Hi, So what I meant was that I have a keyed stream, and from each thread/keygroup/PU I want to query the state of the other threads/keygroups/PUs. Does anybody have any experience with this? I'm currently working on it, and the main problem seems to be that the Queryable State Client requires

Query Rest API from IDE during runtime

2020-05-22 Thread Annemarie Burger
Hi, I want to query Flink's REST API in my IDE during runtime in order to get the jobID of the job that is currently running. Is there any way to do this? I found the RestClient class, but can't seem to figure out how to exactly make this work. Any help much appreciated. Best, Annemarie -- Se

Re: Query Rest API from IDE during runtime

2020-05-25 Thread Annemarie Burger
Hi, Thanks for your response! I can't seem to get past a "java.net.ConnectException: Connection refused" though. Below is the relevant code and exception, any idea what I'm doing wrong? Configuration config = new Configuration(); config.setString(JobManagerOptions.ADDRESS, "localhost"); confi

Re: Query Rest API from IDE during runtime

2020-05-25 Thread Annemarie Burger
Hi, Thanks for your reply and explanation! Do you know of any way to have a job retrieve its own jobID while it's still running? Best, Annemarie -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Using Queryable State within 1 job + docs suggestion

2020-05-26 Thread Annemarie Burger
Hi, I managed to work around the JobID issues, by first starting the task that queries the state, pauzing it, and then using env.executeAsync.getJobID to get the proper jobID to use when querying the state, and passing that to the (pauzed) query state task, which can then continue. However, the Q

Incremental state

2020-06-09 Thread Annemarie Burger
Hi, What I'm trying to do is the following: I want to incrementally add and delete elements to a state. If the element expires/goes out of the window, it needs to be removed from the state. I basically want the functionality of TTL, without using it, since I'm also using Queryable State and these

Global Hashmap & global static variable.

2020-07-17 Thread Annemarie Burger
Hi, I have two questions: 1. In the first part of my pipeline using Flink DataStreams processing graph edges, I'm filling up Hashmap. In it goes a vertex id and the partition this vertex is assigned to. Later in my pipeline I want to query this Hashmap again, to see in which partition exactly I c

Unable to submit high parallelism job in cluster

2020-07-27 Thread Annemarie Burger
Hi, I am running Flink on a cluster with 24 workers, each with 16 cores. Starting the cluster works fine and the Web interface confirms there are 384 slots working. Executing my code with parallelism 24 works fine, but when I try a higher parallelism, eg. 384, the job never succeeds in submitting.

Re: Unable to submit high parallelism job in cluster

2020-07-30 Thread Annemarie Burger
Hi! The problem was indeed a exponentially slow subtask that related to the parallelism, all working now, thanks! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/