Row to tuple conversion in PyFlink when switching to 'thread' execution mode

2024-03-29 Thread Wouter Zorgdrager
Dear readers, I'm running into some unexpected behaviour in PyFlink when switching execution mode from process to thread. In thread mode, my `Row` gets converted to a tuple whenever I use a UDF in a map operation. By this conversion to tuples, we lose critical information such as column names. Bel

Can't start FlinkKafkaProducer using SSL

2021-08-23 Thread Wouter Zorgdrager
Hi all, I'm trying to deploy a FlinkKafkaProducer in PyFlink on a remote cluster. Unfortunately, I'm getting the following exception: Caused by: org.apache.flink.kafka.shaded.org.apache.kafka.common. KafkaException: Failed to construct kafka producer at org.apache.flink.kafka.shaded.org.apach

Fwd: PyFlink performance and deployment issues

2021-08-14 Thread Wouter Zorgdrager
flow-evaluation/pyflink_runtime.py \ --jarfile ~/Documents/stateflow-evaluation/benchmark/bin/combined.jar I hope someone can help me with this because it is a blocker for me. Thanks in advance, Wouter -- Forwarded message ----- From: Wouter Zorgdrager Date: Thu, 8 Jul 2021 at 12:20 Subje

Re: PyFlink performance and deployment issues

2021-07-08 Thread Wouter Zorgdrager
memory runner when running it >>> locally in Beam. In that case, the code path is totally differently >>> compared to running in a remote cluster. >>> 2) Regarding to `flink run`, I’m surprising that it’s running locally. >>> Could you submit a java job with similar commands to see how it runs? >>> 3) Regarding to `flink run-application`, could you share the exception >>> stack? >>> >>> Regards, >>> Dian >>> >>> 2021年7月6日 下午4:58,Wouter Zorgdrager 写道: >>> >>> uses >>> >>> >>>

Re: PyFlink performance and deployment issues

2021-07-08 Thread Wouter Zorgdrager
> 3) Regarding to `flink run-application`, could you share the exception > stack? > > Regards, > Dian > > 2021年7月6日 下午4:58,Wouter Zorgdrager 写道: > > uses > > >

PyFlink performance and deployment issues

2021-07-06 Thread Wouter Zorgdrager
Dear community, I have been struggling a lot with the deployment of my PyFlink job. Moreover, the performance seems to be very disappointing especially the low-throughput latency. I have been playing around with configuration values, but it has not been improving. In short, I have a Datastream job

Re: ByteSerializationSchema in PyFlink

2021-06-08 Thread Wouter Zorgdrager
TypeInformation.of(byte[].class); } } This code is packaged in a jar and uploaded through env.add_jars. That works like a charm! Thanks for the help! Wouter On Fri, 4 Jun 2021 at 14:40, Wouter Zorgdrager wrote: > Hi Dian, all, > > Thanks for your suggestion. Unfortunately, it does no

Re: ByteSerializationSchema in PyFlink

2021-06-04 Thread Wouter Zorgdrager
j_type_serializer= > > j_type_info.createSerializer(gate_way.jvm.org.apache.flink.api.common.ExecutionConfig()) > > j_byte_string_schema = > gate_way.jvm.org.apache.flink.api.common.serialization.TypeInformationSerializationSchema(j_type_info, > j_type_serializer) > > ``` >

ByteSerializationSchema in PyFlink

2021-06-03 Thread Wouter Zorgdrager
Hi all, I have a PyFlink job connected to a KafkaConsumer and Producer. I want to directly work with the bytes from and to Kafka because I want to serialize/deserialize in my Python code rather than the JVM environment. Therefore, I can't use the SimpleStringSchema for (de)serialization (the messa

Re: PyFlink DataStream union type mismatch

2021-05-23 Thread Wouter Zorgdrager
,Types.STRING()) > > ``` > > The reason is that `union` will turns `KeyedStream` into `DataStream` and > you could not perform stateful operations on `DataStream` any more. > > Regards, > Dian > > 2021年5月21日 上午12:38,Wouter Zorgdrager 写道: > > Dear all, >

PyFlink DataStream union type mismatch

2021-05-20 Thread Wouter Zorgdrager
Dear all, I'm having trouble unifying two data streams using the union operator in PyFlink. My code basically looks like this: init_stream = (operator_stream .filter(lambda r: r[0] is None) .map(init_operator,Types.TUPLE([Types.STRING(),Types.PICKLED_BYTE_ARRAY()])) .key_by(lambda x: x[0

Side outputs PyFlink

2021-05-20 Thread Wouter Zorgdrager
Dear Flink community, First of all, I'm very excited about the new 1.13 release. Among other features, I'm particularly excited about the support of stateful operations in Python. I think it will make the wonders of stream processing and the power of Flink accessible to more developers. I'm curre

Re: Statefun 2.0 questions

2020-05-13 Thread Wouter Zorgdrager
u.be/NF0hXZfUyqE > [2] https://www.youtube.com/watch?v=tuSylBadNSo > > It seems like you are on the correct path! > Good luck! > Igal. > > > On Tue, May 12, 2020 at 11:18 PM Wouter Zorgdrager > wrote: > >> Hi Igal, all, >> >> In the meantime we f

Re: Statefun 2.0 questions

2020-05-12 Thread Wouter Zorgdrager
7 mei 2020 om 17:17 schreef Wouter Zorgdrager : > Hi Igal, > > Thanks for your quick reply. Getting back to point 2, I was wondering if > you could trigger indeed a stateful function directly from Flask and also > get the reply there instead of using Kafka in between. We want to >

Re: Statefun 2.0 questions

2020-05-07 Thread Wouter Zorgdrager
higher > request rate/latency increases by spinning new instances (something that is > not yet supported with the embedded API) > > Good luck, > Igal. > > > > > > [1] https://github.com/docker-library/official-images/pull/7749 > > > On Wednesday, May 6

Statefun 2.0 questions

2020-05-06 Thread Wouter Zorgdrager
Hi all, I've been using Flink for quite some time now and for a university project I'm planning to experiment with statefun. During the walkthrough I've run into some issues, I hope you can help me with. 1) Is it correct that the Docker image of statefun is not yet published? I couldn't find it a

Re: Unexpected behavior from interval join in Flink

2019-06-24 Thread Wouter Zorgdrager
uld show if there are any issue with late data. > > Best, Fabian > > Am Fr., 21. Juni 2019 um 15:32 Uhr schrieb Wouter Zorgdrager < > w.d.zorgdra...@tudelft.nl>: > >> Anyone some leads on this issue? Have been looking into the >> IntervalJoinOperator code, but

Re: Unexpected behavior from interval join in Flink

2019-06-21 Thread Wouter Zorgdrager
ter Op ma 17 jun. 2019 om 13:20 schreef Wouter Zorgdrager < w.d.zorgdra...@tudelft.nl>: > Hi all, > > I'm experiencing some unexpected behavior using an interval join in Flink. > I'm dealing with two data sets, lets call them X and Y. They are finite > (10k eleme

Unexpected behavior from interval join in Flink

2019-06-17 Thread Wouter Zorgdrager
Hi all, I'm experiencing some unexpected behavior using an interval join in Flink. I'm dealing with two data sets, lets call them X and Y. They are finite (10k elements) but I interpret them as a DataStream. The data needs to be joined for enrichment purposes. I use event time and I know (because

Re: Re: Flink not giving full reason as to why job submission failed

2019-05-20 Thread Wouter Zorgdrager
rrors on the dashboard? Because many > application teams deploy jobs through the dashboard and don’t have ready > access to the logs. > > > > Thanks, > > Harshith > > > > *From: *Wouter Zorgdrager > *Date: *Thursday, 16 May 2019 at 7:56 PM > *To: *Harshith Kuma

Re: Flink not giving full reason as to why job submission failed

2019-05-16 Thread Wouter Zorgdrager
Hi Harshith, This was indeed an issue in 1.7.2, but fixed in 1.8.0. See the corresponding Jira issue [1]. Cheers, Wouter [1]: https://issues.apache.org/jira/browse/FLINK-11902 Op do 16 mei 2019 om 16:05 schreef Kumar Bolar, Harshith : > Hi all, > > > > After upgrading Flink to 1.7.2, when I tr

Re: Flink and Prometheus setup in K8s

2019-05-15 Thread Wouter Zorgdrager
t this might be useful for other users! Cheers, Wouter [1]: https://github.com/wzorgdrager/flink-k8s/blob/master/docker/Dockerfile [2]: https://hub.docker.com/r/wzorgdrager/flink-prometheus [3]: https://github.com/wzorgdrager/flink-k8s Op ma 13 mei 2019 om 14:16 schreef Wouter Zorgdrager < w.d.zor

Flink and Prometheus setup in K8s

2019-05-13 Thread Wouter Zorgdrager
Hey all, I'm working on a deployment setup with Flink and Prometheus on Kubernetes. I'm running into the following issues: 1) Is it possible to use the default Flink Docker image [1] and enable the Prometheus reporter? Modifying the flink-config.yaml is easy, but somehow the Prometheus reporter j

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Wouter Zorgdrager
d use managed state (keyed or operator) for such use cases. >>> >>> Best, >>> Fabian >>> >>> [1] >>> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AggregateFunction.java >

Re: Preserve accumulators after failure in DataStream API

2019-05-02 Thread Wouter Zorgdrager
apply the AggregateFunction (window, windowAll, ...)? > > Thanks, > Fabian > > [1] > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/functions/AggregateFunction.java > > Am Di., 30. Apr. 2019 um 13:19 Uhr schrieb Wouter Zorgdrager < &g

Preserve accumulators after failure in DataStream API

2019-04-30 Thread Wouter Zorgdrager
Hi all, In the documentation I read about UDF accumulators [1] "Accumulators are automatically backup-ed by Flink’s checkpointing mechanism and restored in case of a failure to ensure exactly-once semantics." So I assumed this also was the case of accumulators used in the DataStream API, but I not

Re: Read mongo datasource in Flink

2019-04-29 Thread Wouter Zorgdrager
19 om 14:37 schreef Flavio Pompermaier : > But what about parallelism with this implementation? From what I see > there's only a single thread querying Mongo and fetching all the data..am I > wrong? > > On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager < > w.d.zorgdra..

Re: Read mongo datasource in Flink

2019-04-29 Thread Wouter Zorgdrager
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. Cheers, Wouter [1]: https://github.com/codefee

Re: Case class field limit

2019-03-22 Thread Wouter Zorgdrager
Done! https://issues.apache.org/jira/browse/FLINK-11996 Op vr 22 mrt. 2019 om 11:52 schreef Chesnay Schepler : > It is likely that the documentation is outdated. Could open a JIRA for > updating the documentation? > > On 22/03/2019 10:12, Wouter Zorgdrager wrote: > > Hey all, &

Case class field limit

2019-03-22 Thread Wouter Zorgdrager
Hey all, Since Scala 2.11 the amount of fields in a case class isn't restricted to 22 anymore [1]. I was wondering if Flink still uses this limit internally, if I check the documentation [2] I also see a max of 22 fields. However, I just ran a simple test setup with a case class > 22 fields and th

Re: Challenges using Flink REST API

2019-03-13 Thread Wouter Zorgdrager
wo 13 mrt. 2019 om 12:18 schreef Chesnay Schepler : > Can you give me the stacktrace that is logged in the JobManager logs? > > > On 13.03.2019 10:57, Wouter Zorgdrager wrote: > > Hi Chesnay, > > Unfortunately this is not true when I run the Flink 1.7.2 docker i

Re: Challenges using Flink REST API

2019-03-13 Thread Wouter Zorgdrager
Hi Chesnay, Unfortunately this is not true when I run the Flink 1.7.2 docker images. The response is still: { "errors": [ "org.apache.flink.client.program.ProgramInvocationException: The main method caused an error." ] } Regards, Wouter Zorgdrager Op wo 1

Challenges using Flink REST API

2019-03-13 Thread Wouter Zorgdrager
uld require users to have the Flink jar locally (and also Scala somewhere, but I assume most have). - Let users provide a list of stage names for all their (interconnected) Flink jobs. This is not really an option, because the (main) idea behind this framework is to reduce the boilerplate and cumbersome of setting up complex stream processing architectures. Any help is appreciated. Thanks in advance! Kind regards, Wouter Zorgdrager

Re: Using Flink in an university course

2019-03-06 Thread Wouter Zorgdrager
I. I am not > sure what you could do as an automated analysis, but the StreamGraph API is > quite low level and exposes a lot of information about the program. > > Hopefully that is a little bit helpful. Good luck and sounds like a fun > course! > > > On Mon, Mar 4, 2019 at

Re: Using Flink in an university course

2019-03-04 Thread Wouter Zorgdrager
a similar big data >> technology) some years ago >> >> > Am 04.03.2019 um 11:32 schrieb Wouter Zorgdrager < >> w.d.zorgdra...@tudelft.nl>: >> > >> > Hi all, >> > >> > I'm working on a setup to use Apache Flink in an assignm

Using Flink in an university course

2019-03-04 Thread Wouter Zorgdrager
university course (or have seen this somewhere) as well as assessing Flink code. Thanks a lot! Kind regards, Wouter Zorgdrager

Avro serialization and deserialization to Kafka in Scala

2019-02-07 Thread Wouter Zorgdrager
hese macro-extensions are extremely slow for complex case classes (compile-time of 15 minutes for a few nested types). I'm looking for an approach without the use of these libraries and therefore curious how Flink handles this. Does anyone has some good leads for this? Thanks in a

Akka version conflict running on Flink cluster

2018-06-11 Thread Wouter Zorgdrager
Hi, I think I'm running into an Akka version conflict when running a Flink job on a cluster. The current situation: - Flink cluster on Flink 1.4.2 (using Docker) - Flink job which uses twitter4s [1] library and Akka version 2.5.8 In my Flink job I try to 'shutdown' an Akka actor from the twitter

Re: KafkaProducer with generic (Avro) serialization schema

2018-05-02 Thread Wouter Zorgdrager
serialisable. If indeed this is the problem, maybe a >> better place to ask this question is on Stack Overflow or some scala >> specific mailing list/board (unless someone else from the Flink's community >> can provide an answer to this problem)? >> >> Piotrek >

Re: KafkaProducer with generic (Avro) serialization schema

2018-05-01 Thread Wouter Zorgdrager
serializable, but of course I can't annotate them transient nor make it a lazy val which gives me the current issue. I hope someone has some leads for me. Thanks! Op do 26 apr. 2018 om 23:03 schreef Wouter Zorgdrager : > Hi Bill, > > Thanks for your answer. However this propos

Re: KafkaProducer with generic (Avro) serialization schema

2018-04-26 Thread Wouter Zorgdrager
ng able to serialize the KafkaProducer failing the whole job. Thanks, Wouter Op do 26 apr. 2018 om 00:42 schreef Nortman, Bill : > The things I would try would first in you are you class Person and Address > have getters and setters and a no argument constructor. > > > > *From:* Wouter

KafkaProducer with generic (Avro) serialization schema

2018-04-25 Thread Wouter Zorgdrager
Dear reader, I'm currently working on writing a KafkaProducer which is able to serialize a generic type using avro4s. However this serialization schema is not serializable itself. Here is my code for this: The serialization schema: class AvroSerializationSchema[IN : SchemaFor : FromRecord: ToReco

RichAsyncFunction in Scala

2018-01-31 Thread Wouter Zorgdrager
Hi, Currently there is no way of using the RichAsyncFunction in Scala, this means I can't get access to the RuntimeContext. I know someone is working on this: https://issues.apache.org/jira/browse/FLINK-6756 , however in the meantime is there a workaround for this? I'm particularly interested in g