Dear readers,
I'm running into some unexpected behaviour in PyFlink when switching
execution mode from process to thread. In thread mode, my `Row` gets
converted to a tuple whenever I use a UDF in a map operation. By this
conversion to tuples, we lose critical information such as column names.
Bel
Hi all,
I'm trying to deploy a FlinkKafkaProducer in PyFlink on a remote cluster.
Unfortunately, I'm getting the following exception:
Caused by: org.apache.flink.kafka.shaded.org.apache.kafka.common.
KafkaException: Failed to construct kafka producer
at org.apache.flink.kafka.shaded.org.apach
flow-evaluation/pyflink_runtime.py \
--jarfile ~/Documents/stateflow-evaluation/benchmark/bin/combined.jar
I hope someone can help me with this because it is a blocker for me.
Thanks in advance,
Wouter
-- Forwarded message -----
From: Wouter Zorgdrager
Date: Thu, 8 Jul 2021 at 12:20
Subje
memory runner when running it
>>> locally in Beam. In that case, the code path is totally differently
>>> compared to running in a remote cluster.
>>> 2) Regarding to `flink run`, I’m surprising that it’s running locally.
>>> Could you submit a java job with similar commands to see how it runs?
>>> 3) Regarding to `flink run-application`, could you share the exception
>>> stack?
>>>
>>> Regards,
>>> Dian
>>>
>>> 2021年7月6日 下午4:58,Wouter Zorgdrager 写道:
>>>
>>> uses
>>>
>>>
>>>
> 3) Regarding to `flink run-application`, could you share the exception
> stack?
>
> Regards,
> Dian
>
> 2021年7月6日 下午4:58,Wouter Zorgdrager 写道:
>
> uses
>
>
>
Dear community,
I have been struggling a lot with the deployment of my PyFlink job.
Moreover, the performance seems to be very disappointing especially the
low-throughput latency. I have been playing around with configuration
values, but it has not been improving.
In short, I have a Datastream job
TypeInformation.of(byte[].class);
}
}
This code is packaged in a jar and uploaded through env.add_jars. That
works like a charm!
Thanks for the help!
Wouter
On Fri, 4 Jun 2021 at 14:40, Wouter Zorgdrager
wrote:
> Hi Dian, all,
>
> Thanks for your suggestion. Unfortunately, it does no
j_type_serializer=
>
> j_type_info.createSerializer(gate_way.jvm.org.apache.flink.api.common.ExecutionConfig())
>
> j_byte_string_schema =
> gate_way.jvm.org.apache.flink.api.common.serialization.TypeInformationSerializationSchema(j_type_info,
> j_type_serializer)
>
> ```
>
Hi all,
I have a PyFlink job connected to a KafkaConsumer and Producer. I want to
directly work with the bytes from and to Kafka because I want to
serialize/deserialize in my Python code rather than the JVM environment.
Therefore, I can't use the SimpleStringSchema for (de)serialization (the
messa
,Types.STRING())
>
> ```
>
> The reason is that `union` will turns `KeyedStream` into `DataStream` and
> you could not perform stateful operations on `DataStream` any more.
>
> Regards,
> Dian
>
> 2021年5月21日 上午12:38,Wouter Zorgdrager 写道:
>
> Dear all,
>
Dear all,
I'm having trouble unifying two data streams using the union operator in
PyFlink. My code basically looks like this:
init_stream = (operator_stream
.filter(lambda r: r[0] is None)
.map(init_operator,Types.TUPLE([Types.STRING(),Types.PICKLED_BYTE_ARRAY()]))
.key_by(lambda x: x[0
Dear Flink community,
First of all, I'm very excited about the new 1.13 release. Among
other features, I'm particularly excited about the support of stateful
operations in Python. I think it will make the wonders of stream processing
and the power of Flink accessible to more developers.
I'm curre
u.be/NF0hXZfUyqE
> [2] https://www.youtube.com/watch?v=tuSylBadNSo
>
> It seems like you are on the correct path!
> Good luck!
> Igal.
>
>
> On Tue, May 12, 2020 at 11:18 PM Wouter Zorgdrager
> wrote:
>
>> Hi Igal, all,
>>
>> In the meantime we f
7 mei 2020 om 17:17 schreef Wouter Zorgdrager :
> Hi Igal,
>
> Thanks for your quick reply. Getting back to point 2, I was wondering if
> you could trigger indeed a stateful function directly from Flask and also
> get the reply there instead of using Kafka in between. We want to
>
higher
> request rate/latency increases by spinning new instances (something that is
> not yet supported with the embedded API)
>
> Good luck,
> Igal.
>
>
>
>
>
> [1] https://github.com/docker-library/official-images/pull/7749
>
>
> On Wednesday, May 6
Hi all,
I've been using Flink for quite some time now and for a university project
I'm planning to experiment with statefun. During the walkthrough I've run
into some issues, I hope you can help me with.
1) Is it correct that the Docker image of statefun is not yet published? I
couldn't find it a
uld show if there are any issue with late data.
>
> Best, Fabian
>
> Am Fr., 21. Juni 2019 um 15:32 Uhr schrieb Wouter Zorgdrager <
> w.d.zorgdra...@tudelft.nl>:
>
>> Anyone some leads on this issue? Have been looking into the
>> IntervalJoinOperator code, but
ter
Op ma 17 jun. 2019 om 13:20 schreef Wouter Zorgdrager <
w.d.zorgdra...@tudelft.nl>:
> Hi all,
>
> I'm experiencing some unexpected behavior using an interval join in Flink.
> I'm dealing with two data sets, lets call them X and Y. They are finite
> (10k eleme
Hi all,
I'm experiencing some unexpected behavior using an interval join in Flink.
I'm dealing with two data sets, lets call them X and Y. They are finite
(10k elements) but I interpret them as a DataStream. The data needs to be
joined for enrichment purposes. I use event time and I know (because
rrors on the dashboard? Because many
> application teams deploy jobs through the dashboard and don’t have ready
> access to the logs.
>
>
>
> Thanks,
>
> Harshith
>
>
>
> *From: *Wouter Zorgdrager
> *Date: *Thursday, 16 May 2019 at 7:56 PM
> *To: *Harshith Kuma
Hi Harshith,
This was indeed an issue in 1.7.2, but fixed in 1.8.0. See the
corresponding Jira issue [1].
Cheers,
Wouter
[1]: https://issues.apache.org/jira/browse/FLINK-11902
Op do 16 mei 2019 om 16:05 schreef Kumar Bolar, Harshith :
> Hi all,
>
>
>
> After upgrading Flink to 1.7.2, when I tr
t this might be useful for other users!
Cheers,
Wouter
[1]: https://github.com/wzorgdrager/flink-k8s/blob/master/docker/Dockerfile
[2]: https://hub.docker.com/r/wzorgdrager/flink-prometheus
[3]: https://github.com/wzorgdrager/flink-k8s
Op ma 13 mei 2019 om 14:16 schreef Wouter Zorgdrager <
w.d.zor
Hey all,
I'm working on a deployment setup with Flink and Prometheus on Kubernetes.
I'm running into the following issues:
1) Is it possible to use the default Flink Docker image [1] and enable the
Prometheus reporter? Modifying the flink-config.yaml is easy, but somehow
the Prometheus reporter j
d use managed state (keyed or operator) for such use cases.
>>>
>>> Best,
>>> Fabian
>>>
>>> [1]
>>> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/AggregateFunction.java
>
apply the AggregateFunction (window, windowAll, ...)?
>
> Thanks,
> Fabian
>
> [1]
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/functions/AggregateFunction.java
>
> Am Di., 30. Apr. 2019 um 13:19 Uhr schrieb Wouter Zorgdrager <
&g
Hi all,
In the documentation I read about UDF accumulators [1] "Accumulators are
automatically backup-ed by Flink’s checkpointing mechanism and restored in
case of a failure to ensure exactly-once semantics." So I assumed this also
was the case of accumulators used in the DataStream API, but I not
19 om 14:37 schreef Flavio Pompermaier :
> But what about parallelism with this implementation? From what I see
> there's only a single thread querying Mongo and fetching all the data..am I
> wrong?
>
> On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <
> w.d.zorgdra..
For a framework I'm working on, we actually implemented a (basic) Mongo
source [1]. It's written in Scala and uses Json4s [2] to parse the data
into a case class. It uses a Mongo observer to iterate over a collection
and emit it into a Flink context.
Cheers,
Wouter
[1]:
https://github.com/codefee
Done! https://issues.apache.org/jira/browse/FLINK-11996
Op vr 22 mrt. 2019 om 11:52 schreef Chesnay Schepler :
> It is likely that the documentation is outdated. Could open a JIRA for
> updating the documentation?
>
> On 22/03/2019 10:12, Wouter Zorgdrager wrote:
> > Hey all,
&
Hey all,
Since Scala 2.11 the amount of fields in a case class isn't restricted to
22 anymore [1]. I was wondering if Flink still uses this limit internally,
if I check the documentation [2] I also see a max of 22 fields. However, I
just ran a simple test setup with a case class > 22 fields and th
wo 13 mrt. 2019 om 12:18 schreef Chesnay Schepler :
> Can you give me the stacktrace that is logged in the JobManager logs?
>
>
> On 13.03.2019 10:57, Wouter Zorgdrager wrote:
>
> Hi Chesnay,
>
> Unfortunately this is not true when I run the Flink 1.7.2 docker i
Hi Chesnay,
Unfortunately this is not true when I run the Flink 1.7.2 docker images.
The response is still:
{
"errors": [
"org.apache.flink.client.program.ProgramInvocationException: The
main method caused an error."
]
}
Regards,
Wouter Zorgdrager
Op wo 1
uld require
users to have the Flink jar locally (and also Scala somewhere, but I assume
most have).
- Let users provide a list of stage names for all their (interconnected)
Flink jobs. This is not really an option, because the (main) idea behind
this framework is to reduce the boilerplate and cumbersome of setting up
complex stream processing architectures.
Any help is appreciated. Thanks in advance!
Kind regards,
Wouter Zorgdrager
I. I am not
> sure what you could do as an automated analysis, but the StreamGraph API is
> quite low level and exposes a lot of information about the program.
>
> Hopefully that is a little bit helpful. Good luck and sounds like a fun
> course!
>
>
> On Mon, Mar 4, 2019 at
a similar big data
>> technology) some years ago
>>
>> > Am 04.03.2019 um 11:32 schrieb Wouter Zorgdrager <
>> w.d.zorgdra...@tudelft.nl>:
>> >
>> > Hi all,
>> >
>> > I'm working on a setup to use Apache Flink in an assignm
university course (or have seen this somewhere) as well as assessing Flink
code.
Thanks a lot!
Kind regards,
Wouter Zorgdrager
hese macro-extensions are extremely slow for complex case classes
(compile-time of 15 minutes for a few nested types). I'm looking for an
approach without the use of these libraries and therefore curious how Flink
handles this.
Does anyone has some good leads for this?
Thanks in a
Hi,
I think I'm running into an Akka version conflict when running a Flink job
on a cluster.
The current situation:
- Flink cluster on Flink 1.4.2 (using Docker)
- Flink job which uses twitter4s [1] library and Akka version 2.5.8
In my Flink job I try to 'shutdown' an Akka actor from the twitter
serialisable. If indeed this is the problem, maybe a
>> better place to ask this question is on Stack Overflow or some scala
>> specific mailing list/board (unless someone else from the Flink's community
>> can provide an answer to this problem)?
>>
>> Piotrek
>
serializable, but of course I can't annotate them transient nor make it a
lazy val which gives me the current issue.
I hope someone has some leads for me.
Thanks!
Op do 26 apr. 2018 om 23:03 schreef Wouter Zorgdrager :
> Hi Bill,
>
> Thanks for your answer. However this propos
ng able to serialize the
KafkaProducer failing the whole job.
Thanks,
Wouter
Op do 26 apr. 2018 om 00:42 schreef Nortman, Bill :
> The things I would try would first in you are you class Person and Address
> have getters and setters and a no argument constructor.
>
>
>
> *From:* Wouter
Dear reader,
I'm currently working on writing a KafkaProducer which is able to serialize
a generic type using avro4s.
However this serialization schema is not serializable itself. Here is my
code for this:
The serialization schema:
class AvroSerializationSchema[IN : SchemaFor : FromRecord: ToReco
Hi,
Currently there is no way of using the RichAsyncFunction in Scala, this
means I can't get access to the RuntimeContext. I know someone is working
on this: https://issues.apache.org/jira/browse/FLINK-6756 , however in the
meantime is there a workaround for this? I'm particularly interested in
g
43 matches
Mail list logo