Hi,
We are using the beam to read the stream from Kafka and using spark as the
runner, and I found our spark graph looks very strange.
For each batch stream, it will generate 3 stages, one of them of our actual
work, that I can understand.
Another two is kind of duplicated, you can take a look a
It would be helpful if the names of the transforms were visible on the
graph otherwise it is really hard understanding what each stage and step do.
On Mon, Jun 22, 2020 at 3:53 AM 吴亭 wrote:
> Hi,
>
> We are using the beam to read the stream from Kafka and using spark as the
> runner, and I found
Who reads map 1?
Can it be stale?
It is unclear what you are trying to do in parallel and why you wouldn't
stick all this logic into a single DoFn / stateful DoFn.
On Sat, Jun 20, 2020 at 7:14 PM Praveen K Viswanathan <
harish.prav...@gmail.com> wrote:
> Hello Everyone,
>
> I am in the process o
I am trying to run the wordcount quickstart example on a flink cluster on AWS
EMR. Beam version 2.22, Flink 1.10.
I get the following error:
ERROR:root:java.util.ServiceConfigurationError:
com.fasterxml.jackson.databind.Module: Provider
com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule no
I think you don’t need to enable EOS in this case since KafkaIO has a dedicated
EOS-sink implementation for Beam part (btw, it’s not supported by all runners)
and it relies on setting “enable.idempotence=true” for KafkaProducer.
I’m not sure that you can achieve “at least once” semantics with cur
Hi Luke,
We can say Map 2 as a kind of a template using which you want to enrich
data in Map 1. As I mentioned in my previous post, this is a high level
scenario.
All these logic are spread across several classes (with ~4K lines of code
in total). As in any Java application,
1. The code has been
Another way to put this question is, how do we write a beam pipeline for
an existing pipeline (in Java) that has a dozen of custom objects and you
have to work with multiple HashMaps of those custom objects in order to
transform it. Currently, I am writing a beam pipeline by using the same
Custom o
Hi Alexey,
Thanks for the response, below are some of my follow up question:
the KafkaIO EOS-sink you referred is: KafkaExactlyOnceSink? If so, the way
I understand to use is KafkaIO.write().withEOS(numPartitionsForSinkTopic,
"mySlinkGroupId"), reading from your response, do I need additionally
c
Hi again, just replying here in case this could be useful for someone
as using Flink checkpoints on Beam is not realiable at all right
now...
Even I removed class references to the serialized object in AvroCoder,
finally I couldn't make AvroCoder work as it is inferring schema using
ReflectData cla
Yes, I agree that serializing coders into the checkpoint creates problems.
I'm wondering whether it is possible to serialize the coder URN + args
instead.
On Mon, Jun 22, 2020 at 11:00 PM Ivan San Jose
wrote:
> Hi again, just replying here in case this could be useful for someone
> as using Flin
I don't really know, my knowledge about Beam source code is not so
deep, but I managed to modify AvroCoder in order to store a string
containing class name (and class parameters in case it was a
parametrized class) instead of references to Class. But, as I said,
AvroCoder is using AVRO's ReflectDat
11 matches
Mail list logo