Re: Developing Beam applications using Flink checkpoints

2020-05-20 Thread Eleanore Jin
Hi Ivan, Beam coders are wrapped in Flink's TypeSerializers. So I don't think it will result in double serialization. Thanks! Eleanore On Tue, May 19, 2020 at 4:04 AM Ivan San Jose wrote: > Perfect, thank you so much Arvid, I'd expect more people using Beam on > top of Flink, but it seems is n

Re: Developing Beam applications using Flink checkpoints

2020-05-19 Thread Ivan San Jose
Perfect, thank you so much Arvid, I'd expect more people using Beam on top of Flink, but it seems is not so popular. On Tue, 2020-05-19 at 12:46 +0200, Arvid Heise wrote: > Hi Ivan, > > I'm fearing that only a few mailing list users have actually deeper > Beam experience. I only used it briefly 3

Re: Developing Beam applications using Flink checkpoints

2020-05-19 Thread Arvid Heise
Hi Ivan, I'm fearing that only a few mailing list users have actually deeper Beam experience. I only used it briefly 3 years ago. Most users here are using Flink directly to avoid these kinds of double-abstraction issues. It might be better to switch to the Beam mailing list if you have Beam-spec

Re: Developing Beam applications using Flink checkpoints

2020-05-19 Thread Ivan San Jose
Actually I'm also thinking about how Beam coders are related with runner's serialization... I mean, on Beam you specify a coder per each Java type in order to Beam be able to serialize/deserialize that type, but then, is the runner used under the hood serializing/deserializing again the result, so

Re: Developing Beam applications using Flink checkpoints

2020-05-19 Thread Ivan San Jose
Yep, sorry if I'm bothering you but I think I'm still not getting this, how could I tell Beam to tell Flink to use that serializer instead of Java standard one, because I think Beam is abstracting us from Flink checkpointing mechanism, so I'm afraid that if we use Flink API directly we might break

Re: Developing Beam applications using Flink checkpoints

2020-05-19 Thread Arvid Heise
Hi Ivan, The easiest way is to use some implementation that's already there [1]. I already mentioned Avro and would strongly recommend giving it a go. If you make sure to provide a default value for as many fields as possible, you can always remove them later giving you great flexibility. I can gi

Re: Developing Beam applications using Flink checkpoints

2020-05-19 Thread Ivan San Jose
Thanks for your complete answer Arvid, we will try to approach all things you mentioned, but take into account we are using Beam on top of Flink, so, to be honest, I don't know how could we implement the custom serialization thing ( https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/s

Re: Developing Beam applications using Flink checkpoints

2020-05-18 Thread Arvid Heise
Hi Ivan, First let's address the issue with idle partitions. The solution is to use a watermark assigner that also emits a watermark with some idle timeout [1]. Now the second question, on why Kafka commits are committed for in-flight, checkpointed data. The basic idea is that you are not losing

Developing Beam applications using Flink checkpoints

2020-05-15 Thread Ivan San Jose
Hi, we are starting to use Beam with Flink as runner on our applications, and recently we would like to get advantages that Flink checkpoiting provides, but it seems we are not understanding it clearly. Simplifying, our application does the following: - Read meesages from a couple of Kafka topic