Related thread: https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E
Kubernetes is otherwise more of a runner deployment concern. There are efforts in the Flink community underway to make deployment on Kubernetes easier. Max: thanks for taking notes! On Mon, Oct 8, 2018 at 10:43 AM Henning Rohde <hero...@google.com> wrote: > Regarding the Kubernetes/Docker story: the current idea for that setup is > to use a per-job pod for the user/sdk containers + runner container, so > that running (and scaling) a job will go with the grain of that ecosystem. > The Beam code on each worker thus wouldn't do any container management. > This is also how Dataflow essentially works. The process-based option > assumes that the runner environment is what the SDK needs, which is > generally not the case. > > Henning > > On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <a...@vanboxel.be> wrote: > >> Hey Max, I've build quit some experience with *Kubernetes* over the >> years. The problem you describe seems like a custom operator story. The >> thing is I don't know enough of the runner and bootstrapping story. After >> the summit I'm quite eager to dive into a beam problem, so if you like to >> collaborate on that topic let me know. >> >> _/ >> _/ Alex Van Boxel >> >> >> On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <m...@apache.org> wrote: >> >>> Hi, >>> >>> What do you think about collecting some of the feedback from the >>> community at Beam Summit last week? Here's what I've come across: >>> >>> >>> * The Kubernetes / Docker Story >>> >>> Multiple users reported that they would like a Beam-Kubernetes story. >>> What is the best way to deploy Beam with Kubernetes? Will there be >>> built-in support? >>> >>> Especially with regards to the portability, there are some unsolved >>> problems, e.g. how to start Beam containerized and bootstrap the SDK >>> Harness container from within a container? For local testing with the >>> JobServer we support that via mounting the Docker socket, but this will >>> be too fragile in production scenarios. Now that we have process-based >>> execution, we could just use that inside the main container. >>> >>> Deployment is a very important topic for users and we should try to >>> reduce complexity as much as possible. >>> >>> * External SDKs / Scio >>> >>> Users have asked why Scio is not part of the main repository. Generally, >>> I don't think that has to be the case, same for the Runners which are >>> not part of the main repo. However, it does raise the question, what >>> will be the future model for maintaining SDKs/IOs/Runners? How do we >>> ensure easy development and a consistent quality of internal/external >>> components? >>> >>> * Documenting Timers & State >>> >>> These two have excellent blog posts but are not part of the official >>> documentation. Since they are part of the model, it would be good to >>> eventually update the docs. >>> >>> * Better Debuggability of pipelines >>> >>> Even a simple WordCount in Beam leads to a quite complex Flink execution >>> graph (due to the the involved I/O logic). How can we make pipelines >>> easier to understand? Will we provide a way to visualize the >>> architecture of high-level Beam pipelines? If so, do we provide a way to >>> gain insight into how it is mapped to the Runner execution model? Users >>> would like to have more insight. >>> >>> * Current Roadmap >>> >>> This was asked in the context of portability. By the end of the year we >>> should have at least the FlinkRunner in a ready state, with the rest >>> following up. There are a lot of others threads in Beam. The newsletter >>> is a great way to keep up with the project development. >>> >>> >>> Looking forward to any other points you might have. >>> >>> Best, >>> Max >>> >>