Re: Beam Summit community feedback

Thomas Weise Mon, 08 Oct 2018 11:19:10 -0700

Related thread:

https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E


Kubernetes is otherwise more of a runner deployment concern. There are
efforts in the Flink community underway to make deployment on Kubernetes
easier.

Max: thanks for taking notes!


On Mon, Oct 8, 2018 at 10:43 AM Henning Rohde <hero...@google.com> wrote:

> Regarding the Kubernetes/Docker story: the current idea for that setup is
> to use a per-job pod for the user/sdk containers + runner container, so
> that running (and scaling) a job will go with the grain of that ecosystem.
> The Beam code on each worker thus wouldn't do any container management.
> This is also how Dataflow essentially works. The process-based option
> assumes that the runner environment is what the SDK needs, which is
> generally not the case.
>
> Henning
>
> On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <a...@vanboxel.be> wrote:
>
>> Hey Max, I've build quit some experience with *Kubernetes* over the
>> years. The problem you describe seems like a custom operator story. The
>> thing is I don't know enough of the runner and bootstrapping story. After
>> the summit I'm quite eager to dive into a beam problem, so if you like to
>> collaborate on that topic let me know.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <m...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> What do you think about collecting some of the feedback from the
>>> community at Beam Summit last week? Here's what I've come across:
>>>
>>>
>>> * The Kubernetes / Docker Story
>>>
>>> Multiple users reported that they would like a Beam-Kubernetes story.
>>> What is the best way to deploy Beam with Kubernetes? Will there be
>>> built-in support?
>>>
>>> Especially with regards to the portability, there are some unsolved
>>> problems, e.g. how to start Beam containerized and bootstrap the SDK
>>> Harness container from within a container? For local testing with the
>>> JobServer we support that via mounting the Docker socket, but this will
>>> be too fragile in production scenarios. Now that we have process-based
>>> execution, we could just use that inside the main container.
>>>
>>> Deployment is a very important topic for users and we should try to
>>> reduce complexity as much as possible.
>>>
>>> * External SDKs / Scio
>>>
>>> Users have asked why Scio is not part of the main repository. Generally,
>>> I don't think that has to be the case, same for the Runners which are
>>> not part of the main repo. However, it does raise the question, what
>>> will be the future model for maintaining SDKs/IOs/Runners? How do we
>>> ensure easy development and a consistent quality of internal/external
>>> components?
>>>
>>> * Documenting Timers & State
>>>
>>> These two have excellent blog posts but are not part of the official
>>> documentation. Since they are part of the model, it would be good to
>>> eventually update the docs.
>>>
>>> * Better Debuggability of pipelines
>>>
>>> Even a simple WordCount in Beam leads to a quite complex Flink execution
>>> graph (due to the the involved I/O logic). How can we make pipelines
>>> easier to understand? Will we provide a way to visualize the
>>> architecture of high-level Beam pipelines? If so, do we provide a way to
>>> gain insight into how it is mapped to the Runner execution model? Users
>>> would like to have more insight.
>>>
>>> * Current Roadmap
>>>
>>> This was asked in the context of portability. By the end of the year we
>>> should have at least the FlinkRunner in a ready state, with the rest
>>> following up. There are a lot of others threads in Beam. The newsletter
>>> is a great way to keep up with the project development.
>>>
>>>
>>> Looking forward to any other points you might have.
>>>
>>> Best,
>>> Max
>>>
>>

Re: Beam Summit community feedback

Reply via email to