Re: Increase Portable SDK Harness share of memory?

Kenneth Knowles Mon, 01 Apr 2019 13:33:26 -0700

On Mon, Apr 1, 2019 at 8:59 AM Lukasz Cwik <lc...@google.com> wrote:

> To clarify, docker isn't the only environment type we are using. We have a
> process based and "existing" environment mode that don't fit the current
> protobuf and is being worked around.
>


Ah, understood.


> The idea would be to move to a URN + payload model like our PTransforms
> and coders with a docker specific one. Using the URN + payload would allow
> us to have a versioned way to update the environment specifications and
> deprecate/remove things that are ill defined.
>

Makes sense to me. It looks like this migration path is already in place in
`message Environment` in beam_runner_api.proto, with `message
StandardEnvironments` enumerating some URNs and corresponding payload
messages just below. So is the gap just getting the two portable runners to
look at the new fields?

Kenn


> On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles <k...@apache.org> wrote:
>
>>
>>
>> On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> The intention is that these kinds of hints such as CPU and/or memory
>>> should be embedded in the environment specification that is associated with
>>> the transforms that need resource hints.
>>>
>>> The environment spec is woefully ill prepared as it only has a docker
>>> URL right now.
>>>
>>
>> FWIW I think this is actually "extremely well prepared" :-)
>>
>> Protobuf is great for adding fields when you need more but removing is
>> nearly impossible once deployed, so it is best to do the absolute minimum
>> until you need to expand.
>>
>> Kenn
>>
>>
>>>
>>> On Thu, Mar 28, 2019 at 8:45 AM Robert Burke <rob...@frantil.com> wrote:
>>>
>>>> A question came over the beam-go slack that I wasn't able to answer, in
>>>> particular for Dataflow*, is there a way to increase how much of a Portable
>>>> FnAPI worker is dedicated for the SDK side, vs the Runner side?
>>>>
>>>> My assumption is that runners should manage it, and have the Runner
>>>> Harness side be as lightweight as possible, to operate under reasonable
>>>> memory bounds, allowing the user-code more room to spread, since it's
>>>> largely unknown.
>>>>
>>>> I saw there's the Provisioning API
>>>> <https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/beam_provision_api.proto#L52>
>>>> which to communicates resource limits to the SDK side, but is there a way
>>>> to make the request (probably on job start up) in the other direction?
>>>>
>>>> I imagine it has to do with the container boot code, but I have only
>>>> vague knowledge of how that works at present.
>>>>
>>>> If there's a portable way for it, that's ideal, but I suspect this will
>>>> be require a Dataflow specific answer.
>>>>
>>>> Thanks!
>>>> Robert B
>>>>
>>>> *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.
>>>>
>>>

Re: Increase Portable SDK Harness share of memory?

Reply via email to