Thanks everyone for the votes!
I’ll summarize the voting result in a separate email.

Best,
Wei

> 在 2019年10月28日,11:38,jincheng sun <sunjincheng...@gmail.com> 写道:
> 
> Hi Max,
> 
> Thanks for your feedback. You are right, we really need a more generic
> solution,  I volunteer to draft an init solution design doc, and bring up
> the discussion in Beam @dev ASAP. (Maybe after release of Flink 1.10).
> 
> Thank you for the voting.
> 
> Best,
> Jincheng
> 
> Maximilian Michels <m...@apache.org> 于2019年10月26日周六 上午1:05写道:
> 
>> Hi Wei, hi Jincheng,
>> 
>> +1 on the current approach.
>> 
>> I agree it would be nice to allow for the Beam artifact staging to use
>> Flink's BlobServer. However, the current implementation which uses the
>> distributed file system is more generic, since the BlobServer is only
>> available on the TaskManager and not necessarily inside Harness
>> containers (Stage 3).
>> 
>> So for stage 1 (Client <=> JobServer) we could certainly use the
>> BlobServer. Stage 2 (Flink job submission) already uses it, and Stage 3
>> (container setup) probably has to have some form of distributed file
>> system or directory which has been populated with the dependencies.
>> 
>> Thanks,
>> Max
>> 
>> On 25.10.19 03:45, Wei Zhong wrote:
>>> Hi Max,
>>> 
>>> Is there any other concerns from your side? I appreciate if you can give
>> some feedback and vote on this.
>>> 
>>> Best,
>>> Wei
>>> 
>>>> 在 2019年10月25日,09:33,jincheng sun <sunjincheng...@gmail.com> 写道:
>>>> 
>>>> Hi Thomas,
>>>> 
>>>> Thanks for your explanation. I understand your original intention. I
>> will
>>>> seriously consider this issue. After I have the initial solution, I will
>>>> bring up a further discussion in Beam ML.
>>>> 
>>>> Thanks for your voting. :)
>>>> 
>>>> Best,
>>>> Jincheng
>>>> 
>>>> 
>>>> Thomas Weise <t...@apache.org> 于2019年10月25日周五 上午7:32写道:
>>>> 
>>>>> Hi Jincheng,
>>>>> 
>>>>> Yes, this topic can be further discussed on the Beam ML. The only
>> reason I
>>>>> brought it up here is that it would be desirable from Beam Flink runner
>>>>> perspective for the artifact staging mechanism that you work on to be
>>>>> reusable.
>>>>> 
>>>>> Stage 1 in Beam is also up to the runner, artifact staging is a service
>>>>> discovered from the job server and that the Flink job server currently
>> uses
>>>>> DFS is not set in stone. My interest was more regarding assumptions
>>>>> regarding the artifact structure, which may or may not allow for
>> reusable
>>>>> implementation.
>>>>> 
>>>>> +1 for the proposal otherwise
>>>>> 
>>>>> Thomas
>>>>> 
>>>>> 
>>>>> On Mon, Oct 21, 2019 at 8:40 PM jincheng sun <sunjincheng...@gmail.com
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi Thomas,
>>>>>> 
>>>>>> Thanks for sharing your thoughts. I think improve and solve the
>>>>> limitations
>>>>>> of the Beam artifact staging is good topic(For beam).
>>>>>> 
>>>>>> As I understand it as follows:
>>>>>> 
>>>>>> For Beam(data):
>>>>>>    Stage1: BeamClient ------> JobService (data will be upload to
>> DFS).
>>>>>>    Stage2: JobService(FlinkClient) ------>  FlinkJob (operator
>> download
>>>>>> the data from DFS)
>>>>>>    Stage3: Operator ------> Harness(artifact staging service)
>>>>>> 
>>>>>> For Flink(data):
>>>>>>    Stage1: FlinkClient(data(local) upload to BlobServer using
>> distribute
>>>>>> cache) ------> Operator (data will be download from BlobServer). Do
>> not
>>>>>> have to depend on DFS.
>>>>>>    Stage2: Operator ------> Harness(for docker we using artifact
>> staging
>>>>>> service)
>>>>>> 
>>>>>> So, I think Beam have to depend on DFS in Stage1. and Stage2 can using
>>>>>> distribute cache if we remove the dependency of DFS for Beam in
>>>>> Stage1.(Of
>>>>>> course we need more detail here),  we can bring up the discussion in a
>>>>>> separate Beam dev@ ML, the current discussion focuses on Flink 1.10
>>>>>> version
>>>>>> of  UDF Environment and Dependency Management for python, so I
>> recommend
>>>>>> voting in the current ML for Flink 1.10, Beam artifact staging
>>>>> improvements
>>>>>> are discussed in a separate Beam dev@.
>>>>>> 
>>>>>> What do you think?
>>>>>> 
>>>>>> Best,
>>>>>> Jincheng
>>>>>> 
>>>>>> Thomas Weise <t...@apache.org> 于2019年10月21日周一 下午10:25写道:
>>>>>> 
>>>>>>> Beam artifact staging currently relies on shared file system and
>> there
>>>>>> are
>>>>>>> limitations, for example when running locally with Docker and local
>> FS.
>>>>>> It
>>>>>>> sounds like a distributed cache based implementation might be a good
>>>>>>> (better?) option for artifact staging even for the Beam Flink runner?
>>>>>>> 
>>>>>>> If so, can the implementation you propose be compatible with the Beam
>>>>>>> artifact staging service so that it can be plugged into the Beam
>> Flink
>>>>>>> runner?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Thomas
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Oct 21, 2019 at 2:34 AM jincheng sun <
>> sunjincheng...@gmail.com
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Max,
>>>>>>>> 
>>>>>>>> Sorry for the late reply. Regarding the issue you mentioned above,
>>>>> I'm
>>>>>>> glad
>>>>>>>> to share my thoughts:
>>>>>>>> 
>>>>>>>>> For process-based execution we use Flink's cache distribution
>>>>> instead
>>>>>>> of
>>>>>>>> Beam's artifact staging.
>>>>>>>> 
>>>>>>>> In current design, we use Flink's cache distribution to upload
>> users'
>>>>>>> files
>>>>>>>> from client to cluster in both docker mode and process mode. That
>> is,
>>>>>>>> Flink's cache distribution and Beam's artifact staging service work
>>>>>>>> together in docker mode.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Do we want to implement two different ways of staging artifacts? It
>>>>>>> seems
>>>>>>>> sensible to use the same artifact staging functionality also for the
>>>>>>>> process-based execution.
>>>>>>>> 
>>>>>>>> I agree that the implementation will be simple if we use the same
>>>>>>> artifact
>>>>>>>> staging functionality also for the process-based execution. However,
>>>>>> it's
>>>>>>>> not the best for performance as it will introduce an additional
>>>>> network
>>>>>>>> transmission, as in process mode TaskManager and python worker share
>>>>>> the
>>>>>>>> same environment, in which case the user files in Flink Distribute
>>>>>> Cache
>>>>>>>> can be accessed by python worker directly. We do not need the
>> staging
>>>>>>>> service in this case.
>>>>>>>> 
>>>>>>>>> Apart from being simpler, this would also allow the process-based
>>>>>>>> execution to run in other environments than the Flink TaskManager
>>>>>>>> environment.
>>>>>>>> 
>>>>>>>> IMHO, this case is more like docker mode, and we can share or reuse
>>>>> the
>>>>>>>> code of Beam docker mode. Furthermore, in this case python worker is
>>>>>>>> launched by the operator, so it is always in the same environment as
>>>>>> the
>>>>>>>> operator.
>>>>>>>> 
>>>>>>>> Thanks again for your feedback, and it is valuable for find out the
>>>>>> final
>>>>>>>> best architecture.
>>>>>>>> 
>>>>>>>> Feel free to correct me if there is anything incorrect.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jincheng
>>>>>>>> 
>>>>>>>> Maximilian Michels <m...@apache.org> 于2019年10月16日周三 下午4:23写道:
>>>>>>>> 
>>>>>>>>> I'm also late to the party here :) When I saw the first draft, I
>>>>> was
>>>>>>>>> thinking how exactly the design doc would tie in with Beam. Thanks
>>>>>> for
>>>>>>>>> the update.
>>>>>>>>> 
>>>>>>>>> A couple of comments with this regard:
>>>>>>>>> 
>>>>>>>>>> Flink has provided a distributed cache mechanism and allows users
>>>>>> to
>>>>>>>>> upload their files using "registerCachedFile" method in
>>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment. The python files
>>>>>> users
>>>>>>>>> specified through "add_python_file", "set_python_requirements" and
>>>>>>>>> "add_python_archive" are also uploaded through this method
>>>>>> eventually.
>>>>>>>>> 
>>>>>>>>> For process-based execution we use Flink's cache distribution
>>>>> instead
>>>>>>> of
>>>>>>>>> Beam's artifact staging.
>>>>>>>>> 
>>>>>>>>>> Apache Beam Portability Framework already supports artifact
>>>>> staging
>>>>>>>> that
>>>>>>>>> works out of the box with the Docker environment. We can use the
>>>>>>> artifact
>>>>>>>>> staging service defined in Apache Beam to transfer the dependencies
>>>>>>> from
>>>>>>>>> the operator to Python SDK harness running in the docker container.
>>>>>>>>> 
>>>>>>>>> Do we want to implement two different ways of staging artifacts? It
>>>>>>>>> seems sensible to use the same artifact staging functionality also
>>>>>> for
>>>>>>>>> the process-based execution. Apart from being simpler, this would
>>>>>> also
>>>>>>>>> allow the process-based execution to run in other environments than
>>>>>> the
>>>>>>>>> Flink TaskManager environment.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Max
>>>>>>>>> 
>>>>>>>>> On 15.10.19 11:13, Wei Zhong wrote:
>>>>>>>>>> Hi Thomas,
>>>>>>>>>> 
>>>>>>>>>> Thanks a lot for your suggestion!
>>>>>>>>>> 
>>>>>>>>>> As you can see from the section "Goals" that this FLIP focuses on
>>>>>> the
>>>>>>>>> dependency management in process mode. However, the APIs and design
>>>>>>>>> proposed in this FLIP also applies for the docker mode. So it makes
>>>>>>> sense
>>>>>>>>> to me to also describe how this design is integated to the artifact
>>>>>>>> staging
>>>>>>>>> service of Apache Beam in docker mode. I have updated the design
>>>>> doc
>>>>>>> and
>>>>>>>>> looking forward to your feedback.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Wei
>>>>>>>>>> 
>>>>>>>>>>> 在 2019年10月15日,01:54,Thomas Weise <t...@apache.org> 写道:
>>>>>>>>>>> 
>>>>>>>>>>> Sorry for joining the discussion late.
>>>>>>>>>>> 
>>>>>>>>>>> The Beam environment already supports artifact staging, it works
>>>>>> out
>>>>>>>> of
>>>>>>>>> the
>>>>>>>>>>> box with the Docker environment. I think it would be helpful to
>>>>>>>> explain
>>>>>>>>> in
>>>>>>>>>>> the FLIP how this proposal relates to what Beam offers / how it
>>>>>>> would
>>>>>>>> be
>>>>>>>>>>> integrated.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Thomas
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Oct 14, 2019 at 8:09 AM Jeff Zhang <zjf...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> +1
>>>>>>>>>>>> 
>>>>>>>>>>>> Hequn Cheng <chenghe...@gmail.com> 于2019年10月14日周一 下午10:55写道:
>>>>>>>>>>>> 
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Good job, Wei!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best, Hequn
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Oct 14, 2019 at 2:54 PM Dian Fu <
>>>>> dian0511...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Wei,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1 (non-binding). Thanks for driving this.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Dian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 在 2019年10月14日,下午1:40,jincheng sun <sunjincheng...@gmail.com
>>>>>> 
>>>>>>> 写道:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Wei Zhong <weizhong0...@gmail.com> 于2019年10月12日周六 下午8:41写道:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I would like to start the vote for FLIP-78[1] which is
>>>>>>> discussed
>>>>>>>>> and
>>>>>>>>>>>>>>>> reached consensus in the discussion thread[2].
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The vote will be open for at least 72 hours. I'll try to
>>>>>> close
>>>>>>> it
>>>>>>>>> by
>>>>>>>>>>>>>>>> 2019-10-16 18:00 UTC, unless there is an objection or not
>>>>>>> enough
>>>>>>>>>>>>> votes.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Wei
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html
>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards
>>>>>>>>>>>> 
>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 

Reply via email to