Re: [DISCUSS] Integration of Volcano Engine TOS in Hadoop.

Steve Loughran Thu, 13 Feb 2025 10:22:58 -0800

Sounds good, though expect no commitment from me to review anything.
My main concerns are about dependency libraries (what are they?) and
testing.


On Tue, 11 Feb 2025 at 05:10, Xiaoqiao He <[email protected]> wrote:

> Thanks Jinglun for your work. Basically +1 from me to involve it into the
> Hadoop codebase.
> a. After a quick review of JIRA and PR, I think it is solid including
> document and code style.
> b. Contributors involved here are diverse who are from different projects
> and companies, and active enough.
> c. Community with Jinlun offline many times, and IMO he could be
> responsible to review and test about this module.
> Beside that, just suggest following the Hadoop guidelines[1] to develop
> the new features.
>
> @Steve Loughran <[email protected]> @Shilun Fan <[email protected]> 
> leave
> some comments including some concerns in JIRA, would you mind giving more
> suggestions for this discussion?
> Thanks.
>
> Best Regards,
> - He Xiaoqiao
>
> [1] https://hadoop.apache.org/bylaws.html
>
>
> On Sun, Jan 26, 2025 at 3:39 PM jinglun <[email protected]> wrote:
>
>> Hello everyone, I'd like to discuss the integration of volcano engine tos
>> in hadoop.
>>
>>
>> Volcano Engine is a fast growing cloud vendor launched by ByteDance, and
>> TOS is the object storage service of Volcano Engine. A common way is to
>> store data into TOS and run Hadoop/Spark/Flink applications to access TOS.
>> But there is no original support for TOS in hadoop, thus it is not easy for
>> users to build their Big Data System based on TOS.
>> &nbsp;
>> My proposal is to integrate TOS with Hadoop to help users run their
>> applications on TOS. Users only need to do some simple configuration, then
>> their applications can read/write TOS without any code change. This work is
>> similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object
>> Storage in Hadoop.
>>
>>
>> More details could be found at&nbsp;
>> https://issues.apache.org/jira/browse/HADOOP-19236.
>>
>>
>> 1. What is the progress of the work now?
>> The work is currently finished at branch HADOOP_19236. It is developed by
>> the EMR team of Volcano Engine and served many users from both cloud and
>> IDC for more than 2 years.
>>
>>
>> 2. How is the&nbsp;long-term maintenance and testing guaranteed?&nbsp;
>> The contributors are opensource friendly,&nbsp;including&nbsp;ZhengHu(PMC
>> of HBase and Iceberg), Jinglun(Committer of Hadoop),&nbsp;SunXin(Committer
>> of HBase),&nbsp;XianyinXin(Contributor of Spark), Rascal Wu(Contributor of
>> Flink), FangBo(Contributor of Hive) and Yuanzhihuan.&nbsp;We will all be
>> involved in the long-term maintenance of this work.&nbsp;As time goes by,
>> more people from the EMR team and the hadoop-tos users may join this work.
>> So I'm confident at the long-term maintenance and testing.
>>
>>
>> 3. Why should hadoop-tos interaged to hadoop codebase? Shall we use an
>> independent project?
>> Integration is for a better user experience. First, users don't need to
>> go to another repo to find the tos support. Second, users don't need to
>> worry about the versions mapping between hadoop and hadoop-tos. Finally, a
>> connector provided by hadoop community is&nbsp;more reliable and
>> trustworthy.&nbsp;
>>
>>
>>
>>
>>
>>
>>
>>
>> If you have any question, concern or any thing else that is unclear,
>> please let me know.&nbsp;Sincerely looking forward to your reply, thanks
>> very much.
>
>

Re: [DISCUSS] Integration of Volcano Engine TOS in Hadoop.

Reply via email to