Sounds good, though expect no commitment from me to review anything. My main concerns are about dependency libraries (what are they?) and testing.
On Tue, 11 Feb 2025 at 05:10, Xiaoqiao He <hexiaoq...@apache.org> wrote: > Thanks Jinglun for your work. Basically +1 from me to involve it into the > Hadoop codebase. > a. After a quick review of JIRA and PR, I think it is solid including > document and code style. > b. Contributors involved here are diverse who are from different projects > and companies, and active enough. > c. Community with Jinlun offline many times, and IMO he could be > responsible to review and test about this module. > Beside that, just suggest following the Hadoop guidelines[1] to develop > the new features. > > @Steve Loughran <ste...@cloudera.com> @Shilun Fan <slfan1...@foxmail.com> > leave > some comments including some concerns in JIRA, would you mind giving more > suggestions for this discussion? > Thanks. > > Best Regards, > - He Xiaoqiao > > [1] https://hadoop.apache.org/bylaws.html > > > On Sun, Jan 26, 2025 at 3:39 PM jinglun <jinglun...@qq.com.invalid> wrote: > >> Hello everyone, I'd like to discuss the integration of volcano engine tos >> in hadoop. >> >> >> Volcano Engine is a fast growing cloud vendor launched by ByteDance, and >> TOS is the object storage service of Volcano Engine. A common way is to >> store data into TOS and run Hadoop/Spark/Flink applications to access TOS. >> But there is no original support for TOS in hadoop, thus it is not easy for >> users to build their Big Data System based on TOS. >> >> My proposal is to integrate TOS with Hadoop to help users run their >> applications on TOS. Users only need to do some simple configuration, then >> their applications can read/write TOS without any code change. This work is >> similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object >> Storage in Hadoop. >> >> >> More details could be found at >> https://issues.apache.org/jira/browse/HADOOP-19236. >> >> >> 1. What is the progress of the work now? >> The work is currently finished at branch HADOOP_19236. It is developed by >> the EMR team of Volcano Engine and served many users from both cloud and >> IDC for more than 2 years. >> >> >> 2. How is the long-term maintenance and testing guaranteed? >> The contributors are opensource friendly, including ZhengHu(PMC >> of HBase and Iceberg), Jinglun(Committer of Hadoop), SunXin(Committer >> of HBase), XianyinXin(Contributor of Spark), Rascal Wu(Contributor of >> Flink), FangBo(Contributor of Hive) and Yuanzhihuan. We will all be >> involved in the long-term maintenance of this work. As time goes by, >> more people from the EMR team and the hadoop-tos users may join this work. >> So I'm confident at the long-term maintenance and testing. >> >> >> 3. Why should hadoop-tos interaged to hadoop codebase? Shall we use an >> independent project? >> Integration is for a better user experience. First, users don't need to >> go to another repo to find the tos support. Second, users don't need to >> worry about the versions mapping between hadoop and hadoop-tos. Finally, a >> connector provided by hadoop community is more reliable and >> trustworthy. >> >> >> >> >> >> >> >> >> If you have any question, concern or any thing else that is unclear, >> please let me know. Sincerely looking forward to your reply, thanks >> very much. > >