Thanks Shilun for driving this progress. +1 from my side, a. From the PR (https://github.com/apache/hadoop/pull/8347), the code has been ready now. b. Both of the contributors are PMC members or committers from mature community of apache. I would like to hear more sound from the dev team about the following plan. Good Luck!
Best Regards, - He Xiaoqiao On Fri, May 22, 2026 at 9:33 PM slfan1989 <[email protected]> wrote: > Hi Hadoop community, > > I would like to start a discussion about adding Baidu Cloud BOS > (Baidu Object Storage) as a native Hadoop-compatible filesystem connector. > > JIRA: https://issues.apache.org/jira/browse/HDFS-11161 > PR: https://github.com/apache/hadoop/pull/8347 > CI Status: +1 overall, all checks passed. > > I have had some offline discussions with LuciferYang and the contributors > working on this connector. Based on those discussions, I am helping bring > this proposal to the Hadoop community for broader review and feedback. > > The goal is to integrate BOS support as a native Hadoop filesystem module, > similar to the existing hadoop-aws (S3A), hadoop-aliyun, and hadoop-cos > connectors. > > 1. Background > > Baidu Cloud is one of the major cloud service providers in China. BOS > (Baidu Object Storage) is Baidu's core object storage service and is widely > used for big data analytics, machine learning, and data lake workloads. > > A native Hadoop connector would allow Hadoop ecosystem projects, including > MapReduce, Spark, Hive, Flink, and others, to access BOS storage directly > through the bos:// scheme. > > According to the contributors, this connector has been running in > production > at Baidu for around 8 years, serving both BOS users and Baidu MapReduce > (BMR) workloads. > > 2. Implementation > > The proposed module is placed under: > > hadoop-cloud-storage-project/hadoop-bos > > This follows the structure of the existing cloud storage connectors. > > The implementation includes: > > - A full Hadoop FileSystem implementation with the bos:// URI scheme > - Pluggable credentials provider support > - Contract tests covering standard filesystem operations > - Dependency shading or exclusion to avoid classpath conflicts, with shaded > dependencies placed under org.apache.hadoop.fs.bos.shaded.* > > 3. Long-term Maintenance > > The following contributors have expressed commitment to maintaining this > module: > > - yangdong2398, BOS R&D > - LuciferYang, Apache Spark PMC > - jackylee-ch, Apache Gluten PMC > - houzhizhen, Apache HugeGraph committer > - summaryzb, Apache Uniffle committer > > They have committed to: > > - Responding to issues and PRs within one week > - Keeping dependencies up to date > - Adapting the connector to future Hadoop API changes > > 4. Why Consider Integrating This into Hadoop > > This proposal follows a similar rationale to hadoop-aws (S3A), > hadoop-aliyun, and hadoop-cos: > > - Users can rely on a single, consistent Hadoop distribution without > managing separate connector JARs and version compatibility manually > - A connector maintained within the Hadoop community is easier for users to > trust and review > - Shared CI helps ensure ongoing compatibility with Hadoop trunk > > I would like to invite feedback from the community on whether this > connector > is appropriate to include in Hadoop, and what additional work, review, or > requirements would be needed before it can be accepted. > > The contributors are copied / expected to participate in this discussion > and > can provide more details about the implementation, production usage, and > maintenance plan. > > Best regards, > Shilun Fan. >
