Re: [DISCUSS] Add Baidu Cloud BOS filesystem connector to Hadoop

Xiaoqiao He Mon, 25 May 2026 23:41:24 -0700

Thanks Shilun for driving this progress.
+1 from my side,
a. From the PR (https://github.com/apache/hadoop/pull/8347), the code has
been ready now.
b. Both of the contributors are PMC members or committers from mature
community of apache.
I would like to hear more sound from the dev team about the following
plan. Good
Luck!


Best Regards,
- He Xiaoqiao

On Fri, May 22, 2026 at 9:33 PM slfan1989 <[email protected]> wrote:

> Hi Hadoop community,
>
> I would like to start a discussion about adding Baidu Cloud BOS
> (Baidu Object Storage) as a native Hadoop-compatible filesystem connector.
>
> JIRA: https://issues.apache.org/jira/browse/HDFS-11161
> PR: https://github.com/apache/hadoop/pull/8347
> CI Status: +1 overall, all checks passed.
>
> I have had some offline discussions with LuciferYang and the contributors
> working on this connector. Based on those discussions, I am helping bring
> this proposal to the Hadoop community for broader review and feedback.
>
> The goal is to integrate BOS support as a native Hadoop filesystem module,
> similar to the existing hadoop-aws (S3A), hadoop-aliyun, and hadoop-cos
> connectors.
>
> 1. Background
>
> Baidu Cloud is one of the major cloud service providers in China. BOS
> (Baidu Object Storage) is Baidu's core object storage service and is widely
> used for big data analytics, machine learning, and data lake workloads.
>
> A native Hadoop connector would allow Hadoop ecosystem projects, including
> MapReduce, Spark, Hive, Flink, and others, to access BOS storage directly
> through the bos:// scheme.
>
> According to the contributors, this connector has been running in
> production
> at Baidu for around 8 years, serving both BOS users and Baidu MapReduce
> (BMR) workloads.
>
> 2. Implementation
>
> The proposed module is placed under:
>
>   hadoop-cloud-storage-project/hadoop-bos
>
> This follows the structure of the existing cloud storage connectors.
>
> The implementation includes:
>
> - A full Hadoop FileSystem implementation with the bos:// URI scheme
> - Pluggable credentials provider support
> - Contract tests covering standard filesystem operations
> - Dependency shading or exclusion to avoid classpath conflicts, with shaded
>   dependencies placed under org.apache.hadoop.fs.bos.shaded.*
>
> 3. Long-term Maintenance
>
> The following contributors have expressed commitment to maintaining this
> module:
>
> - yangdong2398, BOS R&D
> - LuciferYang, Apache Spark PMC
> - jackylee-ch, Apache Gluten PMC
> - houzhizhen, Apache HugeGraph committer
> - summaryzb, Apache Uniffle committer
>
> They have committed to:
>
> - Responding to issues and PRs within one week
> - Keeping dependencies up to date
> - Adapting the connector to future Hadoop API changes
>
> 4. Why Consider Integrating This into Hadoop
>
> This proposal follows a similar rationale to hadoop-aws (S3A),
> hadoop-aliyun, and hadoop-cos:
>
> - Users can rely on a single, consistent Hadoop distribution without
>   managing separate connector JARs and version compatibility manually
> - A connector maintained within the Hadoop community is easier for users to
>   trust and review
> - Shared CI helps ensure ongoing compatibility with Hadoop trunk
>
> I would like to invite feedback from the community on whether this
> connector
> is appropriate to include in Hadoop, and what additional work, review, or
> requirements would be needed before it can be accepted.
>
> The contributors are copied / expected to participate in this discussion
> and
> can provide more details about the implementation, production usage, and
> maintenance plan.
>
> Best regards,
> Shilun Fan.
>

Re: [DISCUSS] Add Baidu Cloud BOS filesystem connector to Hadoop

Reply via email to