Just one thing to note is that we recently removed or reduced the
okhttp3 dependency in Hadoop because the kotlin dependency brings in
big jars and more complicated management of transitive dependencies.
Would it be possible to consider using a lightweight HTTP client
instead? The built-in Java client or Apache HttpClient are examples.

https://issues.apache.org/jira/browse/HADOOP-18890

On Fri, 14 Feb 2025 at 12:41, Jinglun <jing...@apache.org> wrote:
>
> Thanks xiaoqiao and steve for your attention and comments. Let me answer the 
> dependencies and tests.
>
> **Dependencies**
> Hadoop-tos involves a new dependency com.volcengine:ve-tos-java-sdk:2.8.6. It 
> is an open source project with apache 2.0  license 
> (https://github.com/volcengine/ve-tos-java-sdk/blob/main/LICENSE).
>
> Here are the dependencies involved by com.volcengine:ve-tos-java-sdk:2.8.6.  
> They (okhttp, okio, kotlin, jackson) are open source with apache 2.0 too.
> [INFO] +- com.volcengine:ve-tos-java-sdk:jar:2.8.7:compile
> [INFO] |  +- com.squareup.okhttp3:okhttp:jar:4.10.0:compile
> [INFO] |  |  +- com.squareup.okio:okio-jvm:jar:3.0.0:compile
> [INFO] |  |  |  \- org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.6.20:test
> [INFO] |  |  |     \- org.jetbrains.kotlin:kotlin-stdlib-jdk7:jar:1.6.20:test
> [INFO] |  |  \- org.jetbrains.kotlin:kotlin-stdlib:jar:1.6.20:compile
> [INFO] |  |     \- org.jetbrains:annotations:jar:13.0:compile
> [INFO] |  \- com.fasterxml.jackson.core:jackson-annotations:jar:2.12.7:compile
> [INFO] +- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.6.20:compile
>
> **How is it tested**
> The hadoop-tos module has a complete unit test set, including the contracts 
> and extended test cases. To run it, we need a machine that can connect to 
> TOS. Setting the 6 environment variables below.
> ```
> export TOS_ACCESS_KEY_ID={YOUR_ACCESS_KEY}
> export TOS_SECRET_ACCESS_KEY={YOUR_SECRET_ACCESS_KEY}
> export TOS_ENDPOINT={TOS_SERVICE_ENDPOINT}
> export FILE_STORAGE_ROOT=/tmp/local_dev/
> export TOS_BUCKET={YOUR_BUCKET_NAME}
> export TOS_UNIT_TEST_ENABLED=true
> ```
> Then cd to hadoop project root directory, and run the test command below.
> ```
> mvn -Dtest=org.apache.hadoop.fs.tosfs.** test -pl org.apache.hadoop:hadoop-tos
> ```
> I also test it in a real hadoop environment. The document (index.md) 
> describes how to set jars and configure keys. Common tests include: shell 
> commands, Terasort, DFSIO, NNBench, Distcp, etc.
>
> **Test Environment**
> We need a VolcanoEngine account to run all the test cases. I can provide an 
> environment for test. Please let me know if you need to test hadoop-tos 
> (jing...@apache.org).
>
>
>
>
> On 2025/02/13 18:21:57 Steve Loughran wrote:
> > Sounds good, though expect no commitment from me to review anything.
> > My main concerns are about dependency libraries (what are they?) and
> > testing.
> >
> > On Tue, 11 Feb 2025 at 05:10, Xiaoqiao He <hexiaoq...@apache.org> wrote:
> >
> > > Thanks Jinglun for your work. Basically +1 from me to involve it into the
> > > Hadoop codebase.
> > > a. After a quick review of JIRA and PR, I think it is solid including
> > > document and code style.
> > > b. Contributors involved here are diverse who are from different projects
> > > and companies, and active enough.
> > > c. Community with Jinlun offline many times, and IMO he could be
> > > responsible to review and test about this module.
> > > Beside that, just suggest following the Hadoop guidelines[1] to develop
> > > the new features.
> > >
> > > @Steve Loughran <ste...@cloudera.com> @Shilun Fan <slfan1...@foxmail.com> 
> > > leave
> > > some comments including some concerns in JIRA, would you mind giving more
> > > suggestions for this discussion?
> > > Thanks.
> > >
> > > Best Regards,
> > > - He Xiaoqiao
> > >
> > > [1] https://hadoop.apache.org/bylaws.html
> > >
> > >
> > > On Sun, Jan 26, 2025 at 3:39 PM jinglun <jinglun...@qq.com.invalid> wrote:
> > >
> > >> Hello everyone, I'd like to discuss the integration of volcano engine tos
> > >> in hadoop.
> > >>
> > >>
> > >> Volcano Engine is a fast growing cloud vendor launched by ByteDance, and
> > >> TOS is the object storage service of Volcano Engine. A common way is to
> > >> store data into TOS and run Hadoop/Spark/Flink applications to access 
> > >> TOS.
> > >> But there is no original support for TOS in hadoop, thus it is not easy 
> > >> for
> > >> users to build their Big Data System based on TOS.
> > >> &nbsp;
> > >> My proposal is to integrate TOS with Hadoop to help users run their
> > >> applications on TOS. Users only need to do some simple configuration, 
> > >> then
> > >> their applications can read/write TOS without any code change. This work 
> > >> is
> > >> similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud 
> > >> Object
> > >> Storage in Hadoop.
> > >>
> > >>
> > >> More details could be found at&nbsp;
> > >> https://issues.apache.org/jira/browse/HADOOP-19236.
> > >>
> > >>
> > >> 1. What is the progress of the work now?
> > >> The work is currently finished at branch HADOOP_19236. It is developed by
> > >> the EMR team of Volcano Engine and served many users from both cloud and
> > >> IDC for more than 2 years.
> > >>
> > >>
> > >> 2. How is the&nbsp;long-term maintenance and testing guaranteed?&nbsp;
> > >> The contributors are opensource friendly,&nbsp;including&nbsp;ZhengHu(PMC
> > >> of HBase and Iceberg), Jinglun(Committer of 
> > >> Hadoop),&nbsp;SunXin(Committer
> > >> of HBase),&nbsp;XianyinXin(Contributor of Spark), Rascal Wu(Contributor 
> > >> of
> > >> Flink), FangBo(Contributor of Hive) and Yuanzhihuan.&nbsp;We will all be
> > >> involved in the long-term maintenance of this work.&nbsp;As time goes by,
> > >> more people from the EMR team and the hadoop-tos users may join this 
> > >> work.
> > >> So I'm confident at the long-term maintenance and testing.
> > >>
> > >>
> > >> 3. Why should hadoop-tos interaged to hadoop codebase? Shall we use an
> > >> independent project?
> > >> Integration is for a better user experience. First, users don't need to
> > >> go to another repo to find the tos support. Second, users don't need to
> > >> worry about the versions mapping between hadoop and hadoop-tos. Finally, 
> > >> a
> > >> connector provided by hadoop community is&nbsp;more reliable and
> > >> trustworthy.&nbsp;
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> If you have any question, concern or any thing else that is unclear,
> > >> please let me know.&nbsp;Sincerely looking forward to your reply, thanks
> > >> very much.
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to