Thanks to Jinglun for initiating the discussion on TOS. +1 from my personal perspective.
However, considering that we are upgrading to Junit5 and need TOS's Unit Tests to be developed in the Junit5 way, we can discuss it together under the relevant PR. Regarding PJ Fanning's suggestion, I think we should pay attention to it. He has a deeper insight into this part. Best Regards, - Shilun Fan On Sat, Feb 14, 2025 at 20:58 PM PJ Fanning <fannin...@apache.org> wrote: > > Just one thing to note is that we recently removed or reduced the > okhttp3 dependency in Hadoop because the kotlin dependency brings in > big jars and more complicated management of transitive dependencies. > Would it be possible to consider using a lightweight HTTP client > instead? The built-in Java client or Apache HttpClient are examples. > > https://issues.apache.org/jira/browse/HADOOP-18890 > > On Fri, 14 Feb 2025 at 12:41, Jinglun wrote: > > > > Thanks xiaoqiao and steve for your attention and comments. Let me answer > the dependencies and tests. > > > > **Dependencies** > > Hadoop-tos involves a new dependency > com.volcengine:ve-tos-java-sdk:2.8.6. It is an open source project with > apache 2.0 license ( > https://github.com/volcengine/ve-tos-java-sdk/blob/main/LICENSE). > > > > Here are the dependencies involved by > com.volcengine:ve-tos-java-sdk:2.8.6. They (okhttp, okio, kotlin, jackson) > are open source with apache 2.0 too. > > [INFO] +- com.volcengine:ve-tos-java-sdk:jar:2.8.7:compile > > [INFO] | +- com.squareup.okhttp3:okhttp:jar:4.10.0:compile > > [INFO] | | +- com.squareup.okio:okio-jvm:jar:3.0.0:compile > > [INFO] | | | \- org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.6.20:test > > [INFO] | | | \- org.jetbrains.kotlin:kotlin-stdlib-jdk7:jar:1.6.20:test > > [INFO] | | \- org.jetbrains.kotlin:kotlin-stdlib:jar:1.6.20:compile > > [INFO] | | \- org.jetbrains:annotations:jar:13.0:compile > > [INFO] | \- > com.fasterxml.jackson.core:jackson-annotations:jar:2.12.7:compile > > [INFO] +- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.6.20:compile > > > > **How is it tested** > > The hadoop-tos module has a complete unit test set, including the > contracts and extended test cases. To run it, we need a machine that can > connect to TOS. Setting the 6 environment variables below. > > ``` > > export TOS_ACCESS_KEY_ID={YOUR_ACCESS_KEY} > > export TOS_SECRET_ACCESS_KEY={YOUR_SECRET_ACCESS_KEY} > > export TOS_ENDPOINT={TOS_SERVICE_ENDPOINT} > > export FILE_STORAGE_ROOT=/tmp/local_dev/ > > export TOS_BUCKET={YOUR_BUCKET_NAME} > > export TOS_UNIT_TEST_ENABLED=true > > ``` > > Then cd to hadoop project root directory, and run the test command below. > > ``` > > mvn -Dtest=org.apache.hadoop.fs.tosfs.** test -pl > org.apache.hadoop:hadoop-tos > > ``` > > I also test it in a real hadoop environment. The document (index.md) > describes how to set jars and configure keys. Common tests include: shell > commands, Terasort, DFSIO, NNBench, Distcp, etc. > > > > **Test Environment** > > We need a VolcanoEngine account to run all the test cases. I can provide > an environment for test. Please let me know if you need to test hadoop-tos ( > jing...@apache.org). > > > > > > > > > > On 2025/02/13 18:21:57 Steve Loughran wrote: > > > Sounds good, though expect no commitment from me to review anything. > > > My main concerns are about dependency libraries (what are they?) and > > > testing. > > > > > > On Tue, 11 Feb 2025 at 05:10, Xiaoqiao He wrote: > > > > > > > Thanks Jinglun for your work. Basically +1 from me to involve it > into the > > > > Hadoop codebase. > > > > a. After a quick review of JIRA and PR, I think it is solid including > > > > document and code style. > > > > b. Contributors involved here are diverse who are from different > projects > > > > and companies, and active enough. > > > > c. Community with Jinlun offline many times, and IMO he could be > > > > responsible to review and test about this module. > > > > Beside that, just suggest following the Hadoop guidelines[1] to > develop > > > > the new features. > > > > > > > > @Steve Loughran @Shilun Fan leave > > > > some comments including some concerns in JIRA, would you mind giving > more > > > > suggestions for this discussion? > > > > Thanks. > > > > > > > > Best Regards, > > > > - He Xiaoqiao > > > > > > > > [1] https://hadoop.apache.org/bylaws.html > > > > > > > > > > > > On Sun, Jan 26, 2025 at 3:39 PM jinglun wrote: > > > > > > > >> Hello everyone, I'd like to discuss the integration of volcano > engine tos > > > >> in hadoop. > > > >> > > > >> > > > >> Volcano Engine is a fast growing cloud vendor launched by > ByteDance, and > > > >> TOS is the object storage service of Volcano Engine. A common way > is to > > > >> store data into TOS and run Hadoop/Spark/Flink applications to > access TOS. > > > >> But there is no original support for TOS in hadoop, thus it is not > easy for > > > >> users to build their Big Data System based on TOS. > > > >> > > > >> My proposal is to integrate TOS with Hadoop to help users run their > > > >> applications on TOS. Users only need to do some simple > configuration, then > > > >> their applications can read/write TOS without any code change. This > work is > > > >> similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and > HuaweiCloud Object > > > >> Storage in Hadoop. > > > >> > > > >> > > > >> More details could be found at > > > >> https://issues.apache.org/jira/browse/HADOOP-19236. > > > >> > > > >> > > > >> 1. What is the progress of the work now? > > > >> The work is currently finished at branch HADOOP_19236. It is > developed by > > > >> the EMR team of Volcano Engine and served many users from both > cloud and > > > >> IDC for more than 2 years. > > > >> > > > >> > > > >> 2. How is the long-term maintenance and testing guaranteed? > > > >> The contributors are opensource friendly, including ZhengHu(PMC > > > >> of HBase and Iceberg), Jinglun(Committer of > Hadoop), SunXin(Committer > > > >> of HBase), XianyinXin(Contributor of Spark), Rascal Wu(Contributor > of > > > >> Flink), FangBo(Contributor of Hive) and Yuanzhihuan. We will all be > > > >> involved in the long-term maintenance of this work. As time goes by, > > > >> more people from the EMR team and the hadoop-tos users may join > this work. > > > >> So I'm confident at the long-term maintenance and testing. > > > >> > > > >> > > > >> 3. Why should hadoop-tos interaged to hadoop codebase? Shall we use > an > > > >> independent project? > > > >> Integration is for a better user experience. First, users don't > need to > > > >> go to another repo to find the tos support. Second, users don't > need to > > > >> worry about the versions mapping between hadoop and hadoop-tos. > Finally, a > > > >> connector provided by hadoop community is more reliable and > > > >> trustworthy. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> If you have any question, concern or any thing else that is unclear, > > > >> please let me know. Sincerely looking forward to your reply, thanks > > > >> very much. > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >