**Update** I discussed with the main tos sdk developer xiang(evansxi...@126.com), he would be happy to provide a new sdk depends on apache http-client. The new sdk would be released at 3/15.
Thanks xiang for your help. If there are any other questions, please feel free to comment. On 2025/02/17 03:27:15 Jinglun wrote: > Thanks PJ Fanning and slfan for your suggestions ! > > > Would it be possible to consider using a lightweight HTTP client instead? > Thanks for reminding me, this makes sense to me. I'll try to solve it as soon > as possible. > > > We are upgrading to Junit5 and need TOS's Unit Tests to be developed in the > > Junit5 way, > Thanks for your nice suggestion. Let me fix this. > > I will update as soon as possible. If you have any other questions, please > feel free to comment. > > On 2025/02/15 02:59:02 slfan1989 wrote: > > Thanks to Jinglun for initiating the discussion on TOS. > > > > +1 from my personal perspective. > > > > However, considering that we are upgrading to Junit5 and need TOS's Unit > > Tests to be developed in the Junit5 way, we can discuss it together under > > the relevant PR. > > > > Regarding PJ Fanning's suggestion, I think we should pay attention to it. > > He has a deeper insight into this part. > > > > Best Regards, > > - Shilun Fan > > > > On Sat, Feb 14, 2025 at 20:58 PM PJ Fanning <fannin...@apache.org> wrote: > > > > > > > > Just one thing to note is that we recently removed or reduced the > > > okhttp3 dependency in Hadoop because the kotlin dependency brings in > > > big jars and more complicated management of transitive dependencies. > > > Would it be possible to consider using a lightweight HTTP client > > > instead? The built-in Java client or Apache HttpClient are examples. > > > > > > https://issues.apache.org/jira/browse/HADOOP-18890 > > > > > > On Fri, 14 Feb 2025 at 12:41, Jinglun wrote: > > > > > > > > Thanks xiaoqiao and steve for your attention and comments. Let me answer > > > the dependencies and tests. > > > > > > > > **Dependencies** > > > > Hadoop-tos involves a new dependency > > > com.volcengine:ve-tos-java-sdk:2.8.6. It is an open source project with > > > apache 2.0 license ( > > > https://github.com/volcengine/ve-tos-java-sdk/blob/main/LICENSE). > > > > > > > > Here are the dependencies involved by > > > com.volcengine:ve-tos-java-sdk:2.8.6. They (okhttp, okio, kotlin, jackson) > > > are open source with apache 2.0 too. > > > > [INFO] +- com.volcengine:ve-tos-java-sdk:jar:2.8.7:compile > > > > [INFO] | +- com.squareup.okhttp3:okhttp:jar:4.10.0:compile > > > > [INFO] | | +- com.squareup.okio:okio-jvm:jar:3.0.0:compile > > > > [INFO] | | | \- org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.6.20:test > > > > [INFO] | | | \- org.jetbrains.kotlin:kotlin-stdlib-jdk7:jar:1.6.20:test > > > > [INFO] | | \- org.jetbrains.kotlin:kotlin-stdlib:jar:1.6.20:compile > > > > [INFO] | | \- org.jetbrains:annotations:jar:13.0:compile > > > > [INFO] | \- > > > com.fasterxml.jackson.core:jackson-annotations:jar:2.12.7:compile > > > > [INFO] +- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.6.20:compile > > > > > > > > **How is it tested** > > > > The hadoop-tos module has a complete unit test set, including the > > > contracts and extended test cases. To run it, we need a machine that can > > > connect to TOS. Setting the 6 environment variables below. > > > > ``` > > > > export TOS_ACCESS_KEY_ID={YOUR_ACCESS_KEY} > > > > export TOS_SECRET_ACCESS_KEY={YOUR_SECRET_ACCESS_KEY} > > > > export TOS_ENDPOINT={TOS_SERVICE_ENDPOINT} > > > > export FILE_STORAGE_ROOT=/tmp/local_dev/ > > > > export TOS_BUCKET={YOUR_BUCKET_NAME} > > > > export TOS_UNIT_TEST_ENABLED=true > > > > ``` > > > > Then cd to hadoop project root directory, and run the test command > > > > below. > > > > ``` > > > > mvn -Dtest=org.apache.hadoop.fs.tosfs.** test -pl > > > org.apache.hadoop:hadoop-tos > > > > ``` > > > > I also test it in a real hadoop environment. The document (index.md) > > > describes how to set jars and configure keys. Common tests include: shell > > > commands, Terasort, DFSIO, NNBench, Distcp, etc. > > > > > > > > **Test Environment** > > > > We need a VolcanoEngine account to run all the test cases. I can provide > > > an environment for test. Please let me know if you need to test > > > hadoop-tos ( > > > jing...@apache.org). > > > > > > > > > > > > > > > > > > > > On 2025/02/13 18:21:57 Steve Loughran wrote: > > > > > Sounds good, though expect no commitment from me to review anything. > > > > > My main concerns are about dependency libraries (what are they?) and > > > > > testing. > > > > > > > > > > On Tue, 11 Feb 2025 at 05:10, Xiaoqiao He wrote: > > > > > > > > > > > Thanks Jinglun for your work. Basically +1 from me to involve it > > > into the > > > > > > Hadoop codebase. > > > > > > a. After a quick review of JIRA and PR, I think it is solid > > > > > > including > > > > > > document and code style. > > > > > > b. Contributors involved here are diverse who are from different > > > projects > > > > > > and companies, and active enough. > > > > > > c. Community with Jinlun offline many times, and IMO he could be > > > > > > responsible to review and test about this module. > > > > > > Beside that, just suggest following the Hadoop guidelines[1] to > > > develop > > > > > > the new features. > > > > > > > > > > > > @Steve Loughran @Shilun Fan leave > > > > > > some comments including some concerns in JIRA, would you mind giving > > > more > > > > > > suggestions for this discussion? > > > > > > Thanks. > > > > > > > > > > > > Best Regards, > > > > > > - He Xiaoqiao > > > > > > > > > > > > [1] https://hadoop.apache.org/bylaws.html > > > > > > > > > > > > > > > > > > On Sun, Jan 26, 2025 at 3:39 PM jinglun wrote: > > > > > > > > > > > >> Hello everyone, I'd like to discuss the integration of volcano > > > engine tos > > > > > >> in hadoop. > > > > > >> > > > > > >> > > > > > >> Volcano Engine is a fast growing cloud vendor launched by > > > ByteDance, and > > > > > >> TOS is the object storage service of Volcano Engine. A common way > > > is to > > > > > >> store data into TOS and run Hadoop/Spark/Flink applications to > > > access TOS. > > > > > >> But there is no original support for TOS in hadoop, thus it is not > > > easy for > > > > > >> users to build their Big Data System based on TOS. > > > > > >> > > > > > >> My proposal is to integrate TOS with Hadoop to help users run their > > > > > >> applications on TOS. Users only need to do some simple > > > configuration, then > > > > > >> their applications can read/write TOS without any code change. This > > > work is > > > > > >> similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and > > > HuaweiCloud Object > > > > > >> Storage in Hadoop. > > > > > >> > > > > > >> > > > > > >> More details could be found at > > > > > >> https://issues.apache.org/jira/browse/HADOOP-19236. > > > > > >> > > > > > >> > > > > > >> 1. What is the progress of the work now? > > > > > >> The work is currently finished at branch HADOOP_19236. It is > > > developed by > > > > > >> the EMR team of Volcano Engine and served many users from both > > > cloud and > > > > > >> IDC for more than 2 years. > > > > > >> > > > > > >> > > > > > >> 2. How is the long-term maintenance and testing guaranteed? > > > > > >> The contributors are opensource friendly, including ZhengHu(PMC > > > > > >> of HBase and Iceberg), Jinglun(Committer of > > > Hadoop), SunXin(Committer > > > > > >> of HBase), XianyinXin(Contributor of Spark), Rascal Wu(Contributor > > > of > > > > > >> Flink), FangBo(Contributor of Hive) and Yuanzhihuan. We will all be > > > > > >> involved in the long-term maintenance of this work. As time goes > > > > > >> by, > > > > > >> more people from the EMR team and the hadoop-tos users may join > > > this work. > > > > > >> So I'm confident at the long-term maintenance and testing. > > > > > >> > > > > > >> > > > > > >> 3. Why should hadoop-tos interaged to hadoop codebase? Shall we use > > > an > > > > > >> independent project? > > > > > >> Integration is for a better user experience. First, users don't > > > need to > > > > > >> go to another repo to find the tos support. Second, users don't > > > need to > > > > > >> worry about the versions mapping between hadoop and hadoop-tos. > > > Finally, a > > > > > >> connector provided by hadoop community is more reliable and > > > > > >> trustworthy. > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> If you have any question, concern or any thing else that is > > > > > >> unclear, > > > > > >> please let me know. Sincerely looking forward to your reply, thanks > > > > > >> very much. > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org