i hear owen o'malley has been learning rust, and as he left cloudera a year
ago, he'll be missing github and JIRA....

On Thu, 21 Dec 2023 at 15:00, Ayush Saxena <ayush...@gmail.com> wrote:

> It looks pretty challenging to me. Most of the committers aren't
> technically equipped to review this code, so getting the initial code
> reviewed & merged itself would be a challenge, as none of us can
> actually review the code.
>
> Looking at the repo, it has only 1 or 2 major contributors, which
> itself is a red flag, the bus factor is pretty low, if we don't find
> volunteers in future, we would be stuck with some dead code, which
> most of us don't know how to fix or maintain. If there is any CVE
> reported from this code post release, that would be a challenge for us
> to fix
>
> Quoting:
> > the Rust
> community has developed around 10 different HDFS client projects.
> However, almost all of them
> are no longer maintained.
>
> If they couldn't do, how we will be able to do that? and this isn't a
> very good statistic to quote :-)
>
>
> Well, I don't have objections on having this as a separate repo in
> Hadoop, if others are fine with it, I can try to help whatever is in
> my capacity, but I still have doubts on how easy would it be to push
> code or get votes on release of this project, which most of the people
> doesn't have knowledge & developing a community and stuff seems like a
> incubator thing to me.
>
> -Ayush
>
> On Thu, 21 Dec 2023 at 19:01, Xuanwo <xua...@apache.org> wrote:
> >
> > Thanks Xiaoqiao He!
> >
> > Let me provide more context about this project.
> >
> > libhdfs-rust aims to provide native HDFS client support for Rust, a
> rapidly growing systems
> > programming language commonly used in modern infrastructure such as
> databases. With
> > libhdfs-rust, Rust developers can more easily integrate with HDFS.
> libhdfs-rust is analogous
> > to both libhdfs (C API) and libhdfspp <
> https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp>
> (C++ API). Its current codebase builds upon libhdfs, but
> > there are plans to rewrite it entirely in pure rust. Consequently,
> libhdfs-rust will interface
> > directly with the HDFS Java client via JNI, making it fully parallel to
> both libhdfs and libhdfs-cpp.
> >
> > There are three possible ways for us to take:
> >
> > We have three options to consider:
> >
> > A: Integrate libhdfs-rust into the Hadoop repository, placing it under
> >     'hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native'.
> > B: Accept libhdfs-rust as a subproject and establish a new repository
> >     named 'hadoop-hdfs-rust-client' (or another suitable name).
> > C: Maintain libhdfs-rust as an independent project outside of Hadoop.
> >
> > I personally prefer Option B since:
> >
> > For Option A
> >
> > The release process for Hadoop is already quite complex. We should avoid
> placing additional
> > burdens on the Release Managers, especially when it involves integrating
> a new language.
> >
> > And it's impossible to wait for libhdfs-rust mature and stable enough to
> catch up the release train.
> >
> > For Option C
> >
> > libhdfs-rust is exactly the same with libhdfs & libhdfspp <
> https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp>
> but for rust. Building a community for
> > libhdfs-rust outside of Hadoop is challenging. In fact, numerous
> attempts have been made: the Rust
> > community has developed around 10 different HDFS client projects.
> However, almost all of them
> > are no longer maintained.
> >
> > In conclusion, I believe that Option B is the best choice for us: we can
> develop a rust project in hadoop
> > community, attract more rust users, and recruit additional committers
> from the rust community.
> >
> >
> > On Wed, Dec 20, 2023, at 21:53, Xiaoqiao He wrote:
> > > Thanks Xuanwo for your work. I believe it is valuable to enlarge
> hadoop ecosystem.
> > >
> > > I am also concerned that it will involve more hard work to release and
> version match,
> > > especially for one who is not familiar with C or Rust.
> > > Moreover, I am not aware the difference between `accept hdfs-sys as
> part of hadoop
> > > project` and `one separate project`.
> > >
> > > I think one smooth solution is reference hadoop-thirdparty[1] which is
> one hadoop
> > > sub-project but split to separate repo and release line etc, if it is
> accepted.
> > >
> > > cc @Ayush Saxena <mailto:ayush...@gmail.com> @Wei-Chiu Chuang <mailto:
> weic...@apache.org> @Iñigo Goiri <mailto:elgo...@gmail.com> @Shilun Fan
> <mailto:slfan1...@foxmail.com> and other folks, what
> > > do you think? Thanks.
> > >
> > > Best Regards,
> > > - He Xiaoqiao
> > >
> > > [1] https://github.com/apache/hadoop-thirdparty
> > >
> > > On Wed, Dec 20, 2023 at 6:17 PM Xuanwo <xua...@apache.org> wrote:
> > >> I'm fine to start work under a new repo, and I'm willing to help
> maintain this repo. The repo could name after hadoop-libhdfs-rust or just
> libhdfs-rust.
> > >>
> > >> I'm PPMC member of other ASF projects so I know how to do release and
> how to make sure the license fit the requirements. I'm willing the become
> the RM until we find more committers for this sub-project.
> > >>
> > >> I'm currently looking for committers willing to help me review PRs
> and validate my releases. Is there anyone interested in sponsoring me?
> > >>
> > >> On Tue, Jul 18, 2023, at 12:45, Xuanwo wrote:
> > >> > > What is libdirent? How is it relevant in this context?
> > >> >
> > >> > Since version 3.3, libhdfs depends on the dirent.h API. However,
> MSVC does not provide this header which causes issues when building libhdfs
> on Windows platforms. To solve this problem, hdfs-sys uses libdirent - a
> MSVC port of the dirent.h API for Windows.
> > >> >
> > >> > Fortunately, hdfs has already done similar work in
> [native/libhdfspp/lib/x-platform]. If libhdfs-rust is accepted, we can
> migrate to use hdfs's own implementation instead.
> > >> >
> > >> > > How tightly coupled is it to a specific Hadoop version?
> > >> >
> > >> > Thanks to hdfs's stable API, there is no breakage between different
> hadoop version (only addition). So the version matrix will be like:
> > >> >
> > >> > - libhdfs-rust (feature flag: v2_2) can access  hadoop v2.2 ~ v3.3
> > >> > ...
> > >> > - libhdfs-rust (feature flag: v2_10) can access  hadoop v2.10 ~ v3.3
> > >> > ...
> > >> > - libhdfs-rust (feature flag: v3_3) can access  hadoop v3.3
> > >> >
> > >> > > The concern I have as a release manager is that it makes my life
> harder to ensure the quality of a language binding that I am not familiar
> with.
> > >> >
> > >> > Most of the code in libhdfs-rust is generated by [rust-bindgen], a
> tool developed by the Rust Team to automatically generate Rust FFI bindings
> for C (and some C++) libraries. Other parts are related to building and
> linking, similar to Makefile, such as finding libjvm and libhdfs.
> > >> >
> > >> > In general, the task that libhdfs-rust performs is simple: it
> provides an API to Rust and links it with libhdfs.so, which I believe is
> easy to test.
> > >> >
> > >> > [libdirect]: https://github.com/tronkko/dirent
> > >> > [native/libhdfspp/lib/x-platform]:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/dirent.h
> > >> > [rust-bindgen]: https://github.com/rust-lang/rust-bindgen
> > >> >
> > >> >
> > >> > On Tue, Jul 18, 2023, at 00:14, Wei-Chiu Chuang wrote:
> > >> >> Inline
> > >> >>
> > >> >> On Sat, Jul 15, 2023 at 5:04 AM Ayush Saxena <ayush...@gmail.com>
> wrote:
> > >> >>> Forwarding from dev@hadoop to relevant ML
> > >> >>>
> > >> >>> Original mail:
> https://lists.apache.org/thread/r5rcmc7lwwvkysj0320myxltsyokp9kq
> > >> >>>
> > >> >>> -Ayush
> > >> >>>
> > >> >>> On 2023/07/15 09:18:42 Xuanwo wrote:
> > >> >>> > Hello, everyone.
> > >> >>> >
> > >> >>> > I'm the maintainer of [hdfs-sys]: A binding to HDFS Native C
> API for Rust. I want to know is it a good idea of accepting hdfs-sys as a
> part of hadoop project?
> > >> >>> >
> > >> >>> > Users of hdfs-sys for now:
> > >> >>> >
> > >> >>> > - [OpenDAL]: An Apache Incubator project that allows users to
> easily and efficiently retrieve data from various storage services in a
> unified way.
> > >> >>> > - [Databend]: A modern cloud data warehouse focusing on
> reducing cost and complexity for your massive-scale analytics needs. (via
> OpenDAL)
> > >> >>> > - [RisingWave]: The distributed streaming database: SQL stream
> processing with Postgres-like experience. (via OpenDAL)
> > >> >>> > - [LakeSoul]: an end-to-end, realtime and cloud native
> Lakehouse framework
> > >> >>> >
> > >> >>> > Licenses information of hdfs-sys:
> > >> >>> >
> > >> >>> > - hdfs-sys itself licensed under Apache-2.0
> > >> >>> > - hdfs-sys only depends on the following libs: cc@1.0.73,
> glob@0.3.1, hdfs-sys@0.3.0, java-locator@0.1.5, lazy_static@1.4.0, they
> are all dual licensed under Apache-2.0 and MIT.
> > >> >>> >
> > >> >>> > Works need to do if accept:
> > >> >>> >
> > >> >>> > - Replace libdirent with the same dirent API implemented in
> HDFS project.
> > >> >>> > - Remove all bundled hdfs C code.
> > >> >> What is libdirent? How is it relevant in this context?
> > >> >>
> > >> >> How tightly coupled is it to a specific Hadoop version? I am
> wondering if it's possible to host it in a separate Hadoop repo, if it's
> accepted. The concern I have as a release manager is that it makes my life
> harder to ensure the quality of a language binding that I am not familiar
> with.
> > >> >>> >
> > >> >>> > [hdfs-sys]: https://github.com/Xuanwo/hdfs-sys
> > >> >>> > [OpenDAL]: https://github.com/apache/incubator-opendal
> > >> >>> > [Databend]: https://github.com/datafuselabs/databend
> > >> >>> > [RisingWave]: https://github.com/risingwavelabs/risingwave
> > >> >>> > [LakeSoul]: https://github.com/lakesoul-io/LakeSoul
> > >> >>> >
> > >> >>> > Xuanwo
> > >> >>> >
> > >> >>> >
> ---------------------------------------------------------------------
> > >> >>> > To unsubscribe, e-mail: dev-unsubscr...@hadoop.apache.org
> > >> >>> > For additional commands, e-mail: dev-h...@hadoop.apache.org
> > >> >>> >
> > >> >>> >
> > >> >>>
> > >> >>>
> ---------------------------------------------------------------------
> > >> >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > >> >>> For additional commands, e-mail:
> common-dev-h...@hadoop.apache.org
> > >> >
> > >> > Xuanwo
> > >> >
> > >>
> > >> Xuanwo
> >
> > Xuanwo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Reply via email to