Le 06/09/2022 à 09:45, Manoj Kumar a écrit :
Hi  Sutou Kouhei/Team

*[Background]*

Working on intel gazelle_plugin
<https://github.com/oap-project/gazelle_plugin>,
It's a C++ based backend with an arrow compute engine for spark.
Now during scan i.e reading data from HDFS/Cloud currently we are using
cloud/hdfs APIs as mentioned above.
But now we have Alluxio Cache
<https://docs.alluxio.io/ee/user/stable/en/core-services/Caching.html> in
between for fast data access.

*[Problem]*

HDFS/Cloud --------> Alluxio ----> arrow FS api ---> arrow parquet scan

*[Need help]*

Below connection
[  Alluxio    -----> arrow  FS api ]

It looks like you could use either the S3 API, or the FUSE-based POSIX API:
https://docs.alluxio.io/ee/user/stable/en/api/S3-API.html
https://docs.alluxio.io/ee/user/stable/en/api/POSIX-API.html

Reply via email to