Le 06/09/2022 à 09:45, Manoj Kumar a écrit :
Hi Sutou Kouhei/Team *[Background]* Working on intel gazelle_plugin <https://github.com/oap-project/gazelle_plugin>, It's a C++ based backend with an arrow compute engine for spark. Now during scan i.e reading data from HDFS/Cloud currently we are using cloud/hdfs APIs as mentioned above. But now we have Alluxio Cache <https://docs.alluxio.io/ee/user/stable/en/core-services/Caching.html> in between for fast data access. *[Problem]* HDFS/Cloud --------> Alluxio ----> arrow FS api ---> arrow parquet scan *[Need help]* Below connection [ Alluxio -----> arrow FS api ]
It looks like you could use either the S3 API, or the FUSE-based POSIX API: https://docs.alluxio.io/ee/user/stable/en/api/S3-API.html https://docs.alluxio.io/ee/user/stable/en/api/POSIX-API.html