Hello everyone, This is my first time emailing this mailing list, so I hope I am explaining things correctly below.
I am attempting to get started with Arrow Flight. I am storing parquet files and Iceberg tables on S3. I would like to use arrow flight as the interface data consumers use to access my data so they always receive Arrow back, where they can then continue to iterate locally with DuckDB, polars, etc. I am first attempting to get it working with a single parquet file in a private bucket on S3. For this test, I am just putting the credentials and paths directly in the server code, after working I can move to env before production. The parquet file is about 0.6GB. I am running the EC2 on a t2.micro instance. I was originally running into an ACCESS_DENIED during HeadObject operation AWS error when attempting to get the flight_info metadata about the file. >From this issue <https://github.com/apache/arrow/issues/37888>, I added in using s3fs, and I was able to avoid the HeadObject error. So, the client is able to successfully see the available datasets, and return the schema. When I attempt to actually download the data itself, it is causing my EC2 instance to break down and my SSH connection to drop. Is this likely a memory issue, or something with my code? The goal is to provide users with a common interface to access my data. After getting this working, I would add more datasets, data sources, introduce auth and RBAC, etc. For now, I thought this was a good base starting point. For now, I am just going with the user downloads the entire dataset. In the future, I hope to figure out an easy interface to support more fine grained data/tablescans, or supporting a query first, to return desired data. To keep things simple, I just added my code here <https://github.com/ChristianCasazza/arrowflights3example>.( https://github.com/ChristianCasazza/arrowflights3example). When I was actually testing, I connected to the EC2 instance through VScode for the server, and I was running the client code locally in a different window. I removed my actual parquet file path and credentials. This is my first time working with Arrow Flight, so I apologize if I am overlooking something simple or if the answer was in the docs. Any suggestions for changes I can make to get the data download working, or clear errors I am making? Thank you! Best, Christian Casazza