prantogg opened a new issue, #70: URL: https://github.com/apache/sedona-spatialbench/issues/70
**Is your feature request related to a problem? Please describe.** When generating large-scale SpatialBench datasets (e.g., SF1000 or higher), there is currently no way to write generated data directly to S3. This creates several significant limitations: - Local Storage bottlenecks: Large-scale datasets can be hundreds of GBs or TBs in size, quickly exhausting local disk space. For example, SF1000 Trip table alone can exceed 500GB. - Workflow inefficiency: The current workflow requires generating data locally first, then manually uploading to S3 using separate tools (aws cli, rclone, etc.), which is time-consuming and error-prone. **Describe the solution you'd like** Add support for S3 URIs in the `--output-dir` parameter, enabling the tool to stream generated data directly to S3 without requiring local storage: ```bash # Current workflow: spatialbench-cli --scale-factor 1000 --output-dir ./data # Then manually: aws s3 cp ./data s3://my-bucket/spatialbench/sf1000 --recursive # Proposed workflow: spatialbench-cli --scale-factor 1000 --output-dir s3://my-bucket/spatialbench/sf1000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
