william-fundrecs opened a new issue, #31482: URL: https://github.com/apache/superset/issues/31482
## [SIP] Proposal for using Athena Presto functionality for large downloads of CSVs ### Motivation Superset inbuilt download functionality is slow for larger file, attempting to download a 100,000 row CSV from Presto Athena AWS DB took just shy of 9 mins from start to completion. However, this can be vastly sped up with little extra overhead cost to Superset (for Presto DB users). Presto DB can be configured to automatically persist query results to S3, this takes seconds to query DB and save CSV in S3 bucket. By starting the query through the Athena API and then checking in to get the output location of the CSV, this URL can be returned to user immediately without the raw results, resulting in the real world use case we had, the 8 min 45 second download time was reduced down to 11 seconds. ### Proposed Change The proposed change is to add in functionality to return only the existing CSV file output_location to user when the user requests a download of a chart that uses an Athena Presto DB. This change will be protected with feature flags to only turn on for Presto DB users and they will need to set environment variables for AWS region/Athena workgroup and Athena DB name, example below from .env file SUPERSET_REGION=eu-west-1 SUPERSET_WORKGROUP=superset-etl SUPERSET_ATHENA_DB=my_superset_db Feature flags are to enable S3 download functionality and to hide existing CSV/XLSX default options (S3 download is faster than default download), from featureFlags.ts DownloadCSVFromS3 = 'DOWNLOAD_CSV_FROM_S3', ShowDefaultCSVOptions = 'SHOW_DEFAULT_CSV_OPTIONS', Option will appear in right click context menu  ### New or Changed Public Interfaces Reusing existing data endpoint used by CSV and XLSX default and full download. Passing in 'result_location', a new parameter to specify if exported file is to be built within Superset (current export) or S3 (new flow for Presto Athena). Changes to model for output_location, the returned presigned URL which is used to specify the file inside S3 bucket. All other changes in PR code changes. ### New dependencies No changes here ### Migration Plan and Compatibility No DB changes ### Rejected Alternatives Describe alternative approaches that were considered and rejected. Using lambda to get file from S3 and returning through proprietary application, rejected as users are already using Superset and it makes sense to allow them to download large files through Superset UI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
