william-fundrecs opened a new issue, #31482:
URL: https://github.com/apache/superset/issues/31482

   ## [SIP] Proposal for using Athena Presto functionality for large downloads 
of CSVs
   
   ### Motivation
   
   Superset inbuilt download functionality is slow for larger file, attempting 
to download a 100,000 row CSV from Presto Athena AWS DB took just shy of 9 mins 
from start to completion.
   
   However, this can be vastly sped up with little extra overhead cost to 
Superset (for Presto DB users). Presto DB can be configured to automatically 
persist query results to S3, this takes seconds to query DB and save CSV in S3 
bucket.
   
   By starting the query through the Athena API and then checking in to get the 
output location of the CSV, this URL can be returned to user immediately 
without the raw results, resulting in the real world use case we had, the 8 min 
45 second download time was reduced down to 11 seconds.
   
   ### Proposed Change
   
   The proposed change is to add in functionality to return only the existing 
CSV file output_location to user when the user requests a download of a chart 
that uses an Athena Presto DB.
   
   This change will be protected with feature flags to only turn on for Presto 
DB users and they will need to set environment variables for 
   AWS region/Athena workgroup and Athena DB name, example below from .env file
   
   SUPERSET_REGION=eu-west-1
   SUPERSET_WORKGROUP=superset-etl
   SUPERSET_ATHENA_DB=my_superset_db
   
   Feature flags are to enable S3 download functionality and to hide existing 
CSV/XLSX default options (S3 download is faster than default download), from 
featureFlags.ts
   
   DownloadCSVFromS3 = 'DOWNLOAD_CSV_FROM_S3',
   ShowDefaultCSVOptions = 'SHOW_DEFAULT_CSV_OPTIONS',
   
   Option will appear in right click context menu
   
   
![image](https://github.com/user-attachments/assets/f95a001b-3156-484d-a7e0-fa5115d2751b)
   
   ### New or Changed Public Interfaces
   
   Reusing existing data endpoint used by CSV and XLSX default and full 
download.
   
   Passing in 'result_location', a new parameter to specify if exported file is 
to be built within Superset (current export) or S3 (new flow for Presto Athena).
   
   Changes to model for output_location, the returned presigned URL which is 
used to specify the file inside S3 bucket.
   
   All other changes in PR code changes.
   
   ### New dependencies
   
   No changes here
   
   ### Migration Plan and Compatibility
   
   No DB changes
   
   ### Rejected Alternatives
   
   Describe alternative approaches that were considered and rejected.
   
   Using lambda to get file from S3 and returning through proprietary 
application, rejected as users are already using Superset and it makes sense to 
allow them to download large files through Superset UI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to