frlm commented on PR #30961: URL: https://github.com/apache/superset/pull/30961#issuecomment-2485508993
### UPDATE: Export Pivot Tables into CSV Format In the last commit, I made a small change to export pivot tables without flattening multi-index rows/columns. This feature, initially implemented by the community for managing JSON export, is particularly unhelpful when dealing with pivot tables exported into CSV format with a large number of columns corresponding to a multi-index. To better explain the issue, I have attached an example. If I create a pivot table with many columns associated with multi-index rows, the flattening process transforms them into a single column with a field value obtained by concatenating all fields separated by a space. This approach is not very effective if we want to use these exports in Excel or other tools. ##### Example: I created this pivot table using your example dataset:  The AS-IS behavior generates this type of file, where the MultiIndex columns get collapsed into one row and MultiIndex Rows get merged into a single column, obtained by concatenating all fields separated by a space. This export is not very easy to use in Excel for the next steps.  [export_as_is_20241119_101212.csv](https://github.com/user-attachments/files/17814564/export_as_is_20241119_101212.csv) The TO-BE behavior generates this type of file, where the MultiIndex rows/columns get preserved:  [export_to_be_20241119_100853.csv](https://github.com/user-attachments/files/17814637/export_to_be_20241119_100853.csv) #### Code Change ~~~python if query["result_format"] == ChartDataResultFormat.JSON: # Flatten hierarchical columns/index since they are represented as # `Tuple[str]`. Otherwise encoding to JSON later will fail because # maps cannot have tuples as their keys in JSON. processed_df.columns = [ " ".join(str(name) for name in column).strip() if isinstance(column, tuple) else column for column in processed_df.columns ] processed_df.index = [ " ".join(str(name) for name in index).strip() if isinstance(index, tuple) else index for index in processed_df.index ] query["data"] = processed_df.to_dict() elif query["result_format"] == ChartDataResultFormat.CSV: buf = StringIO() processed_df.to_csv(buf) buf.seek(0) query["data"] = buf.getvalue() ~~~ #### Code Complete ~~~python def apply_post_process( result: dict[Any, Any], form_data: Optional[dict[str, Any]] = None, datasource: Optional[Union["BaseDatasource", "Query"]] = None, ) -> dict[Any, Any]: form_data = form_data or {} viz_type = form_data.get("viz_type") if viz_type not in post_processors: return result post_processor = post_processors[viz_type] for query in result["queries"]: if query["result_format"] not in (rf.value for rf in ChartDataResultFormat): raise Exception( # pylint: disable=broad-exception-raised f"Result format {query['result_format']} not supported" ) data = query["data"] if isinstance(data, str): data = data.strip() if not data: # do not try to process empty data continue if query["result_format"] == ChartDataResultFormat.JSON: df = pd.DataFrame.from_dict(data) elif query["result_format"] == ChartDataResultFormat.CSV: df = pd.read_csv(StringIO(data), delimiter=superset_config.CSV_EXPORT.get('sep'), encoding=superset_config.CSV_EXPORT.get('encoding'), decimal=superset_config.CSV_EXPORT.get('decimal')) # convert all columns to verbose (label) name if datasource: df.rename(columns=datasource.data["verbose_map"], inplace=True) processed_df = post_processor(df, form_data, datasource) query["colnames"] = list(processed_df.columns) query["indexnames"] = list(processed_df.index) query["coltypes"] = extract_dataframe_dtypes(processed_df, datasource) query["rowcount"] = len(processed_df.index) if query["result_format"] == ChartDataResultFormat.JSON: # Flatten hierarchical columns/index since they are represented as # `Tuple[str]`. Otherwise encoding to JSON later will fail because # maps cannot have tuples as their keys in JSON. processed_df.columns = [ " ".join(str(name) for name in column).strip() if isinstance(column, tuple) else column for column in processed_df.columns ] processed_df.index = [ " ".join(str(name) for name in index).strip() if isinstance(index, tuple) else index for index in processed_df.index ] query["data"] = processed_df.to_dict() elif query["result_format"] == ChartDataResultFormat.CSV: buf = StringIO() processed_df.to_csv(buf) buf.seek(0) query["data"] = buf.getvalue() return result ~~~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
