frlm opened a new pull request, #30961:
URL: https://github.com/apache/superset/pull/30961
**Title:** fix(csv_export): use custom CSV_EXPORT parameters in pd.read_csv
### Bug description
Function: apply_post_process
The issue is that `pd.read_csv` uses the default values of pandas instead of
the parameters defined in `CSV_EXPORT` in `superset_config`. This problem is
rarely noticeable when using the separator `,` and the decimal `.`. However,
with the configuration `CSV_EXPORT='{"encoding": "utf-8", "sep": ";",
"decimal": ","}'`, the issue becomes evident. This change ensures that
`pd.read_csv` uses the parameters defined in `CSV_EXPORT`.
**Steps to reproduce error**:
- Configure `CSV_EXPORT` with the following parameters:
```python
CSV_EXPORT = {
"encoding": "utf-8",
"sep": ";",
"decimal": ","
}
- Open a default chart in Superset of the Pivot Table type. In this example,
we are using Pivot Table v2 within the USA Births Names dashboard:

- Click on Download > **Export to Pivoted .CSV**

- Download is blocked by an error.
**Cause**: The error is generated by an anomaly in the input DataFrame df,
which has the following format (a single column with all distinct fields
separated by a semicolon separator):
```
,state;name;sum__num
0,other;Michael;1047996
1,other;Christopher;803607
2,other;James;749686
```
**Fix**: Added a bug fix to read data with right CSV_EXPORT settings
**Code Changes:**
~~~python
elif query["result_format"] == ChartDataResultFormat.CSV:
df = pd.read_csv(StringIO(data),
delimiter=superset_config.CSV_EXPORT.get('sep'),
encoding=superset_config.CSV_EXPORT.get('encoding'),
decimal=superset_config.CSV_EXPORT.get('decimal'))
~~~
**Complete Code**
~~~python
def apply_post_process(
result: dict[Any, Any],
form_data: Optional[dict[str, Any]] = None,
datasource: Optional[Union["BaseDatasource", "Query"]] = None,
) -> dict[Any, Any]:
form_data = form_data or {}
viz_type = form_data.get("viz_type")
if viz_type not in post_processors:
return result
post_processor = post_processors[viz_type]
for query in result["queries"]:
if query["result_format"] not in (rf.value for rf in
ChartDataResultFormat):
raise Exception( # pylint: disable=broad-exception-raised
f"Result format {query['result_format']} not supported"
)
data = query["data"]
if isinstance(data, str):
data = data.strip()
if not data:
# do not try to process empty data
continue
if query["result_format"] == ChartDataResultFormat.JSON:
df = pd.DataFrame.from_dict(data)
elif query["result_format"] == ChartDataResultFormat.CSV:
df = pd.read_csv(StringIO(data),
delimiter=superset_config.CSV_EXPORT.get('sep'),
encoding=superset_config.CSV_EXPORT.get('encoding'),
decimal=superset_config.CSV_EXPORT.get('decimal'))
# convert all columns to verbose (label) name
if datasource:
df.rename(columns=datasource.data["verbose_map"], inplace=True)
processed_df = post_processor(df, form_data, datasource)
query["colnames"] = list(processed_df.columns)
query["indexnames"] = list(processed_df.index)
query["coltypes"] = extract_dataframe_dtypes(processed_df,
datasource)
query["rowcount"] = len(processed_df.index)
# Flatten hierarchical columns/index since they are represented as
# `Tuple[str]`. Otherwise encoding to JSON later will fail because
# maps cannot have tuples as their keys in JSON.
processed_df.columns = [
" ".join(str(name) for name in column).strip()
if isinstance(column, tuple)
else column
for column in processed_df.columns
]
processed_df.index = [
" ".join(str(name) for name in index).strip()
if isinstance(index, tuple)
else index
for index in processed_df.index
]
if query["result_format"] == ChartDataResultFormat.JSON:
query["data"] = processed_df.to_dict()
elif query["result_format"] == ChartDataResultFormat.CSV:
buf = StringIO()
processed_df.to_csv(buf)
buf.seek(0)
query["data"] = buf.getvalue()
return result
~~~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]