dosubot[bot] commented on issue #34718:
URL: https://github.com/apache/superset/issues/34718#issuecomment-3197815314
<!-- Answer -->
Here's a starting point for a PR to improve type inference for virtual
datasets (especially DuckDB) in Superset:
1. **Extend DuckDBEngineSpec type mappings**
Patch `superset/db_engine_specs/duckdb.py` to recognize more DuckDB type
strings as numeric. For example:
```python
import re
from sqlalchemy import types
from superset.utils.core import GenericDataType
from superset.db_engine_specs.duckdb import DuckDBEngineSpec
DuckDBEngineSpec.column_type_mappings = (
(re.compile(r"^double$", re.IGNORECASE), types.Float(),
GenericDataType.NUMERIC),
(re.compile(r"^double precision$", re.IGNORECASE), types.Float(),
GenericDataType.NUMERIC),
(re.compile(r"^float$", re.IGNORECASE), types.Float(),
GenericDataType.NUMERIC),
# Add more DuckDB-specific patterns as needed
) + getattr(DuckDBEngineSpec, "column_type_mappings", ())
```
This ensures columns with type strings like `DOUBLE`, `DOUBLE PRECISION`, or
`FLOAT` are recognized as numeric for both physical and virtual datasets
[[source]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/db_engine_specs/base.py).
2. **Fallback to pandas dtype inference for virtual datasets**
In `superset/result_set.py`, `SupersetResultSet` already uses pyarrow/pandas
to infer types when cursor descriptions are ambiguous. To make this more
robust, you can enhance the fallback logic so that if the DB-API type string is
missing or unrecognized, pandas dtype inference (e.g., `float64`, `double`) is
mapped to Superset's `GenericDataType.NUMERIC`
[[source]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/result_set.py).
3. **Integration points**
Key files for this work:
- `superset/db_engine_specs/duckdb.py` (engine-specific regex mappings)
- `superset/db_engine_specs/base.py` (type mapping logic)
- `superset/result_set.py` (`SupersetResultSet` type inference and fallback)
- `superset/connectors/sqla/utils.py` (`get_virtual_table_metadata` pipeline)
4. **Testing**
Test with virtual datasets using DuckDB columns of type `DOUBLE`, `DOUBLE
PRECISION`, and `FLOAT`. Confirm that these columns show the `#` icon and are
eligible for numeric chart controls (e.g., histograms).
This approach will make type recognition for virtual datasets more robust
and consistent with physical datasets. Let me know if you want more detailed
patch examples or guidance for a specific part of the pipeline.
<!-- Dosu Comment Footer -->
*To reply, just mention [@dosu](https://go.dosu.dev/dosubot).*
---
<sup>How did I do?
[Good](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=great_response)
|
[Irrelevant](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=irrelevant_answer)
|
[Incorrect](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=incorrect_sources)
|
[Verbose](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=too_verbose)
|
[Hallucination](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=hallucination)
| [Report
🐛](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=bug_report)
|
[Other](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=other)</sup> [](https://app.dosu.dev/a28d3c7e-a9d3-459e-9fb6-3a6f9ff4f357/ask?utm_source=githu
b) [](https://go.dosu.dev/discord-bot) [](https://twitter.com/intent/tweet?text=%40dosu_ai%20helped%20me%20solve%20this%20issue!&url=https%3A//github.com/apache/superset/issues/34718)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]