dosubot[bot] commented on issue #34718:
URL: https://github.com/apache/superset/issues/34718#issuecomment-3197815314

   <!-- Answer -->
   Here's a starting point for a PR to improve type inference for virtual 
datasets (especially DuckDB) in Superset:
   
   1. **Extend DuckDBEngineSpec type mappings**  
   Patch `superset/db_engine_specs/duckdb.py` to recognize more DuckDB type 
strings as numeric. For example:
   
   ```python
   import re
   from sqlalchemy import types
   from superset.utils.core import GenericDataType
   from superset.db_engine_specs.duckdb import DuckDBEngineSpec
   
   DuckDBEngineSpec.column_type_mappings = (
       (re.compile(r"^double$", re.IGNORECASE), types.Float(), 
GenericDataType.NUMERIC),
       (re.compile(r"^double precision$", re.IGNORECASE), types.Float(), 
GenericDataType.NUMERIC),
       (re.compile(r"^float$", re.IGNORECASE), types.Float(), 
GenericDataType.NUMERIC),
       # Add more DuckDB-specific patterns as needed
   ) + getattr(DuckDBEngineSpec, "column_type_mappings", ())
   ```
   This ensures columns with type strings like `DOUBLE`, `DOUBLE PRECISION`, or 
`FLOAT` are recognized as numeric for both physical and virtual datasets 
[[source]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/db_engine_specs/base.py).
   
   2. **Fallback to pandas dtype inference for virtual datasets**  
   In `superset/result_set.py`, `SupersetResultSet` already uses pyarrow/pandas 
to infer types when cursor descriptions are ambiguous. To make this more 
robust, you can enhance the fallback logic so that if the DB-API type string is 
missing or unrecognized, pandas dtype inference (e.g., `float64`, `double`) is 
mapped to Superset's `GenericDataType.NUMERIC` 
[[source]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/result_set.py).
   
   3. **Integration points**  
   Key files for this work:  
   - `superset/db_engine_specs/duckdb.py` (engine-specific regex mappings)  
   - `superset/db_engine_specs/base.py` (type mapping logic)  
   - `superset/result_set.py` (`SupersetResultSet` type inference and fallback) 
 
   - `superset/connectors/sqla/utils.py` (`get_virtual_table_metadata` pipeline)
   
   4. **Testing**  
   Test with virtual datasets using DuckDB columns of type `DOUBLE`, `DOUBLE 
PRECISION`, and `FLOAT`. Confirm that these columns show the `#` icon and are 
eligible for numeric chart controls (e.g., histograms).
   
   This approach will make type recognition for virtual datasets more robust 
and consistent with physical datasets. Let me know if you want more detailed 
patch examples or guidance for a specific part of the pipeline.
   
   
   <!-- Dosu Comment Footer -->
   *To reply, just mention [@dosu](https://go.dosu.dev/dosubot).*
   
   ---
   <sup>How did I do? 
[Good](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=great_response)
 | 
[Irrelevant](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=irrelevant_answer)
 | 
[Incorrect](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=incorrect_sources)
 | 
[Verbose](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=too_verbose)
 | 
[Hallucination](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=hallucination)
 | [Report 
🐛](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=bug_report)
 | 
[Other](https://app.dosu.dev/response-feedback/e07c4b6e-4452-412f-ad18-757b84a3e496?feedback_type=other)</sup>&nbsp;&nbsp;[![Chat
 with 
Dosu](https://dosu.dev/dosu-chat-badge.svg)](https://app.dosu.dev/a28d3c7e-a9d3-459e-9fb6-3a6f9ff4f357/ask?utm_source=githu
 b)&nbsp;[![Join 
Discord](https://img.shields.io/badge/join-5865F2?logo=discord&logoColor=white&label=)](https://go.dosu.dev/discord-bot)&nbsp;[![Share
 on 
X](https://img.shields.io/badge/X-share-black)](https://twitter.com/intent/tweet?text=%40dosu_ai%20helped%20me%20solve%20this%20issue!&url=https%3A//github.com/apache/superset/issues/34718)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to