Re: [PR] fix(charting): correctly categorize numeric columns with NULL values [superset]

via GitHub Thu, 17 Jul 2025 12:24:03 -0700


korbit-ai[bot] commented on code in PR #34213:
URL: https://github.com/apache/superset/pull/34213#discussion_r2214115690



##########
superset/utils/core.py:
##########
@@ -1503,6 +1514,71 @@
     return [col for col in map(get_column_name_from_metric, metrics) if col]
 
 
+def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
+    """把数据库类型映射成 pandas dtype 里 infer_dtype 可识别的类型字符串"""
+    if sql_type is None:
+        return "string"  # 默认类型
+    if "INT" in sql_type:
+        return "integer"
+    if "CHAR" in sql_type or "TEXT" in sql_type or "VARCHAR" in sql_type:
+        return "string"
+    if (
+        "DECIMAL" in sql_type
+        or "NUMERIC" in sql_type
+        or "FLOAT" in sql_type
+        or "DOUBLE" in sql_type
+    ):
+        return "floating"
+    if "BOOL" in sql_type:
+        return "boolean"
+    if "DATE" in sql_type or "TIME" in sql_type:
+        return "datetime64"
+    # 默认返回字符串类型
+    return "string"
+
+
+# 函数：根据列名获取度量类型
+def get_metric_type_from_column(column: Any, datasource: BaseDatasource | 
Query) -> str:
+    # 度量类型映射
+    metric_type_map = {
+        "SUM": "floating",
+        "AVG": "floating",
+        "COUNT": "floating",
+        "COUNT_DISTINCT": "floating",
+        "MIN": "floating",
+        "MAX": "floating",
+        "FIRST": "",
+        "LAST": "",
+    }
+
+    from superset.connectors.sqla.models import SqlMetric
+
+    # 1. 查找该列名是否是度量列
+    metric: SqlMetric = next(
+        (metric for metric in datasource.metrics if metric.metric_name == 
column),
+        SqlMetric(metric_name=""),
+    )

Review Comment:
   ### Invalid default metric initialization <sub>![category Error 
Handling](https://img.shields.io/badge/Error%20Handling-ea580c)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   The default SqlMetric with empty metric_name can cause issues when accessing 
the expression attribute later.
   
   
   ###### Why this matters
   When no metric is found, accessing metric.expression will raise an 
AttributeError because the default SqlMetric is not properly initialized.
   
   ###### Suggested change ∙ *Feature Preview*
   Return early if no metric is found:
   ```python
   metric: SqlMetric = next(
       (metric for metric in datasource.metrics if metric.metric_name == 
column),
       None,
   )
   
   if metric is None:
       return ""  # Return early if no metric found
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:c23fe1c9-429d-4504-8a15-920de9ecd48d -->
   
   
   [](c23fe1c9-429d-4504-8a15-920de9ecd48d)



##########
superset/utils/core.py:
##########
@@ -1503,6 +1514,71 @@ def get_column_names_from_metrics(metrics: list[Metric]) 
-> list[str]:
     return [col for col in map(get_column_name_from_metric, metrics) if col]
 
 
+def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
+    """把数据库类型映射成 pandas dtype 里 infer_dtype 可识别的类型字符串"""
+    if sql_type is None:
+        return "string"  # 默认类型

Review Comment:
   ### Mixed Language Documentation <sub>![category 
Readability](https://img.shields.io/badge/Readability-0284c7)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   Function comment is in Chinese while the codebase is in English, causing 
confusion and inconsistency in documentation.
   
   
   ###### Why this matters
   Mixed language documentation makes the code harder to maintain and 
understand for English-speaking developers.
   
   ###### Suggested change ∙ *Feature Preview*
   ```python
   def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
       """Maps database SQL types to pandas infer_dtype compatible type 
strings."""
       if sql_type is None:
           return "string"  # default type
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:2a9bdb6d-3e33-45a8-a326-6b38cbae507a -->
   
   
   [](2a9bdb6d-3e33-45a8-a326-6b38cbae507a)



##########
superset/utils/core.py:
##########
@@ -1503,6 +1514,71 @@
     return [col for col in map(get_column_name_from_metric, metrics) if col]
 
 
+def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
+    """把数据库类型映射成 pandas dtype 里 infer_dtype 可识别的类型字符串"""
+    if sql_type is None:
+        return "string"  # 默认类型
+    if "INT" in sql_type:
+        return "integer"
+    if "CHAR" in sql_type or "TEXT" in sql_type or "VARCHAR" in sql_type:
+        return "string"
+    if (
+        "DECIMAL" in sql_type
+        or "NUMERIC" in sql_type
+        or "FLOAT" in sql_type
+        or "DOUBLE" in sql_type
+    ):
+        return "floating"
+    if "BOOL" in sql_type:
+        return "boolean"
+    if "DATE" in sql_type or "TIME" in sql_type:
+        return "datetime64"
+    # 默认返回字符串类型
+    return "string"
+
+
+# 函数：根据列名获取度量类型
+def get_metric_type_from_column(column: Any, datasource: BaseDatasource | 
Query) -> str:
+    # 度量类型映射
+    metric_type_map = {
+        "SUM": "floating",
+        "AVG": "floating",
+        "COUNT": "floating",
+        "COUNT_DISTINCT": "floating",
+        "MIN": "floating",
+        "MAX": "floating",
+        "FIRST": "",
+        "LAST": "",
+    }

Review Comment:
   ### Constant dictionary defined inside function <sub>![category 
Design](https://img.shields.io/badge/Design-0d9488)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   The metric_type_map dictionary is defined inside the function, which means 
it will be recreated every time the function is called. This is inefficient and 
violates DRY principles.
   
   
   ###### Why this matters
   Creating the same dictionary on every function call wastes memory and CPU 
cycles, especially if this function is called frequently.
   
   ###### Suggested change ∙ *Feature Preview*
   Move the metric_type_map to module level as a constant:
   ```python
   METRIC_TYPE_MAP = {
       "SUM": "floating",
       "AVG": "floating",
       "COUNT": "floating",
       "COUNT_DISTINCT": "floating",
       "MIN": "floating",
       "MAX": "floating",
       "FIRST": "",
       "LAST": "",
   }
   
   def get_metric_type_from_column(column: Any, datasource: BaseDatasource | 
Query) -> str:
       # Use METRIC_TYPE_MAP constant
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:f9ba471d-634c-4d82-abd8-29b72453812b -->
   
   
   [](f9ba471d-634c-4d82-abd8-29b72453812b)



##########
superset/utils/core.py:
##########
@@ -1503,6 +1514,71 @@
     return [col for col in map(get_column_name_from_metric, metrics) if col]
 
 
+def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
+    """把数据库类型映射成 pandas dtype 里 infer_dtype 可识别的类型字符串"""
+    if sql_type is None:
+        return "string"  # 默认类型
+    if "INT" in sql_type:
+        return "integer"
+    if "CHAR" in sql_type or "TEXT" in sql_type or "VARCHAR" in sql_type:
+        return "string"
+    if (
+        "DECIMAL" in sql_type
+        or "NUMERIC" in sql_type
+        or "FLOAT" in sql_type
+        or "DOUBLE" in sql_type
+    ):
+        return "floating"
+    if "BOOL" in sql_type:
+        return "boolean"
+    if "DATE" in sql_type or "TIME" in sql_type:
+        return "datetime64"
+    # 默认返回字符串类型
+    return "string"
+
+
+# 函数：根据列名获取度量类型
+def get_metric_type_from_column(column: Any, datasource: BaseDatasource | 
Query) -> str:
+    # 度量类型映射
+    metric_type_map = {
+        "SUM": "floating",
+        "AVG": "floating",
+        "COUNT": "floating",
+        "COUNT_DISTINCT": "floating",
+        "MIN": "floating",
+        "MAX": "floating",
+        "FIRST": "",
+        "LAST": "",
+    }
+
+    from superset.connectors.sqla.models import SqlMetric
+
+    # 1. 查找该列名是否是度量列
+    metric: SqlMetric = next(
+        (metric for metric in datasource.metrics if metric.metric_name == 
column),
+        SqlMetric(metric_name=""),
+    )
+
+    if metric is None:
+        # 不是度量列
+        return ""
+
+    # 2. 提取度量的 SQL 表达式
+    expression: str = metric.expression
+
+    # 3. 使用正则表达式从 expression 中提取聚合操作（如 SUM, COUNT 等）
+    match = re.match(
+        r"(SUM|AVG|COUNT|COUNT_DISTINCT|MIN|MAX|FIRST|LAST)\((.*)\)", 
expression
+    )
+
+    if match:
+        operation = match.group(1)  # 聚合操作，如 SUM, COUNT
+        # 获取对应的 GenericDataType
+        return metric_type_map.get(operation, "")

Review Comment:
   ### Silent Error Masking with Empty String Default <sub>![category Error 
Handling](https://img.shields.io/badge/Error%20Handling-ea580c)</sub>
   
   <details>
     <summary>Tell me more</summary>
   
   ###### What is the issue?
   Silent fallback to empty string when looking up operation in metric_type_map
   
   
   ###### Why this matters
   Using get() with empty string default masks potential errors when an 
unexpected operation type is encountered, making debugging harder.
   
   ###### Suggested change ∙ *Feature Preview*
   Log unexpected operation types to aid debugging:
   ```python
   if match:
       operation = match.group(1)  # 聚合操作，如 SUM, COUNT 
       if operation not in metric_type_map:
           logger.warning("Unexpected metric operation type: %s", operation)
       return metric_type_map.get(operation, "")
   ```
   
   
   ###### Provide feedback to improve future suggestions
   [![Nice 
Catch](https://img.shields.io/badge/👍%20Nice%20Catch-71BC78)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde/upvote)
 
[![Incorrect](https://img.shields.io/badge/👎%20Incorrect-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde?what_not_true=true)
  [![Not in 
Scope](https://img.shields.io/badge/👎%20Out%20of%20PR%20scope-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde?what_out_of_scope=true)
 [![Not in coding 
standard](https://img.shields.io/badge/👎%20Not%20in%20our%20standards-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde?what_not_in_standard=true)
 
[![Other](https://img.shields.io/badge/👎%20Other-white)](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde)
   </details>
   
   <sub>
   
   💬 Looking for more details? Reply to this comment to chat with Korbit.
   </sub>
   
   <!--- korbi internal id:118152b7-b4a4-4a78-a98d-5f7a5721fb86 -->
   
   
   [](118152b7-b4a4-4a78-a98d-5f7a5721fb86)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(charting): correctly categorize numeric columns with NULL values [superset]

Reply via email to