korbit-ai[bot] commented on code in PR #34213:
URL: https://github.com/apache/superset/pull/34213#discussion_r2214115690
##########
superset/utils/core.py:
##########
@@ -1503,6 +1514,71 @@
return [col for col in map(get_column_name_from_metric, metrics) if col]
+def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
+ """把数据库类型映射成 pandas dtype 里 infer_dtype 可识别的类型字符串"""
+ if sql_type is None:
+ return "string" # 默认类型
+ if "INT" in sql_type:
+ return "integer"
+ if "CHAR" in sql_type or "TEXT" in sql_type or "VARCHAR" in sql_type:
+ return "string"
+ if (
+ "DECIMAL" in sql_type
+ or "NUMERIC" in sql_type
+ or "FLOAT" in sql_type
+ or "DOUBLE" in sql_type
+ ):
+ return "floating"
+ if "BOOL" in sql_type:
+ return "boolean"
+ if "DATE" in sql_type or "TIME" in sql_type:
+ return "datetime64"
+ # 默认返回字符串类型
+ return "string"
+
+
+# 函数:根据列名获取度量类型
+def get_metric_type_from_column(column: Any, datasource: BaseDatasource |
Query) -> str:
+ # 度量类型映射
+ metric_type_map = {
+ "SUM": "floating",
+ "AVG": "floating",
+ "COUNT": "floating",
+ "COUNT_DISTINCT": "floating",
+ "MIN": "floating",
+ "MAX": "floating",
+ "FIRST": "",
+ "LAST": "",
+ }
+
+ from superset.connectors.sqla.models import SqlMetric
+
+ # 1. 查找该列名是否是度量列
+ metric: SqlMetric = next(
+ (metric for metric in datasource.metrics if metric.metric_name ==
column),
+ SqlMetric(metric_name=""),
+ )
Review Comment:
### Invalid default metric initialization <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The default SqlMetric with empty metric_name can cause issues when accessing
the expression attribute later.
###### Why this matters
When no metric is found, accessing metric.expression will raise an
AttributeError because the default SqlMetric is not properly initialized.
###### Suggested change ∙ *Feature Preview*
Return early if no metric is found:
```python
metric: SqlMetric = next(
(metric for metric in datasource.metrics if metric.metric_name ==
column),
None,
)
if metric is None:
return "" # Return early if no metric found
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/d25e9169-ea23-46f6-9825-23ada1c3bcb8)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:c23fe1c9-429d-4504-8a15-920de9ecd48d -->
[](c23fe1c9-429d-4504-8a15-920de9ecd48d)
##########
superset/utils/core.py:
##########
@@ -1503,6 +1514,71 @@ def get_column_names_from_metrics(metrics: list[Metric])
-> list[str]:
return [col for col in map(get_column_name_from_metric, metrics) if col]
+def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
+ """把数据库类型映射成 pandas dtype 里 infer_dtype 可识别的类型字符串"""
+ if sql_type is None:
+ return "string" # 默认类型
Review Comment:
### Mixed Language Documentation <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
Function comment is in Chinese while the codebase is in English, causing
confusion and inconsistency in documentation.
###### Why this matters
Mixed language documentation makes the code harder to maintain and
understand for English-speaking developers.
###### Suggested change ∙ *Feature Preview*
```python
def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
"""Maps database SQL types to pandas infer_dtype compatible type
strings."""
if sql_type is None:
return "string" # default type
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/31f0cc4f-403b-46c1-9d51-06831cb4074e)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:2a9bdb6d-3e33-45a8-a326-6b38cbae507a -->
[](2a9bdb6d-3e33-45a8-a326-6b38cbae507a)
##########
superset/utils/core.py:
##########
@@ -1503,6 +1514,71 @@
return [col for col in map(get_column_name_from_metric, metrics) if col]
+def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
+ """把数据库类型映射成 pandas dtype 里 infer_dtype 可识别的类型字符串"""
+ if sql_type is None:
+ return "string" # 默认类型
+ if "INT" in sql_type:
+ return "integer"
+ if "CHAR" in sql_type or "TEXT" in sql_type or "VARCHAR" in sql_type:
+ return "string"
+ if (
+ "DECIMAL" in sql_type
+ or "NUMERIC" in sql_type
+ or "FLOAT" in sql_type
+ or "DOUBLE" in sql_type
+ ):
+ return "floating"
+ if "BOOL" in sql_type:
+ return "boolean"
+ if "DATE" in sql_type or "TIME" in sql_type:
+ return "datetime64"
+ # 默认返回字符串类型
+ return "string"
+
+
+# 函数:根据列名获取度量类型
+def get_metric_type_from_column(column: Any, datasource: BaseDatasource |
Query) -> str:
+ # 度量类型映射
+ metric_type_map = {
+ "SUM": "floating",
+ "AVG": "floating",
+ "COUNT": "floating",
+ "COUNT_DISTINCT": "floating",
+ "MIN": "floating",
+ "MAX": "floating",
+ "FIRST": "",
+ "LAST": "",
+ }
Review Comment:
### Constant dictionary defined inside function <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The metric_type_map dictionary is defined inside the function, which means
it will be recreated every time the function is called. This is inefficient and
violates DRY principles.
###### Why this matters
Creating the same dictionary on every function call wastes memory and CPU
cycles, especially if this function is called frequently.
###### Suggested change ∙ *Feature Preview*
Move the metric_type_map to module level as a constant:
```python
METRIC_TYPE_MAP = {
"SUM": "floating",
"AVG": "floating",
"COUNT": "floating",
"COUNT_DISTINCT": "floating",
"MIN": "floating",
"MAX": "floating",
"FIRST": "",
"LAST": "",
}
def get_metric_type_from_column(column: Any, datasource: BaseDatasource |
Query) -> str:
# Use METRIC_TYPE_MAP constant
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/04c45011-9521-4bc9-b313-480af24226eb)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:f9ba471d-634c-4d82-abd8-29b72453812b -->
[](f9ba471d-634c-4d82-abd8-29b72453812b)
##########
superset/utils/core.py:
##########
@@ -1503,6 +1514,71 @@
return [col for col in map(get_column_name_from_metric, metrics) if col]
+def map_sql_type_to_inferred_type(sql_type: Optional[str]) -> str:
+ """把数据库类型映射成 pandas dtype 里 infer_dtype 可识别的类型字符串"""
+ if sql_type is None:
+ return "string" # 默认类型
+ if "INT" in sql_type:
+ return "integer"
+ if "CHAR" in sql_type or "TEXT" in sql_type or "VARCHAR" in sql_type:
+ return "string"
+ if (
+ "DECIMAL" in sql_type
+ or "NUMERIC" in sql_type
+ or "FLOAT" in sql_type
+ or "DOUBLE" in sql_type
+ ):
+ return "floating"
+ if "BOOL" in sql_type:
+ return "boolean"
+ if "DATE" in sql_type or "TIME" in sql_type:
+ return "datetime64"
+ # 默认返回字符串类型
+ return "string"
+
+
+# 函数:根据列名获取度量类型
+def get_metric_type_from_column(column: Any, datasource: BaseDatasource |
Query) -> str:
+ # 度量类型映射
+ metric_type_map = {
+ "SUM": "floating",
+ "AVG": "floating",
+ "COUNT": "floating",
+ "COUNT_DISTINCT": "floating",
+ "MIN": "floating",
+ "MAX": "floating",
+ "FIRST": "",
+ "LAST": "",
+ }
+
+ from superset.connectors.sqla.models import SqlMetric
+
+ # 1. 查找该列名是否是度量列
+ metric: SqlMetric = next(
+ (metric for metric in datasource.metrics if metric.metric_name ==
column),
+ SqlMetric(metric_name=""),
+ )
+
+ if metric is None:
+ # 不是度量列
+ return ""
+
+ # 2. 提取度量的 SQL 表达式
+ expression: str = metric.expression
+
+ # 3. 使用正则表达式从 expression 中提取聚合操作(如 SUM, COUNT 等)
+ match = re.match(
+ r"(SUM|AVG|COUNT|COUNT_DISTINCT|MIN|MAX|FIRST|LAST)\((.*)\)",
expression
+ )
+
+ if match:
+ operation = match.group(1) # 聚合操作,如 SUM, COUNT
+ # 获取对应的 GenericDataType
+ return metric_type_map.get(operation, "")
Review Comment:
### Silent Error Masking with Empty String Default <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
Silent fallback to empty string when looking up operation in metric_type_map
###### Why this matters
Using get() with empty string default masks potential errors when an
unexpected operation type is encountered, making debugging harder.
###### Suggested change ∙ *Feature Preview*
Log unexpected operation types to aid debugging:
```python
if match:
operation = match.group(1) # 聚合操作,如 SUM, COUNT
if operation not in metric_type_map:
logger.warning("Unexpected metric operation type: %s", operation)
return metric_type_map.get(operation, "")
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/7afb76d9-da51-4b95-b995-0505d98fbbde)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:118152b7-b4a4-4a78-a98d-5f7a5721fb86 -->
[](118152b7-b4a4-4a78-a98d-5f7a5721fb86)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]