DanielCarter-stack commented on PR #10638:
URL: https://github.com/apache/seatunnel/pull/10638#issuecomment-4111958086
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10638", "part": 1,
"total": 1} -->
## Issue 1: sourceType hardcoded causing incorrect NVARCHAR type display
**Location**: `DmdbTypeConverter.java:207-212`
```java
case DM_NVARCHAR:
case DM_NVARCHAR2:
builder.sourceType(String.format("%s(%s)", DM_NVARCHAR2,
typeDefine.getLength()));
builder.dataType(BasicType.STRING_TYPE);
builder.columnLength(TypeDefineUtils.charTo4ByteLength(typeDefine.getLength()));
break;
```
**Related Context**:
- Parent class/interface: `TypeConverter.java:41`
- Caller 1: `DamengCatalog.java:170`
- Caller 2: `DmdbTypeMapper.java:31`
- Caller 3: `DamengCreateTableSqlBuilder.java:123`
**Problem Description**:
The modified code uses the `DM_NVARCHAR2` constant to build sourceType in
both `DM_NVARCHAR` and `DM_NVARCHAR2` cases. This means:
- When the original type is `NVARCHAR`, the generated sourceType will
incorrectly display as `NVARCHAR2(...)`
- When the original type is `NVARCHAR2`, the generated sourceType correctly
displays as `NVARCHAR2(...)`
This results in loss of type information, and users cannot distinguish
whether the original column type is NVARCHAR or NVARCHAR2.
**Potential Risks**:
- Risk 1: Schema recognition error. Users will see incorrect type names when
viewing table structure
- Risk 2: Bidirectional synchronization issue. If synchronized from an
NVARCHAR column and then synchronized back, the type will become NVARCHAR2
- Risk 3: Difficult auditing and debugging. Types shown in logs do not match
actual values
**Scope of Impact**:
- Direct impact: All tasks using Dameng data source and containing NVARCHAR
columns
- Indirect impact: Schema inference, table structure queries, CREATE TABLE
statement generation
- Affected area: Dameng Connector
**Severity**: MAJOR
**Improvement Suggestion**:
```java
case DM_NVARCHAR:
case DM_NVARCHAR2:
// Use dmType variable instead of hardcoding DM_NVARCHAR2
builder.sourceType(String.format("%s(%s)", dmType,
typeDefine.getLength()));
builder.dataType(BasicType.STRING_TYPE);
builder.columnLength(TypeDefineUtils.charTo4ByteLength(typeDefine.getLength()));
break;
```
**Reason**:
`dmType` is the variable of the switch statement (line 125), containing the
actual type name (uppercase). Using `dmType` ensures:
1. NVARCHAR generates `NVARCHAR(length)`
2. NVARCHAR2 generates `NVARCHAR2(length)`
3. Consistent with handling of other types (e.g., `DM_CHAR`, `DM_VARCHAR2`
both use `dmType` or explicit corresponding constants)
Refer to the handling of `DM_CHAR` and `DM_CHARACTER` (lines 195-200), which
are handled together but maintain their respective sourceTypes.
---
## Issue 2: Missing test cases for NVARCHAR2 type
**Location**: `DmdbTypeConverterTest.java`
**Related Context**:
- Test class: `DmdbTypeConverterTest.java`
- Existing tests: `testNvarchar()` (lines 350-363)
**Problem Description**:
The PR adds NVARCHAR2 type support but does not add corresponding unit test
cases. Existing tests only have `testNvarchar()`, covering NVARCHAR type.
**Potential Risks**:
- Risk 1: Code changes are not verified by tests, ensuring functionality
correctness
- Risk 2: Future refactoring may break NVARCHAR2 support without detection
- Risk 3: CI/CD cannot capture regression issues
**Scope of Impact**:
- Direct impact: Reliability of NVARCHAR2 type
- Indirect impact: Code quality and maintainability
- Affected area: Dameng Connector test coverage
**Severity**: MAJOR
**Improvement Suggestion**:
```java
@Test
public void testConvertNvarchar2() {
BasicTypeDefine<Object> typeDefine =
BasicTypeDefine.builder()
.name("test")
.columnType("nvarchar2(10)")
.dataType("nvarchar2")
.length(10L)
.build();
Column column = DmdbTypeConverter.INSTANCE.convert(typeDefine);
Assertions.assertEquals(typeDefine.getName(), column.getName());
Assertions.assertEquals(BasicType.STRING_TYPE, column.getDataType());
Assertions.assertEquals(40, column.getColumnLength()); // 10 * 4
Assertions.assertEquals(typeDefine.getColumnType(),
column.getSourceType().toLowerCase());
}
@Test
public void testConvertNvarchar2WithoutLength() {
BasicTypeDefine<Object> typeDefine =
BasicTypeDefine.builder()
.name("test")
.columnType("nvarchar2")
.dataType("nvarchar2")
.build();
Column column = DmdbTypeConverter.INSTANCE.convert(typeDefine);
Assertions.assertEquals(typeDefine.getName(), column.getName());
Assertions.assertEquals(BasicType.STRING_TYPE, column.getDataType());
// charTo4ByteLength returns null when length is null
Assertions.assertNull(column.getColumnLength());
Assertions.assertEquals("nvarchar2",
column.getSourceType().toLowerCase());
}
```
**Reason**:
1. Adding test cases is a basic requirement for code quality
2. Verify correct conversion of NVARCHAR2 type
3. Ensure consistent behavior with NVARCHAR type
4. Provide regression protection for future refactoring
5. Follow existing test patterns in the project
---
## Issue 3: Missing comment explaining NVARCHAR vs NVARCHAR2 differences
**Location**: `DmdbTypeConverter.java:207-212`
**Related Context**:
- DM database documentation reference (line 35)
- String type constant definitions (lines 64-74)
**Problem Description**:
The code handles NVARCHAR and NVARCHAR2 together, but there is no comment
explaining the differences between these two types in Dameng database and why
they can be uniformly mapped to SeaTunnel's STRING_TYPE.
**Potential Risks**:
- Risk 1: Reduced code readability, maintainers do not understand why they
are merged
- Risk 2: Future developers may mistakenly think differentiation is needed,
leading to unnecessary complexity
**Scope of Impact**:
- Direct impact: Code maintainability
- Affected area: Dameng TypeConverter
**Severity**: MINOR
**Improvement Suggestion**:
```java
// NVARCHAR and NVARCHAR2 are both Unicode character types in DM database
// They support storing multi-byte characters (Unicode) and can be treated
the same way
// Reference:
https://eco.dameng.com/document/dm/zh-cn/sql-dev/dmpl-sql-datatype.html
case DM_NVARCHAR:
case DM_NVARCHAR2:
builder.sourceType(String.format("%s(%s)", dmType,
typeDefine.getLength()));
builder.dataType(BasicType.STRING_TYPE);
builder.columnLength(TypeDefineUtils.charTo4ByteLength(typeDefine.getLength()));
break;
```
**Reason**:
1. Explain why two types can be handled uniformly
2. Provide official documentation reference
3. Help maintainers understand design intent
4. Avoid unnecessary future refactoring
---
## Issue 4: reconvert method does not consider NVARCHAR/NVARCHAR2 distinction
**Location**: `DmdbTypeConverter.java:422-435`
```java
case STRING:
builder.length(column.getColumnLength());
if (column.getColumnLength() == null || column.getColumnLength() <= 0) {
builder.columnType(DM_TEXT);
builder.dataType(DM_TEXT);
} else if (column.getColumnLength() <= MAX_CHAR_LENGTH_FOR_PAGE_4K) {
builder.columnType(
String.format("%s(%s)", DM_VARCHAR2,
column.getColumnLength()));
builder.dataType(DM_VARCHAR2);
} else {
builder.columnType(DM_TEXT);
builder.dataType(DM_TEXT);
}
break;
```
**Related Context**:
- convert method: lines 207-212
- Caller: `DamengCreateTableSqlBuilder.java:123`
**Problem Description**:
When converting from SeaTunnel Column back to Dameng type, the `reconvert`
method uniformly uses VARCHAR2, without considering that the original type
might be NVARCHAR or NVARCHAR2. Although this is not a bug (because it is
functionally acceptable), it results in loss of type information.
**Potential Risks**:
- Risk 1: If users synchronize from an NVARCHAR column to SeaTunnel and then
synchronize back to Dameng, the type will become VARCHAR2
- Risk 2: For scenarios with specific type requirements (e.g., needing
NVARCHAR's Unicode support), this may not be sufficient
**Scope of Impact**:
- Direct impact: Bidirectional synchronization scenarios
- Indirect impact: Type selection when creating target tables
- Affected area: Dameng Connector
**Severity**: MINOR (because functionally acceptable)
**Improvement Suggestion**:
This requires more complex design. Consider:
1. Store original type information in Column (sourceType already exists)
2. Prioritize using sourceType when reconverting
However, considering:
- VARCHAR2 also supports Unicode (Dameng database feature)
- Simplifying implementation is a reasonable design decision
- This is an optimization item rather than a mandatory fix
It is recommended to not make changes for now, but document this behavior.
**Reason**:
1. Dameng database's VARCHAR2 supports Unicode (similar to NVARCHAR)
2. Simplifying implementation reduces complexity
3. If differentiation is truly needed in the future, Column metadata can be
extended
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]