Hisoka-X commented on code in PR #9445: URL: https://github.com/apache/seatunnel/pull/9445#discussion_r2179984160
########## seatunnel-e2e/seatunnel-transforms-v2-e2e/seatunnel-transforms-v2-e2e-part-1/src/test/java/org/apache/seatunnel/e2e/transform/TestDataValidatorIT.java: ########## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.seatunnel.e2e.transform; + +import org.apache.seatunnel.e2e.common.container.TestContainer; + +import org.junit.jupiter.api.Assertions; +import org.junit.jupiter.api.TestTemplate; +import org.testcontainers.containers.Container; + +import java.io.IOException; + +public class TestDataValidatorIT extends TestSuiteBase { + + @TestTemplate + public void testDataValidatorWithValidData(TestContainer container) + throws IOException, InterruptedException { + Container.ExecResult execResult = container.executeJob("/data_validator_valid.conf"); + Assertions.assertEquals(0, execResult.getExitCode()); + } + + @TestTemplate + public void testDataValidatorWithSkipMode(TestContainer container) + throws IOException, InterruptedException { + Container.ExecResult execResult = container.executeJob("/data_validator_skip.conf"); + Assertions.assertEquals(0, execResult.getExitCode()); + } + + @TestTemplate + public void testDataValidatorWithFailMode(TestContainer container) + throws IOException, InterruptedException { + Container.ExecResult execResult = container.executeJob("/data_validator_fail.conf"); + // Should fail due to validation errors + Assertions.assertNotEquals(0, execResult.getExitCode()); Review Comment: Could you add a check for error message in logs? ########## docs/en/transform-v2/data-validator.md: ########## @@ -0,0 +1,322 @@ +# DataValidator + +> Data validation transform plugin + +## Description + +The DataValidator transform validates field values according to configured rules and handles validation failures based on the specified error handling strategy. It supports multiple validation rule types including null checks, range validation, length validation, and regex pattern matching. + +## Options + +| name | type | required | default value | +|-----------------|--------|----------|---------------| +| error_handle_way| enum | no | FAIL | +| error_table | string | no | | +| field_rules | array | yes | | + +### error_handle_way [enum] + +Error handling strategy when validation fails: +- `FAIL`: Fail the entire task when validation errors occur +- `SKIP`: Skip invalid rows and continue processing +- `ROUTE_TO_TABLE`: Route invalid data to a specified error table + +**Note**: `ROUTE_TO_TABLE` mode only works with sinks that support multiple tables. The sink must be capable of handling data routed to different table destinations. + +### error_table [string] + +Target table name for routing invalid data when `error_handle_way` is set to `ROUTE_TO_TABLE`. This parameter is required when using `ROUTE_TO_TABLE` mode. + +#### Error Table Schema + +When using `ROUTE_TO_TABLE` mode, DataValidator automatically creates an error table with a fixed schema to store validation failure data. The error table contains the following fields: + +| Field Name | Data Type | Description | +|------------|-----------|-------------| +| source_table_id | STRING | Source table identifier that identifies the originating table | +| source_table_path | STRING | Source table path with complete table path information | +| original_data | STRING | JSON representation of the original data containing the complete row that failed validation | +| validation_errors | STRING | JSON array of validation error details containing all failed fields and error information | Review Comment: any example for this two data value format? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
