[GitHub] [seatunnel] MonsterChenzhuo opened a new pull request, #5143: [WIP][TEST] Locate the problem of e2e instability
MonsterChenzhuo opened a new pull request, #5143: URL: https://github.com/apache/seatunnel/pull/5143 ## Purpose of this pull request ## Check list * [ ] Code changed are covered with tests, or it does not need tests for reason: * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update change log that in connector document. For more details you can refer to [connector-v2](https://github.com/apache/seatunnel/tree/dev/docs/en/connector-v2) 2. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 3. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] hailin0 commented on pull request #5045: [Improve][Connector[File] Optimize files commit order
hailin0 commented on PR #5045: URL: https://github.com/apache/seatunnel/pull/5045#issuecomment-1647365304 @EricJoy2048 @Hisoka-X @ic4y PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] hailin0 merged pull request #5045: [Improve][Connector[File] Optimize files commit order
hailin0 merged PR #5045: URL: https://github.com/apache/seatunnel/pull/5045 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[seatunnel] branch dev updated: [Improve][Connector[File] Optimize files commit order (#5045)
This is an automated email from the ASF dual-hosted git repository. wanghailin pushed a commit to branch dev in repository https://gitbox.apache.org/repos/asf/seatunnel.git The following commit(s) were added to refs/heads/dev by this push: new 1e18a8c530 [Improve][Connector[File] Optimize files commit order (#5045) 1e18a8c530 is described below commit 1e18a8c5303fb071d6c4d5d8dc893b8882c3bc4a Author: hailin0 AuthorDate: Mon Jul 24 15:35:27 2023 +0800 [Improve][Connector[File] Optimize files commit order (#5045) Before using `HashMap` store files path, so every checkpoint file commit is out of order. Now switch to using `LinkedHashMap` to ensure that files are commit in the generated order --- .../seatunnel/file/sink/BaseFileSinkWriter.java| 6 +- .../file/sink/commit/FileAggregatedCommitInfo.java | 6 +- .../seatunnel/file/sink/commit/FileCommitInfo.java | 6 +- .../sink/commit/FileSinkAggregatedCommitter.java | 15 +++-- .../file/sink/commit/FileSinkCommitter.java| 75 -- .../seatunnel/file/sink/state/FileSinkState.java | 6 +- .../file/sink/writer/AbstractWriteStrategy.java| 44 - .../file/sink/writer/ExcelWriteStrategy.java | 7 +- .../file/sink/writer/JsonWriteStrategy.java| 5 +- .../file/sink/writer/OrcWriteStrategy.java | 6 +- .../file/sink/writer/ParquetWriteStrategy.java | 7 +- .../file/sink/writer/TextWriteStrategy.java| 5 +- .../seatunnel/file/sink/writer/WriteStrategy.java | 4 +- .../commit/S3RedshiftSinkAggregatedCommitter.java | 5 +- 14 files changed, 68 insertions(+), 129 deletions(-) diff --git a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/BaseFileSinkWriter.java b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/BaseFileSinkWriter.java index 7102e954a4..22200249f6 100644 --- a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/BaseFileSinkWriter.java +++ b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/BaseFileSinkWriter.java @@ -34,14 +34,14 @@ import org.apache.hadoop.fs.Path; import java.io.IOException; import java.util.Collections; -import java.util.HashMap; +import java.util.LinkedHashMap; import java.util.List; import java.util.Optional; import java.util.UUID; import java.util.stream.Collectors; public class BaseFileSinkWriter implements SinkWriter { -private final WriteStrategy writeStrategy; +protected final WriteStrategy writeStrategy; private final FileSystemUtils fileSystemUtils; @SuppressWarnings("checkstyle:MagicNumber") @@ -67,7 +67,7 @@ public class BaseFileSinkWriter implements SinkWriter transactions = findTransactionList(jobId, uuidPrefix); FileSinkAggregatedCommitter fileSinkAggregatedCommitter = new FileSinkAggregatedCommitter(fileSystemUtils); -HashMap fileStatesMap = new HashMap<>(); +LinkedHashMap fileStatesMap = new LinkedHashMap<>(); fileSinkStates.forEach( fileSinkState -> fileStatesMap.put(fileSinkState.getTransactionId(), fileSinkState)); diff --git a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/commit/FileAggregatedCommitInfo.java b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/commit/FileAggregatedCommitInfo.java index 16d94a1f63..5ca3b30fad 100644 --- a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/commit/FileAggregatedCommitInfo.java +++ b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/commit/FileAggregatedCommitInfo.java @@ -21,8 +21,8 @@ import lombok.AllArgsConstructor; import lombok.Data; import java.io.Serializable; +import java.util.LinkedHashMap; import java.util.List; -import java.util.Map; @Data @AllArgsConstructor @@ -34,7 +34,7 @@ public class FileAggregatedCommitInfo implements Serializable { * * V is the target file path of the data file. */ -private final Map> transactionMap; +private final LinkedHashMap> transactionMap; /** * Storage the partition information in map. @@ -43,5 +43,5 @@ public class FileAggregatedCommitInfo implements Serializable { * * V is the list of partition column's values. */ -private final Map> partitionDirAndValuesMap; +private final LinkedHashMap> partitionDirAndValuesMap; } diff --git a/seatunnel-co
[GitHub] [seatunnel] hailin0 opened a new issue, #3743: [Feature][connector][kafka] Support read `debezium` format message from kafka
hailin0 opened a new issue, #3743: URL: https://github.com/apache/seatunnel/issues/3743 ### Search before asking - [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description Support read `debezium` format message from kafka debezium format https://debezium.io/documentation/reference/1.3/configuration/avro.html - [ ] using 'debezium-json' as the format to interpret Debezium JSON messages - [ ] please use 'debezium-avro-confluent' if Debezium encodes messages in Avro format ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] 15767714253 opened a new issue, #5144: [Feature][Web] support seatunnel on k8s
15767714253 opened a new issue, #5144: URL: https://github.com/apache/seatunnel/issues/5144 ### Search before asking - [X] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description 可以在Web中支持调度seatunnel on k8s ,application or operator 模式都可以 ### Usage Scenario 生产是完全基于k8s集群 ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel-web] zhangchengming601 opened a new pull request, #92: [IMPROVE] Add plugin isolation function
zhangchengming601 opened a new pull request, #92: URL: https://github.com/apache/seatunnel-web/pull/92 ## Purpose of this pull request ## Check list * [ ] Code changed are covered with tests, or it does not need tests for reason: * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] SharkAndCat opened a new issue, #5145: impala connector
SharkAndCat opened a new issue, #5145: URL: https://github.com/apache/seatunnel/issues/5145 ### Search before asking - [X] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description What do I need to do if I want to connect to impala using the JDBC source connector in seatunnel.I made many attempts, but they all ended in failure ### Usage Scenario impala connect ### Related issues NO ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] EricJoy2048 commented on a diff in pull request #5086: [Feature-WIP][Connector-V2][Doris] Add Doris ConnectorV2 Source
EricJoy2048 commented on code in PR #5086: URL: https://github.com/apache/seatunnel/pull/5086#discussion_r1272032391 ## seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/rest/RestService.java: ## @@ -524,7 +531,7 @@ public static List findPartitions(DorisConfig dorisConfig, return tabletsMapToPartition( dorisConfig, be2Tablets, -queryPlan.getOpaquedQueryPlan(), +queryPlan.getOpaqued_query_plan(), Review Comment: Why use `getOpaqued_query_plan ` instead of `getOpaquedQueryPlan ` ? ## docs/en/connector-v2/source/Doris.md: ## @@ -0,0 +1,121 @@ +# Doris + +> Doris source connector + +## Description + +Used to read data from Doris. + +## Key features + +- [x] [batch](../../concept/connector-v2-features.md) +- [ ] [stream](../../concept/connector-v2-features.md) +- [ ] [exactly-once](../../concept/connector-v2-features.md) +- [x] [schema projection](../../concept/connector-v2-features.md) +- [x] [parallelism](../../concept/connector-v2-features.md) +- [x] [support user-defined split](../../concept/connector-v2-features.md) + +## Options + +| name | type | required | default value | +|--||--|---| +| fenodes | string | yes | - | +| username | string | yes | - | +| password | string | yes | - | +| table.identifier | string | yes | - | +| schema | config | yes | - | Review Comment: I don't understand what the purpose of the scheme parameter is? Doris is a data source with a table schema, and connectors can obtain the table structure information of the tables that need to be read through Doris connections. If the schema is only used to specify the `schema projection` feature, I believe that using the schema is not a good choice in this case, because the schema needs to configure fields and their type information, which is redundant for the `schema projection`. An array type parameter that only needs to configure field columns would be more suitable. ## docs/en/connector-v2/source/Doris.md: ## @@ -0,0 +1,121 @@ +# Doris Review Comment: Please reference https://github.com/apache/seatunnel/issues/4544 to refactoring documents. ## seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/source/DorisSourceFactory.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.seatunnel.connectors.doris.source; + +import org.apache.seatunnel.api.configuration.util.OptionRule; +import org.apache.seatunnel.api.source.SeaTunnelSource; +import org.apache.seatunnel.api.table.catalog.CatalogTableUtil; +import org.apache.seatunnel.api.table.factory.Factory; +import org.apache.seatunnel.api.table.factory.TableSourceFactory; +import org.apache.seatunnel.connectors.doris.config.DorisConfig; + +import com.google.auto.service.AutoService; + +@AutoService(Factory.class) +public class DorisSourceFactory implements TableSourceFactory { +@Override +public String factoryIdentifier() { +return "Doris"; +} + +@Override +public OptionRule optionRule() { +return OptionRule.builder() +.required( +DorisConfig.FENODES, +DorisConfig.USERNAME, +DorisConfig.PASSWORD, +DorisConfig.TABLE_IDENTIFIER, +CatalogTableUtil.SCHEMA) Review Comment: Why `schema` option is required? ## seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/source/DorisSourceFactory.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF
[GitHub] [seatunnel] lightzhao commented on pull request #4382: [Feature][Connector-v2][PulsarSink]Add Pulsar Sink Connector.
lightzhao commented on PR #4382: URL: https://github.com/apache/seatunnel/pull/4382#issuecomment-1647686975 It's been a long time, @TyrantLucifer @hailin0 @EricJoy2048 @Hisoka-X PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] bingquanzhao commented on a diff in pull request #5086: [Feature-WIP][Connector-V2][Doris] Add Doris ConnectorV2 Source
bingquanzhao commented on code in PR #5086: URL: https://github.com/apache/seatunnel/pull/5086#discussion_r1272117581 ## docs/en/connector-v2/source/Doris.md: ## @@ -0,0 +1,121 @@ +# Doris + +> Doris source connector + +## Description + +Used to read data from Doris. + +## Key features + +- [x] [batch](../../concept/connector-v2-features.md) +- [ ] [stream](../../concept/connector-v2-features.md) +- [ ] [exactly-once](../../concept/connector-v2-features.md) +- [x] [schema projection](../../concept/connector-v2-features.md) +- [x] [parallelism](../../concept/connector-v2-features.md) +- [x] [support user-defined split](../../concept/connector-v2-features.md) + +## Options + +| name | type | required | default value | +|--||--|---| +| fenodes | string | yes | - | +| username | string | yes | - | +| password | string | yes | - | +| table.identifier | string | yes | - | +| schema | config | yes | - | Review Comment: Did you mean `fields = [f1,f2,f3,...]`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] hailin0 commented on pull request #5066: [Feature][connector][kafka] Support read debezium format message from kafka
hailin0 commented on PR #5066: URL: https://github.com/apache/seatunnel/pull/5066#issuecomment-1647740389 good pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] jackyyyyyssss commented on a diff in pull request #5097: [Improve][Connector-v2][Jdbc] check url not null throw friendly message
jacky commented on code in PR #5097: URL: https://github.com/apache/seatunnel/pull/5097#discussion_r1272165143 ## seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/source/JdbcSource.java: ## @@ -93,6 +98,19 @@ public String getPluginName() { @Override public void prepare(Config pluginConfig) throws PrepareFailException { +CheckResult checkResult = Review Comment: I found that KuduSource CassandraSource uses this method, or do you have any better suggestions for me to make some modifications@Hisoka-X -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] liugddx opened a new pull request, #5146: [Bugfix][AmazonDynamoDB] Fix the problem that all table data cannot be obtained
liugddx opened a new pull request, #5146: URL: https://github.com/apache/seatunnel/pull/5146 close #5124 ## Purpose of this pull request ## Check list * [ ] Code changed are covered with tests, or it does not need tests for reason: * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update change log that in connector document. For more details you can refer to [connector-v2](https://github.com/apache/seatunnel/tree/dev/docs/en/connector-v2) 2. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 3. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] ic4y opened a new pull request, #5147: [Feature][CDC base] Support string type shard fields.
ic4y opened a new pull request, #5147: URL: https://github.com/apache/seatunnel/pull/5147 ## Purpose of this pull request ## Check list * [ ] Code changed are covered with tests, or it does not need tests for reason: * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update change log that in connector document. For more details you can refer to [connector-v2](https://github.com/apache/seatunnel/tree/dev/docs/en/connector-v2) 2. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 3. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] ic4y commented on pull request #5123: [Feature][Iceberg Sink] Add iceberg sink connector
ic4y commented on PR #5123: URL: https://github.com/apache/seatunnel/pull/5123#issuecomment-1647924622 > Repeated with #5072 The feature gap between the two pull requests is quite significant, it may take some time to discuss how to merge them. I'm also currently implementing save mode and automatic table creation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] github-actions[bot] closed issue #4913: kafka-source 接入的数据格式是xml类型,应该如何配置能解析成表做处理
github-actions[bot] closed issue #4913: kafka-source 接入的数据格式是xml类型,应该如何配置能解析成表做处理 URL: https://github.com/apache/seatunnel/issues/4913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] github-actions[bot] closed issue #4605: [Feature][CDC] Support flink running cdc job
github-actions[bot] closed issue #4605: [Feature][CDC] Support flink running cdc job URL: https://github.com/apache/seatunnel/issues/4605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] github-actions[bot] commented on issue #4913: kafka-source 接入的数据格式是xml类型,应该如何配置能解析成表做处理
github-actions[bot] commented on issue #4913: URL: https://github.com/apache/seatunnel/issues/4913#issuecomment-1648794303 This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] github-actions[bot] commented on issue #4587: [Feature][Core] Design of Dirty Data Collection
github-actions[bot] commented on issue #4587: URL: https://github.com/apache/seatunnel/issues/4587#issuecomment-1648794339 This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] chaorongzhi opened a new pull request, #5148: [Fix][Zeta] Resolve out-of-heap memory overflow
chaorongzhi opened a new pull request, #5148: URL: https://github.com/apache/seatunnel/pull/5148 ## Purpose of this pull request I will run about 15 batch synchronization tasks per minute. When I add MaxMetaspaceSize = 2g in jvm_options, OOM will appear after about 1.5 hours of running. I used Arthas to check and analyze dump image files. I found that the class loader is created again every time a batch task is submitted. After the batch task is completed, the class loader is not uninstalled, resulting in the rapid expansion of the metaSpace area and finally oom. So I added a cache to the classloader. ## Check list * [ ] Code changed are covered with tests, or it does not need tests for reason: * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update change log that in connector document. For more details you can refer to [connector-v2](https://github.com/apache/seatunnel/tree/dev/docs/en/connector-v2) 2. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 3. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] Hisoka-X merged pull request #5143: [Hotfix][Mongodb cdc] Solve startup resume token is negative
Hisoka-X merged PR #5143: URL: https://github.com/apache/seatunnel/pull/5143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[seatunnel] branch dev updated: [Hotfix][Mongodb cdc] Solve startup resume token is negative (#5143)
This is an automated email from the ASF dual-hosted git repository. fanjia pushed a commit to branch dev in repository https://gitbox.apache.org/repos/asf/seatunnel.git The following commit(s) were added to refs/heads/dev by this push: new e964c03dca [Hotfix][Mongodb cdc] Solve startup resume token is negative (#5143) e964c03dca is described below commit e964c03dca69f3a95da72fee2fba68e5cf24485d Author: monster <60029759+monsterchenz...@users.noreply.github.com> AuthorDate: Tue Jul 25 10:09:11 2023 +0800 [Hotfix][Mongodb cdc] Solve startup resume token is negative (#5143) - Co-authored-by: chenzy15 --- .../cdc/mongodb/source/dialect/MongodbDialect.java | 7 .../seatunnel/cdc/mongodb/utils/ResumeToken.java | 46 +- .../src/test/java/mongodb/MongoDBContainer.java| 1 + 3 files changed, 26 insertions(+), 28 deletions(-) diff --git a/seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mongodb/source/dialect/MongodbDialect.java b/seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mongodb/source/dialect/MongodbDialect.java index 11ef57ffc5..25e463c17e 100644 --- a/seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mongodb/source/dialect/MongodbDialect.java +++ b/seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mongodb/source/dialect/MongodbDialect.java @@ -34,6 +34,7 @@ import org.bson.BsonDocument; import com.mongodb.client.MongoClient; import io.debezium.relational.TableId; +import lombok.extern.slf4j.Slf4j; import javax.annotation.Nonnull; @@ -52,6 +53,7 @@ import static org.apache.seatunnel.connectors.seatunnel.cdc.mongodb.utils.Mongod import static org.apache.seatunnel.connectors.seatunnel.cdc.mongodb.utils.MongodbUtils.getCurrentClusterTime; import static org.apache.seatunnel.connectors.seatunnel.cdc.mongodb.utils.MongodbUtils.getLatestResumeToken; +@Slf4j public class MongodbDialect implements DataSourceDialect { private final Map cache = @@ -137,6 +139,11 @@ public class MongodbDialect implements DataSourceDialect { ChangeStreamOffset changeStreamOffset; if (startupResumeToken != null) { changeStreamOffset = new ChangeStreamOffset(startupResumeToken); +log.info( +"startup resume token={},change stream offset={}", +startupResumeToken, +changeStreamOffset); + } else { changeStreamOffset = new ChangeStreamOffset(getCurrentClusterTime(mongoClient)); } diff --git a/seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mongodb/utils/ResumeToken.java b/seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mongodb/utils/ResumeToken.java index 3ddd2ccbb2..5ee8962bc5 100644 --- a/seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mongodb/utils/ResumeToken.java +++ b/seatunnel-connectors-v2/connector-cdc/connector-cdc-mongodb/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mongodb/utils/ResumeToken.java @@ -17,8 +17,6 @@ package org.apache.seatunnel.connectors.seatunnel.cdc.mongodb.utils; -import org.apache.seatunnel.connectors.seatunnel.cdc.mongodb.exception.MongodbConnectorException; - import org.bson.BsonDocument; import org.bson.BsonTimestamp; import org.bson.BsonValue; @@ -29,41 +27,33 @@ import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.util.Objects; -import static org.apache.seatunnel.common.exception.CommonErrorCode.ILLEGAL_ARGUMENT; - public class ResumeToken { private static final int K_TIMESTAMP = 130; -public static @Nonnull BsonTimestamp decodeTimestamp(BsonDocument resumeToken) { -Objects.requireNonNull(resumeToken, "Missing ResumeToken."); -BsonValue bsonValue = resumeToken.get("_data"); -byte[] keyStringBytes = extractKeyStringBytes(bsonValue); -validateKeyType(keyStringBytes); - -ByteBuffer buffer = ByteBuffer.wrap(keyStringBytes).order(ByteOrder.BIG_ENDIAN); -int t = buffer.getInt(); -int i = buffer.getInt(); -return new BsonTimestamp(t, i); -} - -private static byte[] extractKeyStringBytes(@Nonnull BsonValue bsonValue) { -if (bsonValue.isBinary()) { -return bsonValue.asBinary().getData(); -} else if (bsonValue.isString()) { -return hexToUint8Array(bsonValue.asString().getValue()); +public static BsonTimestamp decodeTimestamp(BsonDocument resumeToken) { +BsonValue bsonValue = +Objects.re
[GitHub] [seatunnel] EricJoy2048 merged pull request #5066: [Feature][connector][kafka] Support read debezium format message from kafka
EricJoy2048 merged PR #5066: URL: https://github.com/apache/seatunnel/pull/5066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[seatunnel] branch dev updated: [Feature][connector][kafka] Support read debezium format message from kafka (#5066)
This is an automated email from the ASF dual-hosted git repository. gaojun2048 pushed a commit to branch dev in repository https://gitbox.apache.org/repos/asf/seatunnel.git The following commit(s) were added to refs/heads/dev by this push: new 53a1f0c6c1 [Feature][connector][kafka] Support read debezium format message from kafka (#5066) 53a1f0c6c1 is described below commit 53a1f0c6c14190c37c0b28320db049f822436a4f Author: Xiaojian Sun AuthorDate: Tue Jul 25 10:09:24 2023 +0800 [Feature][connector][kafka] Support read debezium format message from kafka (#5066) --- .github/workflows/backend.yml | 2 +- docs/en/connector-v2/formats/debezium-json.md | 107 ++ docs/en/connector-v2/sink/Kafka.md | 16 +- docs/en/connector-v2/source/kafka.md | 17 +- release-note.md| 12 +- .../connectors/seatunnel/kafka/config/Config.java | 8 +- .../seatunnel/kafka/config/MessageFormat.java | 1 + .../serialize/DefaultSeaTunnelRowSerializer.java | 3 + .../seatunnel/kafka/source/KafkaSource.java| 10 + .../seatunnel/kafka/source/KafkaSourceFactory.java | 1 + .../e2e/connector/kafka/DebeziumToKafkaIT.java | 418 + .../test/resources/debezium/register-mysql.json| 16 + .../kafkasource_debezium_cdc_to_pgsql.conf | 62 +++ .../resources/kafkasource_debezium_to_kafka.conf | 57 +++ .../seatunnel/format/json/JsonFormatOptions.java | 13 +- .../DebeziumJsonDeserializationSchema.java | 168 + .../json/debezium/DebeziumJsonFormatFactory.java | 70 .../json/debezium/DebeziumJsonFormatOptions.java | 53 +++ .../debezium/DebeziumJsonSerializationSchema.java | 80 .../org.apache.seatunnel.api.table.factory.Factory | 1 + .../json/debezium/DebeziumJsonSerDeSchemaTest.java | 163 .../src/test/resources/debezium-data.txt | 16 + 22 files changed, 1270 insertions(+), 24 deletions(-) diff --git a/.github/workflows/backend.yml b/.github/workflows/backend.yml index fbe37acece..6da4f4a5ab 100644 --- a/.github/workflows/backend.yml +++ b/.github/workflows/backend.yml @@ -564,7 +564,7 @@ jobs: matrix: java: [ '8', '11' ] os: [ 'ubuntu-latest' ] -timeout-minutes: 90 +timeout-minutes: 150 steps: - uses: actions/checkout@v2 - name: Set up JDK ${{ matrix.java }} diff --git a/docs/en/connector-v2/formats/debezium-json.md b/docs/en/connector-v2/formats/debezium-json.md new file mode 100644 index 00..4c40a0298e --- /dev/null +++ b/docs/en/connector-v2/formats/debezium-json.md @@ -0,0 +1,107 @@ +# Debezium Format + +Changelog-Data-Capture Format: Serialization Schema Format: Deserialization Schema + +Debezium is a set of distributed services to capture changes in your databases so that your applications can see those changes and respond to them. Debezium records all row-level changes within each database table in a *change event stream*, and applications simply read these streams to see the change events in the same order in which they occurred. + +Seatunnel supports to interpret Debezium JSON messages as INSERT/UPDATE/DELETE messages into seatunnel system. This is useful in many cases to leverage this feature, such as + +synchronizing incremental data from databases to other systems +auditing logs +real-time materialized views on databases +temporal join changing history of a database table and so on. + +Seatunnel also supports to encode the INSERT/UPDATE/DELETE messages in Seatunnel asDebezium JSON messages, and emit to storage like Kafka. + +# Format Options + +| option | default | required | Description | +|---|-|--|--| +| format| (none) | yes | Specify what format to use, here should be 'debezium_json'. | +| debezium-json.ignore-parse-errors | false | no | Skip fields and rows with parse errors instead of failing. Fields are set to null in case of errors. | + +# How to use Debezium format + +## Kafka uses example + +Debezium provides a unified format for changelog, here is a simple example for an update operation captured from a MySQL products table: + +```bash +{ + "before": { + "id": 111, + "name": "scooter", + "description": "Big 2-wheel scooter ", + "weight": 5.18 + }, + "after": { + "id": 111, + "name": "scooter", + "description": "Big 2-wheel scooter ", + "weight": 5.17 + }, + "source": { + "version": "1.1.1
[GitHub] [seatunnel] suyonggong opened a new issue, #5149: how to stop the mysql_cdc task
suyonggong opened a new issue, #5149: URL: https://github.com/apache/seatunnel/issues/5149 ### Search before asking - [X] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description I started a mysql_cdc task, but I don't know what command to stop it ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] bingquanzhao commented on a diff in pull request #5086: [Feature-WIP][Connector-V2][Doris] Add Doris ConnectorV2 Source
bingquanzhao commented on code in PR #5086: URL: https://github.com/apache/seatunnel/pull/5086#discussion_r1272925705 ## seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/rest/RestService.java: ## @@ -524,7 +531,7 @@ public static List findPartitions(DorisConfig dorisConfig, return tabletsMapToPartition( dorisConfig, be2Tablets, -queryPlan.getOpaquedQueryPlan(), +queryPlan.getOpaqued_query_plan(), Review Comment: The QueryPlan returned by doris fe is a json format, which is directly mapped to a QueryPlan object, and the field inside is this "opaqued_query_plan" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] EricJoy2048 commented on pull request #5101: [Doc] Improve S3File Source & S3File Sink document
EricJoy2048 commented on PR #5101: URL: https://github.com/apache/seatunnel/pull/5101#issuecomment-1648925829 > LGTM It would be better to fix these typos. Thanks, I finished it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] hailin0 commented on pull request #4516: [Feature][Connector-V2] connector-kafka source support data conversion extracted by kafka connect source
hailin0 commented on PR #4516: URL: https://github.com/apache/seatunnel/pull/4516#issuecomment-1648958563 @sunxiaojian please resolve the conflict -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] hailin0 commented on pull request #3981: [Feature][connector][kafka] Support read debezium format message from kafka
hailin0 commented on PR #3981: URL: https://github.com/apache/seatunnel/pull/3981#issuecomment-1648960791 Sorry, this PR has been merged https://github.com/apache/seatunnel/pull/5066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] TaoZex closed pull request #3981: [Feature][connector][kafka] Support read debezium format message from kafka
TaoZex closed pull request #3981: [Feature][connector][kafka] Support read debezium format message from kafka URL: https://github.com/apache/seatunnel/pull/3981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] hailin0 opened a new pull request, #5150: [Feature][CDC] Support tables without primary keys (with unique keys)…
hailin0 opened a new pull request, #5150: URL: https://github.com/apache/seatunnel/pull/5150 … (#163) ## Purpose of this pull request ## Check list * [ ] Code changed are covered with tests, or it does not need tests for reason: * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update change log that in connector document. For more details you can refer to [connector-v2](https://github.com/apache/seatunnel/tree/dev/docs/en/connector-v2) 2. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 3. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] hailin0 commented on pull request #5150: [Feature][CDC] Support tables without primary keys (with unique keys)
hailin0 commented on PR #5150: URL: https://github.com/apache/seatunnel/pull/5150#issuecomment-1648981927 PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] Hisoka-X commented on pull request #4778: [Improve][Docs][Kafka]Reconstruct the kafka connector document
Hisoka-X commented on PR #4778: URL: https://github.com/apache/seatunnel/pull/4778#issuecomment-1649194711 Hi, please resolve conflict, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [seatunnel] Hisoka-X commented on pull request #5118: [Bug][Improve][LocalFileSink]Fix LocalFile Sink file_format_type.
Hisoka-X commented on PR #5118: URL: https://github.com/apache/seatunnel/pull/5118#issuecomment-1649197990 Over LGTM, maybe you should take a look @TyrantLucifer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org