Re: [I] [Bug] [seatunnel-engine] close pending status job failed [seatunnel]
github-actions[bot] commented on issue #8564: URL: https://github.com/apache/seatunnel/issues/8564#issuecomment-2673032340 This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug] [MYSQL CDC] geometry 类型自动转换为BYTES类型失败 [seatunnel]
github-actions[bot] commented on issue #8548: URL: https://github.com/apache/seatunnel/issues/8548#issuecomment-2673032464 This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Feature][Connector - V2] Support for AWS Glue Data Catalog in Apache Iceberg connector [seatunnel]
github-actions[bot] commented on issue #8559: URL: https://github.com/apache/seatunnel/issues/8559#issuecomment-2673032418 This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug] [Paimon] The data has repeated after the dynamic hash table write. [seatunnel]
github-actions[bot] commented on issue #8565: URL: https://github.com/apache/seatunnel/issues/8565#issuecomment-2673032257 This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [improve] update localfile connector config [seatunnel]
liunaijie commented on code in PR #8765: URL: https://github.com/apache/seatunnel/pull/8765#discussion_r1964868961 ## seatunnel-connectors-v2/connector-file/connector-file-local/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/local/source/config/LocalFileSourceConfig.java: ## @@ -37,7 +36,7 @@ public HadoopConf getHadoopConfig() { @Override public String getPluginName() { -return FileSystemType.LOCAL.getFileSystemPluginName(); +return "LocalFile"; Review Comment: This change is not necessary. ## seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSinkConfig.java: ## @@ -35,7 +33,7 @@ import static org.apache.seatunnel.api.sink.DataSaveMode.DROP_DATA; import static org.apache.seatunnel.api.sink.DataSaveMode.ERROR_WHEN_DATA_EXISTS; -public class BaseSinkConfig { +public class BaseSinkConfig extends FileBaseOptions { Review Comment: Suggest we have an unified class name style. How about rename to - `FileBaseSinkOptions` - `FileBaseSourceOptions` - `FileBaseOptions` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [improve] RabbitMq connector options [seatunnel]
liunaijie commented on PR #8699: URL: https://github.com/apache/seatunnel/pull/8699#issuecomment-2673157897 Hi @akulabs8, I will close this PR since the changes have already been merged in [PR #8740](https://github.com/apache/seatunnel/pull/8740). Feel free to contribute with another PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [improve] RabbitMq connector options [seatunnel]
liunaijie closed pull request #8699: [improve] RabbitMq connector options URL: https://github.com/apache/seatunnel/pull/8699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(seatunnel) branch dev updated: [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector (#8778)
This is an automated email from the ASF dual-hosted git repository. fanjia pushed a commit to branch dev in repository https://gitbox.apache.org/repos/asf/seatunnel.git The following commit(s) were added to refs/heads/dev by this push: new 96b610eb7e [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector (#8778) 96b610eb7e is described below commit 96b610eb7ec7ac0834e34ce3e693c04047a4bc18 Author: xiaochen <598457...@qq.com> AuthorDate: Fri Feb 21 10:04:29 2025 +0800 [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector (#8778) --- docs/en/connector-v2/sink/Clickhouse.md | 1 + docs/zh/connector-v2/sink/Clickhouse.md | 1 + .../connectors/seatunnel/clickhouse/config/ClickhouseSinkOptions.java | 2 +- .../connectors/seatunnel/starrocks/config/StarRocksSinkOptions.java | 2 +- 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/en/connector-v2/sink/Clickhouse.md b/docs/en/connector-v2/sink/Clickhouse.md index 0837f76203..e01f6b6ee9 100644 --- a/docs/en/connector-v2/sink/Clickhouse.md +++ b/docs/en/connector-v2/sink/Clickhouse.md @@ -61,6 +61,7 @@ They can be downloaded via install-plugin.sh or from the Maven central repositor | allow_experimental_lightweight_delete | Boolean | No | false | Allow experimental lightweight delete based on `*MergeTree` table engine. | | schema_save_mode | Enum| no | CREATE_SCHEMA_WHEN_NOT_EXIST | Schema save mode. Please refer to the `schema_save_mode` section below. | | data_save_mode | Enum| no | APPEND_DATA | Data save mode. Please refer to the `data_save_mode` section below. | +| custom_sql | String | no | - | When data_save_mode selects CUSTOM_PROCESSING, you should fill in the CUSTOM_SQL parameter. This parameter usually fills in a SQL that can be executed. SQL will be executed before synchronization tasks.| | save_mode_create_template | string | no | see below | See below. | | common-options| | No | - | Sink plugin common parameters, please refer to [Sink Common Options](../sink-common-options.md) for details. | diff --git a/docs/zh/connector-v2/sink/Clickhouse.md b/docs/zh/connector-v2/sink/Clickhouse.md index 3dc04ce962..48190c8b55 100644 --- a/docs/zh/connector-v2/sink/Clickhouse.md +++ b/docs/zh/connector-v2/sink/Clickhouse.md @@ -60,6 +60,7 @@ | allow_experimental_lightweight_delete | Boolean | No | false | 允许基于`MergeTree`表引擎实验性轻量级删除. | | schema_save_mode | Enum| no | CREATE_SCHEMA_WHEN_NOT_EXIST | schema保存模式,请参考下面的`schema_save_mode` | | data_save_mode | Enum| no | APPEND_DATA | 数据保存模式,请参考下面的`data_save_mode`。 | +| custom_sql | String | no | - | 当data_save_mode设置为CUSTOM_PROCESSING时,必须同时设置CUSTOM_SQL参数。CUSTOM_SQL的值为可执行的SQL语句,在同步任务开启前SQL将会被执行 | | save_mode_create_template | string | no | see below | 见下文。 | | common-options| | No | - | Sink插件查用参数,详见[Sink常用选项](../sink-common-options.md). | diff --git a/seatunnel-connectors-v2/connector-clickhouse/src/main/java/org/apache/seatunnel/connectors/seatunnel/clickhouse/config/ClickhouseSinkOptions.java b/seatunnel-connectors-v2/connector-clickhouse/src/main/java/org/apache/seatunnel/con
Re: [PR] [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector [seatunnel]
Hisoka-X merged PR #8778: URL: https://github.com/apache/seatunnel/pull/8778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Improve][Connector-V2] Optimizes default value of jdbc fetch_size [seatunnel]
wuchunfu commented on PR #8774: URL: https://github.com/apache/seatunnel/pull/8774#issuecomment-2670840094 @corgy-w Please also update the document, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Bug] [MongoDB] collStats 的值太大,Long类型无法处理科学计数法表示的字符串数值 [seatunnel]
qifanlili opened a new issue, #8775: URL: https://github.com/apache/seatunnel/issues/8775 ### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened collStats 的值太大,Long类型无法处理科学计数法表示的字符串数值 错误在这里: https://github.com/user-attachments/assets/db6d6a06-e285-4b3b-a7f9-ae74e2ce8601"; /> ### SeaTunnel Version 2.3.9 ### SeaTunnel Config ```conf env { job.mode = "BATCH" spark.yarn.queue = "root.jwth.sync" spark.task.maxFailures = 1 spark.executor.memory = 4g spark.executor.cores = 1 spark.yarn.maxAppAttempts = 1 spark.shuffle.service.enabled=true } source { MongoDB { parallelism = 3 uri = mongodb://user:password@hosts:27017/database?readPreference=secondary&slaveOk=true. database = db_name collection = student_detail schema = { fields { "_id" = "String" } } } } sink { Console {} } ``` ### Running Command ```shell sh /opt/seatunnel/bin/start-seatunnel-spark-3-connector-v2.sh --user username --master yarn --deploy-mode cluster --name jobName --config mongo2hive.conf ``` ### Error Exception ```log Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: SourceSplitEnumerator run failed. at org.apache.seatunnel.translation.spark.source.partition.batch.SeaTunnelBatchPartitionReader.next(SeaTunnelBatchPartitionReader.java:38) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:119) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:156) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63) at scala.Option.exists(Option.scala:376) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:97) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:435) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:480) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:381) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: SourceSplitEnumerator run failed. at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.seatunnel.translation.source.ParallelSource.run(ParallelSource.java:142) at org.apache.seatunnel.translation.spark.source.partition.batch.ParallelBatchPartitionReader.lambda$prepare$0(ParallelBatchPartitionReader.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.
Re: [I] [Feature][FTP/SFTP] FTP/SFTP connnectors add validation file when data transform ended [seatunnel]
liunaijie commented on issue #8773: URL: https://github.com/apache/seatunnel/issues/8773#issuecomment-2671026163 Hi, @18859108815 Welcome to contribute! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector [seatunnel]
xiaochen-zhou opened a new pull request, #8778: URL: https://github.com/apache/seatunnel/pull/8778 ### Purpose of this pull request ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Check list * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 2. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) 3. Add ci label in [label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml) 4. Add e2e testcase in [seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/) 5. Update connector [plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [Bugfix] Fix ClassCastException of ExceptionUtil [seatunnel]
hailin0 opened a new pull request, #8776: URL: https://github.com/apache/seatunnel/pull/8776 ### Purpose of this pull request https://github.com/user-attachments/assets/d7162242-35b7-4a5e-99c4-3ff3dfc2e73e"; /> ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Check list * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 2. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) 3. Add ci label in [label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml) 4. Add e2e testcase in [seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/) 5. Update connector [plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector [seatunnel]
xiaochen-zhou opened a new pull request, #8777: URL: https://github.com/apache/seatunnel/pull/8777 ### Purpose of this pull request ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Check list * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 2. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) 3. Add ci label in [label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml) 4. Add e2e testcase in [seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/) 5. Update connector [plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [Improve][Connector-V2] Optimizes default value of jdbc fetch_size [seatunnel]
corgy-w opened a new pull request, #8774: URL: https://github.com/apache/seatunnel/pull/8774 ### Purpose of this pull request ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Check list * [ ] If any new Jar binary package adding in your PR, please add License Notice according [New License Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md) * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs * [ ] If you are contributing the connector code, please check that the following files are updated: 1. Update [plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties) and add new connector information in it 2. Update the pom file of [seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml) 3. Add ci label in [label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml) 4. Add e2e testcase in [seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/) 5. Update connector [plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config) * [ ] Update the [`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector [seatunnel]
xiaochen-zhou closed pull request #8777: [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector URL: https://github.com/apache/seatunnel/pull/8777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connector-V2] Add Handling Logic for HBase Asynchronous Data Write Failures [seatunnel]
xiaochen-zhou closed pull request #8279: [Feature][Connector-V2] Add Handling Logic for HBase Asynchronous Data Write Failures URL: https://github.com/apache/seatunnel/pull/8279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Improve][Connector-V2] Improve Paimon source split enumerator [seatunnel]
xiaochen-zhou closed pull request #6766: [Improve][Connector-V2] Improve Paimon source split enumerator URL: https://github.com/apache/seatunnel/pull/6766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Feature][FTP/SFTP] FTP/SFTP connnectors add validation file when data transform ended [seatunnel]
18859108815 opened a new issue, #8773: URL: https://github.com/apache/seatunnel/issues/8773 ### Search before asking - [x] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description FTP和SFTP文件传输完成后,新增校验文件,需求如下: ``` 1、校验文件内容包含:数据文件名称、数据文件的大小、数据文件包含的记录数(text/csv)、数据事件时间、数据文件的生成时间等,如果有多个数据文件,则在同一份校验文件中生成多条记录; 2、支持自定义校验文件的后缀名; 3、支持自定义校验文件名。 ``` ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Improve][Connector-V2] Optimizes default value of jdbc fetch_size [seatunnel]
corgy-w commented on PR #8774: URL: https://github.com/apache/seatunnel/pull/8774#issuecomment-2670889619 > @corgy-w Please also update the document, thanks. get -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector [seatunnel]
hailin0 commented on code in PR #8778: URL: https://github.com/apache/seatunnel/pull/8778#discussion_r1963437456 ## docs/en/connector-v2/sink/Clickhouse.md: ## @@ -61,6 +61,7 @@ They can be downloaded via install-plugin.sh or from the Maven central repositor | allow_experimental_lightweight_delete | Boolean | No | false | Allow experimental lightweight delete based on `*MergeTree` table engine. | | schema_save_mode | Enum| no | CREATE_SCHEMA_WHEN_NOT_EXIST | Schema save mode. Please refer to the `schema_save_mode` section below. | | data_save_mode | Enum| no | APPEND_DATA | Data save mode. Please refer to the `data_save_mode` section below. | +| custom_sql | String | no | - | When data_save_mode selects CUSTOM_PROCESSING, you should fill in the CUSTOM_SQL parameter. This parameter usually fills in a SQL that can be executed. SQL will be executed before synchronization tasks.| Review Comment: update zh docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Improve][Jdbc] Optimize source data dynamic splitting when where_condition is configured [seatunnel]
davidzollo commented on PR #8760: URL: https://github.com/apache/seatunnel/pull/8760#issuecomment-2673747278 good job Please add related Unit Tests, thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Improve][Connector-V2] Optimizes default value of jdbc fetch_size [seatunnel]
corgy-w closed pull request #8774: [Improve][Connector-V2] Optimizes default value of jdbc fetch_size URL: https://github.com/apache/seatunnel/pull/8774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug] [MYSQL CDC] geometry 类型自动转换为BYTES类型失败 [seatunnel]
Quilian commented on issue #8548: URL: https://github.com/apache/seatunnel/issues/8548#issuecomment-2673082921 help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connector-V2] Add `filename_extension` parameter for read/write file [seatunnel]
Hisoka-X commented on code in PR #8769: URL: https://github.com/apache/seatunnel/pull/8769#discussion_r1964668736 ## seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sink/writer/AbstractWriteStrategy.java: ## @@ -224,8 +224,16 @@ public LinkedHashMap> generatorPartitionDir(SeaTunnelRow se public final String generateFileName(String transactionId) { String fileNameExpression = fileSinkConfig.getFileNameExpression(); FileFormat fileFormat = fileSinkConfig.getFileFormat(); -String suffix = fileFormat.getSuffix(); -suffix = compressFormat.getCompressCodec() + suffix; +String suffix; +if (StringUtils.isNotEmpty(fileSinkConfig.getFilenameExtension())) { +suffix = +fileSinkConfig.getFilenameExtension().startsWith(".") +? fileSinkConfig.getFilenameExtension() +: "." + fileSinkConfig.getFilenameExtension(); Review Comment: I'm not sure if this is appropriate, as the `.` is pretty common. Not a specific text, maybe you could start a discussion on the mailing list. See what others think? It makes sense to me to change it or not. PS: If we need change it, we can open another PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [Bug] [Module Name] Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information [seatunnel]
lm520hy opened a new issue, #8781: URL: https://github.com/apache/seatunnel/issues/8781 ### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information spark写入数据后改变了hudi的元数据源信息 seatunnel同步数据报错读取hudi的元数据信息报错了 ### SeaTunnel Version 2.3.8 ### SeaTunnel Config ```conf env { parallelism = 1 job.mode = "BATCH" } source { FakeSource { parallelism = 1 result_table_name = "fake2" row.num = 16 schema = { fields { id = "int" name = "string" price = "double" ts = "bigint" } } rows = [ { kind = INSERT fields = [ 7,"l", 1100,117] } ] } } sink { Hudi { table_dfs_path = "hdfs:///hudi/" table_name = "hudi_mor_tbl2" table_type = "COPY_ON_WRITE" conf_files_path = "/soft/hadoop/etc/hadoop/hdfs-site.xml;/soft/hadoop/etc/hadoop/core-site.xml;/soft/hadoop/etc/hadoop/yarn-site.xml" batch_size = 1 } ``` ### Running Command ```shell env { parallelism = 1 job.mode = "BATCH" } source { FakeSource { parallelism = 1 result_table_name = "fake2" row.num = 16 schema = { fields { id = "int" name = "string" price = "double" ts = "bigint" } } rows = [ { kind = INSERT fields = [ 7,"l", 1100,117] } ] } } sink { Hudi { table_dfs_path = "hdfs:///hudi/" table_name = "hudi_mor_tbl2" table_type = "COPY_ON_WRITE" conf_files_path = "/soft/hadoop/etc/hadoop/hdfs-site.xml;/soft/hadoop/etc/hadoop/core-site.xml;/soft/hadoop/etc/hadoop/yarn-site.xml" batch_size = 1 } ``` ### Error Exception ```log 2025-02-20 19:27:42,144 INFO [a.h.c.t.t.HoodieActiveTimeline] [st-multi-table-sink-writer-2] - Loaded instants upto : Option{val=[20250220192742010__clean__COMPLETED__20250220192742120]} 2025-02-20 19:27:42,145 INFO [o.a.h.c.t.HoodieTableConfig ] [st-multi-table-sink-writer-2] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties 2025-02-20 19:27:42,147 WARN [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=5}] - [localhost]:5801 [seatunnel-825957] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@1f445812 java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:253) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693) [seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018) [seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.api.tracing.MDCRunna
Re: [PR] [Improve][Connector-V2] Optimizes default value of jdbc fetch_size [seatunnel]
Hisoka-X commented on code in PR #8774: URL: https://github.com/apache/seatunnel/pull/8774#discussion_r1964965488 ## seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/config/JdbcOptions.java: ## @@ -84,7 +84,7 @@ public interface JdbcOptions { Option FETCH_SIZE = Options.key("fetch_size") .intType() -.defaultValue(0) +.defaultValue(1024) Review Comment: Changing the default value may cause loss of optimization logic inside the driver. It is better to keep it as 0 and let the driver decide the default value itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Doc][Improve] support chinese [docs/zh/connector-v2/sink/TDengine.md] [seatunnel]
Hisoka-X commented on code in PR #8732: URL: https://github.com/apache/seatunnel/pull/8732#discussion_r1964721109 ## docs/zh/connector-v2/sink/TDengine.md: ## @@ -0,0 +1,71 @@ +# TDengine + +> TDengine 数据接收器 + +## 描述 + +用于将数据写入TDengine. 在运行 seatunnel 任务之前,你需要创建稳定的环境。 + +## 主要特性 + +- [x] [exactly-once](../../concept/connector-v2-features.md) +- [ ] [cdc](../../concept/connector-v2-features.md) Review Comment: oh my mistake. I checked the source doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Bugfix] Fix ClassCastException of ExceptionUtil [seatunnel]
liunaijie merged PR #8776: URL: https://github.com/apache/seatunnel/pull/8776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature] [Zeta] Client print event which received from jobmaster [seatunnel]
Hisoka-X commented on code in PR #8696: URL: https://github.com/apache/seatunnel/pull/8696#discussion_r1964742072 ## docs/en/seatunnel-engine/rest-api-v1.md: ## @@ -843,4 +843,46 @@ Returns a list of logs from the requested node. To get a list of logs from the current node: `http://localhost:5801/hazelcast/rest/maps/log` To get the content of a log file: `http://localhost:5801/hazelcast/rest/maps/log/job-898380162133917698.log` + + +-- Review Comment: Let's not add new feature for V1. Because it already be depreacated. ## seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/master/JobMaster.java: ## @@ -142,6 +144,8 @@ public class JobMaster { private SeaTunnelServer seaTunnelServer; +@Getter private ArrayBlockingQueue events; +@Getter private ArrayBlockingQueue historyEvents; Review Comment: Why need two `ArrayBlockingQueue`? ## seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/operation/GetEventOperation.java: ## @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.seatunnel.engine.server.operation; + +import org.apache.seatunnel.api.event.Event; +import org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException; +import org.apache.seatunnel.engine.common.utils.concurrent.CompletableFuture; +import org.apache.seatunnel.engine.server.CoordinatorService; +import org.apache.seatunnel.engine.server.SeaTunnelServer; +import org.apache.seatunnel.engine.server.master.JobMaster; +import org.apache.seatunnel.engine.server.serializable.ResourceDataSerializerHook; + +import com.hazelcast.internal.serialization.Data; +import com.hazelcast.nio.ObjectDataInput; +import com.hazelcast.nio.ObjectDataOutput; +import com.hazelcast.nio.serialization.IdentifiedDataSerializable; +import com.hazelcast.spi.impl.operationservice.Operation; +import lombok.extern.slf4j.Slf4j; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Optional; +import java.util.concurrent.ExecutionException; + +@Slf4j +public class GetEventOperation extends Operation implements IdentifiedDataSerializable { + +private Long jobId; + +private Boolean isAll; + +private List events = new ArrayList<>(); + +public GetEventOperation() {} + +private Data response; + +/** + * @param jobId job id + * @param isAll When isAll is true, retrieve all events; otherwise, retrieve the latest event + */ +public GetEventOperation(Long jobId, boolean isAll) { +this.jobId = jobId; +this.isAll = isAll; +} + +@Override +public void run() throws Exception { +if (jobId == null) { +throw new SeaTunnelEngineException("JobId cannot be null"); +} +SeaTunnelServer server = getService(); +CoordinatorService coordinatorService = server.getCoordinatorService(); + +try { +response = +CompletableFuture.supplyAsync( +() -> retrieveEvents(coordinatorService), +getNodeEngine() +.getExecutionService() + .getExecutor("get_event_operation")) Review Comment: Please do not use new executor to do this. You can just use Executor in org.apache.seatunnel.engine.common.utils.concurrent.CompletableFuture ## seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/CoordinatorService.java: ## @@ -150,8 +154,10 @@ public class CoordinatorService { * key: job id; * value: job master; */ -private final Map runningJobMasterMap = new ConcurrentHashMap<>(); +@Getter private final Map runningJobMasterMap = new ConcurrentHashMap<>(); +@Getter @Setter +private Map> jobEventMap = new ConcurrentHashMap<>(); Review Comment: put `ArrayBlockingQueue` into JobMaster? -- This is an automated message from the Apac
[I] [Bug] [Module Name] jdbc sink create table failed [seatunnel]
OYRoy opened a new issue, #8782: URL: https://github.com/apache/seatunnel/issues/8782 ### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened I'm trying to integrate data from multiple PostgreSQL tables into another PostgreSQL table, adding a column that describes which table each row originates from and another column for record batch codes. I used metadata transformation and SQL transformation for this purpose. When using the console sink, these transformations work successfully; however, when switching to the JDBC sink, the creation of the new table fails due to the additional columns. I haven't found any properties in the documentation that allow configuring the JDBC sink to recognize these new columns. Is this a bug? Or how can I correctly configure my settings? ### SeaTunnel Version 2.3.9 ### SeaTunnel Config ```conf env { job.mode = "STREAMING" parallelism = 4 } source { Jdbc { url = "jdbc:postgresql://xxx.xxx.xxx.xxx:/test" driver = "org.postgresql.Driver" user = "" password = "" schema_name = "test" database_name = "test" table_pattern = "^test_[a-z]_\\d{4}$" plugin_output = "source_table" } } transform { Sql { plugin_input = "source_table" query = "select *, '123123abc' as batch_code from source_table" plugin_output = "source_table1" } Metadata { plugin_input = "source_table1" metadata_fields { Table = table } plugin_output = "source_table2" } } sink { jdbc { plugin_input = "source_table2" url = "jdbc:postgresql://xxx.xxx.xxx.xxx:/test" driver = "org.postgresql.Driver" user = "xx" password = "x" table = "testt.ttest_a_1999" generate_sink_sql = true database = "test" schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST" } } ``` ### Running Command ```shell ./bin/seatunnel.sh --config testconfig -m local ``` ### Error Exception ```log 2025-02-21 11:24:29,238 INFO [.s.c.s.j.c.AbstractJdbcCatalog] [seatunnel-coordinator-service-2] - Execute sql : CREATE TABLE "testt"."ttest_a_1999" ( "a" varchar(255), "b" varchar(255), "c" varchar(255), "d" varchar(255), "e" varchar(255), "f" varchar(255), "batch_code" null, "table" null ); 2025-02-21 11:24:29,242 INFO [.s.c.s.j.c.AbstractJdbcCatalog] [seatunnel-coordinator-service-2] - Catalog Postgres closing 2025-02-21 11:24:29,242 ERROR [o.a.s.e.s.CoordinatorService ] [seatunnel-coordinator-service-2] - [localhost]:5801 [seatunnel-271533] [5.1] submit job 945159011949346817 error org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: ErrorCode:[API-09], ErrorDescription:[Handle save mode failed] at org.apache.seatunnel.engine.server.master.JobMaster.handleSaveMode(JobMaster.java:527) at org.apache.seatunnel.engine.server.master.JobMaster.handleSaveMode(JobMaster.java:533) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.apache.seatunnel.engine.server.master.JobMaster.init(JobMaster.java:256) at org.apache.seatunnel.engine.server.CoordinatorService.lambda$submitJob$6(CoordinatorService.java:649) at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.ThreadPoolExe
[I] [Bug] [connector maxcompute] maxcompute connector ClassCastException: java.time.ZonedDateTime cannot be cast to java.util.Date [seatunnel]
liujunfei2230 opened a new issue, #8780: URL: https://github.com/apache/seatunnel/issues/8780 ### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened maxcompute 表中数据类型为timestamp,写入doris的时候报以下错误,2.3.9版本 Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:228) at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40) at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34) Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.connectors.seatunnel.maxcompute.exception.MaxcomputeConnectorException: ErrorCode:[COMMON-12], ErrorDescription:[Source reader operation failed, such as (open, close) etc...] at org.apache.seatunnel.connectors.seatunnel.maxcompute.source.MaxcomputeSourceReader.lambda$pollNext$0(MaxcomputeSourceReader.java:85) at java.lang.Iterable.forEach(Iterable.java:75) at org.apache.seatunnel.connectors.seatunnel.maxcompute.source.MaxcomputeSourceReader.pollNext(MaxcomputeSourceReader.java:67) at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:159) at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:127) at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:169) at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:132) at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:694) at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1019) at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassCastException: java.time.ZonedDateTime cannot be cast to java.util.Date at org.apache.seatunnel.connectors.seatunnel.maxcompute.util.MaxcomputeTypeMapper.resolveObject2SeaTunnel(MaxcomputeTypeMapper.java:192) at org.apache.seatunnel.connectors.seatunnel.maxcompute.util.MaxcomputeTypeMapper.getSeaTunnelRowData(MaxcomputeTypeMapper.java:66) at org.apache.seatunnel.connectors.seatunnel.maxcompute.source.MaxcomputeSourceReader.lambda$pollNext$0(MaxcomputeSourceReader.java:79) ... 14 more at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:220) ... 2 more 2025-02-21 11:26:12,176 INFO [s.c.s.s.c.ClientExecuteCommand] [SeaTunnel-CompletableFuture-Thread-3] - run shutdown hook because get close signal ### SeaTunnel Version 2.3.9 ### SeaTunnel Config ```conf 无 ``` ### Running Command ```shell seatunnel.sh -c xxx.conf -e local ``` ### Error Exception ```log Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:228) at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40) at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34) Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.connectors.seatunnel.maxcompute.exception.MaxcomputeConnectorException: ErrorCode:[COMMON-12], ErrorDescription:[Source reader operation failed, such as (open, close) etc...] at org.apache.seatunnel.connectors.seatunnel.maxcompute.source.MaxcomputeSourceReader.lambda$pollNext$0(MaxcomputeSourceReader.java:85) at java.lang.Iterable.forEach(Iterable.java:75) at org.apache.seatunnel.connectors.seatunnel.maxcompute.source.MaxcomputeSourceReader.pollNext(MaxcomputeSourceReader.java:67) at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:159) at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(Sour
Re: [PR] [Doc][Improve] support chinese [docs/zh/connector-v2/sink/TDengine.md] [seatunnel]
Hisoka-X commented on code in PR #8732: URL: https://github.com/apache/seatunnel/pull/8732#discussion_r1964725280 ## docs/zh/connector-v2/sink/TDengine.md: ## @@ -0,0 +1,71 @@ +# TDengine + +> TDengine 数据接收器 + +## 描述 + +用于将数据写入TDengine。 + +## 主要特性 + +- [x] [exactly-once](../../concept/connector-v2-features.md) +- [ ] [cdc](../../concept/connector-v2-features.md) + +## 选项 + +| 名称 | 类型 | 是否必传 | 默认值 | +|--||--|---| +| url | string | 是 | - | +| username | string | 是 | - | +| password | string | 是 | - | +| database | string | 是 | | +| stable | string | 是 | - | +| timezone | string | 否 | UTC | + +### url [string] + +选择TDengine时的TDengine的url + +例如 + +``` +jdbc:TAOS-RS://localhost:6041/ +``` + +### username [string] + +选择时TDengine的用户名 + +### password [string] + +选择时TDengine的密码 + +### database [string] + +当您选择时,TDengine的数据库 + +### stable [string] + +选择时TDengine的稳定性 + +### timezone [string] + +TDengine服务器的时间对ts领域很重要 Review Comment: ```suggestion ### username [string] TDengine的用户名 ### password [string] TDengine的密码 ### database [string] TDengine的数据库 ### stable [string] TDengine的超级表 ### timezone [string] TDengine服务器的时间,对ts字段很重要 ``` ## docs/zh/connector-v2/sink/TDengine.md: ## @@ -0,0 +1,71 @@ +# TDengine + +> TDengine 数据接收器 + +## 描述 + +用于将数据写入TDengine。 + +## 主要特性 + +- [x] [exactly-once](../../concept/connector-v2-features.md) +- [ ] [cdc](../../concept/connector-v2-features.md) + +## 选项 + +| 名称 | 类型 | 是否必传 | 默认值 | +|--||--|---| +| url | string | 是 | - | +| username | string | 是 | - | +| password | string | 是 | - | +| database | string | 是 | | +| stable | string | 是 | - | +| timezone | string | 否 | UTC | + +### url [string] + +选择TDengine时的TDengine的url Review Comment: ```suggestion TDengine的url ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS [seatunnel]
hailin0 commented on PR #8768: URL: https://github.com/apache/seatunnel/pull/8768#issuecomment-2673321695 good pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS [seatunnel]
hailin0 merged PR #8768: URL: https://github.com/apache/seatunnel/pull/8768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(seatunnel) branch dev updated: [Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS (#8768)
This is an automated email from the ASF dual-hosted git repository. wanghailin pushed a commit to branch dev in repository https://gitbox.apache.org/repos/asf/seatunnel.git The following commit(s) were added to refs/heads/dev by this push: new 3c6f216135 [Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS (#8768) 3c6f216135 is described below commit 3c6f216135dfd48c2539a400b596ef82ce24fcb3 Author: xiaochen <598457...@qq.com> AuthorDate: Fri Feb 21 11:39:24 2025 +0800 [Fix][Connector-V2] Fix possible data loss in scenarios of request_tablet_size is less than the number of BUCKETS (#8768) --- .../client/source/StarRocksBeReadClient.java | 5 +++- .../e2e/connector/starrocks/StarRocksIT.java | 9 +- .../starrocks-thrift-to-starrocks-streamload.conf | 1 + ...ks-streamload.conf => starrocks-to-assert.conf} | 32 ++ 4 files changed, 28 insertions(+), 19 deletions(-) diff --git a/seatunnel-connectors-v2/connector-starrocks/src/main/java/org/apache/seatunnel/connectors/seatunnel/starrocks/client/source/StarRocksBeReadClient.java b/seatunnel-connectors-v2/connector-starrocks/src/main/java/org/apache/seatunnel/connectors/seatunnel/starrocks/client/source/StarRocksBeReadClient.java index c0be0106bb..3fa50f1cc0 100644 --- a/seatunnel-connectors-v2/connector-starrocks/src/main/java/org/apache/seatunnel/connectors/seatunnel/starrocks/client/source/StarRocksBeReadClient.java +++ b/seatunnel-connectors-v2/connector-starrocks/src/main/java/org/apache/seatunnel/connectors/seatunnel/starrocks/client/source/StarRocksBeReadClient.java @@ -92,7 +92,6 @@ public class StarRocksBeReadClient implements Serializable { } public void openScanner(QueryPartition partition, SeaTunnelRowType seaTunnelRowType) { -this.seaTunnelRowType = seaTunnelRowType; Set tabletIds = partition.getTabletIds(); TScanOpenParams params = new TScanOpenParams(); params.setTablet_ids(new ArrayList<>(tabletIds)); @@ -135,6 +134,10 @@ public class StarRocksBeReadClient implements Serializable { contextId, tabletIds.size(), tabletIds); +this.eos.set(false); +this.rowBatch = null; +this.readerOffset = 0; +this.seaTunnelRowType = seaTunnelRowType; } public boolean hasNext() { diff --git a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/java/org/apache/seatunnel/e2e/connector/starrocks/StarRocksIT.java b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/java/org/apache/seatunnel/e2e/connector/starrocks/StarRocksIT.java index c49b1bfa41..2d2eda1ae2 100644 --- a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/java/org/apache/seatunnel/e2e/connector/starrocks/StarRocksIT.java +++ b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/java/org/apache/seatunnel/e2e/connector/starrocks/StarRocksIT.java @@ -105,7 +105,7 @@ public class StarRocksIT extends TestSuiteBase implements TestResource { + " DATE_COL DATE\n" + ")ENGINE=OLAP\n" + "DUPLICATE KEY(`BIGINT_COL`)\n" -+ "DISTRIBUTED BY HASH(`BIGINT_COL`) BUCKETS 1\n" ++ "DISTRIBUTED BY HASH(`BIGINT_COL`) BUCKETS 3\n" + "PROPERTIES (\n" + "\"replication_num\" = \"1\",\n" + "\"in_memory\" = \"false\"," @@ -419,4 +419,11 @@ public class StarRocksIT extends TestSuiteBase implements TestResource { Assertions.assertFalse(starRocksCatalog.tableExists(tablePathStarRocksSink)); starRocksCatalog.close(); } + +@TestTemplate +public void testStarRocksReadRowCount(TestContainer container) +throws IOException, InterruptedException { +Container.ExecResult execResult = container.executeJob("/starrocks-to-assert.conf"); +Assertions.assertEquals(0, execResult.getExitCode()); +} } diff --git a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/resources/starrocks-thrift-to-starrocks-streamload.conf b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/resources/starrocks-thrift-to-starrocks-streamload.conf index ca47a8eb08..8af2b36107 100644 --- a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/resources/starrocks-thrift-to-starrocks-streamload.conf +++ b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/resources/starrocks-thrift-to-starrocks-streamload.conf @@ -28,6 +28,7 @@ source { database = "test" table = "e2e_table_source" max_retries = 3 +request_tablet_size = 5 schema { fields { BIGINT_COL = BIGINT diff --git a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-starrocks-e2e/src/test/resources
Re: [I] [SqlServer-CDC]当字段数量超过128,无法正常cdc [seatunnel]
lirulei commented on issue #6658: URL: https://github.com/apache/seatunnel/issues/6658#issuecomment-2673189015 i met same bug too (my datasource is mysql database) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug] [Module Name] Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information [seatunnel]
lm520hy commented on issue #8781: URL: https://github.com/apache/seatunnel/issues/8781#issuecomment-2673803138 When hudi's metadata is set to hoodie. table. metadata. partitions=files, there is a data synchronization error in seatunnel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Demos-Collection] Collecting Use Case Demos for Apache SeaTunnel [seatunnel]
davidzollo commented on issue #8388: URL: https://github.com/apache/seatunnel/issues/8388#issuecomment-2673812869 Streaming task from Kafka to Doris ``` env { execution.parallelism = 4 # It is recommended to adjust according to the number of Kafka partitions, keeping it consistent with the partition count job.mode = "STREAMING" checkpoint.interval = 3 checkpoint.timeout = 60 # Your current rate limits seem high but reasonable, ~700MB/s read_limit.bytes_per_second=7 read_limit.rows_per_second=4 } source { Kafka { result_table_name = "kafka_log" #Kafka server address bootstrap.servers = "x" topic = "" consumer.group = "kafka2table" start_mode = "earliest" kafka.config = { "fetch.min.bytes" = "1048576" # 1MB, increase the minimum batch fetch size "fetch.max.wait.ms" = "500"# Wait time when data is insufficient "max.partition.fetch.bytes" = "5242880" # 5MB, maximum data fetch per partition "max.poll.records" = "5000" # Maximum number of records per poll "isolation.level" = "read_committed" # Ensure data consistency } format = json schema={ fields={ ev=STRING pg=STRING uuid=STRING userId=bigint fromDevice=STRING ip=STRING source=STRING np=STRING lp=STRING tg=STRING ch=STRING v=STRING nt=STRING wifi=STRING dbd=STRING dmd=STRING bs=STRING browser_version=STRING ext=STRING sid=STRING timestamp=bigint reporttime=bigint } } } } transform { Sql { source_table_name = "kafka_log" result_table_name = "log" query = "select ev as event,pg as page,uuid,userId as userid,fromDevice as platform,ip,source,np as nextpage,lp as lastpage,tg as target,ch as channel,v as version,nt as network,wifi,dbd as device_brand,dmd as device_model,bs as browser,browser_version,ext as extra,sid as sessionid,timestamp,reporttime,CURRENT_DATE as dt from kafka_log" } } sink { Doris { source_table_name = "log" fenodes = "dxxx" username = xxx password = "" table.identifier = "ods.ods_log" sink.label-prefix = "log" sink.enable-2pc = "false" doris.batch.size = 50 sink.buffer-size = 104857600 sink.max-retries = 5 doris.config { format="json" read_json_by_line="true" } } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(seatunnel) branch dev updated: [Fix][E2e] Optimized pom file name tag (#8770)
This is an automated email from the ASF dual-hosted git repository. fanjia pushed a commit to branch dev in repository https://gitbox.apache.org/repos/asf/seatunnel.git The following commit(s) were added to refs/heads/dev by this push: new 7763326726 [Fix][E2e] Optimized pom file name tag (#8770) 7763326726 is described below commit 77633267269fbd85ee60079f80d442e2ee44bd6b Author: corgy-w <73771213+corg...@users.noreply.github.com> AuthorDate: Thu Feb 20 20:44:31 2025 +0800 [Fix][E2e] Optimized pom file name tag (#8770) --- seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-tidb-e2e/pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-tidb-e2e/pom.xml b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-tidb-e2e/pom.xml index 300d2f47b9..6936b36ee7 100644 --- a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-tidb-e2e/pom.xml +++ b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-tidb-e2e/pom.xml @@ -23,7 +23,7 @@ connector-cdc-tidb-e2e -SeaTunnel : E2E : Connector V2 : TiDB +SeaTunnel : E2E : Connector V2 : CDC TiDB 8
Re: [PR] [Fix][E2e] Optimized pom file name tag [seatunnel]
Hisoka-X merged PR #8770: URL: https://github.com/apache/seatunnel/pull/8770 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Fix][Connector-V2] Fix possible data loss in certain scenarios of starrocks [seatunnel]
xiaochen-zhou commented on PR #8768: URL: https://github.com/apache/seatunnel/pull/8768#issuecomment-2672117863 I added a test case, `StarRocksIT#testStarRocksReadRowCount()`, to verify whether the number of rows written to the sink matches the number of rows read from the source in scenarios where `request_tablet_size` is less than the number of `BUCKETS`. When I set the table's buckets to 3: ```sql DISTRIBUTED BY HASH(`BIGINT_COL`) BUCKETS 3 ``` At the same time, when request_tablet_size is set to a value less than 3:  The `StarRocksIT#testStarRocksReadRowCount()` test could not pass before the fix:  In this case, the row count is 31, which is less than the expected 100.  After applying the fix, the StarRocksIT#testStarRocksReadRowCount() test now passes successfully:  @hailin0 @Hisoka-X -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Fix][Connector-V2]Fix Descriptions for CUSTOM_SQL in Connector [seatunnel]
xiaochen-zhou commented on code in PR #8778: URL: https://github.com/apache/seatunnel/pull/8778#discussion_r1963550533 ## docs/en/connector-v2/sink/Clickhouse.md: ## @@ -61,6 +61,7 @@ They can be downloaded via install-plugin.sh or from the Maven central repositor | allow_experimental_lightweight_delete | Boolean | No | false | Allow experimental lightweight delete based on `*MergeTree` table engine. | | schema_save_mode | Enum| no | CREATE_SCHEMA_WHEN_NOT_EXIST | Schema save mode. Please refer to the `schema_save_mode` section below. | | data_save_mode | Enum| no | APPEND_DATA | Data save mode. Please refer to the `data_save_mode` section below. | +| custom_sql | String | no | - | When data_save_mode selects CUSTOM_PROCESSING, you should fill in the CUSTOM_SQL parameter. This parameter usually fills in a SQL that can be executed. SQL will be executed before synchronization tasks.| Review Comment: > update zh docs done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Fix][connector-http] fix when post have param [seatunnel]
Hisoka-X commented on PR #8434: URL: https://github.com/apache/seatunnel/pull/8434#issuecomment-2670765327 > Please update https://github.com/apache/seatunnel/pull/8434/files#diff-31bd0672f7a316bde3b591f194484526bed84f71b007370b75c2c2bf99223fb4R362 with how to set pageing parameter into body. @CosmosNi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org