longdpt opened a new issue, #8689: URL: https://github.com/apache/seatunnel/issues/8689
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened When importing data from MySQL to HDFS, under the fully configured environment with the Zeta engine and sufficient resources, when sinking data in Parquet or TXT format, or when sinking to a local file (sink2localFile), the write speed maintains at 100,000+ per second. However, when using the ORC format, the initial write speed starts at 15,000 per second and gradually decreases to 3,000 per second. ******the configuration of sink2orcFile and execution results are as follows.**************** sink { HdfsFile { fs.defaultFS = "hdfs://mycluster" path = "/tmp/hive/warehouse/test2" hdfs_site_path = "/data/hadoop-2.7.1/etc/hadoop/hdfs-site.xml" file_format_type = "orc" compress_codec="snappy" remote_user="hadoop" file_exists_action = "OVERWRITE" } } Statistic Information: *********************************************** Job Statistic Information *********************************************** Start Time : 2025-02-13 16:19:23 End Time : 2025-02-13 16:40:15 Total Time(s) : 1251 Total Read Count : 11568409 Total Write Count : 11568409 Total Failed Count : 0 *********************************************** ******the configuration of sink2parquet and execution results are as follows.**************** sink { HdfsFile { fs.defaultFS = "hdfs://mycluster" path = "/tmp/hive/warehouse/test2" hdfs_site_path = "/data/hadoop-2.7.1/etc/hadoop/hdfs-site.xml" file_format_type = "parquet" compress_codec="snappy" remote_user="hadoop" file_exists_action = "OVERWRITE" } } Statistic Information: *********************************************** Job Statistic Information *********************************************** Start Time : 2025-02-13 16:42:32 End Time : 2025-02-13 16:45:22 Total Time(s) : 170 Total Read Count : 11568987 Total Write Count : 11568987 Total Failed Count : 0 *********************************************** ### SeaTunnel Version 2.3.9 ### SeaTunnel Config ```conf env { parallelism = 5 job.mode = "BATCH" } source{ Jdbc { url = "jdbc:mysql://10.101.xx.xx:3711/information_schema?serverTimezone=Asia/Shanghai&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true" driver = "com.mysql.cj.jdbc.Driver" connection_check_timeout_sec = 100 user = "etldb" password = "xxxxx" query = "select xxx from yyrenting_mall.tb_trade_order t" partition_column= "id" split.size=500000 fetch_size=20000 } } sink { HdfsFile { fs.defaultFS = "hdfs://mycluster" path = "/tmp/hive/warehouse/test2" # path = "/ODS/YYRENTING_MALL/TB_TRADE_ORDER/etl_date=${etl_date}/child=${child}" hdfs_site_path = "/data/hadoop-2.7.1/etc/hadoop/hdfs-site.xml" # custom_filename= true file_format_type = "parquet" compress_codec="snappy" remote_user="hadoop" file_exists_action = "OVERWRITE" } } ``` ### Running Command ```shell ./bin/seatunnel.sh --config ./job/mysql2hdfs.cnf -n mysql2hdfs ``` ### Error Exception ```log Non ``` ### Zeta or Flink or Spark Version _No response_ ### Java or Scala Version _No response_ ### Screenshots _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org