[I] [Bug] [connector-file-s3] Bug title [seatunnel]

via GitHub Wed, 09 Apr 2025 06:23:20 -0700


cw-longwang opened a new issue, #9133:
URL: https://github.com/apache/seatunnel/issues/9133


   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   The flink engine reads kafka and Mysqlcdc writes to s3. The two checkpoint 
file names are the same. As a result, the later file overwrites the earlier 
file, resulting in data loss;
   
   
   
   
   ### SeaTunnel Version
   
   2.3.10
   
   ### SeaTunnel Config
   
   ```conf
   # Defining the runtime environment
   env {
     parallelism = 1
     job.mode = "STREAMING"
     checkpoint.interval=60000
   }
   
   source {
     MySQL-CDC {
        base-url = 
"jdbc:mysql://ai-bigdatanew-instance-1.ca56eeecuskl.us-east-1.rds.amazonaws.com:3306/test"
       username = "admin"
       password = "abc123456"
       table-names = ["test.test"]
       startup.mode = "latest"
        exactly_once = true
        format = compatible_debezium_json
        debezium = {
           key.converter.schemas.enable = false
           value.converter.schemas.enable = false
           database.server.name =  "mysql_cdc_1"
       }
     }
   }
   
   sink {
       S3File {
         bucket = "s3a://sparktest111cw"
          path="/sea/test666"
          tmp_path = "/tmp/seatunnel"
         fs.s3a.endpoint="s3.us-east-1.amazonaws.com"
         
fs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
          access_key="AKIAZ3MGNGECBK2XNCJE"
          secret_key="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"
          format = “compatible_debezium_json”
         file_format_type = "parquet"
          data_save_mode  = "APPEND_DATA"
          have_partition = true
         partition_by = ["topic"]
         partition_dir_expression = "${k0}=${v0}"
          single_file_mode=false
          is_enable_transaction=true
          batch_size=2
        custom_filename = true
       file_name_expression = "${transactionId}_${now}"
       filename_time_format = "yyyy.MM.dd"
     }
   }
   ```
   
   ### Running Command
   
   ```shell
   kubectl create -f sen.yaml -n flink-bigdata
   ```
   
   ### Error Exception
   
   ```log
   The flink engine reads kafka and Mysqlcdc writes to s3. The two checkpoint 
file names are the same. As a result, the later file overwrites the earlier 
file, resulting in data loss;
   ```
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   java8
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [Bug] [connector-file-s3] Bug title [seatunnel]

Reply via email to