zhouxumeng213 commented on issue #6009: URL: https://github.com/apache/hudi/issues/6009#issuecomment-1171841056
### Environment Description Hudi version : 0.10.0 Spark version : 2.4.5 Hadoop version : 3.2.1 Storage (HDFS/S3/GCS..) : HDFS Running on Docker? (yes/no) : no ### steps 1、Zookeeper and Kafka have been started; 2、Run setupkafka. sh under hudi-master/hudi-kafka-connect/demo to create the number:sh setupKafka.sh -n 100000 3、Go to the kafka home directory and run:./bin/connect-distributed.sh /opt/hudi-kafka-connect/demo/connect-distributed.properties 4、Initiate a CONNECT request and execute:curl -X POST -H "Content-Type:application/json" -d @/opt/config-sink-test.json http://master:8084/connectors ### note: 1、connect-distributed.properties configuration: bootstrap.servers=51.38.135.107:9092 group.id=hudi-connect-cluster key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=true value.converter.schemas.enable=true offset.storage.topic=connect-offsets offset.storage.replication.factor=1 config.storage.topic=connect-configs config.storage.replication.factor=1 status.storage.topic=connect-status status.storage.replication.factor=1 offset.flush.interval.ms=60000 listeners=HTTP://:8084 plugin.path=/usr/local/share/kafka/plugins 2、config-sink-test.json configuration: { "name": "hudi-test-topic", "config": { "bootstrap.servers": "51.38.135.107:9092", "connector.class": "org.apache.hudi.connect.HoodieSinkConnector", "tasks.max": "12", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter.schemas.enable": "false", "topics": "hudi-test-topic15", "hoodie.table.name": "test_hudi_table", "hoodie.table.type": "COPY_ON_WRITE", "hoodie.base.path": "hdfs://tdh0623hudi.storage.huawei.com/tmp/huditest", "hoodie.datasource.write.partitionpath.field": "date", "hoodie.datasource.write.recordkey.field": "volume", "hoodie.schemaprovider.class": "org.apache.hudi.schema.FilebasedSchemaProvider", "hoodie.deltastreamer.schemaprovider.source.schema.file": "hdfs://tdh0623hudi.storage.huawei.com/tmp/schema.avsc", "hoodie.clean.automatic": false, "hoodie.kafka.commit.interval.secs": 60 } } ### Refer to the link: https://github.com/apache/hudi/tree/master/hudi-kafka-connect ### Phenomenon of the problem: After a connect request is made, data will be written to HDFS, but will be deleted after a while. Kafka Connect logs are as follows,Small amount of data such as (20,000, 50,000, etc., no problem, 100,000 will have this problem) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
