[GitHub] [hudi] zhouxumeng213 commented on issue #6009: kafka connect to Hudi,When 10w pieces of data are inserted, data is successfully written to kafka and deleted after kafka is dropped to hudi

GitBox Thu, 30 Jun 2022 18:56:39 -0700


zhouxumeng213 commented on issue #6009:
URL: https://github.com/apache/hudi/issues/6009#issuecomment-1171841056


   ### Environment Description
   
   Hudi version : 0.10.0
   
   Spark version : 2.4.5
   
   Hadoop version : 3.2.1
   
   Storage (HDFS/S3/GCS..) : HDFS
   
   Running on Docker? (yes/no) : no
   
   ### steps
   1、Zookeeper and Kafka have been started；
   2、Run setupkafka. sh under hudi-master/hudi-kafka-connect/demo to create the 
number：sh setupKafka.sh -n 100000
   3、Go to the kafka home directory and run：./bin/connect-distributed.sh 
/opt/hudi-kafka-connect/demo/connect-distributed.properties
   4、Initiate a CONNECT request and execute：curl -X POST -H 
"Content-Type:application/json" -d @/opt/config-sink-test.json 
http://master:8084/connectors
   
   ### note：
   1、connect-distributed.properties  configuration：
   
   bootstrap.servers=51.38.135.107:9092
   group.id=hudi-connect-cluster
   key.converter=org.apache.kafka.connect.json.JsonConverter
   value.converter=org.apache.kafka.connect.json.JsonConverter
   key.converter.schemas.enable=true
   value.converter.schemas.enable=true
   offset.storage.topic=connect-offsets
   offset.storage.replication.factor=1
   config.storage.topic=connect-configs
   config.storage.replication.factor=1
   status.storage.topic=connect-status
   status.storage.replication.factor=1
   
   offset.flush.interval.ms=60000
   listeners=HTTP://:8084
   plugin.path=/usr/local/share/kafka/plugins
   
   2、config-sink-test.json  configuration：
   
   {
       "name": "hudi-test-topic",
       "config": {
           "bootstrap.servers": "51.38.135.107:9092",
           "connector.class": "org.apache.hudi.connect.HoodieSinkConnector",
           "tasks.max": "12",
           "key.converter": "org.apache.kafka.connect.storage.StringConverter",
           "value.converter": 
"org.apache.kafka.connect.storage.StringConverter",
           "value.converter.schemas.enable": "false",
           "topics": "hudi-test-topic15",
           "hoodie.table.name": "test_hudi_table",
           "hoodie.table.type": "COPY_ON_WRITE",
           "hoodie.base.path": 
"hdfs://tdh0623hudi.storage.huawei.com/tmp/huditest",
           "hoodie.datasource.write.partitionpath.field": "date",
           "hoodie.datasource.write.recordkey.field": "volume",
           "hoodie.schemaprovider.class": 
"org.apache.hudi.schema.FilebasedSchemaProvider",
           "hoodie.deltastreamer.schemaprovider.source.schema.file": 
"hdfs://tdh0623hudi.storage.huawei.com/tmp/schema.avsc",
           "hoodie.clean.automatic": false,
           "hoodie.kafka.commit.interval.secs": 60
       }
   }
   
   ### Refer to the link：
   https://github.com/apache/hudi/tree/master/hudi-kafka-connect
   
   ### Phenomenon of the problem:
   After a connect request is made, data will be written to HDFS, but will be 
deleted after a while. Kafka Connect logs are as follows，Small amount of data 
such as (20,000, 50,000, etc., no problem, 100,000 will have this problem)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zhouxumeng213 commented on issue #6009: kafka connect to Hudi,When 10w pieces of data are inserted, data is successfully written to kafka and deleted after kafka is dropped to hudi

Reply via email to