Hi - I have a Kafka Streams application that generates Avro records in a topic, which is being read by a Kafka Connect process that uses HDFS Sink connector. The topic has around 1.6 million messages. And the Kafka Connect script is as follows ..
bin/connect-standalone > etc/schema-registry/connect-avro-standalone.properties > etc/kafka-connect-hdfs/quickstart-hdfs.properties where quickstart-hdfs.properties contains the following .. name=hdfs-sink > connector.class=io.confluent.connect.hdfs.HdfsSinkConnector > tasks.max=1 > topics=avro-topic > hdfs.url=hdfs://0.0.0.0:9000 > flush.size=3 The problem is that the Kafka Connect process looks to be running in an infinite loop with messages like the following .. [2017-07-18 20:02:04,487] INFO Starting commit and rotation for topic > partition avro-topic-0 with start offsets {partition=0=1143033} and end > offsets {partition=0=1143035} > (io.confluent.connect.hdfs.TopicPartitionWriter:297) > [2017-07-18 20:02:04,491] INFO Committed hdfs:// > 0.0.0.0:9000/topics/avro-topic/partition=0/avro-topic+0+0001143033+0001143035.avro > for avro-topic-0 (io.confluent.connect.hdfs.TopicPartitionWriter:625) The result is that the avro files created are so many in numbers that I cannot do an ls on the folder. $ hdfs dfs -ls /topics/avro-topic > Found 1 items > drwxr-xr-x - debasishghosh supergroup 0 2017-07-18 20:02 > /topics/avro-topic/partition=0 Trying to list to more depth in the HDFS folder results in an OutOfMemoryError .. $ hdfs dfs -ls /topics/avro-topic/partition=0 > 17/07/18 20:02:19 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > at java.util.Arrays.copyOfRange(Arrays.java:3664) > at java.lang.String.<init>(String.java:207) > at java.lang.String.substring(String.java:1969) > at java.net.URI$Parser.substring(URI.java:2869) > at java.net.URI$Parser.parseHierarchical(URI.java:3106) > ... Why is the Kafka Connect program going in an infinite loop ? How can I prevent it ? I am using Confluent 3.2.2 for the schema registry, Avro serialization part and Apache Kafka 0.10.2.1 for Kafka Streams client and the broker part. Help ? regards. -- Debasish Ghosh http://manning.com/ghosh2 http://manning.com/ghosh Twttr: @debasishg Blog: http://debasishg.blogspot.com Code: http://github.com/debasishg