Hi -

I have a Kafka Streams application that generates Avro records in a topic,
which is being read by a Kafka Connect process that uses HDFS Sink
connector. The topic has around 1.6 million messages. And the Kafka Connect
script is as follows ..

bin/connect-standalone
> etc/schema-registry/connect-avro-standalone.properties
> etc/kafka-connect-hdfs/quickstart-hdfs.properties


where quickstart-hdfs.properties contains the following ..

name=hdfs-sink
> connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
> tasks.max=1
> topics=avro-topic
> hdfs.url=hdfs://0.0.0.0:9000
> flush.size=3


The problem is that the Kafka Connect process looks to be running in an
infinite loop with messages like the following ..

[2017-07-18 20:02:04,487] INFO Starting commit and rotation for topic
> partition avro-topic-0 with start offsets {partition=0=1143033} and end
> offsets {partition=0=1143035}
> (io.confluent.connect.hdfs.TopicPartitionWriter:297)
> [2017-07-18 20:02:04,491] INFO Committed hdfs://
> 0.0.0.0:9000/topics/avro-topic/partition=0/avro-topic+0+0001143033+0001143035.avro
> for avro-topic-0 (io.confluent.connect.hdfs.TopicPartitionWriter:625)


The result is that the avro files created are so many in numbers that I
cannot do an ls on the folder.

$ hdfs dfs -ls /topics/avro-topic
> Found 1 items
> drwxr-xr-x   - debasishghosh supergroup          0 2017-07-18 20:02
> /topics/avro-topic/partition=0


Trying to list to more depth in the HDFS folder results in an
OutOfMemoryError ..

$ hdfs dfs -ls /topics/avro-topic/partition=0
> 17/07/18 20:02:19 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
> exceeded
> at java.util.Arrays.copyOfRange(Arrays.java:3664)
> at java.lang.String.<init>(String.java:207)
> at java.lang.String.substring(String.java:1969)
> at java.net.URI$Parser.substring(URI.java:2869)
> at java.net.URI$Parser.parseHierarchical(URI.java:3106)
> ...


Why is the Kafka Connect program going in an infinite loop ? How can I
prevent it ?

I am using Confluent 3.2.2 for the schema registry, Avro serialization part
and Apache Kafka 0.10.2.1 for Kafka Streams client and the broker part.

Help ?

regards.

-- 
Debasish Ghosh
http://manning.com/ghosh2
http://manning.com/ghosh

Twttr: @debasishg
Blog: http://debasishg.blogspot.com
Code: http://github.com/debasishg

Reply via email to