Hi,

I’m writing my own producer to read from text files, and send line by line to 
Kafka cluster. I notice that the producer is extremely slow. It's currently 
sending at ~57KB/node/s. This is like 50-100 times slower than using 
bin/kafka-console-producer.sh


Here’s my producer:
final File dir = new File(dataDir);
List<File> files = new ArrayList<>(Arrays.asList(dir.listFiles()));
int key = 0;
for (final File file : files) {
    try {
        BufferedReader br = new BufferedReader(new FileReader(file));
        for (String line = br.readLine(); line != null; line = br.readLine()) {
            KeyedMessage<String, String> data = new KeyedMessage<>(topic, 
Integer.toString(key++), line);
            producer.send(data);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}



And partitioner:
public int partition(Object key, int numPartitions) {
    String stringKey = (String)key;
    return Integer.parseInt(stringKey) % numPartitions;
}


The only difference between kafka-console-producer.sh code and my code is that 
I use a custom partitioner. I have no idea why it’s so slow.

Best regards,Huy, Le Van

Reply via email to