Re: kafka async producer takes a lot of cpu

2014-12-14 Thread Rajiv Kurian
I'll try it and let you guys know. Is there anything that comes to mind from these log messages though? Why would there be so many log messages? Would you suggest doing something else to find out why things are working so poorly? I am worried about making the risky transition to the beta producer c

Re: Kafka sending messages with zero copy

2014-12-14 Thread Rajiv Kurian
Resuscitating this thread. I've done some more experiments and profiling. My messages are very tiny (currently 25 bytes) per message and creating multiple objects per message leads to a lot of churn. The memory churn through creation of convenience objects is more than the memory being used by my o

Re: spark kafka batch integration

2014-12-14 Thread Koert Kuipers
hey joe, glad to hear you think its a good use case of SimpleConsumer. not sure if i understand your question. are you asking why we make the offsets available in the rdd? we have a daily partitioned dataset on hdfs, and processes that run at night that do the following: 1) read the last daily par

Re: spark kafka batch integration

2014-12-14 Thread Koert Kuipers
hey gwen, no immediate plans to contribute it to spark but of course we are open to this. given sparks pullreq backlog my suspicion is that spark community prefers a user library at this point. if you lose a node the task will restart. and since each task reads until the end of a kafka partition,

Re: spark kafka batch integration

2014-12-14 Thread Joe Stein
I like the idea of the KafkaRDD and Spark partition/split per Kafka partition. That is good use of the SimpleConsumer. I can see a few different strategies for the commitOffsets and partitionOwnership. What use case are you committing your offsets for? /**

spark kafka batch integration

2014-12-14 Thread Koert Kuipers
hello all, we at tresata wrote a library to provide for batch integration between spark and kafka. it supports: * distributed write of rdd to kafa * distributed read of rdd from kafka our main use cases are (in lambda architecture speak): * periodic appends to the immutable master dataset on hdfs

Re: kafka async producer takes a lot of cpu

2014-12-14 Thread Neha Narkhede
Thanks for reporting the issue, Rajiv. Since we are actively phasing out the old client, it will be very helpful to know what the behavior on the new client is. On Fri, Dec 12, 2014 at 8:12 PM, Rajiv Kurian wrote: > > I am using the kafka java api async client (the one that wraps the Scala > cli

metrics about how behind a replica is?

2014-12-14 Thread Xiaoyu Wang
Hello, If I understand it correctly, when the number of messages a replica is behind from the leader is < replica.lag.max.messages, the replica is considered in sync with the master and are eligible for leader election. This means we can lose at most replica.lag.max.messages messages during leade

Re: csv files to kafka

2014-12-14 Thread Joe Stein
Here is a non-production example of a file load to Kafka using Scala https://github.com/stealthly/f2k/blob/master/src/main/scala/ly/stealth/f2k/KafkaUploader.scala If you wanted to convert that to javacode the Scala compiler does that for you :) and makes it importable as jar to your java program