Re: how to ensure strong consistency with reasonable availabilit

2014-07-23 Thread Jiang Wu (Pricehistory) (BLOOMBERG/ 731 LEX -)
Jun, There're still other concerns regarding ack=-1. A single disk failure may cause data loss for ack=-1. When 2 out 3 brokers fail out of ISR, acknowledged messages may be stored in the leader only. If the leader disk failure happens, then these messages are lost. In a less severe situtation w

Re: much reduced io utilization after upgrade to 0.8.0 -> 0.8.1.1

2014-07-23 Thread Neha Narkhede
Yes, that is most likely the improvement due to which you see the drop in io utilization, though there were several improvements since 0.8.0 that could've helped as well. Thanks, Neha On Tue, Jul 22, 2014 at 9:37 PM, Jason Rosenberg wrote: > I recently upgraded some of our kafka clusters to us

Re: much reduced io utilization after upgrade to 0.8.0 -> 0.8.1.1

2014-07-23 Thread Jay Kreps
Yes, it could definitely be related to KAFKA-615. The default in 0.8.1 is to let the OS handle disk writes. This is much more efficient as it will schedule them in an order friendly to the layout on disk and do a good job of merging adjacent writes. However if you are explicitly configuring an fsyn

Re: much reduced io utilization after upgrade to 0.8.0 -> 0.8.1.1

2014-07-23 Thread Jason Rosenberg
Thanks for the improvement! (I'm not explicitly configuring fsync policy) Jason On Wed, Jul 23, 2014 at 12:33 PM, Jay Kreps wrote: > Yes, it could definitely be related to KAFKA-615. The default in 0.8.1 > is to let the OS handle disk writes. This is much more efficient as it > will schedu

Kafka consumer per topic

2014-07-23 Thread Nickolas Simi
Hello All, I hope that this is the right place for this question, I am trying to determine if I have a separate connection per kafka topic that I want to consume if that would cause any performance, or usage problems for my kafka servers or the clients? Thank you, Nick The information and at

Re: Kafka consumer per topic

2014-07-23 Thread Philip O'Toole
How many partitions in your topic? Are you talking about Producing or Consuming? All those factors will determine the number of TCP connections to your Kafka cluster. In any event, Kafka can support lots, and lots, and lots, of connections (I've run systems with hundreds of connections to a 3-

Re: [DISCUSS] Kafka Security Specific Features

2014-07-23 Thread Chris Neal
Pramod, I got that same error when following the configuration from Raja's presentation earlier in this thread. If you'll notice the usage for the console_producer.sh, it is slightly different, which is also slightly different than the scala code for the ConsoleProducer. :) When I changed this:

Partitions per Machine for a topic

2014-07-23 Thread Kashyap Mhaisekar
HI, Is the maximum no. of partitions for a topic dependent on the no. of machines in a kafka cluster? For e.g., if I have 3 machines in a cluster, can I have 5 partitions with a caveat that one machine can host multiple partitions for a given topic? Regards, Kashyap

Re: Partitions per Machine for a topic

2014-07-23 Thread Philip O'Toole
Brokers can host multiple partitions for the same topic without any problems. Philip   - http://www.philipotoole.com On Wednesday, July 23, 2014 2:15 PM, Kashyap Mhaisekar wrote: HI, Is the maximum no. of partitions for a topic dependent on the no. of

Kafka on yarn

2014-07-23 Thread hsy...@gmail.com
Hi guys, Kafka is getting more and more popular and in most cases people run kafka as long-term service in the cluster. Is there a discussion of running kafka on yarn cluster which we can utilize the convenient configuration/resource management and HA. I think there is a big potential and require

Re: Kafka on yarn

2014-07-23 Thread Kam Kasravi
Hi  Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a particular resource since the broker needs to always use its local data. YARN doesn't do this well, unless you provide (override) the default scheduler (CapacityScheduler or FairScheduler). SequenceIO did something alo

Re: Kafka on yarn

2014-07-23 Thread Joe Stein
There are folks that run Kafka Brokers on Apache Mesos. I don't know of anyone running Kafka brokers on YARN but if there were I would hope they chime in. Without getting into a long debate about Mesos vs YARN I do agree with cluster resource allocation being an important direction for the indust

Re: Kafka on yarn

2014-07-23 Thread Jay Kreps
Hey Kam, It would be nice to have a way to get a failed node back with it's original data, but this isn't strictly necessary, it is just a good optimization. As long as you run with replication you can restart a broker elsewhere with no data, and it will restore it's state off the other replicas.

num.partitions vs CreateTopicCommand.main(args)

2014-07-23 Thread Mingtao Zhang
Hi All, In kafka.properties, I put (forgot to change): num.partitions=1 While I create topics programatically: String[] args = new String[]{ "--zookeeper", config.getString("zookeeper"), "--topic", config.getString("topic"), "--replica", config.getStr

Re: Kafka on yarn

2014-07-23 Thread Kam Kasravi
Thanks Joe for the input related to Mesos as well as acknowledging the need for YARN to support this type of cluster allocation - long running services with node locality priority.  Thanks Jay - That's an interesting fact that I wasn't aware of - though I imagine there could possibly be a long

Re: num.partitions vs CreateTopicCommand.main(args)

2014-07-23 Thread Guozhang Wang
num.partitions is only used as a default value when the createTopic command does not specify the num.partitions or it is automatically created. In your case since you always use its value in the createTopic you will always can one partition. Try change your code to sth. like: String[] args

Re: Kafka on yarn

2014-07-23 Thread Steve Morin
Kam, Give it some time and think it's getting better as a real possibility for Kafka on Yarn. There are new capabilities coming out in Yarn/HDFS to allow for node groups/label that can work with locality and secondarily new functionality in HDFS that depending on the use-case can be very interes

Re: Kafka on yarn

2014-07-23 Thread Jay Kreps
Yeah restoring data is definitely expensive. If you have 5TB/machine then you will need to restore 5TB of data. Running this way then there is no particular functionality you need out of the app master other than and setting the right node id. Obviously you do need HA RM to make this work. I think

Re: num.partitions vs CreateTopicCommand.main(args)

2014-07-23 Thread Mingtao Zhang
Thank you for the clarification! In fact, the config instance is our own file ... Mingtao On Wed, Jul 23, 2014 at 7:57 PM, Guozhang Wang wrote: > num.partitions is only used as a default value when the createTopic command > does not specify the num.partitions or it is automatically created. I

Re: Kafka on yarn

2014-07-23 Thread hsy...@gmail.com
Thanks guys for your knowledge. Is there any other concern on producer/consumer side? My understanding is High level consumer and producer would refresh metadata of the cluster and detect the leadership change or node failure. I guess, there shouldn't be anything worried if I delete 1 broker and a

Re: Kafka on yarn

2014-07-23 Thread Gwen Shapira
Hi, Can we discuss for a moment the use-case of Kafka-on-YARN? I (as Cloudera field engineer) typically advise my customers to install Kafka on their own nodes, to allow Kafka uninterrupted access to disks. Hadoop processes tend to be a bit IO heavy. Also, I can't see any benefit from co-locating