Re: Recommendation for number of brokers on kafka(0.7.2) hosts

2013-10-01 Thread rk vishu
Thank you Neha for the suggestion. On Tue, Oct 1, 2013 at 1:50 PM, Neha Narkhede wrote: > 1) Will setting 4 brokers per host with different ports and different log > data directories be beneficial to use all the available space? > 2) Will there be any disadvantage using multiple brokers on same

Re: Recommendation for number of brokers on kafka(0.7.2) hosts

2013-10-01 Thread Neha Narkhede
1) Will setting 4 brokers per host with different ports and different log data directories be beneficial to use all the available space? 2) Will there be any disadvantage using multiple brokers on same host? It is recommended that you do not deploy multiple brokers on the same box since that will

Recommendation for number of brokers on kafka(0.7.2) hosts

2013-10-01 Thread rk vishu
Hello All, I am currently using 5 node kafka cluster with 0.7.2 version. Would like to get some advice on optimal number of brokers on each kafka host. Below is the specification of each machine - 4 data directories /data1,/data2, /data3, /data4 with 200+GB usable space. RAID10 - 24 Core CPU - 32

Re: use case with high rate of duplicate messages

2013-10-01 Thread Neha Narkhede
It is only available in 0.8.1 (current trunk) which has not been released yet. We plan to release it right after 0.8-final is out. Here are some wikis that describe the deduplication feature - https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal https://cwiki.apache.org/conflu

RE: use case with high rate of duplicate messages

2013-10-01 Thread Sybrandy, Casey
Interesting. I didn't know that Kafka had deduplication capabilities. How do you leverage it? Also, is it supported in Kafka 0.7.x? -Original Message- From: Guozhang Wang [mailto:wangg...@gmail.com] Sent: Tuesday, October 01, 2013 11:33 AM To: users@kafka.apache.org Subject: Re: use c

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Thanks for reply, David, your library is great and indeed the rebalancing is currently somewhat quirky and complicated. And I guess it doesn't make sense to implement it considering 0.9 is planned relatively soon. On Tue, Oct 1, 2013 at 10:09 AM, David Arthur wrote: > Kane, > > I'm the creator

Re: as i understand rebalance happens on client side

2013-10-01 Thread David Arthur
Kane, I'm the creator of kafka-python, just thought I'd give some insight. Consumer rebalancing is actually pretty tricky to get right. It requires interaction with ZooKeeper which (though possible via kazoo) is something I've tried to avoid in kafka-python. It also seems a little strange to

Re: as i understand rebalance happens on client side

2013-10-01 Thread Neha Narkhede
Currently we are preparing for the 0.8-final release. But 0.9 is not slated to release until early next year. Thanks, Neha On Tue, Oct 1, 2013 at 9:26 AM, Kane Kane wrote: > Btw, is it expected to be released on Oct 31? > > Thanks! > > > On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede >wrote: >

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Btw, is it expected to be released on Oct 31? Thanks! On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede wrote: > We do plan to move the group membership over to the server side and have a > very thin consumer client. The proposal is here - > > https://cwiki.apache.org/confluence/display/KAFKA/Clien

Re: Iterator Question

2013-10-01 Thread Neha Narkhede
It is recommended you use the iterator() API since that invokes Kafka's ConsumerIterator which has state management logic for consuming Kafka messages properly. If you use toIterator(), it just gives you a plain Scala iterator over KafkaStream. Thanks, Neha On Tue, Oct 1, 2013 at 6:03 AM, Sybran

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Thanks! Direction in that proposal looks very good, I wish that would be implemented already On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede wrote: > We do plan to move the group membership over to the server side and have a > very thin consumer client. The proposal is here - > > https://cwiki.apa

Re: as i understand rebalance happens on client side

2013-10-01 Thread Neha Narkhede
We do plan to move the group membership over to the server side and have a very thin consumer client. The proposal is here - https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPIand this is being planned for the 0.9 release. Once this is complete, the non-java c

Re: bandwidth usage issue

2013-10-01 Thread Neha Narkhede
This is a capacity planning issue. I think the right thing to do here is to expand the cluster and use the partition reassignment tool to move some partitions over to the new brokers to evenly spread out the load. Thanks, Neha On Tue, Oct 1, 2013 at 8:53 AM, Yu, Libo wrote: > Hi team, > > Here

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
The reason i was asking is that this library seems to have support only for SimpleConsumer https://github.com/mumrah/kafka-python/, i was curious if all should be implemented on client or kafka has some rebalancing logic and prevent consuming from the same queue on server side in case of SimpleCons

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Yes, and I actually want to use high level api, i just didn't understand how kafka works inside. For some reason i thought in case of SimpleConsumer, kafka would implement locking and rebalancing on server side, letting client know that it can't attach to specific partition, etc. But i see now that

bandwidth usage issue

2013-10-01 Thread Yu, Libo
Hi team, Here is a usage case: Assume each host in a kafka cluster a gigabit network adaptor. And the incoming traffic is 0.8gbps and at one point all the traffic goes to one host. The remaining bandwidth is not enough for the followers to replicate messages from this leader. To make sure no b

Re: as i understand rebalance happens on client side

2013-10-01 Thread Guozhang Wang
I do not understand your question, what are you trying to implement? On Tue, Oct 1, 2013 at 8:42 AM, Kane Kane wrote: > So essentially you can't do "queue" pattern, unless you somehow implement > locking on the client? > > > On Tue, Oct 1, 2013 at 8:35 AM, Guozhang Wang wrote: > > > SimpleCons

Re: as i understand rebalance happens on client side

2013-10-01 Thread Neha Narkhede
I think you want locking to prevent two processes from consuming the same partition. You can just use the high level consumer to get this group management functionality. It ensures that a partition is only consumed by one high level consumer at any given time. Is there any particular reason why you

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
So essentially you can't do "queue" pattern, unless you somehow implement locking on the client? On Tue, Oct 1, 2013 at 8:35 AM, Guozhang Wang wrote: > SimpleConsumer do not have any concept of group management, only the > high-level consumers have. So multiple simple consumers can independentl

Re: use case with high rate of duplicate messages

2013-10-01 Thread Guozhang Wang
Batch processing will increase the throughput but also increase latency, how large latency your real-time processing can tolerate? One thing you could try is to use the keyed messages, with key as the md5 hash of your message. Kafka has a deduplication mechanism on the brokers that dedup messages

Re: as i understand rebalance happens on client side

2013-10-01 Thread Guozhang Wang
SimpleConsumer do not have any concept of group management, only the high-level consumers have. So multiple simple consumers can independently consume from the same partition(s). Guozhang On Tue, Oct 1, 2013 at 8:11 AM, Kane Kane wrote: > Yeah, I noticed that, i'm curious how balancing happens

use case with high rate of duplicate messages

2013-10-01 Thread S Ahmed
I have a use case where thousands of servers send status type messages, which I am currently handling real-time w/o any kind of queueing system. So currently when I receive a message, and perform a md5 hash of the message, perform a lookup in my database to see if this is a duplicate, if not, I st

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Yeah, I noticed that, i'm curious how balancing happens if SimpleConsumer is used. I.e. i can provide a partition to read from if i use SimpleConsumer, but what if someone else already attached to that partition, what would happen? Also what would happen if one SimpleConsumer attached to all partit

Re: Strategies for auto generating broker ID

2013-10-01 Thread Aniket Bhatnagar
Right. It is currently java integer. However, as per previous thread, it seems possible to change it to a string. In that case, we can use instance IDs, IP addresses, custom ID generators, etc. How are you currently generating broker IDs from IP address? Chef script or custom shell script? On 1 O

Re: as i understand rebalance happens on client side

2013-10-01 Thread Neha Narkhede
There are 2 types of consumer clients in Kafka - ZookeeperConsumerConnector and SimpleConsumer. Only the former has the re balancing logic. Thanks, Neha On Oct 1, 2013 6:30 AM, "Kane Kane" wrote: > But it looks like some clients don't implement it? >

as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
But it looks like some clients don't implement it?

Re: Strategies for auto generating broker ID

2013-10-01 Thread Maxime Brugidou
I think it currently is a java (signed) integer or maybe this was zookeeper? We are generating the id from IP address for now but this is not ideal (and can cause integer overflow with java signed ints) On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar" wrote: > I would like to revive an older thread ar

Iterator Question

2013-10-01 Thread Sybrandy, Casey
Hello, What's the difference between the .toIterator() and .iterator() methods for KafkaStream? I see they return different types and one of my coworkers is noticing that when he uses .toIterator() will block at times where .iterator() will not block. Casey

Running Kafka 0.8 as supervised service

2013-10-01 Thread Aniket Bhatnagar
Has anyone been able to install and start Kafka 0.8 as a supervised service so that it comes back up after a crash/reboot/etc?

Strategies for auto generating broker ID

2013-10-01 Thread Aniket Bhatnagar
I would like to revive an older thread around auto generating broker ID. As a AWS user, I would like Kafka to just use the instance's ID or instance's IP or instance's internal domain (whichever is easier). This would mean I can easily clone from a AMI to launch kafka instances without having to wo