Thank you Neha for the suggestion.
On Tue, Oct 1, 2013 at 1:50 PM, Neha Narkhede wrote:
> 1) Will setting 4 brokers per host with different ports and different log
> data directories be beneficial to use all the available space?
> 2) Will there be any disadvantage using multiple brokers on same
1) Will setting 4 brokers per host with different ports and different log
data directories be beneficial to use all the available space?
2) Will there be any disadvantage using multiple brokers on same host?
It is recommended that you do not deploy multiple brokers on the same box
since that will
Hello All,
I am currently using 5 node kafka cluster with 0.7.2 version. Would like to
get some advice on optimal number of brokers on each kafka host. Below is
the specification of each machine
- 4 data directories /data1,/data2, /data3, /data4 with 200+GB usable
space. RAID10
- 24 Core CPU
- 32
It is only available in 0.8.1 (current trunk) which has not been released
yet. We plan to release it right after 0.8-final is out. Here are some
wikis that describe the deduplication feature -
https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal
https://cwiki.apache.org/conflu
Interesting. I didn't know that Kafka had deduplication capabilities. How do
you leverage it? Also, is it supported in Kafka 0.7.x?
-Original Message-
From: Guozhang Wang [mailto:wangg...@gmail.com]
Sent: Tuesday, October 01, 2013 11:33 AM
To: users@kafka.apache.org
Subject: Re: use c
Thanks for reply, David, your library is great and indeed the rebalancing
is currently somewhat quirky and complicated. And I guess it doesn't make
sense to implement it considering 0.9 is planned relatively soon.
On Tue, Oct 1, 2013 at 10:09 AM, David Arthur wrote:
> Kane,
>
> I'm the creator
Kane,
I'm the creator of kafka-python, just thought I'd give some insight.
Consumer rebalancing is actually pretty tricky to get right. It requires
interaction with ZooKeeper which (though possible via kazoo) is
something I've tried to avoid in kafka-python. It also seems a little
strange to
Currently we are preparing for the 0.8-final release. But 0.9 is not slated
to release until early next year.
Thanks,
Neha
On Tue, Oct 1, 2013 at 9:26 AM, Kane Kane wrote:
> Btw, is it expected to be released on Oct 31?
>
> Thanks!
>
>
> On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede >wrote:
>
Btw, is it expected to be released on Oct 31?
Thanks!
On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede wrote:
> We do plan to move the group membership over to the server side and have a
> very thin consumer client. The proposal is here -
>
> https://cwiki.apache.org/confluence/display/KAFKA/Clien
It is recommended you use the iterator() API since that invokes Kafka's
ConsumerIterator which has state management logic for consuming Kafka
messages properly. If you use toIterator(), it just gives you a plain Scala
iterator over KafkaStream.
Thanks,
Neha
On Tue, Oct 1, 2013 at 6:03 AM, Sybran
Thanks! Direction in that proposal looks very good, I wish that would be
implemented already
On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede wrote:
> We do plan to move the group membership over to the server side and have a
> very thin consumer client. The proposal is here -
>
> https://cwiki.apa
We do plan to move the group membership over to the server side and have a
very thin consumer client. The proposal is here -
https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPIand
this is being planned for the 0.9 release. Once this is complete, the
non-java c
This is a capacity planning issue. I think the right thing to do here is to
expand the cluster and use the partition reassignment tool to move some
partitions over to the new brokers to evenly spread out the load.
Thanks,
Neha
On Tue, Oct 1, 2013 at 8:53 AM, Yu, Libo wrote:
> Hi team,
>
> Here
The reason i was asking is that this library seems to have support only for
SimpleConsumer https://github.com/mumrah/kafka-python/, i was curious if
all should be implemented on client or kafka has some rebalancing logic and
prevent consuming from the same queue on server side in case of
SimpleCons
Yes, and I actually want to use high level api, i just didn't understand
how kafka works inside. For some reason i thought in case of
SimpleConsumer, kafka would implement locking and rebalancing on server
side, letting client know that it can't attach to specific partition, etc.
But i see now that
Hi team,
Here is a usage case: Assume each host in a kafka cluster a gigabit network
adaptor.
And the incoming traffic is 0.8gbps and at one point all the traffic goes to
one host.
The remaining bandwidth is not enough for the followers to replicate messages
from
this leader.
To make sure no b
I do not understand your question, what are you trying to implement?
On Tue, Oct 1, 2013 at 8:42 AM, Kane Kane wrote:
> So essentially you can't do "queue" pattern, unless you somehow implement
> locking on the client?
>
>
> On Tue, Oct 1, 2013 at 8:35 AM, Guozhang Wang wrote:
>
> > SimpleCons
I think you want locking to prevent two processes from consuming the same
partition. You can just use the high level consumer to get this group
management functionality. It ensures that a partition is only consumed by
one high level consumer at any given time. Is there any particular reason
why you
So essentially you can't do "queue" pattern, unless you somehow implement
locking on the client?
On Tue, Oct 1, 2013 at 8:35 AM, Guozhang Wang wrote:
> SimpleConsumer do not have any concept of group management, only the
> high-level consumers have. So multiple simple consumers can independentl
Batch processing will increase the throughput but also increase latency,
how large latency your real-time processing can tolerate?
One thing you could try is to use the keyed messages, with key as the md5
hash of your message. Kafka has a deduplication mechanism on the brokers
that dedup messages
SimpleConsumer do not have any concept of group management, only the
high-level consumers have. So multiple simple consumers can independently
consume from the same partition(s).
Guozhang
On Tue, Oct 1, 2013 at 8:11 AM, Kane Kane wrote:
> Yeah, I noticed that, i'm curious how balancing happens
I have a use case where thousands of servers send status type messages,
which I am currently handling real-time w/o any kind of queueing system.
So currently when I receive a message, and perform a md5 hash of the
message, perform a lookup in my database to see if this is a duplicate, if
not, I st
Yeah, I noticed that, i'm curious how balancing happens if SimpleConsumer
is used. I.e. i can provide a partition to read from if i use
SimpleConsumer, but what if someone else already attached to that
partition, what would happen? Also what would happen if one SimpleConsumer
attached to all partit
Right. It is currently java integer. However, as per previous thread, it
seems possible to change it to a string. In that case, we can use instance
IDs, IP addresses, custom ID generators, etc.
How are you currently generating broker IDs from IP address? Chef script or
custom shell script?
On 1 O
There are 2 types of consumer clients in Kafka - ZookeeperConsumerConnector
and SimpleConsumer. Only the former has the re balancing logic.
Thanks,
Neha
On Oct 1, 2013 6:30 AM, "Kane Kane" wrote:
> But it looks like some clients don't implement it?
>
But it looks like some clients don't implement it?
I think it currently is a java (signed) integer or maybe this was zookeeper?
We are generating the id from IP address for now but this is not ideal (and
can cause integer overflow with java signed ints)
On Oct 1, 2013 12:52 PM, "Aniket Bhatnagar"
wrote:
> I would like to revive an older thread ar
Hello,
What's the difference between the .toIterator() and .iterator() methods for
KafkaStream? I see they return different types and one of my coworkers is
noticing that when he uses .toIterator() will block at times where .iterator()
will not block.
Casey
Has anyone been able to install and start Kafka 0.8 as a supervised service
so that it comes back up after a crash/reboot/etc?
I would like to revive an older thread around auto generating broker ID. As
a AWS user, I would like Kafka to just use the instance's ID or instance's
IP or instance's internal domain (whichever is easier). This would mean I
can easily clone from a AMI to launch kafka instances without having to
wo
30 matches
Mail list logo