Jun,
There're still other concerns regarding ack=-1. A single disk failure may cause
data loss for ack=-1. When 2 out 3 brokers fail out of ISR, acknowledged
messages may be stored in the leader only. If the leader disk failure happens,
then these messages are lost. In a less severe situtation w
Yes, that is most likely the improvement due to which you see the drop in
io utilization, though there were several improvements since 0.8.0 that
could've helped as well.
Thanks,
Neha
On Tue, Jul 22, 2014 at 9:37 PM, Jason Rosenberg wrote:
> I recently upgraded some of our kafka clusters to us
Yes, it could definitely be related to KAFKA-615. The default in 0.8.1
is to let the OS handle disk writes. This is much more efficient as it
will schedule them in an order friendly to the layout on disk and do a
good job of merging adjacent writes. However if you are explicitly
configuring an fsyn
Thanks for the improvement!
(I'm not explicitly configuring fsync policy)
Jason
On Wed, Jul 23, 2014 at 12:33 PM, Jay Kreps wrote:
> Yes, it could definitely be related to KAFKA-615. The default in 0.8.1
> is to let the OS handle disk writes. This is much more efficient as it
> will schedu
Hello All,
I hope that this is the right place for this question, I am trying to determine
if I have a separate connection per kafka topic that I want to consume if that
would cause any performance, or usage problems for my kafka servers or the
clients?
Thank you,
Nick
The information and at
How many partitions in your topic? Are you talking about Producing or
Consuming? All those factors will determine the number of TCP connections to
your Kafka cluster.
In any event, Kafka can support lots, and lots, and lots, of connections (I've
run systems with hundreds of connections to a 3-
Pramod,
I got that same error when following the configuration from Raja's
presentation earlier in this thread. If you'll notice the usage for the
console_producer.sh, it is slightly different, which is also slightly
different than the scala code for the ConsoleProducer. :)
When I changed this:
HI,
Is the maximum no. of partitions for a topic dependent on the no. of
machines in a kafka cluster?
For e.g., if I have 3 machines in a cluster, can I have 5 partitions with a
caveat that one machine can host multiple partitions for a given topic?
Regards,
Kashyap
Brokers can host multiple partitions for the same topic without any problems.
Philip
-
http://www.philipotoole.com
On Wednesday, July 23, 2014 2:15 PM, Kashyap Mhaisekar
wrote:
HI,
Is the maximum no. of partitions for a topic dependent on the no. of
Hi guys,
Kafka is getting more and more popular and in most cases people run kafka
as long-term service in the cluster. Is there a discussion of running kafka
on yarn cluster which we can utilize the convenient configuration/resource
management and HA. I think there is a big potential and require
Hi
Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a
particular resource since the broker needs to always use its local data. YARN
doesn't do this well, unless you provide (override) the default scheduler
(CapacityScheduler or FairScheduler). SequenceIO did something alo
There are folks that run Kafka Brokers on Apache Mesos. I don't know of
anyone running Kafka brokers on YARN but if there were I would hope they
chime in.
Without getting into a long debate about Mesos vs YARN I do agree with
cluster resource allocation being an important direction for the indust
Hey Kam,
It would be nice to have a way to get a failed node back with it's
original data, but this isn't strictly necessary, it is just a good
optimization. As long as you run with replication you can restart a
broker elsewhere with no data, and it will restore it's state off the
other replicas.
Hi All,
In kafka.properties, I put (forgot to change):
num.partitions=1
While I create topics programatically:
String[] args = new String[]{
"--zookeeper", config.getString("zookeeper"),
"--topic", config.getString("topic"),
"--replica", config.getStr
Thanks Joe for the input related to Mesos as well as acknowledging the need for
YARN to support this type of cluster allocation - long running services with
node locality priority.
Thanks Jay - That's an interesting fact that I wasn't aware of - though I
imagine there could possibly be a long
num.partitions is only used as a default value when the createTopic command
does not specify the num.partitions or it is automatically created. In your
case since you always use its value in the createTopic you will always can
one partition. Try change your code to sth. like:
String[] args
Kam,
Give it some time and think it's getting better as a real possibility for
Kafka on Yarn. There are new capabilities coming out in Yarn/HDFS to allow
for node groups/label that can work with locality and secondarily new
functionality in HDFS that depending on the use-case can be very
interes
Yeah restoring data is definitely expensive. If you have 5TB/machine
then you will need to restore 5TB of data. Running this way then there
is no particular functionality you need out of the app master other
than and setting the right node id.
Obviously you do need HA RM to make this work. I think
Thank you for the clarification!
In fact, the config instance is our own file ...
Mingtao
On Wed, Jul 23, 2014 at 7:57 PM, Guozhang Wang wrote:
> num.partitions is only used as a default value when the createTopic command
> does not specify the num.partitions or it is automatically created. I
Thanks guys for your knowledge. Is there any other concern on
producer/consumer side? My understanding is High level consumer and
producer would refresh metadata of the cluster and detect the leadership
change or node failure. I guess, there shouldn't be anything worried if I
delete 1 broker and a
Hi,
Can we discuss for a moment the use-case of Kafka-on-YARN?
I (as Cloudera field engineer) typically advise my customers to
install Kafka on their own nodes, to allow Kafka uninterrupted access
to disks. Hadoop processes tend to be a bit IO heavy. Also, I can't
see any benefit from co-locating
21 matches
Mail list logo