Explicit control over flushing the messages

2015-03-04 Thread Ponmani Rayar
Hi Group, I have started using Kafka 0.8.2 with the new producer API. Just wanted to know if we can have an explicit control over flushing the messages batch to Kafka cluster. Configuring batch.size will flush the messages when the batch.size is reached for a partition. But is there any

RE: Kafka producer failed to send but actually does

2015-03-04 Thread Arunkumar Srambikkal (asrambik)
Thanks for responding. I was creating an instance of kafka.server.KafkaServer in my code for running some tests and this was what I referred to by an embedded broker. The scenario you described was what was happening. In my case when I kill my broker, it fails to send an ack. I added handling

JSON parsing causing rebalance to fail

2015-03-04 Thread Arunkumar Srambikkal (asrambik)
Hi, When I start a new consumer, it throws a Rebalance exception. However I hit it only on some machines where the run time libraries are different The stack given below is what I encounter - is this a known issue? I saw this Jira but it's not resolved so thought to confirm - https://issues.

Re: reassign a topic partition which has no ISR and leader set to -1

2015-03-04 Thread todd
When we ran in to this problem we ended up going in to zookeeper and changing the leader to point to one of the replicas, then did a force leader election. This got the partition back online.   Original Message   From: Virendra Pratap Singh Sent: Wednesday, March 4, 2015 2:00 AM To: Gwen Shapir

Kafka web console error

2015-03-04 Thread Bhuvana Baskar
Hi, Using kafka-Web-Console: when i run the command play start, it works fine. I tried to register the zookeeper, but getting the below error. *java.nio.channels.ClosedChannelException* at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)

Re: Database Replication Question

2015-03-04 Thread Josh Rader
Thanks everyone for your responses! These are great. It seems our cases matches closest to Jay's recommendations. The one part that sounds a little tricky is point #5 'Include in each message the database's transaction id, scn, or other identifier '. This is pretty straightforward with the RDBM

Re: Got negative offset lag after restarting brokers

2015-03-04 Thread tao xiao
Thanks guy. with unclean.leader.election.enable set to false the issue is fixed On Tue, Mar 3, 2015 at 2:50 PM, Gwen Shapira wrote: > of course :) > unclean.leader.election.enable > > On Mon, Mar 2, 2015 at 9:10 PM, tao xiao wrote: > > How do I achieve point 3? is there a config that I can set?

Re: Explicit control over flushing the messages

2015-03-04 Thread Jeff Holoman
Take a look here: https://issues.apache.org/jira/browse/KAFKA-1865 On Wed, Mar 4, 2015 at 4:28 AM, Ponmani Rayar wrote: > Hi Group, > > I have started using Kafka 0.8.2 with the new producer API. > Just wanted to know if we can have an explicit control over flushing the > messages b

Re: Database Replication Question

2015-03-04 Thread Xiao
Hi, Josh, That depends on how you implemented it. Basically, Kafka can provide a good throughput only when you have multiple partitions. - If you have multiple consumers and multiple partitions, each of which has a dedicated partition. That means, you need a coordinator to ensure all the c

Re: Database Replication Question

2015-03-04 Thread Jay Kreps
Hey Josh, NoSQL DBs may actually be easier because they themselves generally don't have a global order. I.e. I believe Mongo has a per-partition oplog, is that right? Their partitions would match our partitions. -Jay On Wed, Mar 4, 2015 at 5:18 AM, Josh Rader wrote: > Thanks everyone for your

Trying to get kafka data to Hadoop

2015-03-04 Thread max square
Hi all, I have browsed through different conversations around Camus, and bring this as a kinda Kafka question. I know is not the most orthodox, but if someone has some thoughts I'd appreciate ir. That said, I am trying to set up Camus, using a 3 node Kafka cluster 0.8.2.1, using a project that is

Re: Database Replication Question

2015-03-04 Thread Jay Kreps
Hey Xiao, 1. Nothing prevents applying transactions transactionally on the destination side, though that is obviously more work. But I think the key point here is that much of the time the replication is not Oracle=>Oracle, but Oracle=>{W, X, Y, Z} where W/X/Y/Z are totally heterogenous systems th

Re: Database Replication Question

2015-03-04 Thread Xiao
Hey Jay, Yeah. I understood the advantage of Kafka is one to many. That is why I am reading the source codes of Kafka. Your guys did a good product! : ) Our major concern is its message persistency. Zero data loss is a must in our applications. Below is what I copied from the Kafka document.

Re: Database Replication Question

2015-03-04 Thread Jay Kreps
Hey Xiao, Yeah I agree that without fsync you will not get durability in the case of a power outage or other correlated failure, and likewise without replication you won't get durability in the case of disk failure. If each batch is fsync'd it will definitely be slower, depending on the capabilit

Re: Database Replication Question

2015-03-04 Thread Jonathan Hodges
Yes you are right on the oplog per partition as well as that mapping well to the Kafka partitions. I think we are making this harder than it is based on previous attempts and trying to leverage something like Databus for propagating log changes from MongoDB and Cassandra since it requires a scn.

New Errors in 0.8.2 Protocol

2015-03-04 Thread Evan Huus
Hey all, it seems that 0.8.2 has added a handful more errors to the protocol which are not yet reflected on the wiki page [1]. Specifically, [2] seems to indicate that codes 17-20 now have associated meanings. My questions are: - Which of these are exposed "publicly"? (for example, the existing er

Problem deleting topics in 0.8.2?

2015-03-04 Thread Jeff Schroeder
So I've got 3 kafka brokers that were started with delete.topic.enable set to true. When they start, I can see in the logs that the property was successfully set. The dataset in each broker is only approximately 2G (per du). When running kafaka-delete.sh with the correct arguments to delete all of

Re: Problem deleting topics in 0.8.2?

2015-03-04 Thread Harsha
Hi Jeff, Are you seeing any errors in state-change.log or controller.log after issuing kafka-topics.sh --delete command. There is another known issue is if you have auto.topic.enable.create = true (this is true by default) your consumer or producer can re-create the topic. So try

Re: Problem deleting topics in 0.8.2?

2015-03-04 Thread Timothy Chen
Hi Jeff, The controller should have a Topic deletion thread running coordinating the delete in the cluster, and the progress should be logged to the controller log. Can you look at the controller log to see what's going on? Tim On Wed, Mar 4, 2015 at 10:28 AM, Jeff Schroeder wrote: > So I've g

Re: New Errors in 0.8.2 Protocol

2015-03-04 Thread Joe Stein
Hey Evan, moving forward (so 0.8.3.0 and beyond) the release documentation is going to match up more with specific KIP changes https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals which elaborated on things like "breaking changes" and "major modifications you should adopt b

Re: Problem deleting topics in 0.8.2?

2015-03-04 Thread Jeff Schroeder
Timothy and Harsha, Conveniently, a coworker figured this out almost immediately after I sent this email. I was passing the zookeeper bits as: --zookeeper 'host1:2181,host2:2181,host3:2181/path/to/zk/chroot' When the actual correct thing to do was: --zookeeper 'host1:2181/path/to/zk/chroot' I

NodeJS Consumer library for 0.8.2

2015-03-04 Thread Julio Castillo
Looking around the nom repo, it looks like there is no current support for 0.8.2. Is the only alternative to use REST/Proxy? Thanks Julio Castillo NOTICE: This e-mail and any attachments to it may be privileged, confidential or contain trade secret information and is intended only for the use

Re: New Errors in 0.8.2 Protocol

2015-03-04 Thread Evan Huus
Thanks Joe, keeping documentation in sync with KIPs does seem like a reasonable process going forward. And I apologize for the confrontational tone I used to end my original email, that was not called for. In the mean time, where can I find the answers to my two actual questions? I think I've figu

high level consumer rollback

2015-03-04 Thread Luiz Geovani Vier
Hello, I'm using the high level consumer with auto-commit disabled and a single thread per consumer, in order to consume messages in batches. In case of failures on the database, I'd like to stop processing, rollback and restart from the last commited offset. Is there a way to receive the messages

Re: high level consumer rollback

2015-03-04 Thread Mayuresh Gharat
As per my knowledge, I don't think we you can do that with an online stream. You will have to reset the offsets to a particular offset in the past to start consuming from that. Another way would be start a separate consumer with different groupId. In any case you cannot consume from past offset wi

Re: Camus Issue about Output File EOF Issue

2015-03-04 Thread Bhavesh Mistry
Hi Gwen, The root cause of all io related problems seems to be file rename that Camus does and underlying Hadoop MapR FS. We are copying files from user volume to a day volume (rename does copy) when mapper commits file to FS. Please refer to http://answers.mapr.com/questions/162562/volume-issue

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread Joel Koshy
I think the camus mailing list would be more suitable for this question. Thanks, Joel On Wed, Mar 04, 2015 at 11:00:51AM -0500, max square wrote: > Hi all, > > I have browsed through different conversations around Camus, and bring this > as a kinda Kafka question. I know is not the most orthodo

Re: high level consumer rollback

2015-03-04 Thread Joel Koshy
This is not possible with the current high-level consumer without a restart, but the new consumer (under development) does have support for this. On Wed, Mar 04, 2015 at 03:04:57PM -0500, Luiz Geovani Vier wrote: > Hello, > > I'm using the high level consumer with auto-commit disabled and a > sin

Re: moving replications

2015-03-04 Thread Joel Koshy
I think what you may be looking for is being discussed here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+rebalancing On Wed, Mar 04, 2015 at 12:34:30PM +0530, sunil kalva wrote: > Is there any way to automate > On Mar 3, 2015 11:57 AM, "sunil kalv

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread Jagat Singh
Also see the related tool http://confluent.io/downloads/ Confluent is bringing the glue together for Kafta , Avro , Camus Though there is no clarity around support (e.g update of Kafta) around it at this moment. On Thu, Mar 5, 2015 at 8:57 AM, Joel Koshy wrote: > I think the camus mailing l

Re: Camus reads from multiple offsets in parallel?

2015-03-04 Thread Yang
Thanks for that info Jun. On Tue, Mar 3, 2015 at 3:56 PM, Jun Rao wrote: > Camus only fetches from different partitions in parallel. > > Thanks, > > Jun > > On Fri, Feb 27, 2015 at 4:24 PM, Yang wrote: > > > we have a single partition, and the topic contains 300k events. > > > > we fired off a

Re: high level consumer rollback

2015-03-04 Thread Luiz Geovani Vier
Thanks, Mayuresh and Joel. Reconnecting works just fine, although it's much more complex than just calling rollback(), so I'm looking forward to the new version :) -Geovani On Wed, Mar 4, 2015 at 4:57 PM, Joel Koshy wrote: > This is not possible with the current high-level consumer without a >

Re: Topicmetadata response miss some partitions information sometimes

2015-03-04 Thread Mayuresh Gharat
Cool. So then this is a non issue then. To make things better we can expose the availablePartitons() api through Kafka producer. What do you think? Thanks, Mayuresh On Tue, Mar 3, 2015 at 4:56 PM, Guozhang Wang wrote: > Hey Jun, > > You are right. Previously I thought only in your recent patch

Re: Database Replication Question

2015-03-04 Thread James Cheng
Another thing to think about is delivery guarantees. Exactly once, at least once, etc. If you have a publisher that consumes from the database log and pushes out to Kafka, and then the publisher crashes, what happens when it starts back up? Depending on how you keep track of the database's tran

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread Lakshmanan Muthuraman
I think the libjars is not required. Maven package command for the camus project, builds the uber jar(fat jar) which contains all the dependencies in it. I generally run camus the following way. hadoop jar camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.prope

Re: Database Replication Question

2015-03-04 Thread Jonathan Hodges
Thanks James. This is really helpful. Another extreme edge case might be that the single producer is sending the database log changes and the network causes them to reach Kafka out of order. How do you prevent something like this, I guess relying on the scn on the consumer side? On Wed, Mar 4,

RE: Trying to get kafka data to Hadoop

2015-03-04 Thread Thunder Stumpges
What branch of camus are you using? We have our own fork that we updated the camus dependency from the avro snapshot of the REST Schema Repository to the new "official" one you mention in github.com/schema-repo. I was not aware of a branch on the main linked-in camus repo that has this. That be

Re: Kafka Poll: Version You Use?

2015-03-04 Thread Otis Gospodnetic
Hello hello, Results of the poll are here! Any guesses before looking? What % of Kafka users are on 0.8.2.x already? What % of people are still on 0.7.x? http://blog.sematext.com/2015/03/04/poll-results-kafka-version-distribution/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized L

Re: Kafka Poll: Version You Use?

2015-03-04 Thread Christian Csar
Do you have a anything on the number of voters, or audience breakdown? Christian On Wed, Mar 4, 2015 at 8:08 PM, Otis Gospodnetic wrote: > Hello hello, > > Results of the poll are here! > Any guesses before looking? > What % of Kafka users are on 0.8.2.x already? > What % of people are still on

Re: Kafka Poll: Version You Use?

2015-03-04 Thread Otis Gospodnetic
Hi, You can see the number of voters in the poll itself (view poll results link in the poll widget). Audience details unknown, but the poll was posted on: * twitter - https://twitter.com/sematext/status/57050147435776 * LinkedIn - a few groups - Kafka, DevOps, and I think another larger one *

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread max square
Thunder, thanks for your reply. The hadoop job is now correctly configured (the client was not getting the correct jars), however I am getting Avro formatting exceptions due to the format the schema-repo server follows. I think I will do something similar and create our own branch that uses the sc

Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-04 Thread Otis Gospodnetic
Hi, On Fri, Feb 27, 2015 at 1:36 AM, James Cheng wrote: > Hi, > > I know that Netflix might be talking about "Kafka on AWS" at the March > meetup, but I wanted to bring up the topic anyway. > > I'm sure that some people are running Kafka in AWS. I'd say most, not some :) > Is anyone running

Re: Best way to show lag?

2015-03-04 Thread Otis Gospodnetic
Hi, On Sat, Feb 28, 2015 at 9:16 AM, Gene Robichaux wrote: > What is the best way to detect consumer lag? > > We are running each consumer as a separate group and I am running the > ConsumerOffsetChecker to assess the partitions and the lag for each > group/consumer. I run this every 5 minutes.

Re: Kafka Poll: Version You Use?

2015-03-04 Thread Neha Narkhede
Thanks for running the poll and sharing the results! On Wed, Mar 4, 2015 at 8:34 PM, Otis Gospodnetic wrote: > Hi, > > You can see the number of voters in the poll itself (view poll results link > in the poll widget). > Audience details unknown, but the poll was posted on: > * twitter - https://

Re: [kafka-clients] Re: [VOTE] 0.8.2.1 Candidate 2

2015-03-04 Thread Neha Narkhede
+1. Verified quick start, unit tests. On Tue, Mar 3, 2015 at 12:09 PM, Joe Stein wrote: > Ok, lets fix the transient test failure on trunk agreed not a blocker. > > +1 quick start passed, verified artifacts, updates in scala > https://github.com/stealthly/scala-kafka/tree/0.8.2.1 and go > https:

Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-03-04 Thread Colin
Hello, We use docker for kafka on vm's with both nas and local disk. We mount the volumes externally. We havent had many problems at all, and a restart has cleared any issue. We are on .8.1 We are also started to deploy to aws. -- Colin +1 612 859 6129 Skype colin.p.clark > On Mar 4, 2015

Re: Trying to get kafka data to Hadoop

2015-03-04 Thread Neha Narkhede
Thanks Jagat for the callout! Confluent Platform 1.0 includes Camus and we were happy to address any questions in our community mailing list . On Wed, Mar 4, 2015 at 8:41 PM, max square wrote: > Thunder, > > thanks for your reply. The hadoop job is now correctly

please subscribe

2015-03-04 Thread Michael Minar
thank you

Re: Explicit control over flushing the messages

2015-03-04 Thread Ponmani Rayar
Thanks a lot Jeff for redirecting me to the right place.. :-) Is there any tentative date when we can get the official release with this patch. On 4 March 2015 at 19:42, Jeff Holoman wrote: > Take a look here: > > https://issues.apache.org/jira/browse/KAFKA-1865 > > > > On Wed, Mar 4, 2015

Increasing the throughput of Kafka Publisher

2015-03-04 Thread Vineet Mishra
Hi, I am having a Logstash Forwarder which is publishing events to Kafka, but as I can see the rate at which the events is published to Kafka is really very slow. With the reference to some links I could get the Kafka Publish throughput reaching in 50-60Mbs per second but in my case I am hardly g

Mirror maker end to end latency metric

2015-03-04 Thread tao xiao
Hi team, Is there a built-in metric that can measure the end to end latency in MM? -- Regards, Tao

Re: Increasing the throughput of Kafka Publisher

2015-03-04 Thread Roger Hoover
Hi Vineet, Try enabling compression. That improves throughput 3-4x usually for me. Also, you can use async mode if you're willing to trade some chance of dropping messages for more throughput. kafka { codec => 'json' broker_list => "localhost:9092" topic_id => "blah"

Re: Increasing the throughput of Kafka Publisher

2015-03-04 Thread Vineet Mishra
Hi Roger, I have already enabled the snappy, the throughput which I have mentioned is after only. Could you mention what's the throughput you have reaching. Thanks! On Thu, Mar 5, 2015 at 12:56 PM, Roger Hoover wrote: > Hi Vineet, > > Try enabling compression. That improves throughput 3-4x u

Re: Database Replication Question

2015-03-04 Thread James Cheng
> On Mar 3, 2015, at 4:18 PM, Guozhang Wang wrote: > > Additionally to Jay's recommendation, you also need to have some special > cares in error handling of the producer in order to preserve ordering since > producer uses batching and async sending. That is, if you already sent > messages 1,2,3,

Re: Increasing the throughput of Kafka Publisher

2015-03-04 Thread Roger Hoover
Seeing around 5k msgs/s. The messages are small (average 42 bytes after snappy compression) On Wed, Mar 4, 2015 at 11:34 PM, Vineet Mishra wrote: > Hi Roger, > > I have already enabled the snappy, the throughput which I have mentioned is > after only. > > Could you mention what's the throughput