Re: Consuming Kafka Messages Inside of EC2 Instances

2015-01-28 Thread Su She
Thank you Dillian and Guozhang for the responses. Yes, Dillian you are understanding my issue correctly. I am not sure what the best approach to this is...I'm not sure if there's a way to whitelist certain IPs, create a VPC, use the cluster launcher as the kafka zookeeper/broker. I guess this is m

Re: Error writing to highwatermark file

2015-01-28 Thread Guozhang Wang
Hi, Could you check if the specified file: /nfs/trust-machine/machine11/kafka-logs/replication-offset-checkpoint.tmp is not deleted beforehand? Guozhang On Tue, Jan 27, 2015 at 10:54 PM, wan...@act.buaa.edu.cn < wan...@act.buaa.edu.cn> wrote: > Hi, > I use 2 brokers as a cluster, and write d

[VOTE] 0.8.2.0 Candidate 3

2015-01-28 Thread Jun Rao
This is the third candidate for release of Apache Kafka 0.8.2.0. Release Notes for the 0.8.2.0 release https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/RELEASE_NOTES.html *** Please download, test and vote by Saturday, Jan 31, 11:30pm PT Kafka's KEYS file containing PGP keys we use to

Re: Consuming Kafka Messages Inside of EC2 Instances

2015-01-28 Thread Guozhang Wang
Su, Does this help for your case? https://cwiki.apache.org/confluence/display/KAFKA/FAQ Guozhang On Wed, Jan 28, 2015 at 3:36 PM, Dillian Murphey wrote: > Am I understanding your question correctly... You're asking how do you > establish connectivity to an instance in a private subnet from th

Re: Poll RESULTS: Producer/Consumer languages

2015-01-28 Thread Guozhang Wang
Thanks for sharing this Otis, I am also quite surprised about Python / Go popularity. At LinkedIn we use a REST proxy server for our non-java clients, but introducing a second hop will also bring more overhead as well as complexities, such as producer acking and offset committing, etc. So I think

Re: How to mark a message as needing to retry in Kafka?

2015-01-28 Thread Christian Csar
noodles, Without an external mechanism you won't be able to mark individual messages/offsets as needing to be retried at a later time. Guozhang is describing a way to get the offset of a message that's been received so that you can find it later. You would need to save that into a 'failed messag

Re: How to mark a message as needing to retry in Kafka?

2015-01-28 Thread noodles
I did not describe my problem clearly. In my case, I got the message from Kakfa, but I could not handle this message because of some reason, for example the external server is down. So I want to mark the message as not being consumed directly. 2015-01-28 23:26 GMT+08:00 Guozhang Wang : > Hi, > >

Re: How to mark a message as needing to retry in Kafka?

2015-01-28 Thread Guozhang Wang
I see. If you are using the high-level consumer, once the message is returned to the application it is considered "consumed", and current it is not supported to "re-wind" to a previously consumed message. With the new consumer coming in 0.8.3 release, we have an api for you to get the offset of ea

Error writing to highwatermark file

2015-01-28 Thread wan...@act.buaa.edu.cn
Hi, I use 2 brokers as a cluster, and write data into nfs. I encountered this problem several days after starting the brokers : FATAL [Replica Manager on Broker 1]: Error writing to highwatermark file: (kafka.server.ReplicaManager) java.io.IOException: File rename from /nfs/trust-machine/machi

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Jay Kreps
Yeah Joe is exactly right. Let's not confuse scala apis with the existing Scala clients There are a ton of downsides to those clients. They aren't going away any time in the forceable future, so don't stress, but I think we can kind of "deprecate" them and try to shame people into upgrading. For

Re: One or multiple instances of MM to aggregate kafka data to one hadoop

2015-01-28 Thread Daniel Compton
Hi Mingjie I would recommend the first option of running one mirrormaker instance pulling from multiple DC's. A single MM instance will be able to make more efficient use of the machine resources in two ways: 1. You will only have to run one process which will be able to be allocated the full amo

Re: Consuming Kafka Messages Inside of EC2 Instances

2015-01-28 Thread Dillian Murphey
Am I understanding your question correctly... You're asking how do you establish connectivity to an instance in a private subnet from the outside world? Are you thinking in terms of zookeeper or just general aws network connectivity? On Wed, Jan 28, 2015 at 11:03 AM, Su She wrote: > Hello All,

Re: Resilient Producer

2015-01-28 Thread Lakshmanan Muthuraman
We have been using Flume to solve a very similar usecase. Our servers write the log files to a local file system, and then we have flume agent which ships the data to kafka. Flume you can use as exec source running tail. Though the exec source runs well with tail, there are issues if the agent goe

Re: Routing modifications at runtime

2015-01-28 Thread Lakshmanan Muthuraman
Hi Toni, Couple of thoughts. 1. Kafka behaviour need not be changed at run time. Your producers which push your MAC data into kafka should know to which topic it should write. Your producer can be flume, log stash or it can be your own custom written java producer. As long as your producer know

One or multiple instances of MM to aggregate kafka data to one hadoop

2015-01-28 Thread Mingjie Lai
Hi. We have a pretty typical data ingestion use case that we use mirrormaker at one hadoop data center, to mirror kafka data from multiple remote application data centers. I know mirrormaker can support to consume kafka data from multiple kafka source, by one instance at one physical node. By this

Re: Proper Relationship Between Partition and Threads

2015-01-28 Thread Ricardo Ferreira
Thank you very much Christian. That's what I concluded too, I wanted just to double check. Best regards, Ricardo Ferreira On Wed, Jan 28, 2015 at 4:44 PM, Christian Csar wrote: > Ricardo, >The parallelism of each logical consumer (consumer group) is the number > of partitions. So with fou

Re: Proper Relationship Between Partition and Threads

2015-01-28 Thread Christian Csar
Ricardo, The parallelism of each logical consumer (consumer group) is the number of partitions. So with four partitions it could make sense to have one logical consumer (application) have two processes on different machines each with two threads, or one process with four. While with two logical

Proper Relationship Between Partition and Threads

2015-01-28 Thread Ricardo Ferreira
Hi experts, I'm newbie in the Kafka world, so excuse me for such basic question. I'm in the process of designing a client for Kafka, and after few hours of study, I was told that to achieve a proper level of parallelism, it is a best practice having one thread for each partition of an topic. My

Routing modifications at runtime

2015-01-28 Thread Toni Cebrián
Hi, I'm starting to weight different alternatives for data ingestion and I'd like to know whether Kafka meets the problem I have. Say we have a set of devices each with its own MAC and then we receive data in Kafka. There is a dictionary defined elsewhere that says each MAC to which topic

Re: Can't create a topic; can't delete it either

2015-01-28 Thread Sumit Rangwala
On Tue, Jan 27, 2015 at 10:54 PM, Joel Koshy wrote: > Do you still have the controller and state change logs from the time > you originally tried to delete the topic? > > If you can tell me where the find the logs I can check. I haven't restarted my brokers since the issue. Sumit > On Tue, Ja

Consuming Kafka Messages Inside of EC2 Instances

2015-01-28 Thread Su She
Hello All, I have set up a cluster of EC2 instances using this method: http://blogs.aws.amazon.com/bigdata/post/Tx2D0J7QOVRJBRX/Deploying-Cloudera-s-Enterprise-Data-Hub-on-AWS As you can see the instances are w/in a private subnet. I was wondering if anyone has any advice on how I can set up a K

Re: Resilient Producer

2015-01-28 Thread Magnus Edenhill
The big syslog daemons support Kafka since a while back. rsyslog: http://www.rsyslog.com/doc/master/configuration/modules/omkafka.html syslog-ng: https://czanik.blogs.balabit.com/2015/01/syslog-ng-kafka-destination-support/#more-1013 And Bruce might be of interest aswell: https://github.com/tagg

Re: Resilient Producer

2015-01-28 Thread Colin
Logstash -- Colin Clark +1 612 859 6129 Skype colin.p.clark > On Jan 28, 2015, at 10:47 AM, Gwen Shapira wrote: > > It sounds like you are describing Flume, with SpoolingDirectory source > (or exec source running tail) and Kafka channel. > >> On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. wro

Re: Resilient Producer

2015-01-28 Thread Gwen Shapira
It sounds like you are describing Flume, with SpoolingDirectory source (or exec source running tail) and Kafka channel. On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. wrote: > Hi all, > I'm evaluating using Kafka. > > I liked this thing of Facebook scribe that you log to your own machine and >

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Joe Stein
I kind of look at the Storm, Spark, Samza, etc integrations as producers/consumers too. Not sure if that maybe was getting lumped also into other. I think Jason's 90/10 80/20 70/30 would be found to be typical. As far as the Scala API goes, I think we should have a wrapper around the shiny new J

Re: Resilient Producer

2015-01-28 Thread Fernando O.
Something like Heka but lightweight :D On Wed, Jan 28, 2015 at 3:39 PM, Fernando O. wrote: > Hi all, > I'm evaluating using Kafka. > > I liked this thing of Facebook scribe that you log to your own machine and > then there's a separate process that forwards messages to the central > logger.

Resilient Producer

2015-01-28 Thread Fernando O.
Hi all, I'm evaluating using Kafka. I liked this thing of Facebook scribe that you log to your own machine and then there's a separate process that forwards messages to the central logger. With Kafka it seems that I have to embed the publisher in my app, and deal with any communication proble

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Otis Gospodnetic
Good point, Jason. Not sure how we could account for that easily. But maybe that is at least a partial explanation of the Java % being under 50% when Java in general is more popular than that... Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch S

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Jason Rosenberg
I think the results could be a bit skewed, in cases where an organization uses multiple languages, but not equally. In our case, we overwhelmingly use java clients (>90%). But we also have ruby and Go clients too. But in the poll, these come out as equally used client languages. Jason On Wed,

question on the mailing list

2015-01-28 Thread Dillian Murphey
Hi all, Sorry for asking, but is there some easier way to use the mailing list? Maybe a tool which makes reading and replying to messages more like google groups? I like the hadoop searcher, but the UI on that is really bad. tnx

Re: Poor performance running performance test

2015-01-28 Thread Dillian Murphey
You could be right Ewen. I was starting to wonder about the load balancer too. Is using a load balancer a bad idea? How else do users know which kafka broker to connect to? I'm using one of the IPs directly and I don't see that error. I am seeing an occasional connection refused. What the heck. Ma

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread David McNelis
I agree with Stephen, it would be really unfortunate to see the Scala api go away. On Wed, Jan 28, 2015 at 11:57 AM, Stephen Boesch wrote: > The scala API going away would be a minus. As Koert mentioned we could use > the java api but it is less .. well .. functional. > > Kafka is included in t

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Stephen Boesch
The scala API going away would be a minus. As Koert mentioned we could use the java api but it is less .. well .. functional. Kafka is included in the Spark examples and external modules and is popular as a component of ecosystems on Spark (for which scala is the primary language). 2015-01-28 8:

Poll RESULTS: Producer/Consumer languages

2015-01-28 Thread Otis Gospodnetic
Hi, I promised to share the results of this poll, and here they are: http://blog.sematext.com/2015/01/28/kafka-poll-results-producer-consumer/ List of "surprises" is there. I wonder if anyone else is surprised by any aspect of the breakdown, or is the breakdown just as you expected? Otis -- Mo

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Otis Gospodnetic
Hi, I don't have a good excuse here. :( I thought about including Scala, but for some reason didn't do it. I see 12-13% of people chose "Other". Do you think that is because I didn't include Scala? Also, is the Scala API reeally going away? Otis -- Monitoring * Alerting * Anomaly Detection

WARN Error in I/O with NetworkReceive.readFrom(NetworkReceive.java

2015-01-28 Thread Dillian Murphey
Running the performance test. What is the nature of this error?? I'm running a very high end cluster on aws. Tried this even within the same subnet on aws. bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance topic9 5000 100 -1 acks=1 bootstrap.servers=$IP:9092 buffer.mem

Re: How to mark a message as needing to retry in Kafka?

2015-01-28 Thread Guozhang Wang
Hi, Which consumer are you using? If you are using a high level consumer then retry would be automatic upon network exceptions. Guozhang On Wed, Jan 28, 2015 at 1:32 AM, noodles wrote: > Hi group: > > I'm working for building a webhook notification service based on Kafka. I > produce all of th

How to mark a message as needing to retry in Kafka?

2015-01-28 Thread noodles
Hi group: I'm working for building a webhook notification service based on Kafka. I produce all of the payloads into Kafka, and consumers consume these payloads by offset. Sometimes some payloads cannot be consumed because of network exception or http server exception. So I want to mark the faile

Re: Poor performance running performance test

2015-01-28 Thread Ewen Cheslack-Postava
That error indicates the broker closed the connection for some reason. Any useful logs from the broker? It looks like you're using ELB, which could also be the culprit. A connection timeout seems doubtful, but ELB can also close connections for other reasons, like failed health checks. -Ewen On T