RE: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker
Next time the protocol is evolved and new error codes can be introduced, would it make sense to add a new one called "Deprecated" (or "Deprecation" or "DeprecatedOperation" or whatever sounds best)? This would act as a more precise form of "Unknown" error. It could help identify what the problem is more easily when debugging clients. Of course, this is the kind of lever one would prefer never pulling, but when you need it, you're better off having it than not, and if you end up having it and never using it, it does not do much harm either. -- Felix GV Data Infrastructure Engineer Distributed Data Systems LinkedIn f...@linkedin.com<mailto:f...@linkedin.com> linkedin.com/in/felixgv<http://linkedin.com/in/felixgv> From: kafka-clie...@googlegroups.com [kafka-clie...@googlegroups.com] on behalf of Magnus Edenhill [mag...@edenhill.se] Sent: Thursday, January 15, 2015 10:40 AM To: dev@kafka.apache.org Cc: kafka-clie...@googlegroups.com Subject: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker I very much agree on what Joe is saying, let's use the version field as intended and be very strict with not removing nor altering existing behaviour without bumping the version. Old API versions could be deprecated (documentation only?) immediately and removed completely in the next minor version bump (0.8->0.9). An API to query supported API versions would be a good addition in the long run but doesn't help current clients much as such a request to an older broker version will kill the connection without any error reporting to the client, thus making it rather useless in the short term. Regards, Magnus -- You received this message because you are subscribed to the Google Groups "kafka-clients" group. To unsubscribe from this group and stop receiving emails from it, send an email to kafka-clients+unsubscr...@googlegroups.com<mailto:kafka-clients+unsubscr...@googlegroups.com>. To post to this group, send email to kafka-clie...@googlegroups.com<mailto:kafka-clie...@googlegroups.com>. Visit this group at http://groups.google.com/group/kafka-clients. To view this discussion on the web visit https://groups.google.com/d/msgid/kafka-clients/CAHCQUcBtJ1nXi5_dEaHyR2QcRycQHh03rUCY%2BRo2Ussg9kM6UQ%40mail.gmail.com<https://groups.google.com/d/msgid/kafka-clients/CAHCQUcBtJ1nXi5_dEaHyR2QcRycQHh03rUCY%2BRo2Ussg9kM6UQ%40mail.gmail.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout.
Merge improvements back into Kafka Metrics?
Hi, We've been using the new Kafka Metrics within Voldemort for a little while now, and we have made some improvements to the library that you might like to copy back into Kafka proper. You can view the changes that went in after we forked here: https://github.com/tehuti-io/tehuti/commits/master The most critical ones are probably these two: - A pretty simpe yet nasty bug in Percentile that pretty much made Histograms useless otherwise: https://github.com/tehuti-io/tehuti/commit/913dcc0dcc79e2ce87a4c3e52a1affe2aaae9948 - A few improvements to SampledStat (unfortunately littered across several commits) were made to prevent spurious values from being measured out of a disproportionately small time window (either initally, or because all windows expired in the case of an infrequently used stat) : https://github.com/tehuti-io/tehuti/blob/master/src/main/java/io/tehuti/metrics/stats/SampledStat.java There were other minor changes here and there, to make the APIs more usable (IMHO) though that may be a matter of personal taste more than correctness. If you're interested in the above changes, I could put together a patch and file a JIRA. Or someone else can do it if they prefer. On an unrelated note, if you do merge the changes back into Kafka, it would be nice if you considered releasing kafka-metrics as a standalone artifact. Voldemort could depend on kafka-metrics rather than tehuti if it was fixed properly, but it would be a stretch for Voldemort to depend on all of Kafka (or even Kafka clients...). The fork was just to iterate quicker at the time we needed this, but it would be nice to bring it back together. Let me know if I can help in any way. -- *Felix GV* Senior Software Engineer Data Infrastructure LinkedIn f...@linkedin.com linkedin.com/in/felixgv
Re: Kafka/Hadoop consumers and producers
IMHO, I think Camus should probably be decoupled from Avro before the simpler contribs are deleted. We don't actually use the contribs, so I'm not saying this for our sake, but it seems like the right thing to do to provide simple examples for this type of stuff, no...? -- Felix On Wed, Jul 3, 2013 at 4:56 AM, Cosmin Lehene wrote: > If the Hadoop consumer/producers use-case will remain relevant for Kafka > (I assume it will), it would make sense to have the core components (kafka > input/output format at least) as part of Kafka so that it could be built, > tested and versioned together to maintain compatibility. > This would also make it easier to build custom MR jobs on top of Kafka, > rather than having to decouple stuff from Camus. > Also it would also be less confusing for users at least when starting > using Kafka. > > Camus could use those instead of providing it's own. > > This being said we did some work on the consumer side (0.8 and the new(er) > MR API). > We could probably try to rewrite them to use Camus or fix Camus or > whatever, but please consider this alternative as well. > > Thanks, > Cosmin > > > > On 7/3/13 11:06 AM, "Sam Meder" wrote: > > >I think it makes sense to kill the hadoop consumer/producer code in > >Kafka, given, as you said, Camus and the simplicity of the Hadoop > >producer. > > > >/Sam > > > >On Jul 2, 2013, at 5:01 PM, Jay Kreps wrote: > > > >> We currently have a contrib package for consuming and producing messages > >> from mapreduce ( > >> > >> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tree;f=contrib;h=e5 > >>3e1fb34893e733b10ff27e79e6a1dcbb8d7ab0;hb=HEAD > >> ). > >> > >> We keep running into problems (e.g. KAFKA-946) that are basically due to > >> the fact that the Kafka committers don't seem to mostly be Hadoop > >> developers and aren't doing a good job of maintaining this code > >>(keeping it > >> tested, improving it, documenting it, writing tutorials, getting it > >>moved > >> over to the more modern apis, getting it working with newer Hadoop > >> versions, etc). > >> > >> A couple of options: > >> 1. We could try to get someone in the Kafka community (either a current > >> committer or not) who would adopt this as their baby (it's not much > >>code). > >> 2. We could just let Camus take over this functionality. They already > >>have > >> a more sophisticated consumer and the producer is pretty minimal. > >> > >> So are there any people who would like to adopt the current Hadoop > >>contrib > >> code? > >> > >> Conversely would it be possible to provide the same or similar > >> functionality in Camus and just delete these? > >> > >> -Jay > > > > -- > You received this message because you are subscribed to the Google Groups > "Camus - Kafka ETL for Hadoop" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to camus_etl+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > >
Re: Kafka/Hadoop consumers and producers
The contrib code is simple and probably wouldn't require too much work to fix, but it's a lot less robust than Camus, so you would ideally need to do some work to make it solid against all edge cases, failure scenarios and performance bottlenecks... I would definitely recommend investing in Camus instead, since it already covers a lot of the challenges I'm mentioning above, and also has more community support behind it at the moment (as far as I can tell, anyway), so it is more likely to keep getting improvements than the contrib code. -- Felix On Thu, Aug 8, 2013 at 9:28 AM, wrote: > We also have a need today to ETL from Kafka into Hadoop and we do not > currently nor have any plans to use Avro. > > So is the official direction based on this discussion to ditch the Kafka > contrib code and direct people to use Camus without Avro as Ken described > or are both solutions going to survive? > > I can put time into the contrib code and/or work on documenting the > tutorial on how to make Camus work without Avro. > > Which is the preferred route, for the long term? > > Thanks, > Andrew > > On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote: > > Hi Andrew, > > > > > > > > Camus can be made to work without avro. You will need to implement a > message decoder and and a data writer. We need to add a better tutorial > on how to do this, but it isn't that difficult. If you decide to go down > this path, you can always ask questions on this list. I try to make sure > each email gets answered. But it can take me a day or two. > > > > > > > > -Ken > > > > > > > > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote: > > > > > > > > > Hi all, > > > > > > > > > > Over at the Wikimedia Foundation, we're trying to figure out the best > way to do our ETL from Kafka into Hadoop. We don't currently use Avro and > I'm not sure if we are going to. I came across this post. > > > > > > > > > > If the plan is to remove the hadoop-consumer from Kafka contrib, do > you think we should not consider it as one of our viable options? > > > > > > > > > > Thanks! > > > > > -Andrew > > > > > > > > > > -- > > > > > You received this message because you are subscribed to the Google > Groups "Camus - Kafka ETL for Hadoop" group. > > > > > To unsubscribe from this group and stop receiving emails from it, send > an email to camus_etl+unsubscr...@googlegroups.com. > > > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > > > >
Re: kafka replication blog
Thanks Jun! I hadn't been following the discussions regarding 0.8 and replication for a little while and this was a great post to refresh my memory and get up to speed on the current replication architecture's design. -- Felix On Tue, Feb 5, 2013 at 2:21 PM, Jun Rao wrote: > I just posted the following blog on Kafka replication. This may answer some > of the questions that a few people have asked in the mailing list before. > > > http://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka > > Thanks, > > Jun >
[jira] [Commented] (KAFKA-2750) Sender.java: handleProduceResponse does not check protocol version
[ https://issues.apache.org/jira/browse/KAFKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139784#comment-15139784 ] Felix GV commented on KAFKA-2750: - Shouldn't the producer automatically do a graceful degradation of its protocol, without even bubbling anything up to the user? > Sender.java: handleProduceResponse does not check protocol version > -- > > Key: KAFKA-2750 > URL: https://issues.apache.org/jira/browse/KAFKA-2750 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.0 >Reporter: Geoff Anderson > > If you try run an 0.9 producer against 0.8.2.2 kafka broker, you get a fairly > cryptic error message: > [2015-11-04 18:55:43,583] ERROR Uncaught error in kafka producer I/O thread: > (org.apache.kafka.clients.producer.internals.Sender) > org.apache.kafka.common.protocol.types.SchemaException: Error reading field > 'throttle_time_ms': java.nio.BufferUnderflowException > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:462) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:279) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:141) > Although we shouldn't expect an 0.9 producer to work against an 0.8.X broker > since the protocol version has been increased, perhaps the error could be > clearer. > The cause seems to be that in Sender.java, handleProduceResponse does not to > have any mechanism for checking the protocol version of the received produce > response - it just calls a constructor which blindly tries to grab the > throttle time field which in this case fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-260) Add audit trail to kafka
[ https://issues.apache.org/jira/browse/KAFKA-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556958#comment-13556958 ] Felix GV commented on KAFKA-260: It would be possible to have optional timestamps by using the magic byte at the beginning of the Kafka Messages, no? If the Message contains the old (current) magic byte, then there's no timestamp, if it's the new magic byte, then there is a timestamp (without needing a key) somewhere in the header... > Add audit trail to kafka > > > Key: KAFKA-260 > URL: https://issues.apache.org/jira/browse/KAFKA-260 > Project: Kafka > Issue Type: New Feature >Affects Versions: 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Attachments: kafka-audit-trail-draft.patch, Picture 18.png > > > LinkedIn has a system that does monitoring on top of our data flow to ensure > all data is delivered to all consumers of data. This works by having each > logical "tier" through which data passes produce messages to a central > "audit-trail" topic; these messages give a time period and the number of > messages that passed through that tier in that time period. Example of tiers > for data might be "producer", "broker", "hadoop-etl", etc. This makes it > possible to compare the total events for a given time period to ensure that > all events that are produced are consumed by all consumers. > This turns out to be extremely useful. We also have an application that > "balances the books" and checks that all data is consumed in a timely > fashion. This gives graphs for each topic and shows any data loss and the lag > at which the data is consumed (if any). > This would be an optional feature that would allow you to to this kind of > reconciliation automatically for all the topics kafka hosts against all the > tiers of applications that interact with the data. > Some details, the proposed format of the data is JSON using the following > format for messages: > { > "time":1301727060032, // the timestamp at which this audit message is sent > "topic": "my_topic_name", // the topic this audit data is for > "tier":"producer", // a user-defined "tier" name > "bucket_start": 130172640, // the beginning of the time bucket this > data applies to > "bucket_end": 130172700, // the end of the time bucket this data > applies to > "host":"my_host_name.datacenter.linkedin.com", // the server that this was > sent from > "datacenter":"hlx32", // the datacenter this occurred in > "application":"newsfeed_service", // a user-defined application name > "guid":"51656274-a86a-4dff-b824-8e8e20a6348f", // a unique identifier for > this message > "count":43634 > } > DISCUSSION > Time is complex: > 1. The audit data must be based on a timestamp in the events not the time on > machine processing the event. Using this timestamp means that all downstream > consumers will report audit data on the right time bucket. This means that > there must be a timestamp in the event, which we don't currently require. > Arguably we should just add a timestamp to the events, but I think it is > sufficient for now just to allow the user to provide a function to extract > the time from their events. > 2. For counts to reconcile exactly we can only do analysis at a granularity > based on the least common multiple of the bucket size used by all tiers. The > simplest is just to configure them all to use the same bucket size. We > currently use a bucket size of 10 mins, but anything from 1-60 mins is > probably reasonable. > For analysis purposes one tier is designated as the source tier and we do > reconciliation against this count (e.g. if another tier has less, that is > treated as lost, if another tier has more that is duplication). > Note that this system makes false positives possible since you can lose an > audit message. It also makes false negatives possible since if you lose both > normal messages and the associated audit messages it will appear that > everything adds up. The later problem is astronomically unlikely to happen > exactly, though. > This would integrate into the client (producer and consumer both) in the > following way: > 1. The user provides a way to get timestamps from messages (required) > 2. The user configures the tier name, host name, datacenter name, and > applicatio