Python Kafka Client API to support kerberozed secure Kafka cluster?

2017-04-13 Thread Yuxiang Mai
Hi, all We have upgrade our Kafka Cluster to 0.10.0 version mainly for the Kerberos security feature and our Kafka-Client Java API is working fine. But I tried to use Python API like Python-Kafka in our secure Kafka Cluster. Seems there isn't any Python API to support secure Kafka cluster. Can any

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Dmitry Goldenberg
Thank you, Matthias. A great writeup! Very detailed and definitely gives us "food for thought" and such. - Dmitry On Thu, Apr 13, 2017 at 8:05 PM, Matthias J. Sax wrote: > Dmitry. > > let me do one step back, to help you better understand the tradeoffs: > > A message will only be delivered mul

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Matthias J. Sax
Dmitry. let me do one step back, to help you better understand the tradeoffs: A message will only be delivered multiple times in cause of failure -- ie, if a consumer crashed or timed out. For this case, another consumer will take over the partitions assigned to the failing consumer and start con

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Dmitry Goldenberg
Thanks, Matthias. Will read the doc you referenced. The duplicates are on the consumer side. We've been trying to curtail this by increasing the consumer session timeout. Would that potentially help? Basically, we're grappling with the causes of the behavior. Why would messages be ever delivered

Re: Delayed consumer in a Streams topology

2017-04-13 Thread Matthias J. Sax
Hi, reading a topic twice -- what it the first requirement you have -- is not possible (and not necessary IMHO) with Streams API -- regardless of a "delayed" read. The reason is, that Streams uses a single consumer group.id internally and thus, Streams can commit only one offset per topic-partitio

Delayed consumer in a Streams topology

2017-04-13 Thread Marcos Juarez
I'm building a prototype with Kafka Streams that will be consuming from the same topic twice, once with no delay, just like any normal consumer, and once with a 60 minute delay, using the new timestamp-per-message field. It will also store state coming from other topics that are being read simulta

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Matthias J. Sax
Hi, the first question to ask would be, if you get duplicate writes at the producer or duplicate reads at the consumer... For exactly-once: it's work in progress and we aim for 0.11 release (what might still be a beta version). In short, there will be an idempotent producer that will avoid dupli

Re: Kafka-Streams: Cogroup

2017-04-13 Thread Eno Thereska
Hi Kyle, (cc-ing user list as well) This could be an interesting scenario. Two things to help us think through it some more: 1) it seems you attached a figure, but I cannot seem to open it. 2) what about using the low level processor API instead of the DSL as approach 3? Do you have any thought

Re: [VOTE] 0.10.2.1 RC1

2017-04-13 Thread Eno Thereska
+1 (non-binding) Built sources, ran all unit and integration tests, checked new documentation, esp with an eye on the streams library. Thanks Gwen Eno > On 12 Apr 2017, at 17:25, Gwen Shapira wrote: > > Hello Kafka users, developers, client-developers, friends, romans, > citizens, etc, > >

Re: Kafka Streams Application does not start after 10.1 to 10.2 update if topics need to be auto-created

2017-04-13 Thread Eno Thereska
No, internal topics do not need to be manually created. Eno > On 13 Apr 2017, at 10:00, Shimi Kiviti wrote: > > Is that (manual topic creation) also true for internal topics? > > On Thu, 13 Apr 2017 at 19:14 Matthias J. Sax wrote: > >> Hi, >> >> thanks for reporting this issue. We are aware

Failure scenarios for a java kafka producer reading from stdin

2017-04-13 Thread Milind Vaidya
Hi Background : I have following set up Apache server >> Apache Kafka Producer >> Apache Kafka Cluster >> Apache Storm As a normal scenario, front end boxes run the apache server and populate the log files. The requirement is to read every log and send it to kafka cluster. The java producer r

Re: Streams error handling

2017-04-13 Thread Sachin Mittal
We are also catching the exception in serde and returning null and then filtering out null values downstream so as they are not included. Thanks Sachin On Thu, Apr 13, 2017 at 9:13 PM, Mike Gould wrote: > Great to know I've not gone off in the wrong direction > Thanks > > On Thu, 13 Apr 2017 a

Re: Kafka Streams Application does not start after 10.1 to 10.2 update if topics need to be auto-created

2017-04-13 Thread Shimi Kiviti
Is that (manual topic creation) also true for internal topics? On Thu, 13 Apr 2017 at 19:14 Matthias J. Sax wrote: > Hi, > > thanks for reporting this issue. We are aware of a bug in 0.10.2 that > seems to be related: https://issues.apache.org/jira/browse/KAFKA-5037 > > However, I also want to p

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Dmitry Goldenberg
Thanks, Jayesh and Vincent. It seems rather extreme that one has to implement a cache of already seen messages using Redis, memcached or some such. I would expect Kafka to "do the right thing". The data loss is a worse problem, especially for mission critical applications. So what is the curren

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Timur Fayruzov
Very enlightening presentation, thanks for sharing! On Thu, Apr 13, 2017 at 9:07 AM, Thakrar, Jayesh < jthak...@conversantmedia.com> wrote: > Hi Dmitri, > > This presentation might help you understand and take appropriate actions > to deal with data duplication (and data loss) > > https://www.sli

Re: Kafka Streams Application does not start after 10.1 to 10.2 update if topics need to be auto-created

2017-04-13 Thread Matthias J. Sax
Hi, thanks for reporting this issue. We are aware of a bug in 0.10.2 that seems to be related: https://issues.apache.org/jira/browse/KAFKA-5037 However, I also want to point out, that it is highly recommended to not use auto topic create for Streams, but to manually create all input/output topics

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Thakrar, Jayesh
Hi Dmitri, This presentation might help you understand and take appropriate actions to deal with data duplication (and data loss) https://www.slideshare.net/JayeshThakrar/kafka-68540012 Regards, Jayesh On 4/13/17, 10:05 AM, "Vincent Dautremont" wrote: One of the case where you would get

Re: Streams error handling

2017-04-13 Thread Eno Thereska
Hi Mike, Thank you. Could you open a JIRA to capture this specific problem (a copy-paste would suffice)? Alternatively we can open it, up to you. Thanks Eno > On 13 Apr 2017, at 08:43, Mike Gould wrote: > > Great to know I've not gone off in the wrong direction > Thanks > > On Thu, 13 Apr 20

Re: Kafka Support as Service

2017-04-13 Thread Roger Hoover
Hi Diego, Confluent offers support for Apache Kafka. https://www.confluent.io/ Cheers, Roger On Wed, Apr 12, 2017 at 11:14 AM, Diego Paes Ramalho Pereira < diego.pere...@b3.com.br> wrote: > Hello, > > > > I work for a Stock Exchange in Brazil and We are looking for a company > that can provid

Re: Kafka best practice on bare metal hardware

2017-04-13 Thread Marcos Juarez
That's correct, if you're mostly dealing with "latest" message consumption, faster disks will be mostly worthless. You will get some benefit if you have to rebalance partitions, since the cluster needs to shuffle a lot of data around for that, but during normal operations, there will be no benefit

Re: Streams error handling

2017-04-13 Thread Mike Gould
Great to know I've not gone off in the wrong direction Thanks On Thu, 13 Apr 2017 at 16:34, Matthias J. Sax wrote: > Mike, > > thanks for your feedback. You are absolutely right that Streams API does > not have great support for this atm. And it's very valuable that you > report this (you are no

Re: Streams error handling

2017-04-13 Thread Matthias J. Sax
Mike, thanks for your feedback. You are absolutely right that Streams API does not have great support for this atm. And it's very valuable that you report this (you are not the first person). It helps us prioritizing :) For now, there is no better solution as the one you described in your email,

Streams error handling

2017-04-13 Thread Mike Gould
Hi Are there any better error handling options for Kafka streams in java. Any errors in the serdes will break the stream. The suggested implementation is to use the byte[] serde and do the deserialisation in a map operation. However this isn't ideal either as there's no great way to handle excep

Streams error handling

2017-04-13 Thread Mike Gould
Hi Are there any better error handling options for Kafka streams in java. Any errors in the serdes will break the stream. The suggested implementation is to use the byte[] serde and do the deserialisation in a map operation. However this isn't ideal either as there's no great way to handle excep

Re: Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Vincent Dautremont
One of the case where you would get a message more than once is if you get disconnected / kicked off the consumer group / etc if you fail to commit offset for messages you have already read. What I do is that I insert the message in a in-memory cache redis database. If it fails to insert because o

Causes for Kafka messages being unexpectedly delivered more than once? The 'exactly once' semantic

2017-04-13 Thread Dmitry Goldenberg
Hi all, I was wondering if someone could list some of the causes which may lead to Kafka delivering the same messages more than once. We've looked around and we see no errors to notice, yet intermittently, we see messages being delivered more than once. Kafka documentation talks about the below

Re: Kafka best practice on bare metal hardware

2017-04-13 Thread Ali Nazemian
Thank you very much, Marcos. My application is real-time processing so I would say most of the times I am dealing with the "latest" message that emphasize page caching. In this case, does it mean there is no additional throughput provided by using 10k or 15k disks? What about having virtualized cl