Re: IRC logs now available on botbot.me

2014-09-10 Thread Jay Kreps
That's awesome. -Jay On Wed, Sep 10, 2014 at 11:05 AM, David Arthur wrote: > https://botbot.me/freenode/apache-kafka/ > > Just FYI, wasn't sure if we had any logging in place > > Cheers, > David > >

Re: [Java New Producer] CPU Usage Spike to 100% when network connection is lost

2014-09-17 Thread Jay Kreps
Also do you know what version you are running we did fix several bugs similar to this against trunk. -Jay On Wed, Sep 17, 2014 at 2:14 PM, Bhavesh Mistry wrote: > Hi Kafka Dev team, > > I see my CPU spike to 100% when network connection is lost for while. It > seems network IO thread are very

Re: [Java New Producer] CPU Usage Spike to 100% when network connection is lost

2014-09-18 Thread Jay Kreps
ewed by Neha Narkhede and Jun Rao > > > Thanks, > > Bhavesh > > On Wed, Sep 17, 2014 at 11:22 PM, Jay Kreps wrote: > >> Also do you know what version you are running we did fix several bugs >> similar to this against trunk. >> >> -Jay >> >> On We

Re: Different partitioning between new producer and old producer

2014-09-18 Thread Jay Kreps
Hey Jae, The rationale for switching was to use a hash code that is cross language and not dependent on the particular object. There are all kinds of gotchas with Java's hashCode() as a partition assignment strategy (e.g. two byte arrays with the same bytes will have different hash codes). -Jay

Re: kafka producer performance test

2014-10-01 Thread Jay Kreps
Hi Sa, That script was developed with the new producer that is included on trunk. Checkout trunk and build and it should be there. -Jay On Wed, Oct 1, 2014 at 7:55 PM, Sa Li wrote: > Hi, All > > I built a 3-node kafka cluster, I want to make performance test, I found > someone post following t

Re: migrating log data to new locations

2014-10-07 Thread Jay Kreps
I think the more automated/lazy way right now would be to shutdown one broker, rm -rf all its data, add the data directories in config, and restart to let the broker restore off the replicas. This may actually be okay though it is a little slower. -Jay On Tue, Oct 7, 2014 at 3:25 PM, Jun Rao wro

Re: Connections from kafka consumer

2014-10-08 Thread Jay Kreps
Yes, that is all correct--the consumer will use zookeeper for discovery and then make direct connections to the appropriate brokers on 9092 or whatever port you have configured. -Jay On Wed, Oct 8, 2014 at 3:32 PM, ravi singh wrote: > I have few questions regarding Kafka Consumer. > > In kafka

Re: Sending Same Message to Two Topics on Same Broker Cluster

2014-10-21 Thread Jay Kreps
Hey Bhavesh, This would only work if both topics happened to be on the same machine, which generally they wouldn't. -Jay On Tue, Oct 21, 2014 at 9:14 AM, Bhavesh Mistry wrote: > Hi Neha, > > All, I am saying is that if same byte[] or data has to go to two topics > then, I have to call send tw

Re: Performance issues

2014-10-21 Thread Jay Kreps
What version of Kafka is this? Can you try the same test against trunk? We fixed a couple of latency related bugs which may be the cause. -Jay On Tue, Oct 21, 2014 at 10:50 AM, Mohit Anchlia wrote: > It's consistently close to 100ms which makes me believe that there are some > settings that I m

Re: Performance issues

2014-10-21 Thread Jay Kreps
t version > On Tue, Oct 21, 2014 at 10:57 AM, Jay Kreps wrote: > > > What version of Kafka is this? Can you try the same test against trunk? > We > > fixed a couple of latency related bugs which may be the cause. > > > > -Jay > > > > On Tue, Oct 21,

Re: Kafka sending messages with zero copy

2014-10-23 Thread Jay Kreps
It sounds like you are primarily interested in optimizing the producer? There is no way to produce data without any allocation being done and I think getting to that would be pretty hard and lead to bad apis, but avoiding memory allocation entirely shouldn't be necessary. Small transient objects i

Re: Kafka sending messages with zero copy

2014-10-24 Thread Jay Kreps
iii) Write an encoder that just takes the byte array from this > > wrapper > > > > > object and hands it to Kafka. > > > > > > > > > > Similarly on the consumer: > > > > > i) Kafka will make copies of slices (representing user values) of > > th

Re: question about async publishing for 0.8.1

2014-10-27 Thread Jay Kreps
You may also need to set the retries to something high, I think. I think the default is something like 1 or 3 so it will try a few times then give up. -Jay On Mon, Oct 27, 2014 at 6:01 PM, Libo Yu wrote: > This seems to be a bug of 0.8.1 > More info: > queue.enqueue.timeout.ms is explicitly set

Re: Kafka producer error

2014-10-30 Thread Jay Kreps
Yeah the intention of requiring that properties is to pipe through all the configuration that goes to the producer to the partitioner. That way if your partitioner needs to query some external system it can get the configuration for that. -Jay On Thu, Oct 30, 2014 at 5:57 PM, Rajiv Kurian wrote:

Re: Dynamically adding Kafka brokers

2014-11-03 Thread Jay Kreps
I agree it would be really nice to get KAFKA-1070 figured out. FWIW, the reason for having a name or id other than ip was to make it possible to move the identity to another physical server (e.g. scp the data directory) and have it perform the same role on that new piece of hardware. Systems that

Announcing Confluent

2014-11-06 Thread Jay Kreps
Hey all, I’m happy to announce that Jun Rao, Neha Narkhede and I are creating a company around Kafka called Confluent. We are planning on productizing the kind of Kafka-based real-time data platform we built out at LinkedIn. We are doing this because we think this is a really powerful idea and we

Re: No longer supporting Java 6, if? when?

2014-11-06 Thread Jay Kreps
Yeah it is a little bit silly that people are still using Java 6. I guess this is a tradeoff--being more conservative in our java support means more people can use our software, whereas upgrading gives us developers a better experience since we aren't stuck with ancient stuff. Nonetheless I would

Re: Strategies for high-concurrency consumers

2014-11-06 Thread Jay Kreps
Unfortunately the performance of the consumer balancing scales poorly with the number of partitions. This is one of the things the consumer rewrite project is meant to address, however that is not complete yet. A reasonable workaround may be to decouple your application parallelism from the number

Re: High CPU usage of Crc32 on Kafka broker

2014-11-06 Thread Jay Kreps
I suspect it is possible to save and reuse the CRCs though it might be a bit of an invasive change. I suspect the first usage is when we are checking the validity of the messages and the second is from when we rebuild the compressed message set (I'm assuming you guys are using compression because I

Re: spikes in producer requests/sec

2014-11-11 Thread Jay Kreps
There are some fixes in 0.8.2-beta for periodic latency spikes if you are using acks=-1 in the producer. -Jay On Tue, Nov 11, 2014 at 10:50 AM, Wes Chow wrote: > > We're seeing periodic spikes in req/sec rates across our nodes. Our > cluster is 10 nodes, and the topic has a replication factor o

Re: benchmark kafka on 10GbE network

2014-11-18 Thread Jay Kreps
Hey Manu, I'm not aware of a benchmark on 10GbE. I'd love to see that though. Diving into the results may help us find bottlenecks hidden by the slower network. Can you figure out where the bottleneck is in your test? I assume this is a single producer and consumer instance and you are using the

Re: benchmark kafka on 10GbE network

2014-11-18 Thread Jay Kreps
s but I'm not using the new producer. > CPU is typically 50% utilized on client and merely used on broker. Disks > aren't busy either as a lot of data are cached in memory. > Would you please give a link for the producer metrics you are referring to > ? > > Thanks, >

Re: benchmark kafka on 10GbE network

2014-11-20 Thread Jay Kreps
for 1KB > message on 10GbE network. The difference is that a producer is created per > topic partition. > > > On Wed, Nov 19, 2014 at 12:34 PM, Jay Kreps wrote: > > > Yeah this will involve some experimentation. > > > > The metrics are visible with jconsole or anothe

Re: Are logs portable?

2014-11-20 Thread Jay Kreps
Yes, this will work. You will want to configure the new instance with the same node id as the failed instance. -Jay On Thu, Nov 20, 2014 at 10:06 AM, Parag Shah wrote: > Hi all, > > This question is related to node recovery. Say, there is a hardware > failure and the node cannot be brought

Re: new producer api and batched Futures....

2014-11-20 Thread Jay Kreps
Internally it works as you describe, there is only one CountDownLatch per batch sent, each of the futures is just a wrapper around that. It is true that if you accumulate thousands of futures in a list that may be a fair number of objects you are retaining, and there will be some work involved in

Re: benchmark kafka on 10GbE network

2014-11-20 Thread Jay Kreps
.Metadata.timeToNextUpdate > 47 0.02% 99.42% 40 300795 sun.nio.ch.NativeThread.current > 48 0.01% 99.43% 28 300785 sun.nio.ch.EPollArrayWrapper.epollCtl > 49 0.01% 99.44% 25 301055 sun.nio.ch.EPollSelectorImpl.wakeup > 50 0.01% 99.45% 22 300806 java.lang.T

Re: benchmark kafka on 10GbE network

2014-11-25 Thread Jay Kreps
o 0.59* > *io-wait-time-ns-avg 62881* > > It seems to confirm that IO spent much more time waiting than doing real > work. > > Given the above stats, how could I trace down and pinpoint the bottleneck ? > I guess computing crc32s can not be avoided. > > On Fri, Nov 21, 2

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jay Kreps
Hey Joel, you are right, we discussed this, but I think we didn't think about it as deeply as we should have. I think our take was strongly shaped by having a wrapper api at LinkedIn that DOES do the serialization transparently so I think you are thinking of the producer as just an implementation d

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-02 Thread Jay Kreps
ctice > most organizations (big and small) tend to have at least some specific > organization-specific detail that warrants a custom serializer anyway; > and it's going to be easier to override a serializer than an entire > producer API. > > Joel > > On Tue, Dec 02, 2014

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Jay Kreps
I agree that having the new Producer(KeySerializer, ValueSerializer) interface would be useful. People suggested cases where you want to mix and match serialization types. The ByteArraySerializer is a no-op that would give the current behavior so any odd case where you need to mix and match serial

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-04 Thread Jay Kreps
Hey Guozhang, These are good points, let me try to address them. 1. Our goal is to be able to provide a best-of-breed serialization package that works out of the box that does most of the magic. This best-of-breed plugin would allow schemas, schema evolution, compatibility checks, etc. We think i

Re: [DISCUSSION] adding the serializer api back to the new java producer

2014-12-05 Thread Jay Kreps
Hey Sriram, Thanks! I think this is a very helpful summary. Let me try to address your point about passing in the serde at send time. I think the first objection is really to the paired key/value serializer interfaces. This leads to kind of a weird combinatorial thing where you would have an avr

Re: OutOfMemoryException when starting replacement node.

2014-12-10 Thread Jay Kreps
Hey Solon, The 10MB size is per-partition. The rationale for this is that the fetch size per-partition is effectively a max message size. However with so many partitions on one machine this will lead to a very large fetch size. We don't do a great job of scheduling these to stay under a memory bou

Re: Kafka 0.8.2 new producer blocking on metadata

2014-12-18 Thread Jay Kreps
Hey Paul, Here are the constraints: 1. We wanted the storage of messages to be in their compact binary form so we could bound memory usage. This implies partitioning prior to enqueue. And as you note partitioning requires having metadata (even stale metadata) about topics. 2. We wanted to avoid pr

Re: In Flight Requests

2014-12-18 Thread Jay Kreps
Hi David, Each request sent to Kafka gets acknowledged. The protocol allows multiple requests to be sent on a connection without waiting on a connection. The number of requests currently awaiting acknowledgement is the in flight request count. By default once there are five unacknowledged requests

Re: Kafka 0.8.2 new producer blocking on metadata

2014-12-19 Thread Jay Kreps
like that be a baked in option be accepted into > Kafka clients mainline? > > A quick win might be to clarify the documentation so that it is clear that > this API will block in cases XYZ (maybe this is mentioned somewhere and I > missed it). > > Thanks, > Paul > > &g

Re: Kafka 0.8.2 new producer blocking on metadata

2014-12-19 Thread Jay Kreps
are code or send a PR. > > Thanks, > Paul > > On Fri, Dec 19, 2014 at 2:05 PM, Jay Kreps wrote: > > > Hey Paul, > > > > I agree we should document this better. > > > > We allow and encourage using partitions to semantically distribute data. > So &

Re: Trying to figure out kafka latency issues

2014-12-29 Thread Jay Kreps
Hey Rajiv, This sounds like a bug. The more info you can help us get the easier to fix. Things that would help: 1. Can you check if the the request log on the servers shows latency spikes (in which case it is a server problem)? 2. It would be worth also getting the jmx stats on the producer as the

Re: Kafka 0.8.2 new producer blocking on metadata

2014-12-29 Thread Jay Kreps
I don't think a separate queue will be a very simple solution to implement. Could you describe your use case a little bit more. It does seem to me that as long as the metadata fetch happens only once and the blocking has a tight time bound this should be okay in any use case I can imagine. And, of

Re: Trying to figure out kafka latency issues

2014-12-29 Thread Jay Kreps
rian > >> wrote: > >> > >>> Thanks Jay. Will check (1) and (2) and get back to you. The test is not > >>> stand-alone now. It might be a bit of work to extract it to a > stand-alone > >>> executable. It might take me a bit of time to get that g

Re: Trying to figure out kafka latency issues

2014-12-30 Thread Jay Kreps
g at offset 19321510 > > 2014-12-30T02:13:44.501Z TRACE [kafka-request-handler-3] > [kafka.server.KafkaApis ]: [KafkaApi-11] 3 bytes written to > log MY.TOPIC-463 beginning at offset 28777627 and ending at offset 28777629 > > On Mon, Dec 29, 2014 at 5:43 PM,

Re: Trying to figure out kafka latency issues

2014-12-30 Thread Jay Kreps
producer and consumer code to get a self-contained load > test going. I'll do the same end to end lag measurements and see if it's my > environment that is adding this lag somehow. > > Thanks! > > > On Tue, Dec 30, 2014 at 11:58 AM, Jay Kreps wrote: > > >

Re: kafka logs gone after reboot the server

2015-01-02 Thread Jay Kreps
Nice catch Joe--several people have complained about this as a problem and we were a bit mystified as to what kind of bug could lead to all their logs getting deleted and re-replicated when they bounced the server. We assumed "bounced" meant restarted the app, but I think likely what is happening i

Re: Consumer and offset management support in 0.8.2 and 0.9

2015-01-07 Thread Jay Kreps
Hey guys, We need to take the versioning of the protocol seriously. People are definitely using the offset commit functionality in 0.8.1 and I really think we should treat this as a bug and revert the change to version 0. -Jay On Wed, Jan 7, 2015 at 9:24 AM, Jun Rao wrote: > Yes, we did make a

Re: Consumer and offset management support in 0.8.2 and 0.9

2015-01-07 Thread Jay Kreps
new functionality. > This way everyone works and nothing breaks =8^) > > /*** > Joe Stein > Founder, Principal Consultant > Big Data Open Source Security LLC > http://www.stealth.ly > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> &g

Re: Using Kafka for Event Sourcing

2015-01-11 Thread Jay Kreps
Hey Yann, Yes, you can just make the retention infinite which will disable any deletion. What you describe with compaction might work, but wasn't exactly the intention. This type of event logging can work two ways: you can log the "command" or you can log the result of the command. In databases

Re: 0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Jay Kreps
I agree. Also, is this behavior a good one? It seems kind of hacky to give an error code and a result both, no? -Jay On Wed, Jan 14, 2015 at 6:35 PM, Dana Powers wrote: > Thanks -- i see that this was more of a bug in 0.8.1 than a regression in > 0.8.2. But I do think the 0.8.2 bug fix to the

Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-15 Thread Jay Kreps
Yeah, good call on removing it. -Jay On Thu, Jan 15, 2015 at 6:39 AM, scott@heroku wrote: > That opt has been the default for a long while now iirc, maybe it can be > removed? > > Sent from my iPhone > > > On Jan 15, 2015, at 5:21 AM, Jaikiran Pai > wrote: > > > > I just downloaded the Kafka b

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 2 (with the correct links)

2015-01-23 Thread Jay Kreps
I don't think so--see if you buy my explanation. We previously defaulted to the byte array serializer and it was a source of unending frustration and confusion. Since it wasn't a required config people just went along plugging in whatever objects they had, and thinking that changing the parametric

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 2 (with the correct links)

2015-01-24 Thread Jay Kreps
s just makes things worse. -Jay On Sat, Jan 24, 2015 at 2:51 PM, Joe Stein wrote: > Maybe. I think the StringSerialzer could look more like a typical type of > message. Instead of encoding being a property it would be more typically > just written in the bytes. > > On Sat, Jan 24, 201

Re: kafka deleted old logs but not released

2015-01-25 Thread Jay Kreps
Also, what is the configuration for the servers? In particular it would be good to know the retention and/or log compaction settings as those delete files. -Jay On Sun, Jan 25, 2015 at 4:34 AM, Jaikiran Pai wrote: > Hi Yonghui, > > Do you still have this happening? If yes, can you tell us a bit

Re: [DISCUSSION] Boot dependency in the new producer

2015-01-26 Thread Jay Kreps
Hey Guozhang, That line shouldn't cause any connections to Kafka to be established, does it? All that is doing is creating the Cluster pojo using the supplied addresses. The use of InetSocketAddress may cause some dns stuff to happen, though... -Jay On Mon, Jan 26, 2015 at 10:50 AM, Guozhang Wan

Re: does kafka support "COMMIT" of a batch ?

2015-01-26 Thread Jay Kreps
We did a relatively complete prototype but it isn't integrated into the mainline code yet and there isn't a target release date. There is rather a lot of testing and compatability work that would have to be done to fully productionize it. I suspect someone will pick it up in 2015 but I wouldn't blo

Re: [DISCUSSION] Boot dependency in the new producer

2015-01-26 Thread Jay Kreps
On Mon, Jan 26, 2015 at 1:34 PM, Guozhang Wang wrote: > It will set the needUpdate flag to true and hence the background Sender > will try to talk to the bootstrap servers. > > Guozhang > > On Mon, Jan 26, 2015 at 1:12 PM, Jay Kreps wrote: > > > Hey Guozhang, > >

Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Jay Kreps
Yeah Joe is exactly right. Let's not confuse scala apis with the existing Scala clients There are a ton of downsides to those clients. They aren't going away any time in the forceable future, so don't stress, but I think we can kind of "deprecate" them and try to shame people into upgrading. For

Re: is RequestTimedOut (error code: 7) retryable in producer?

2015-01-31 Thread Jay Kreps
This means the write occurred on the leader but the followers couldn't acknowledge in the time bound specified by the user. The write will likely complete but is not guaranteed to (the leader could immediately crash after the response is sent). So if you retry you will potentially (likely) have dup

Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-01 Thread Jay Kreps
You may already know this but the producer doesn't require a complete list of brokers in its config, it just requires the connection info for one active broker which it uses to discover the rest of the brokers. We allow you to specify multiple urls here for failover in cases where you aren't using

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jay Kreps
Actually that fetch call blocks on the server side. That is, if there is no data, the server will wait until data arrives or the timeout occurs to send a response. This is done to help simplify the client development. If that isn't happening it is likely a bug or a configuration change in the timeo

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jay Kreps
Ah, yeah, you're right. That is just wait time not CPU time. We should check that profile it must be something else on the list. -Jay On Mon, Feb 2, 2015 at 9:33 AM, Jun Rao wrote: > Hi, Mathias, > > From the hprof output, it seems that the top CPU consumers are > socketAccept() and epollWait()

Re: Increased CPU usage with 0.8.2-beta

2015-02-02 Thread Jay Kreps
0.09% 77.19%1106 306190 org.xerial.snappy.SnappyNative.rawCompress On Mon, Feb 2, 2015 at 9:39 AM, Jay Kreps wrote: > Ah, yeah, you're right. That is just wait time not CPU time. We should > check that profile it must be something else on the list. > > -Jay > > On Mon,

Re: New Producer - ONLY sync mode?

2015-02-02 Thread Jay Kreps
Yeah as Gwen says there is no sync/async mode anymore. There is a new configuration which does a lot of what async did in terms of allowing batching: batch.size - This is the target amount of data per partition the server will attempt to batch together. linger.ms - This is the time the producer wi

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-02 Thread Jay Kreps
Yay! -Jay On Mon, Feb 2, 2015 at 2:23 PM, Neha Narkhede wrote: > Great! Thanks Jun for helping with the release and everyone involved for > your contributions. > > On Mon, Feb 2, 2015 at 1:32 PM, Joe Stein wrote: > > > Huzzah! > > > > Thanks Jun for preparing the release candidates and getting

Re: Kafka long tail latency issue

2015-02-03 Thread Jay Kreps
If you are on 0.8.1 or higher and are running with replication consider disabling the forced log flush, that will definitely lead to latency spikes as the flush is synchronous. You will still get durability from replication and the background OS flush. On Linux the background I/O flush the OS does

Re: New Producer - ONLY sync mode?

2015-02-03 Thread Jay Kreps
> Thanks for the info. Here's the use case. We have something up > >> stream > >> >> > sending data, say a log shipper called X. It sends it to some > remote > >> >> > component Y. Y is the Kafka Producer and it puts data into Kafka. > >> Bu

Re: high cpu and network traffic when cluster has no topic

2015-02-03 Thread Jay Kreps
Hey Steven, That sounds like a bug. I think we fixed a few producer high cpu issues since the beta, I wonder if you could repeat the same test with the 0.8.2. final release? -Jay On Tue, Feb 3, 2015 at 8:37 PM, Steven Wu wrote: > actually, my local test can reproduce the issue although not imm

Re: New Producer - ONLY sync mode?

2015-02-04 Thread Jay Kreps
ant > Big Data Open Source Security LLC > http://www.stealth.ly > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> > / > > On Tue, Feb 3, 2015 at 10:45 PM, Jay Kreps wrote: > > > Hey guys, > > > &

Re: high cpu and network traffic when cluster has no topic

2015-02-04 Thread Jay Kreps
t; Guozhang > > > > On Tue, Feb 3, 2015 at 8:52 PM, Steven Wu wrote: > > > > > sure. will try my unit test again with 0.8.2.0 release tomorrow and > > report > > > back my findings. > > > > > > On Tue, Feb 3, 2015 at 8:42 PM, Jay K

Re: high cpu and network traffic when cluster has no topic

2015-02-08 Thread Jay Kreps
n KAFKA-1642 > >> > > <https://issues.apache.org/jira/browse/KAFKA-1642>. > >> > > > >> > > As Jay said, a bunch of such issues are fixed in the new release. > >> Please > >> > > let us know if you still see the issue wit

Re: New Producer - ONLY sync mode?

2015-02-08 Thread Jay Kreps
sort of like > what Kinesis does as Pradeep mentioned. > -Steve > > On Wed, Feb 4, 2015 at 11:19 AM, Jay Kreps wrote: > > > Yeah totally. Using a callback is, of course, the Right Thing for this > kind > > of stuff. But I have found that kind of asynchronous thinking

Re: New Producer - ONLY sync mode?

2015-02-08 Thread Jay Kreps
er. > >> > > >> > If producer is ONLY async, Y can't easily do that. Or maybe Y would > just > >> > need to wait for the Future to come back and only then send the > response > >> > back to X? If so, I'm guessing the delay would be m

Re: New Producer - ONLY sync mode?

2015-02-08 Thread Jay Kreps
/KAFKA-1865 -Jay On Sat, Feb 7, 2015 at 1:24 PM, Jay Kreps wrote: > Hey Otis, > > Yeah, Gwen is correct. The future from the send will be satisfied when the > response is received so it will be exactly the same as the performance of > the sync producer previously. > > -

[DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-08 Thread Jay Kreps
Following up on our previous thread on making batch send a little easier, here is a concrete proposal to add a flush() method to the producer: https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API A proposed implementation is here: https://issues.apache.

Re: [DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-08 Thread Jay Kreps
; > On Sun, Feb 8, 2015 at 10:25 AM, Jay Kreps wrote: > > > Following up on our previous thread on making batch send a little easier, > > here is a concrete proposal to add a flush() method to the producer: > > > > > > > https://cwiki.apache.org/conflue

Re: could new java producer miss callbacks after successful send?

2015-02-09 Thread Jay Kreps
Hmm, that does sound like a bug, we haven't seen that. How easy is it to reproduce this? -Jay On Mon, Feb 9, 2015 at 5:19 PM, Steven Wu wrote: > We observed some small discrepancy in messages sent per second reported at > different points. 1) and 4) matches very close. 2) and 3) matches very >

Re: Lack of JMX LogCleaner and LogCleanerManager metrics

2015-02-10 Thread Jay Kreps
Probably you need to enable the log cleaner for those to show up? We disable it by default and so I think those metrics never get created. -Jay On Tue, Feb 10, 2015 at 3:33 AM, o...@sematext.com wrote: > Hello, > > I have a problem with some JMX metrics. In Kafka source code I see > LogCleaner

Re: Lack of JMX LogCleaner and LogCleanerManager metrics

2015-02-10 Thread Jay Kreps
I've seen. > > Gwen > > On Tue, Feb 10, 2015 at 8:07 AM, Jay Kreps wrote: > > > Probably you need to enable the log cleaner for those to show up? We > > disable it by default and so I think those metrics never get created. > > > > -Jay > > > >

Re: could new java producer miss callbacks after successful send?

2015-02-10 Thread Jay Kreps
s. > > this could be some metric issues. > > On Mon, Feb 9, 2015 at 8:23 PM, Steven Wu wrote: > > > I don't have strong evidence that this is a bug yet. let me write some > > test program and see if I can confirm/reproduce the issue. > > > > On Mon, Feb 9

Re: Poll RESULTS: Producer/Consumer languages

2015-02-11 Thread Jay Kreps
Hey Justin, I don't think LinkedIn is, but Confluent has made a pretty complete producer and consumer REST proxy that we will be releasing quite soon. -Jay On Wed, Feb 11, 2015 at 2:40 PM, Justin Maltat wrote: > Hi, > > Is Linkedin planning to release its REST proxy server as an official > rel

Re: Does kafka use sbt (or) gradle

2015-02-12 Thread Jay Kreps
I think the problem is that google was giving people that obsolete wiki page rather than the real quickstart. I deleted the wiki page and linked the quickstart on the main page. -Jay On Thu, Feb 12, 2015 at 8:30 AM, Mark Reddy wrote: > Hi Saravana, > > Since 0.8.1 Kafka uses Gradle, previous to

Re: Increased CPU usage with 0.8.2-beta

2015-02-12 Thread Jay Kreps
This is a serious issue, we'll take a look. -Jay On Thu, Feb 12, 2015 at 3:19 PM, Solon Gordon wrote: > I saw a very similar jump in CPU usage when I tried upgrading from 0.8.1.1 > to 0.8.2.0 today in a test environment. The Kafka cluster there is two > m1.larges handling 2,000 partitions acros

Re: Increased CPU usage with 0.8.2-beta

2015-02-13 Thread Jay Kreps
We can reproduce this issue, have a theory as to the cause, and are working on a fix. Here is the ticket to track it: https://issues.apache.org/jira/browse/KAFKA-1952 I would recommend people hold off on 0.8.2 upgrades until we have a handle on this. -Jay On Fri, Feb 13, 2015 at 1:47 PM, Solon G

Hold off on 0.8.2 upgrades

2015-02-13 Thread Jay Kreps
Hey all, We found an issue in 0.8.2 that can lead to high CPU usage on brokers with lots of partitions. We are working on a fix for this. You can track progress here: https://issues.apache.org/jira/browse/KAFKA-1952 I would recommend holding off on upgrading to 0.8.2 until we have a fix for this

Re: KafkaConsumer Class Usage in Kafka 0.8.2 Beta

2015-02-13 Thread Jay Kreps
As Manikumar mentioned the code in 0.8.2 is not released and that class is just a stub (it doesn't do anything yet). If you would like to try out the new consumer you can try it on trunk. However be aware that it doesn't yet do partition balancing among topics as that is pending server side work. H

Re: CallBackHandler is not being called after successful delivery of message

2015-02-16 Thread Jay Kreps
Sounds like a potential bug, and it sounds like you can easily reproduce it. Can you post your test code and a description of the server version and how you started/configured it, and what you expect to see from your test and what you actually see: https://issues.apache.org/jira/browse/KAFKA/ This

Re: CallBackHandler is not being called after successful delivery of message

2015-02-17 Thread Jay Kreps
handler , will retry logic > in case of rebalance exception or any other exception handle by > kafka-producer-network-thread or that logic has be to implemented in > callback handler. > > > > > > > > On Tue, Feb 17, 2015 at 4:42 AM, Jay Kreps wrote: > > > Sounds

Re: Producer duplicates

2015-02-17 Thread Jay Kreps
There are some details to get this right, the lookup table has to survive failures. But yes this is exactly what we would like to add: https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer -Jay On Tue, Feb 17, 2015 at 12:44 AM, Arunkumar Srambikkal (asrambik) < asram...@cisco.com

Re: Hold off on 0.8.2 upgrades

2015-02-18 Thread Jay Kreps
y new issue comes up. After that, we will do an 0.8.2.1 release. > > Thanks, > > Jun > > On Fri, Feb 13, 2015 at 3:28 PM, Jay Kreps wrote: > > > Hey all, > > > > We found an issue in 0.8.2 that can lead to high CPU usage on brokers > with > > lots of

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread Jay Kreps
If you catch up off a compacted topic and keep consuming then you will become consistent with the log. I think what you are saying is that you want to create a snapshot from the Kafka topic but NOT do continual reads after that point. For example you might be creating a backup of the data to a fil

Re: Hold off on 0.8.2 upgrades

2015-02-18 Thread Jay Kreps
t 50 partitions. Since there is a bit > of overhead running a release, I was hoping to collect some more feedback > from people trying the 0.8.2.0 release who may not be affected by this > issue. But I agree that we don't need to wait for too long. > > Thanks, > > Jun >

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread Jay Kreps
log then begin consuming and read up to that point compaction may > > > have > > > > already kicked in (if the reading takes a while) and hence you might > have > > > > an incomplete snapshot. > > > > > > Isn't it sufficient to just repeat the c

Re: Broker w/ high memory due to index file sizes

2015-02-18 Thread Jay Kreps
40G is really huge, generally you would want more like 4G. Are you sure you need that? Not sure what you mean by lsof and index files being too large, but the index files are memory mapped so they should be able to grow arbitrarily large and their memory usage is not counted in the java heap (in fa

Re: Consuming a snapshot from log compacted topic

2015-02-19 Thread Jay Kreps
To confirm then, the log-end-offset is the same as the cleaner point? > > > > On 19 February 2015 at 03:10, Jay Kreps wrote: > > > Yeah I was thinking either along the lines Joel was suggesting or else > > adding a logEndOffset(TopicPartition) method or something like

Re: New Producer - Is the configurable partitioner gone?

2015-02-21 Thread Jay Kreps
Hey Daniel, partitionsFor() will block the very first time it sees a new topic that it doesn't have metadata for yet. If you want to ensure you don't block even that one time, call it prior to your regular usage so it initializes then. The rationale for adding a partition in ProducerRecord was th

Re: New Producer - Is the configurable partitioner gone?

2015-02-22 Thread Jay Kreps
h the interface we would end up with. - Currently Cluster is not a public class so we'll have to think about whether we want to make that public. -Jay On Sun, Feb 22, 2015 at 4:44 AM, Daniel Wegener < daniel.wege...@holisticon.de> wrote: > > Jay Kreps writes: > > > > &

Re: High CPU usage of Crc32 on Kafka broker

2015-02-22 Thread Jay Kreps
debase to judge if server > side decompression happens before acknowledge. If so, these would be some > additional milliseconds to respond faster if we could spare > de/recompression. > > Those are my thoughts about server side de/recompression. It would be > great if I could get so

Re: New Producer - Is the configurable partitioner gone?

2015-02-22 Thread Jay Kreps
and reduced cpu util at broker side by 60%. We plan to make it our > default partitioner. > > > > On Sun, Feb 22, 2015 at 10:28 AM, Jay Kreps > wrote: > > > Hey Daniel, > > > > Yeah I think that would be doable. If you want to pursue it you would > need

Re: Anyone interested in speaking at Bay Area Kafka meetup @ LinkedIn on March 24?

2015-02-23 Thread Jay Kreps
+1 I think something like "Kafka on AWS at Netflix" would be hugely interesting to a lot of people. -Jay On Mon, Feb 23, 2015 at 3:02 PM, Allen Wang wrote: > We (Steven Wu and Allen Wang) can talk about Kafka use cases and operations > in Netflix. Specifically, we can talk about how we scale a

Tips for working with Kafka and data streams

2015-02-25 Thread Jay Kreps
Hey guys, One thing we tried to do along with the product release was start to put together a practical guide for using Kafka. I wrote this up here: http://blog.confluent.io/2015/02/25/stream-data-platform-1/ I'd like to keep expanding on this as good practices emerge and we learn more stuff. So

Re: Tips for working with Kafka and data streams

2015-02-25 Thread Jay Kreps
might be dealt with by > covering disk encryption and how the conversations between Kafka instances > are protected. > > Christian > > > On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps wrote: > > > Hey guys, > > > > One thing we tried to do along with the prod

Re: Unlimited Log Retention

2015-02-28 Thread Jay Kreps
It is totally reasonable to have unlimited retention. We don't have an explicit setting for this but you can set the time based retention policy to something large log.retention.hours=2147483647 which will retain the log for 245,146 years. :-) -Jay On Fri, Feb 27, 2015 at 4:12 PM, Warren Kiser

  1   2   3   4   5   >