Out of memory exception

2013-08-30 Thread Vadim Keylis
I followed linkedin setup example in the docs and located 3g for heap size. java -Xmx3G -Xms3G -server -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC - After a day of normal run scenario I discover

Re: Kafka -> HDFS

2013-08-30 Thread Jun Rao
You can take a look at the hadoop consumer under contrib. Thanks, Jun On Fri, Aug 30, 2013 at 11:18 AM, Mark wrote: > What is the quickest and easiest way to write message from Kafka into > HDFS? I've come across Camus but before we go the whole route of writing > Avro messages we want to tes

Re: kafka produce API batching

2013-08-30 Thread Rajasekar Elango
Great. Thanks for reply Jun. Thanks, Raja. On Fri, Aug 30, 2013 at 11:50 PM, Jun Rao wrote: > If you are using the async mode, there is no difference. If you use sync > mode, the former api gives you a way to batch request in sync node, which > can lead to better throughput. > > Thanks, > > Ju

Re: kafka produce API batching

2013-08-30 Thread Jun Rao
If you are using the async mode, there is no difference. If you use sync mode, the former api gives you a way to batch request in sync node, which can lead to better throughput. Thanks, Jun On Fri, Aug 30, 2013 at 10:28 AM, Rajasekar Elango wrote: > Kafka producer API supports sending a single

Re: Unable to send and consume compressed events.

2013-08-30 Thread Jun Rao
Not sure if this is a bug. I think the issue is that the log4j property is not set properly so that no log shows up. Thanks, Jun On Fri, Aug 30, 2013 at 9:32 AM, Jay Kreps wrote: > This seems like more of a bug then a FAQ, no? We are swallowing the > exception... > > -Jay > > > On Thu, Aug 29

Re: What are my options? (Ruby/Rails environment)

2013-08-30 Thread Jun Rao
In 0.8, the producer no longer depends on ZK. It only takes a list of brokers. At LinkedIn, we have a 0.8 C producer implementation and plan to open source it soon. Thanks, Jun On Fri, Aug 30, 2013 at 9:00 AM, Travis Brady wrote: > I think this points out the need for a single canonical cross-

Re: Num of streams for consumers using TopicFilter.

2013-08-30 Thread Jun Rao
It seems to me option 1) is easer. Option 2) has the same issue as option 1) since you have to manage different while lists. A more general solution is probably to change the consumer distribution model to divide partitions across topics. That way, one can create as many streams as total # partiti

Re: Securing kafka

2013-08-30 Thread Jay Kreps
Yeah if nobody else does it first linkedin will definitely do kerberos/ssl + unix permissions at the topic level soonish. If folks already have a head start on the auth piece we would love to have that contribution. On Fri, Aug 30, 2013 at 5:25 AM, Maxime Brugidou wrote: > We would love to see k

Re: Securing kafka

2013-08-30 Thread Scott Clasen
Please contribute that back!, Would potentially be huge for mirroring clusters across Amazon Regions, for instance. On Thu, Aug 29, 2013 at 8:22 PM, Rajasekar Elango wrote: > We have made changes to kafka code to support certificate based mutual SSL > authentication. So the clients and broker wi

Re: Application Logic: In Kafka, Storm or Redis?

2013-08-30 Thread Eric Tschetter
One other option is to use something like Druid, especially if you care about doing arbitrary dimensional drilldowns. http://druid.io It reads from Kafka and can do simple rollups for you automatically (meaning you don't need storm if all you are doing with Storm is a simple "group by" style roll

Re: Application Logic: In Kafka, Storm or Redis?

2013-08-30 Thread Dan Di Spaltro
You could also use something more oriented at timeseries data like https://github.com/rackerlabs/blueflood/. Then you'd have to write some output adapters to feed the additional processing of your data elsewhere. I think the team is working on making an output adapter for Kafka for the rolled-up m

Re: Securing kafka

2013-08-30 Thread Calvin Lei
That's sounds very interesting. Looking forward to it! On Aug 29, 2013 11:23 PM, "Rajasekar Elango" wrote: > We have made changes to kafka code to support certificate based mutual SSL > authentication. So the clients and broker will exchange trusted > certificates for successful communication. Th

Re: Kafka -> HDFS

2013-08-30 Thread Joe Stein
Can you elaborate on your use case a bit? At what point would your business logic decide that the file is complete (by time or other decision to cut a file as completed)? And then when do you batch process from what the stream has pilled up for you ? Writing to HDFS http://wiki.apache.org/hadoop/

Re: Kafka 0.7 performance compared to bare metal

2013-08-30 Thread Benjamin Black
Your producer test uses a thread per core. Your consumer test uses a single thread. A single thread is likely insufficient to get maximum throughput. On Aug 30, 2013 8:46 AM, "Rafael Bagmanov" wrote: > Bejamin, do you mean thread on a client side? I'm not quite getting > what I'm limited with. Ca

Kafka -> HDFS

2013-08-30 Thread Mark
What is the quickest and easiest way to write message from Kafka into HDFS? I've come across Camus but before we go the whole route of writing Avro messages we want to test plain old vanilla messages. Thanks

Re: Unable to send and consume compressed events.

2013-08-30 Thread Jay Kreps
This seems like more of a bug then a FAQ, no? We are swallowing the exception... -Jay On Thu, Aug 29, 2013 at 11:30 PM, Lu Xuechao wrote: > Hi Jun, > > Thanks for you help. Finally, I found the reason by enabling producer side > DEBUG info output. The snappy jar is not included in the classpat

kafka produce API batching

2013-08-30 Thread Rajasekar Elango
Kafka producer API supports sending a single message and as well sending list of messages with two different methods. I believe irrespective of either of send method used, producer internally batches batch.num.messages and sends them in bulk. Is there any advantage in performance of using send(List

Re: Unable to send and consume compressed events.

2013-08-30 Thread Joe Stein
yeah the error should have showed up , will create JIRA On Fri, Aug 30, 2013 at 12:32 PM, Jay Kreps wrote: > This seems like more of a bug then a FAQ, no? We are swallowing the > exception... > > -Jay > > > On Thu, Aug 29, 2013 at 11:30 PM, Lu Xuechao wrote: > > > Hi Jun, > > > > Thanks for you

Re: Kafka 0.7 performance compared to bare metal

2013-08-30 Thread Jay Kreps
It is hard to say where the bottleneck is just from your description. Would it be possible for you to rerun the consumer test using hprof on the consumer so we can understand whether the fetcher is waiting on the fetches (i.e. the broker is the bottleneck) or on the enque (i.e. the consumer is the

Re: Unable to send and consume compressed events.

2013-08-30 Thread Joe Stein
I think it is still good to have this one in the FAQ even issue sometimes folks need to know where to work around and how things until there fixed here is the JIRA for the defect https://issues.apache.org/jira/browse/KAFKA-1037 <== great place I think for someone looking to jump in and start cont

Re: Kafka 0.7 performance compared to bare metal

2013-08-30 Thread Rafael Bagmanov
Bejamin, do you mean thread on a client side? I'm not quite getting what I'm limited with. Can you please explain little bit more? A single threaded producer is still capable of doing 50 MB/s on hi1.4xlarge. Which is quite slower than 377 MB/s from single job of FIO. But still 5 times faster than

Re: What are my options? (Ruby/Rails environment)

2013-08-30 Thread Travis Brady
I think this points out the need for a single canonical cross-platform C client lib with support for Zookeeper that could easily wrapped for use in other languages. It would make it much easier for people using Python, Ruby, Node, Lua, Haskell, Go, OCaml, etc to have such a library that matches th

Re: Num of streams for consumers using TopicFilter.

2013-08-30 Thread Rajasekar Elango
Yeah. The actual bottleneck is actually number of topics that match the topic filter. Num of streams is going be shared between all topics it's consuming from. I thought about following ideas to work around this. (I am basically referring to mirrormaker consumer in examples). Option 1). Instead of

Re: Kafka 0.7 performance compared to bare metal

2013-08-30 Thread Rafael Bagmanov
I'm tried two different deployments: 1) Clients and broker on the same host (all the results I've shown are for this configuration) 2) Client and broker on different hosts with 1 Gbits/s network channel bandwidth between them (verified with iperf) The results are practically the same. Except that

Re: Kafka 0.7 performance compared to bare metal

2013-08-30 Thread Benjamin Black
You are maxing out the single consumer thread. On Aug 30, 2013 1:35 AM, "Rafael Bagmanov" wrote: > Hi, > > I am trying to understand how fast is kafka 0.7 compared to what I can get > from hard drive. In essence I have 3 questions. > > In all tests below, I'm using single broker with single one-p

Re: Kafka 0.7 performance compared to bare metal

2013-08-30 Thread Jun Rao
Are the clients on the same host as the broker? Could network be the bottleneck? Thanks, Jun On Fri, Aug 30, 2013 at 1:34 AM, Rafael Bagmanov wrote: > Hi, > > I am trying to understand how fast is kafka 0.7 compared to what I can get > from hard drive. In essence I have 3 questions. > > In all

Re: Num of streams for consumers using TopicFilter.

2013-08-30 Thread Jun Rao
Right, but if you set #partitions in each topic to 16, you can use a total of 16 streams. Thanks, Jun On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango wrote: > With option 1) I can't really use 8 streams in each consumer, If I do only > one consumer seem to be doing all work. So I had to actu

Re: Unable to send and consume compressed events.

2013-08-30 Thread Jun Rao
Xuechao, Thanks for updating the wiki. Not setting up log4j properly seems to be a general problem, not just limited to this particular problem. Could you reword it a bit to make it more general? Jun On Fri, Aug 30, 2013 at 2:02 AM, Lu Xuechao wrote: > Hi, Joe. wiki updated. Hope it helps. >

Re: Securing kafka

2013-08-30 Thread Maxime Brugidou
We would love to see kerberos authentication + some unix-like permission system for topics (where one topic is a file and users/groups have read and/or write access). I guess this is not high-priority but it enables some sort of kafka-as-a-service possibility with multi tenancy. You could integrat

Re: Unable to send and consume compressed events.

2013-08-30 Thread Lu Xuechao
Hi, Joe. wiki updated. Hope it helps. On Fri, Aug 30, 2013 at 3:22 PM, Joe Stein wrote: > I feel like this is maybe a usual case as we have heard it before now a few > bits > > Lu Xuechao would you mind updating the FAQ > https://cwiki.apache.org/confluence/display/KAFKA/FAQ with what the > pro

Kafka 0.7 performance compared to bare metal

2013-08-30 Thread Rafael Bagmanov
Hi, I am trying to understand how fast is kafka 0.7 compared to what I can get from hard drive. In essence I have 3 questions. In all tests below, I'm using single broker with single one-partitioned topic. Kafka perf tests have been run in 2 deployment configs: - broker, perf-test on same host -

Re: Unable to send and consume compressed events.

2013-08-30 Thread Joe Stein
I feel like this is maybe a usual case as we have heard it before now a few bits Lu Xuechao would you mind updating the FAQ https://cwiki.apache.org/confluence/display/KAFKA/FAQ with what the problem was and your solution just to capture this thread in the wiki please, thanx! /***