I followed linkedin setup example in the docs and located 3g for heap size.
java -Xmx3G -Xms3G -server -XX:+UseCompressedOops -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled
-XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -
After a day of normal run scenario I discover
You can take a look at the hadoop consumer under contrib.
Thanks,
Jun
On Fri, Aug 30, 2013 at 11:18 AM, Mark wrote:
> What is the quickest and easiest way to write message from Kafka into
> HDFS? I've come across Camus but before we go the whole route of writing
> Avro messages we want to tes
Great. Thanks for reply Jun.
Thanks,
Raja.
On Fri, Aug 30, 2013 at 11:50 PM, Jun Rao wrote:
> If you are using the async mode, there is no difference. If you use sync
> mode, the former api gives you a way to batch request in sync node, which
> can lead to better throughput.
>
> Thanks,
>
> Ju
If you are using the async mode, there is no difference. If you use sync
mode, the former api gives you a way to batch request in sync node, which
can lead to better throughput.
Thanks,
Jun
On Fri, Aug 30, 2013 at 10:28 AM, Rajasekar Elango
wrote:
> Kafka producer API supports sending a single
Not sure if this is a bug. I think the issue is that the log4j property is
not set properly so that no log shows up.
Thanks,
Jun
On Fri, Aug 30, 2013 at 9:32 AM, Jay Kreps wrote:
> This seems like more of a bug then a FAQ, no? We are swallowing the
> exception...
>
> -Jay
>
>
> On Thu, Aug 29
In 0.8, the producer no longer depends on ZK. It only takes a list of
brokers. At LinkedIn, we have a 0.8 C producer implementation and plan to
open source it soon.
Thanks,
Jun
On Fri, Aug 30, 2013 at 9:00 AM, Travis Brady wrote:
> I think this points out the need for a single canonical cross-
It seems to me option 1) is easer. Option 2) has the same issue as option
1) since you have to manage different while lists.
A more general solution is probably to change the consumer distribution
model to divide partitions across topics. That way, one can create as many
streams as total # partiti
Yeah if nobody else does it first linkedin will definitely do kerberos/ssl
+ unix permissions at the topic level soonish. If folks already have a head
start on the auth piece we would love to have that contribution.
On Fri, Aug 30, 2013 at 5:25 AM, Maxime Brugidou
wrote:
> We would love to see k
Please contribute that back!, Would potentially be huge for mirroring
clusters across Amazon Regions, for instance.
On Thu, Aug 29, 2013 at 8:22 PM, Rajasekar Elango wrote:
> We have made changes to kafka code to support certificate based mutual SSL
> authentication. So the clients and broker wi
One other option is to use something like Druid, especially if you care
about doing arbitrary dimensional drilldowns.
http://druid.io
It reads from Kafka and can do simple rollups for you automatically
(meaning you don't need storm if all you are doing with Storm is a simple
"group by" style roll
You could also use something more oriented at timeseries data like
https://github.com/rackerlabs/blueflood/. Then you'd have to write some
output adapters to feed the additional processing of your data elsewhere.
I think the team is working on making an output adapter for Kafka for the
rolled-up m
That's sounds very interesting. Looking forward to it!
On Aug 29, 2013 11:23 PM, "Rajasekar Elango" wrote:
> We have made changes to kafka code to support certificate based mutual SSL
> authentication. So the clients and broker will exchange trusted
> certificates for successful communication. Th
Can you elaborate on your use case a bit?
At what point would your business logic decide that the file is complete
(by time or other decision to cut a file as completed)? And then when do
you batch process from what the stream has pilled up for you ?
Writing to HDFS http://wiki.apache.org/hadoop/
Your producer test uses a thread per core. Your consumer test uses a single
thread. A single thread is likely insufficient to get maximum throughput.
On Aug 30, 2013 8:46 AM, "Rafael Bagmanov" wrote:
> Bejamin, do you mean thread on a client side? I'm not quite getting
> what I'm limited with. Ca
What is the quickest and easiest way to write message from Kafka into HDFS?
I've come across Camus but before we go the whole route of writing Avro
messages we want to test plain old vanilla messages.
Thanks
This seems like more of a bug then a FAQ, no? We are swallowing the
exception...
-Jay
On Thu, Aug 29, 2013 at 11:30 PM, Lu Xuechao wrote:
> Hi Jun,
>
> Thanks for you help. Finally, I found the reason by enabling producer side
> DEBUG info output. The snappy jar is not included in the classpat
Kafka producer API supports sending a single message and as well sending
list of messages with two different methods. I believe irrespective of
either of send method used, producer internally batches batch.num.messages
and sends them in bulk. Is there any advantage in performance of using
send(List
yeah the error should have showed up , will create JIRA
On Fri, Aug 30, 2013 at 12:32 PM, Jay Kreps wrote:
> This seems like more of a bug then a FAQ, no? We are swallowing the
> exception...
>
> -Jay
>
>
> On Thu, Aug 29, 2013 at 11:30 PM, Lu Xuechao wrote:
>
> > Hi Jun,
> >
> > Thanks for you
It is hard to say where the bottleneck is just from your description. Would
it be possible for you to rerun the consumer test using hprof on the
consumer so we can understand whether the fetcher is waiting on the fetches
(i.e. the broker is the bottleneck) or on the enque (i.e. the consumer is
the
I think it is still good to have this one in the FAQ even issue sometimes
folks need to know where to work around and how things until there fixed
here is the JIRA for the defect
https://issues.apache.org/jira/browse/KAFKA-1037 <== great place I think
for someone looking to jump in and start cont
Bejamin, do you mean thread on a client side? I'm not quite getting
what I'm limited with. Can you please explain little bit more?
A single threaded producer is still capable of doing 50 MB/s on hi1.4xlarge.
Which is quite slower than 377 MB/s from single job of FIO. But still
5 times faster than
I think this points out the need for a single canonical cross-platform C
client lib with support for Zookeeper that could easily wrapped for use in
other languages.
It would make it much easier for people using Python, Ruby, Node, Lua,
Haskell, Go, OCaml, etc to have such a library that matches th
Yeah. The actual bottleneck is actually number of topics that match the
topic filter. Num of streams is going be shared between all topics it's
consuming from. I thought about following ideas to work around this. (I am
basically referring to mirrormaker consumer in examples).
Option 1). Instead of
I'm tried two different deployments:
1) Clients and broker on the same host (all the results I've shown are
for this configuration)
2) Client and broker on different hosts with 1 Gbits/s network channel
bandwidth between them (verified with iperf)
The results are practically the same.
Except that
You are maxing out the single consumer thread.
On Aug 30, 2013 1:35 AM, "Rafael Bagmanov" wrote:
> Hi,
>
> I am trying to understand how fast is kafka 0.7 compared to what I can get
> from hard drive. In essence I have 3 questions.
>
> In all tests below, I'm using single broker with single one-p
Are the clients on the same host as the broker? Could network be the
bottleneck?
Thanks,
Jun
On Fri, Aug 30, 2013 at 1:34 AM, Rafael Bagmanov wrote:
> Hi,
>
> I am trying to understand how fast is kafka 0.7 compared to what I can get
> from hard drive. In essence I have 3 questions.
>
> In all
Right, but if you set #partitions in each topic to 16, you can use a total
of 16 streams.
Thanks,
Jun
On Thu, Aug 29, 2013 at 9:08 PM, Rajasekar Elango wrote:
> With option 1) I can't really use 8 streams in each consumer, If I do only
> one consumer seem to be doing all work. So I had to actu
Xuechao,
Thanks for updating the wiki. Not setting up log4j properly seems to be a
general problem, not just limited to this particular problem. Could you
reword it a bit to make it more general?
Jun
On Fri, Aug 30, 2013 at 2:02 AM, Lu Xuechao wrote:
> Hi, Joe. wiki updated. Hope it helps.
>
We would love to see kerberos authentication + some unix-like permission
system for topics (where one topic is a file and users/groups have read
and/or write access).
I guess this is not high-priority but it enables some sort of
kafka-as-a-service possibility with multi tenancy. You could integrat
Hi, Joe. wiki updated. Hope it helps.
On Fri, Aug 30, 2013 at 3:22 PM, Joe Stein wrote:
> I feel like this is maybe a usual case as we have heard it before now a few
> bits
>
> Lu Xuechao would you mind updating the FAQ
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ with what the
> pro
Hi,
I am trying to understand how fast is kafka 0.7 compared to what I can get
from hard drive. In essence I have 3 questions.
In all tests below, I'm using single broker with single one-partitioned
topic. Kafka perf tests have been run in 2 deployment configs:
- broker, perf-test on same host
-
I feel like this is maybe a usual case as we have heard it before now a few
bits
Lu Xuechao would you mind updating the FAQ
https://cwiki.apache.org/confluence/display/KAFKA/FAQ with what the problem
was and your solution just to capture this thread in the wiki please, thanx!
/***
32 matches
Mail list logo