Hi everyone,
I've done some raw Disk, Kafka and Samza benchmarking. I peaked out a
single Samza container's consumer at around 2MB/s. Running a Kafka Consumer
Perf test though on the same machine I can do 100's of MB/s. It seems like
most of the bottleneck exists in the Kafka async client. There ap
s.apache.org/jira/browse/SAMZA-6
> >
> > Here's what I'd recommend:
> >
> > 0. Write something reproducible and post it on SAMZA-6. For bonus points,
> > write an equivalent raw-Kafka-producer test (no Samza) so we can compare
> > them.
> > 1. Ch
VM, and do CPU sampling?
> It'd be good to get a view of exactly where in the "produce" call things
> are slow.
>
> Cheers,
> Chris
>
> On Sun, Feb 8, 2015 at 9:47 PM, Jordan Shaw wrote:
>
> > Hey Chris,
> > Sorry for the delayed response, did a Tahoe
rmal results. Thanks!
-Jordan
On Tue, Feb 10, 2015 at 10:27 AM, Jordan Shaw wrote:
> Hey Chris,
> We've done pretty extensive testing already on that task. Here's a SS of a
> sample of those results showing the 2MB/s rate. I haven't done those
> profiling specifically, we
Hey Everyone,
This I have a question somewhat related to SAMZA-109 and this line in
run-class.sh:
# Check if a max-heap size is specified. If not - set a 768M heap [[
$JAVA_OPTS != *-Xmx* ]] && JAVA_OPTS="$JAVA_OPTS -Xmx768M"
If I were to set the container.memory.mb for yarn to 4GB (
yarn.containe
es, and off-heap memory usage. All of these
> contribute to the physical memory usage that YARN cares about, but are
> outside the JVM heap. This means that we can't just use one memory setting
> for both YARN and Java. We have to have two.
>
> Cheers,
> Chris
>
> On Tue
all over?
> 6. If there was a highly-optimized and reliable way of ingesting
> partitioned streams quickly into your online serving system, would that
> help you leverage Samza more effectively?
>
> Your insights would be much appreciated!
>
>
> Thanks (:
>
>
> --
> Felix
>
--
Jordan Shaw
Full Stack Software Engineer
PubNub Inc
1045 17th St
San Francisco, CA 94107
I'm all for any optimizations that can be made to the Yarn workflow.
I actually agree with Jakob in regard to the producers/consumers. I have
spent sometime writing consumers and producers for other transport
abstractions and overall the current api abstractions in Samza I feel are
pretty good. Th
Jay,
I think doing this iteratively in smaller chunks is a better way to go as
new issues arise. As Navina said Kafka is a "stream system" and Samza is a
"stream processor" and those two ideas should be mutually exclusive.
-Jordan
On Mon, Jul 13, 2015 at 10:06 AM, Jay Kreps wrote:
> Hmm, though
from the samza
producer any idea what could be causing this? Just about the only thing
that I can find is maybe a issue with snappy or compression but I don't see
a snappy call in the traceback.
--
Jordan Shaw
Full Stack Software Engineer
PubNub Inc
on and the new
> producer. If you disable compression or switch to lz4 or gzip, does the
> issue go away?
>
> Cheers,
>
> Roger
>
> On Wed, Jul 22, 2015 at 11:54 PM, Jordan Shaw wrote:
>
> > Hey Everyone,
> > I'm getting an:
> > "kafka.me
Roger,
We upgraded from yarn 2.4 to 2.6 a while ago and been running it in prod with
no issues. It was basically a drop in if I remember right.
Jordan
> On Aug 20, 2015, at 1:48 PM, Yi Pan wrote:
>
> Hi, Selina,
>
> Samza 0.9.1 on YARN 2.6 is the proved working solution.
>
> Best,
>
> -Yi
>
ou to do this, (well, discouraged anyway :). Samza
> >> by
> >>>>> default does not provide this feature. So you maybe a little cautious
> >>>> when
> >>>>> implementing this.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Fang, Yan
> >>>>> yanfang...@gmail.com
> >>>>>
> >>>>> On Sun, Sep 20, 2015 at 4:28 PM, Michael Sklyar >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> What would be the best approach for doing "blocking" operations in
> >>>> Samza?
> >>>>>>
> >>>>>> For example, we have a kafka stream of urls for which we need to
> >>> gather
> >>>>>> external data via HTTP (such as alexa rank, get the page title and
> >>>>>> headers..). Other scenarios include database access and decision
> >>> making
> >>>>> via
> >>>>>> a rule engine.
> >>>>>>
> >>>>>> Samza processes messages in a singe thread, HTTP requests might
> >> take
> >>>>>> hundreds of miliseconds. With the single threaded design the
> >>> throughput
> >>>>>> would be very limited, which can be solved with an asynchronous
> >>>> approach.
> >>>>>> However Samza documentation explicitely states
> >>>>>> "*You are strongly discouraged from using threads in your job’s
> >>> code*".
> >>>>>>
> >>>>>> It seems that Samza design suits very well "data transformation"
> >>>>> scenarios,
> >>>>>> what is not clear is how well can it support external services?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Michael Sklyar
>
>
>
>
>
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>
--
Jordan Shaw
Full Stack Software Engineer
PubNub Inc
1045 17th St
San Francisco, CA 94107
gt;
> >
> > If I set it to the first of my RMs, the the job submission works ok if I
> submit the job from that RM and the RM is the active one. If the RM machine
> that I run the job submission from is not active, I get connection refused
> errors on port 8032. If I don't set it, I get errors where run-job.sh
> tries to submit to 0.0.0.0:8032
> >
> >
> > Many thanks,
> >
> >
> > John
>
>
--
Jordan Shaw
Full Stack Software Engineer
PubNub Inc
1045 17th St
San Francisco, CA 94107
ging?
>
> On a related subject, I'd also like to monitor throughput per topic in
> terms of messages per second and bytes per second. Should I query brokers
> periodically, or maybe there is a better way?
>
> Thanks,
> Michael
>
--
Jordan Shaw
Full Stack Software Engineer
PubNub Inc
1045 17th St
San Francisco, CA 94107
umed that it's the default Kafka config
> for commiting offsets. Will try again with Burrow set to read from
> __consumer_offsets.
>
> Thanks
>
> On Mon, Nov 16, 2015 at 8:04 PM, Jordan Shaw wrote:
>
> > Michael,
> > It depends on how you define lag.
> >
16 matches
Mail list logo