---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30825/
---
Review request for samza.
Repository: samza
Description
---
I added 2 im
Hey Jordan,
It looks like your task is almost identical to the one in SAMZA-548. Did
you have a chance to test your job out with 0.9?
> If I just consume off the envelope I was seeing much faster consume
rates. Which was one of the indications that the producer was causing
problems.
Yes, this s
Hey Karthik,
I've never tried running ZK on the same machines as Kafka/Samza.
Co-locating Kafka/Samza worked pretty well for us until we started using
Samza's state management facilities. At this point, Samza's state stores
started messing with the OS page cache in a way that impacted the Kafka
b
Hey Chris,
We've done pretty extensive testing already on that task. Here's a SS of a
sample of those results showing the 2MB/s rate. I haven't done those
profiling specifically, we were running htop and a network profiler to get
a general idea of system consumption. We'll add that to our todo's fo
The answer depends on your design philosophy. We need to strike a balance
between making it possible and making it easy. Because SQL is a powerful closed
language, we can achieve a lot by combining the elements. For example, I think
that your example can be solved by joining a "heartbeat" stream
Hey Chris,
Good News! ...sorta. We found that our serialization serde (msgpack) was
taking 4x our process time, and when I changed the serde to String it
supported our test traffic rates of 9MB/s without any signs of not being
able to support more see here:[image: Inline image 1]
We also benchmark
Hey Jordan,
That's awesome! Yes, 9mb/s should be do-able, so I'm glad to hear that
worked. :)
In practice, what we've found is that serde does take most of the time.
Protobuf, Thrift, or Avro are usually the ones people end up using when
they care about performance at the level you're talking abo
Hi, Julian,
Thanks for the example. Could you also comment on how can user specifies
the "timeout" to terminate the window in my example? I.e. there is no rows
delivered in 5min, close the current window? Essentially, how do user
specify "punctuation point" in the stream w/o breaking SQL planner?