Review Request 30825: SAMZA-547: Add Java Serializable Serde

2015-02-10 Thread Ruslan Khafizov
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30825/ --- Review request for samza. Repository: samza Description --- I added 2 im

Re: container concurrency and pipelining

2015-02-10 Thread Chris Riccomini
Hey Jordan, It looks like your task is almost identical to the one in SAMZA-548. Did you have a chance to test your job out with 0.9? > If I just consume off the envelope I was seeing much faster consume rates. Which was one of the indications that the producer was causing problems. Yes, this s

Re: Collocating Samza(YARN) and Kafka/ZK clusters

2015-02-10 Thread Chris Riccomini
Hey Karthik, I've never tried running ZK on the same machines as Kafka/Samza. Co-locating Kafka/Samza worked pretty well for us until we started using Samza's state management facilities. At this point, Samza's state stores started messing with the OS page cache in a way that impacted the Kafka b

Re: container concurrency and pipelining

2015-02-10 Thread Jordan Shaw
Hey Chris, We've done pretty extensive testing already on that task. Here's a SS of a sample of those results showing the 2MB/s rate. I haven't done those profiling specifically, we were running htop and a network profiler to get a general idea of system consumption. We'll add that to our todo's fo

Re: Window spec in SQL language vs Samza system details

2015-02-10 Thread Julian Hyde
The answer depends on your design philosophy. We need to strike a balance between making it possible and making it easy. Because SQL is a powerful closed language, we can achieve a lot by combining the elements. For example, I think that your example can be solved by joining a "heartbeat" stream

Re: container concurrency and pipelining

2015-02-10 Thread Jordan Shaw
Hey Chris, Good News! ...sorta. We found that our serialization serde (msgpack) was taking 4x our process time, and when I changed the serde to String it supported our test traffic rates of 9MB/s without any signs of not being able to support more see here:[image: Inline image 1] We also benchmark

Re: container concurrency and pipelining

2015-02-10 Thread Chris Riccomini
Hey Jordan, That's awesome! Yes, 9mb/s should be do-able, so I'm glad to hear that worked. :) In practice, what we've found is that serde does take most of the time. Protobuf, Thrift, or Avro are usually the ones people end up using when they care about performance at the level you're talking abo

Re: Window spec in SQL language vs Samza system details

2015-02-10 Thread Yi Pan
Hi, Julian, Thanks for the example. Could you also comment on how can user specifies the "timeout" to terminate the window in my example? I.e. there is no rows delivered in 5min, close the current window? Essentially, how do user specify "punctuation point" in the stream w/o breaking SQL planner?