Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-26 Thread Yi Pan
I have ran the integration test suite w/ 0.9.0-rc0. There were some issues related w/ the integration test: SAMZA-621, but the test suite passed after I manually created a symlink to the file name the test script is looking for. Hence, +1 on the release. On Thu, Mar 26, 2015 at 5:39 PM, Roger Hoo

Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-26 Thread Roger Hoover
Hi Chris + all, I created a basic job that does a join from local state with Samza 0.9.0 ( https://github.com/Quantiply/rico-playground/tree/master/join/samza). So far so good. I hoping to get some time this weekend to benchmark it on my laptop. I think I saw that 0.9.0 includes support for send

Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-26 Thread Chris Riccomini
Hey all, I'm running validations and some burn-in. I'll post my vote tomorrow. It's been pretty quiet. It'd be good to get other committers/non-committers to do validation as well. Cheers, Chris On Wed, Mar 25, 2015 at 11:20 AM, Yan Fang wrote: > Hi Chris, > > Opps, signed it with another ke

Re: Samza questions

2015-03-26 Thread Gian Merlino
Hi Ori, Maybe an example would be useful. We use Samza to transform data for materialization in Druid, because Druid is built to index and aggregate a single event stream, but our raw data actually exists in a bunch of streams and tables that need joining. So we have Samza handle the joining and t

Re: Samza questions

2015-03-26 Thread Yi Pan
Hi, Ori, My interpretation on the MV usage in Martin's talk is exactly what you have mentioned: it is considered as a "view" instead of a regular table in DB, hence, read-only and possibly, derived data that already went through the business logic. On Thu, Mar 26, 2015 at 1:55 PM, Yan Fang wrote

Re: Samza questions

2015-03-26 Thread Yan Fang
I guess you mean "Martin", not "Matrin", here is the link for Ori's question. To give everyone a background. https://thestrangeloop.com/sessions/turning-the-database-inside-out-with-apache-samza Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Thu, Mar 26, 2015 at 3:15 AM, Ori Cohen wrote:

Re: Kafka partition key

2015-03-26 Thread Milinda Pathirage
Hi Shekar, Please refer to [1]. You can set a custom partitioner through the producer cofig. You will have to implement your own partitioner based on your application and partitioning strategy. Thanks Milinda [1] https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example On Thu,

Re: Kafka partition key

2015-03-26 Thread Shekar Tippur
So if I have a feed with {user_id:12345, ethnicity: asian, location: "cerritos, ca", Height:"5.9", weight: "150 lbs"} I am referring to https://kafka.apache.org/081/ops.html#topic-config How do I map the 3 columns - (user_id, ethnicity, and location) to a partition id. If I map it this way and s

Re: Kafka partition key

2015-03-26 Thread Roger Hoover
Hi Richard, You can also partition by a key like "user_id" so that all messages for a given user would end up in the same partition. This can be useful for calculating user-specific aggregations or doing a distributed join where the local state is also partitioned on user_id. Cheers, Roger On

Re: Kafka partition key

2015-03-26 Thread Richard Lee
Is there a typo below? Are all of these actually in the same topic, just different partitions? Partitioning, AFAIK, is mainly done for parallelism & throughput reasons. What is the reason for partitioning your dataset by ‘columns’? https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Ho

Re: java.lang.NoClassDefFoundError on Yarn job

2015-03-26 Thread Roger Hoover
Hi Jordi, You might be running into this issue ( https://issues.apache.org/jira/browse/SAMZA-456) which I just hit as well. You probably need to add a couple more jars to your YARN lib dir. Cheers, Roger On Thu, Mar 26, 2015 at 9:21 AM, Jordi Blasi Uribarri wrote: > Hi: > > I got samza runnin

java.lang.NoClassDefFoundError on Yarn job

2015-03-26 Thread Jordi Blasi Uribarri
Hi: I got samza running a job in local mode with the property: job.factory.class=org.apache.samza.job.local.ThreadJobFactory Now I am trying to get it running in multiple machines. I have followed the steps in the following guide: https://github.com/apache/samza/blob/master/docs/learn/tutorials

Re: Kafka partition key

2015-03-26 Thread Dotan Patrich
Hi Shekar, Each kafka partition is basically just a number, you would need to specify what partitioner strategy to use when mapping your event key to the partition number. You can take the 4 columns you have in the event and map it to a partition number,the partitioner in that case would be a func

Kafka partition key

2015-03-26 Thread Shekar Tippur
Hello, Want to confirm a basic understanding of Kafka. If I have a dataset that needs to be partitioned by 4 columns, then the progression is {topic1:partition_key1} -> {Group by samza on partition_key1} -> {topic2:partition_key2} -> {Group by samza on partition_key2} -> {topic3:partition_key3} -

RE: cannot be cast to java.lang.String

2015-03-26 Thread Jordi Blasi Uribarri
Got it!!! It was all about (as Chinmay pointed) defining correctly the serializer: serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory systems.kafka.streams.syslog.samza.msg.serde=string systems.kafka.streams.samzaout.samza.msg.serde=string Now going for the next st

Samza questions

2015-03-26 Thread Ori Cohen
Hi everyone Based on Matrin's StrangeLoop "turning the database inside out" what I understand is that he meant for Samza to be a tool to pull sequential event data from a pub-sub such as Kafka, then process the data to generate materialized views. The next piece of the puzzle I couldn't figure out

RE: cannot be cast to java.lang.String

2015-03-26 Thread Jordi Blasi Uribarri
I am not sure of understanding what you want to say. Does this mean that the kafka producer is not valid? I should test a producer specifically made for Samza or is it a configuration issue? I have working producers and they are sending strings in different data structures, but they all end bei

Review Request 32528: SAMZA-540 modified

2015-03-26 Thread Yan Fang
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32528/ --- Review request for samza. Repository: samza Description --- modified bas

Re: Review Request 32528: SAMZA-540 modified

2015-03-26 Thread Yan Fang
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32528/ --- (Updated March 26, 2015, 8:15 a.m.) Review request for samza. Bugs: SAMZA-540