I have ran the integration test suite w/ 0.9.0-rc0. There were some issues
related w/ the integration test: SAMZA-621, but the test suite passed after
I manually created a symlink to the file name the test script is looking
for.
Hence, +1 on the release.
On Thu, Mar 26, 2015 at 5:39 PM, Roger Hoo
Hi Chris + all,
I created a basic job that does a join from local state with Samza 0.9.0 (
https://github.com/Quantiply/rico-playground/tree/master/join/samza). So
far so good. I hoping to get some time this weekend to benchmark it on my
laptop. I think I saw that 0.9.0 includes support for send
Hey all,
I'm running validations and some burn-in. I'll post my vote tomorrow.
It's been pretty quiet. It'd be good to get other committers/non-committers
to do validation as well.
Cheers,
Chris
On Wed, Mar 25, 2015 at 11:20 AM, Yan Fang wrote:
> Hi Chris,
>
> Opps, signed it with another ke
Hi Ori,
Maybe an example would be useful. We use Samza to transform data for
materialization in Druid, because Druid is built to index and aggregate a
single event stream, but our raw data actually exists in a bunch of streams
and tables that need joining. So we have Samza handle the joining and t
Hi, Ori,
My interpretation on the MV usage in Martin's talk is exactly what you have
mentioned: it is considered as a "view" instead of a regular table in DB,
hence, read-only and possibly, derived data that already went through the
business logic.
On Thu, Mar 26, 2015 at 1:55 PM, Yan Fang wrote
I guess you mean "Martin", not "Matrin", here is the link for Ori's
question. To give everyone a background.
https://thestrangeloop.com/sessions/turning-the-database-inside-out-with-apache-samza
Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108
On Thu, Mar 26, 2015 at 3:15 AM, Ori Cohen wrote:
Hi Shekar,
Please refer to [1]. You can set a custom partitioner through the producer
cofig. You will have to implement your own partitioner based on your
application and partitioning strategy.
Thanks
Milinda
[1] https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
On Thu,
So if I have a feed with
{user_id:12345,
ethnicity: asian,
location: "cerritos, ca",
Height:"5.9",
weight: "150 lbs"}
I am referring to https://kafka.apache.org/081/ops.html#topic-config
How do I map the 3 columns - (user_id, ethnicity, and location) to a
partition id. If I map it this way and s
Hi Richard,
You can also partition by a key like "user_id" so that all messages for a
given user would end up in the same partition. This can be useful for
calculating user-specific aggregations or doing a distributed join where
the local state is also partitioned on user_id.
Cheers,
Roger
On
Is there a typo below? Are all of these actually in the same topic, just
different partitions? Partitioning, AFAIK, is mainly done for parallelism &
throughput reasons. What is the reason for partitioning your dataset by
‘columns’?
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Ho
Hi Jordi,
You might be running into this issue (
https://issues.apache.org/jira/browse/SAMZA-456) which I just hit as well.
You probably need to add a couple more jars to your YARN lib dir.
Cheers,
Roger
On Thu, Mar 26, 2015 at 9:21 AM, Jordi Blasi Uribarri
wrote:
> Hi:
>
> I got samza runnin
Hi:
I got samza running a job in local mode with the property:
job.factory.class=org.apache.samza.job.local.ThreadJobFactory
Now I am trying to get it running in multiple machines. I have followed the
steps in the following guide:
https://github.com/apache/samza/blob/master/docs/learn/tutorials
Hi Shekar,
Each kafka partition is basically just a number, you would need to specify
what partitioner strategy to use when mapping your event key to the
partition number.
You can take the 4 columns you have in the event and map it to a partition
number,the partitioner in that case would be a func
Hello,
Want to confirm a basic understanding of Kafka.
If I have a dataset that needs to be partitioned by 4 columns, then the
progression is
{topic1:partition_key1} -> {Group by samza on partition_key1}
->
{topic2:partition_key2} -> {Group by samza on partition_key2}
->
{topic3:partition_key3} -
Got it!!!
It was all about (as Chinmay pointed) defining correctly the serializer:
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
systems.kafka.streams.syslog.samza.msg.serde=string
systems.kafka.streams.samzaout.samza.msg.serde=string
Now going for the next st
Hi everyone
Based on Matrin's StrangeLoop "turning the database inside out" what I
understand is that he meant for Samza to be a tool to pull sequential event
data from a pub-sub such as Kafka, then process the data to generate
materialized views. The next piece of the puzzle I couldn't figure out
I am not sure of understanding what you want to say. Does this mean that the
kafka producer is not valid? I should test a producer specifically made for
Samza or is it a configuration issue? I have working producers and they are
sending strings in different data structures, but they all end bei
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32528/
---
Review request for samza.
Repository: samza
Description
---
modified bas
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32528/
---
(Updated March 26, 2015, 8:15 a.m.)
Review request for samza.
Bugs: SAMZA-540
19 matches
Mail list logo