---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32528/
---
(Updated March 26, 2015, 8:15 a.m.)
Review request for samza.
Bugs: SAMZA-540
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32528/
---
Review request for samza.
Repository: samza
Description
---
modified bas
I am not sure of understanding what you want to say. Does this mean that the
kafka producer is not valid? I should test a producer specifically made for
Samza or is it a configuration issue? I have working producers and they are
sending strings in different data structures, but they all end bei
Hi everyone
Based on Matrin's StrangeLoop "turning the database inside out" what I
understand is that he meant for Samza to be a tool to pull sequential event
data from a pub-sub such as Kafka, then process the data to generate
materialized views. The next piece of the puzzle I couldn't figure out
Got it!!!
It was all about (as Chinmay pointed) defining correctly the serializer:
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
systems.kafka.streams.syslog.samza.msg.serde=string
systems.kafka.streams.samzaout.samza.msg.serde=string
Now going for the next st
Hello,
Want to confirm a basic understanding of Kafka.
If I have a dataset that needs to be partitioned by 4 columns, then the
progression is
{topic1:partition_key1} -> {Group by samza on partition_key1}
->
{topic2:partition_key2} -> {Group by samza on partition_key2}
->
{topic3:partition_key3} -
Hi Shekar,
Each kafka partition is basically just a number, you would need to specify
what partitioner strategy to use when mapping your event key to the
partition number.
You can take the 4 columns you have in the event and map it to a partition
number,the partitioner in that case would be a func
Hi:
I got samza running a job in local mode with the property:
job.factory.class=org.apache.samza.job.local.ThreadJobFactory
Now I am trying to get it running in multiple machines. I have followed the
steps in the following guide:
https://github.com/apache/samza/blob/master/docs/learn/tutorials
Hi Jordi,
You might be running into this issue (
https://issues.apache.org/jira/browse/SAMZA-456) which I just hit as well.
You probably need to add a couple more jars to your YARN lib dir.
Cheers,
Roger
On Thu, Mar 26, 2015 at 9:21 AM, Jordi Blasi Uribarri
wrote:
> Hi:
>
> I got samza runnin
Is there a typo below? Are all of these actually in the same topic, just
different partitions? Partitioning, AFAIK, is mainly done for parallelism &
throughput reasons. What is the reason for partitioning your dataset by
‘columns’?
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Ho
Hi Richard,
You can also partition by a key like "user_id" so that all messages for a
given user would end up in the same partition. This can be useful for
calculating user-specific aggregations or doing a distributed join where
the local state is also partitioned on user_id.
Cheers,
Roger
On
So if I have a feed with
{user_id:12345,
ethnicity: asian,
location: "cerritos, ca",
Height:"5.9",
weight: "150 lbs"}
I am referring to https://kafka.apache.org/081/ops.html#topic-config
How do I map the 3 columns - (user_id, ethnicity, and location) to a
partition id. If I map it this way and s
Hi Shekar,
Please refer to [1]. You can set a custom partitioner through the producer
cofig. You will have to implement your own partitioner based on your
application and partitioning strategy.
Thanks
Milinda
[1] https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
On Thu,
I guess you mean "Martin", not "Matrin", here is the link for Ori's
question. To give everyone a background.
https://thestrangeloop.com/sessions/turning-the-database-inside-out-with-apache-samza
Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108
On Thu, Mar 26, 2015 at 3:15 AM, Ori Cohen wrote:
Hi, Ori,
My interpretation on the MV usage in Martin's talk is exactly what you have
mentioned: it is considered as a "view" instead of a regular table in DB,
hence, read-only and possibly, derived data that already went through the
business logic.
On Thu, Mar 26, 2015 at 1:55 PM, Yan Fang wrote
Hi Ori,
Maybe an example would be useful. We use Samza to transform data for
materialization in Druid, because Druid is built to index and aggregate a
single event stream, but our raw data actually exists in a bunch of streams
and tables that need joining. So we have Samza handle the joining and t
Hey all,
I'm running validations and some burn-in. I'll post my vote tomorrow.
It's been pretty quiet. It'd be good to get other committers/non-committers
to do validation as well.
Cheers,
Chris
On Wed, Mar 25, 2015 at 11:20 AM, Yan Fang wrote:
> Hi Chris,
>
> Opps, signed it with another ke
Hi Chris + all,
I created a basic job that does a join from local state with Samza 0.9.0 (
https://github.com/Quantiply/rico-playground/tree/master/join/samza). So
far so good. I hoping to get some time this weekend to benchmark it on my
laptop. I think I saw that 0.9.0 includes support for send
I have ran the integration test suite w/ 0.9.0-rc0. There were some issues
related w/ the integration test: SAMZA-621, but the test suite passed after
I manually created a symlink to the file name the test script is looking
for.
Hence, +1 on the release.
On Thu, Mar 26, 2015 at 5:39 PM, Roger Hoo
19 matches
Mail list logo