Thanks, Julian.  Good point about needing aliasing for unique names in
SQL.  I didn't know about array_agg...nice.

On Thu, Apr 9, 2015 at 12:35 PM, Julian Hyde <jul...@hydromatic.net> wrote:

> Much of this is about mapping from logical fields (i.e. the fields you can
> reference in SQL) down to the Avro representation; I’m no expert on that
> mapping, so I’ll focus on the SQL stuff.
>
> First, SQL doesn’t allow a record to have two fields of the same name, so
> you wouldn’t be allowed to have two “name” fields. When you do a join, you
> might need to alias output columns:
>
> select stream orders.id, products.id as productId
> from orders
> join products on orders.id = products.id;
>
> Second, JOIN isn’t the only SQL operator that combines records; GROUP BY
> also combines records. JOIN combines records from different streams, and
> they usually have different types (i.e. different numbers/types of fields),
> whereas GROUP BY combines records from the same stream. Use whichever best
> suits your purpose.
>
> select stream zipcode, floor(rowtime to hour), array_agg(orderid) as
> orderIds
> from orders
> group by zipcode, floor(rowtime to hour)
>
> (array_agg is an aggregate function, recently added to the SQL standard,
> that gathers input values into an array. See
> http://www.craigkerstiens.com/2013/04/17/array-agg/.)
>
> Output:
>
> { zipcode: “94705”, rowtime: “2015-04-09 11:00:00”, orderIds: [123, 156,
> 1056] },
> { zipcode: “94117”, rowtime: “2015-04-09 11:00:00”, orderIds: [45, 777] },
> { zipcode: “94705”, rowtime: “2015-04-09 12:00:00”, orderIds: [55] }
>
> Julian
>
>
> On Apr 9, 2015, at 12:07 PM, Yi Pan <nickpa...@gmail.com> wrote:
>
> > Hi, Roger,
> >
> > Good question on that. I am actually not aware of any "automatic" way of
> > doing this in Avro. I have tried to add generic Schema and Data interface
> > in samza-sql branch to address the morphing of the schemas from input
> > streams to the output streams. The basic idea is to have wrapper Schema
> and
> > Data classes on-top-of the deserialized objects to access the data fields
> > according to the schema w/o changing and copying the actual data fields.
> > Hence, when there is a need to morph the input data schemas into a new
> > output data schema, we just need an implementation of the new output data
> > Schema class that can read the corresponding data fields from the input
> > data and write them out in the output schema. An interface function
> > transform() is added in the Schema class for this exact purpose.
> Currently,
> > it only takes one input data and one example of "projection"
> transformation
> > can be found in the implementation of AvroSchema class. A join case as
> you
> > presented may well be a reason to have an implementation of "join" with
> > multiple input data.
> >
> > All the above solution is still experimental and please feel free to
> > provide your feedback and comments on that. If we agree that this
> solution
> > is good and suit for a broader use case, it can be considered to be used
> > outside the "SQL" context as well.
> >
> > Best regards!
> >
> > -Yi
> >
> > On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
> >
> >> Hi Milinda and others,
> >>
> >> This is an Avro question but since you guys are working on Avro support
> for
> >> stream SQL, I thought I'd ask you for help.
> >>
> >> If I have a two records of type A and B as below and want to join them
> >> similar to "SELECT *" in SQL to produce a record of type AB, is there an
> >> simple way to do this with Avro without writing code to copy each field
> >> individually?
> >>
> >> I appreciate any help.
> >>
> >> Thanks,
> >>
> >> Roger
> >>
> >> {
> >>  "name": "A",
> >>  "type": "record",
> >>  "namespace": "fubar",
> >>  "fields": [{"name": "a", "type" : "int"}]
> >> }
> >>
> >> {
> >>  "name": "B",
> >>  "type": "record",
> >>  "namespace": "fubar",
> >>  "fields": [{"name": "b", "type" : "int"}]
> >> }
> >>
> >> {
> >>  "name": "AB",
> >>  "type": "record",
> >>  "namespace": "fubar",
> >>  "fields": [{"name": "a", "type" : "int"}, {"name": "b", "type" :
> "int"}]
> >> }
> >>
>
>

Reply via email to