Thanks, Julian. Good point about needing aliasing for unique names in SQL. I didn't know about array_agg...nice.
On Thu, Apr 9, 2015 at 12:35 PM, Julian Hyde <jul...@hydromatic.net> wrote: > Much of this is about mapping from logical fields (i.e. the fields you can > reference in SQL) down to the Avro representation; I’m no expert on that > mapping, so I’ll focus on the SQL stuff. > > First, SQL doesn’t allow a record to have two fields of the same name, so > you wouldn’t be allowed to have two “name” fields. When you do a join, you > might need to alias output columns: > > select stream orders.id, products.id as productId > from orders > join products on orders.id = products.id; > > Second, JOIN isn’t the only SQL operator that combines records; GROUP BY > also combines records. JOIN combines records from different streams, and > they usually have different types (i.e. different numbers/types of fields), > whereas GROUP BY combines records from the same stream. Use whichever best > suits your purpose. > > select stream zipcode, floor(rowtime to hour), array_agg(orderid) as > orderIds > from orders > group by zipcode, floor(rowtime to hour) > > (array_agg is an aggregate function, recently added to the SQL standard, > that gathers input values into an array. See > http://www.craigkerstiens.com/2013/04/17/array-agg/.) > > Output: > > { zipcode: “94705”, rowtime: “2015-04-09 11:00:00”, orderIds: [123, 156, > 1056] }, > { zipcode: “94117”, rowtime: “2015-04-09 11:00:00”, orderIds: [45, 777] }, > { zipcode: “94705”, rowtime: “2015-04-09 12:00:00”, orderIds: [55] } > > Julian > > > On Apr 9, 2015, at 12:07 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Roger, > > > > Good question on that. I am actually not aware of any "automatic" way of > > doing this in Avro. I have tried to add generic Schema and Data interface > > in samza-sql branch to address the morphing of the schemas from input > > streams to the output streams. The basic idea is to have wrapper Schema > and > > Data classes on-top-of the deserialized objects to access the data fields > > according to the schema w/o changing and copying the actual data fields. > > Hence, when there is a need to morph the input data schemas into a new > > output data schema, we just need an implementation of the new output data > > Schema class that can read the corresponding data fields from the input > > data and write them out in the output schema. An interface function > > transform() is added in the Schema class for this exact purpose. > Currently, > > it only takes one input data and one example of "projection" > transformation > > can be found in the implementation of AvroSchema class. A join case as > you > > presented may well be a reason to have an implementation of "join" with > > multiple input data. > > > > All the above solution is still experimental and please feel free to > > provide your feedback and comments on that. If we agree that this > solution > > is good and suit for a broader use case, it can be considered to be used > > outside the "SQL" context as well. > > > > Best regards! > > > > -Yi > > > > On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover <roger.hoo...@gmail.com> > wrote: > > > >> Hi Milinda and others, > >> > >> This is an Avro question but since you guys are working on Avro support > for > >> stream SQL, I thought I'd ask you for help. > >> > >> If I have a two records of type A and B as below and want to join them > >> similar to "SELECT *" in SQL to produce a record of type AB, is there an > >> simple way to do this with Avro without writing code to copy each field > >> individually? > >> > >> I appreciate any help. > >> > >> Thanks, > >> > >> Roger > >> > >> { > >> "name": "A", > >> "type": "record", > >> "namespace": "fubar", > >> "fields": [{"name": "a", "type" : "int"}] > >> } > >> > >> { > >> "name": "B", > >> "type": "record", > >> "namespace": "fubar", > >> "fields": [{"name": "b", "type" : "int"}] > >> } > >> > >> { > >> "name": "AB", > >> "type": "record", > >> "namespace": "fubar", > >> "fields": [{"name": "a", "type" : "int"}, {"name": "b", "type" : > "int"}] > >> } > >> > >