Hi all, In case this helps anyone, I was able to create a simple class to do the join and it works nicely for my use case. It assumes you have schema for the input and output records.
Example ( https://github.com/Quantiply/rico/blob/master/avro-serde/src/test/java/com/quantiply/avro/JoinTest.java#L44-L52 ): GenericRecord in1 = getIn1(); GenericRecord in2 = getIn2(); GenericRecord joined = new Join(getJoinedSchema()) .merge(in1) .merge(in2) .getBuilder() .set("charlie", "blah blah") .build(); Class is here: https://github.com/Quantiply/rico/blob/master/avro-serde/src/main/java/com/quantiply/avro/Join.java Cheers, Roger On Thu, Apr 9, 2015 at 12:54 PM, Roger Hoover <[email protected]> wrote: > Yi Pan, > > Thanks for your response. I'm thinking that I'll iterate over the fields > of the input schemas (similar to this > https://github.com/apache/samza/blob/samza-sql/samza-sql/src/main/java/org/apache/samza/sql/metadata/AvroSchemaConverter.java#L58-L62), > match them up with the output schema and then copy the values. It'll let > you know how it goes in case it's useful. > > Cheers, > > Roger > > On Thu, Apr 9, 2015 at 12:07 PM, Yi Pan <[email protected]> wrote: > >> Hi, Roger, >> >> Good question on that. I am actually not aware of any "automatic" way of >> doing this in Avro. I have tried to add generic Schema and Data interface >> in samza-sql branch to address the morphing of the schemas from input >> streams to the output streams. The basic idea is to have wrapper Schema >> and >> Data classes on-top-of the deserialized objects to access the data fields >> according to the schema w/o changing and copying the actual data fields. >> Hence, when there is a need to morph the input data schemas into a new >> output data schema, we just need an implementation of the new output data >> Schema class that can read the corresponding data fields from the input >> data and write them out in the output schema. An interface function >> transform() is added in the Schema class for this exact purpose. >> Currently, >> it only takes one input data and one example of "projection" >> transformation >> can be found in the implementation of AvroSchema class. A join case as you >> presented may well be a reason to have an implementation of "join" with >> multiple input data. >> >> All the above solution is still experimental and please feel free to >> provide your feedback and comments on that. If we agree that this solution >> is good and suit for a broader use case, it can be considered to be used >> outside the "SQL" context as well. >> >> Best regards! >> >> -Yi >> >> On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover <[email protected]> >> wrote: >> >> > Hi Milinda and others, >> > >> > This is an Avro question but since you guys are working on Avro support >> for >> > stream SQL, I thought I'd ask you for help. >> > >> > If I have a two records of type A and B as below and want to join them >> > similar to "SELECT *" in SQL to produce a record of type AB, is there an >> > simple way to do this with Avro without writing code to copy each field >> > individually? >> > >> > I appreciate any help. >> > >> > Thanks, >> > >> > Roger >> > >> > { >> > "name": "A", >> > "type": "record", >> > "namespace": "fubar", >> > "fields": [{"name": "a", "type" : "int"}] >> > } >> > >> > { >> > "name": "B", >> > "type": "record", >> > "namespace": "fubar", >> > "fields": [{"name": "b", "type" : "int"}] >> > } >> > >> > { >> > "name": "AB", >> > "type": "record", >> > "namespace": "fubar", >> > "fields": [{"name": "a", "type" : "int"}, {"name": "b", "type" : >> "int"}] >> > } >> > >> > >
