Very well - I'll give this a try. Thanks, Dawid. // ah
From: Dawid Wysakowicz <dwysakow...@apache.org> Sent: Wednesday, January 8, 2020 7:21 AM To: Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>; user@flink.apache.org Cc: Richards, Adam S [Engineering] <adam.richa...@ny.email.gs.com> Subject: Re: Table API: Joining on Tables of Complex Types Hi Andreas, Converting your GenericRecords to Rows would definitely be the safest option. You can check how its done in the org.apache.flink.formats.avro.AvroRowDeserializationSchema. You can reuse the logic from there to write something like: DataSet<GenericRecord> dataset = ... dataset.map( /* convert GenericRecord to Row */).returns(AvroSchemaConverter.convertToTypeInfo(avroSchemaString)); Another thing you could try is to make sure that GenericRecord is seen as an avro type by fink (flink should understand that avro type is a complex type): dataset.returns(new GenericRecordAvroTypeInfo(/*schema string*/) than the TableEnvironment should pick it up as a structured type and flatten it automatically when registering the Table. Bear in mind the returns method is part of SingleInputUdfOperator so you can apply it right after some transformation e.g. map/flatMap etc. Best, Dawid On 06/01/2020 18:03, Hailu, Andreas wrote: Hi David, thanks for getting back. >From what you've said, I think we'll need to convert our GenericRecord into >structured types - do you have any references or examples I can have a look >at? If not, perhaps you could just show me a basic example of flattening a >complex object with accessors into a Table of structured types. Or by >structured types, did you mean Row? // ah From: Dawid Wysakowicz <dwysakow...@apache.org><mailto:dwysakow...@apache.org> Sent: Monday, January 6, 2020 9:32 AM To: Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com><mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org<mailto:user@flink.apache.org> Cc: Richards, Adam S [Engineering] <adam.richa...@ny.email.gs.com><mailto:adam.richa...@ny.email.gs.com> Subject: Re: Table API: Joining on Tables of Complex Types Hi Andreas, First of all I would highly recommend converting a non-structured types to structured types as soon as possible as it opens more possibilities to optimize the plan. Have you tried: Table users = batchTableEnvironment.fromDataSet(usersDataset).select("getField(f0, userName) as userName", "f0") Table other = batchTableEnvironment.fromDataSet(otherDataset).select("getField(f0, userName) as user", "f1") Table result = other.join(users, "user = userName") You could also check how the org.apache.flink.formats.avro.AvroRowDeserializationSchema class is implemented which internally converts an avro record to a structured Row. Hope this helps. Best, Dawid On 03/01/2020 23:16, Hailu, Andreas wrote: Hi folks, I'm trying to join two Tables which are composed of complex types, Avro's GenericRecord to be exact. I have to use a custom UDF to extract fields out of the record and I'm having some trouble on how to do joins on them as I need to call this UDF to read what I need. Example below: batchTableEnvironment.registerFunction("getField", new GRFieldExtractor()); // GenericRecord field extractor Table users = batchTableEnvironment.fromDataSet(usersDataset); // Converting from some pre-existing DataSet Table otherDataset = batchTableEnvironment.fromDataSet(someOtherDataset); Table userNames = t.select("getField(f0, userName)"); // This is how the UDF is used, as GenericRecord is a complex type requiring you to invoke a get() method on the field you're interested in. Here we get a get on field 'userName' I'd like to do something using the Table API similar to the query "SELECT * from otherDataset WHERE otherDataset.userName = users.userName". How is this done? Best, Andreas The Goldman Sachs Group, Inc. All rights reserved. See http://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest and other terms and conditions relating to this e-mail and your reliance on information contained in it. This message may contain confidential or privileged information. If you are not the intended recipient, please advise us immediately and delete this message. See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication. If you cannot access these links, please notify us by reply message and we will send the contents to you. ________________________________ Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices> ________________________________ Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices> ________________________________ Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>