@Brian Hulette<mailto:bhule...@google.com> I think the main issue I am trying to reporting is that I see this error message “Specify it explicitly using withCoder().” But I did not find withCoder() API available from ParquetIO. So maybe we need to add that method. Getting back to your ask, here is roughly the code I was running. Hope this helps. PCollection<Row> inputDataTest = pipeline.apply(ParquetIO.parseGenericRecords(new SerializableFunction<GenericRecord, Row>() { public Row apply(GenericRecord record) { return AvroUtils.toBeamRowStrict(record, null); } }) .from(path));
From: Brian Hulette <bhule...@google.com> Reply-To: "user@beam.apache.org" <user@beam.apache.org> Date: Thursday, February 25, 2021 at 3:11 PM To: Anant Damle <ana...@google.com> Cc: user <user@beam.apache.org> Subject: Re: Potential bug with BEAM-11460? Hi Tao, Thanks for reporting this! Could you share more details about your use-case, Anant mentioned that he's having trouble coming up with a test case where inferCoder doesn't work [1]. Brian [1] https://github.com/apache/beam/pull/14078#issuecomment-786293576<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fbeam%2Fpull%2F14078%23issuecomment-786293576&data=04%7C01%7Ctaol%40zillow.com%7C28c92736981d44cd247a08d8d9e2b033%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637498914935965028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Pz6zNIBaqVVl9aKmnn27iVSxLXJ%2Fx6ly5NXHgY6TCyI%3D&reserved=0> On Wed, Feb 24, 2021 at 6:49 PM Anant Damle <ana...@google.com<mailto:ana...@google.com>> wrote: Hi Brian, I think you are right. Create BEAM-11861<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FBEAM-11861&data=04%7C01%7Ctaol%40zillow.com%7C28c92736981d44cd247a08d8d9e2b033%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637498914935974981%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6bWEhniOJB8MVYHWMN0oZlGex40DarKoq%2FzIZUv0ARQ%3D&reserved=0>, will send a PR today. Present workaround is to provide .setCoder directly on the Output PCollection. On Thu, Feb 25, 2021 at 5:25 AM Brian Hulette <bhule...@google.com<mailto:bhule...@google.com>> wrote: +Anant Damle<mailto:ana...@google.com> is this an oversight in https://github.com/apache/beam/pull/13616<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fbeam%2Fpull%2F13616&data=04%7C01%7Ctaol%40zillow.com%7C28c92736981d44cd247a08d8d9e2b033%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637498914935974981%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=R87jcebbsRfqpnOBYY%2F8YYD5Hd82GK9EGGUFyfjSO7s%3D&reserved=0>? What would be the right way to fix this? On Tue, Feb 23, 2021 at 5:24 PM Tao Li <t...@zillow.com<mailto:t...@zillow.com>> wrote: Hi Beam community, I cannot log into Beam jira so I am asking this question here. I am testing this new feature from Beam 2.28 and see below error: Exception in thread "main" java.lang.IllegalArgumentException: Unable to infer coder for output of parseFn. Specify it explicitly using withCoder(). at org.apache.beam.sdk.io.parquet.ParquetIO$ParseFiles.inferCoder(ParquetIO.java:554) at org.apache.beam.sdk.io.parquet.ParquetIO$ParseFiles.expand(ParquetIO.java:521) at org.apache.beam.sdk.io.parquet.ParquetIO$ParseFiles.expand(ParquetIO.java:483) at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:547) However ParquetIO builder does not have this withCoder() method. I think this error message is mimicking AvroIO: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L1010<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fbeam%2Fblob%2Fmaster%2Fsdks%2Fjava%2Fcore%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fbeam%2Fsdk%2Fio%2FAvroIO.java%23L1010&data=04%7C01%7Ctaol%40zillow.com%7C28c92736981d44cd247a08d8d9e2b033%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637498914935974981%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zXKg%2BaLY1sGDFL%2FkS9a0%2B6MzMSjCaMxOubZr3XSicI0%3D&reserved=0> Should we add this method to ParquetIO? Thanks!