If you want to get just a PCollection<GenericRecord> as output then you would still need to set AvroCoder, but which schema to use in this case?
> On 6 Jan 2021, at 19:53, Tao Li <t...@zillow.com> wrote: > > Hi Alexey, > > Thank you so much for this info. I will definitely give it a try once 2.28 is > released. > > Regarding this feature, it’s basically mimicking the feature from > AvroIO:https://beam.apache.org/releases/javadoc/2.26.0/org/apache/beam/sdk/io/AvroIO.html > > <https://beam.apache.org/releases/javadoc/2.26.0/org/apache/beam/sdk/io/AvroIO.html> > > I have one more quick question regarding the “reading records of an unknown > schema” scenario. In the sample code a PCollection<Foo> is being returned and > the parseGenericRecords requires a parsing logic. What if I just want to get > a PCollection<GenericRecord> instead of a specific class (e.g. Foo in the > example)? I guess I can just skip the ParquetIO.parseGenericRecords > transform? So do I still have to specify the dummy parsing logic like below? > Thanks! > > p.apply(AvroIO.parseGenericRecords(new SerializableFunction<GenericRecord, > GenericRecord >() { > public Foo apply(GenericRecord record) { > return record; > } > > From: Alexey Romanenko <aromanenko....@gmail.com> > Reply-To: "user@beam.apache.org" <user@beam.apache.org> > Date: Wednesday, January 6, 2021 at 10:13 AM > To: "user@beam.apache.org" <user@beam.apache.org> > Subject: Re: Quick question regarding ParquetIO > > Hi Tao, > > This jira [1] looks exactly what you are asking but it was merged recently > (thanks to Anant Damle for working on this!) and it should be available only > in Beam 2.28.0. > > [1] https://issues.apache.org/jira/browse/BEAM-11460 > <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FBEAM-11460&data=04%7C01%7Ctaol%40zillow.com%7Cc1a2c7a32ee64bdaf32b08d8b26ec466%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637455536115879373%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pLjqharsCRGvC7%2FJNPtOwMBAsXbNfujs%2BCnbbew0MLA%3D&reserved=0> > > Regards, > Alexey > > >> On 6 Jan 2021, at 18:57, Tao Li <t...@zillow.com <mailto:t...@zillow.com>> >> wrote: >> >> Hi beam community, >> >> Quick question about ParquetIO >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Freleases%2Fjavadoc%2F2.25.0%2Forg%2Fapache%2Fbeam%2Fsdk%2Fio%2Fparquet%2FParquetIO.html&data=04%7C01%7Ctaol%40zillow.com%7Cc1a2c7a32ee64bdaf32b08d8b26ec466%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637455536115889330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NvZGeUUZoMNBqRVBNNviMUq6uanJH4XNk05EEHTrngc%3D&reserved=0>. >> Is there a way to avoid specifying the avro schema when reading parquet >> files? The reason is that we may not know the parquet schema until we read >> the files. In comparison, spark parquet reader >> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fsql-data-sources-parquet.html&data=04%7C01%7Ctaol%40zillow.com%7Cc1a2c7a32ee64bdaf32b08d8b26ec466%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637455536115889330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xc4IanHypjltv8PeeDbt9eSQpgyFNUxE9nv1SgB2eTQ%3D&reserved=0> >> does not require such a schema specification. >> >> Please advise. Thanks a lot!