Hi Arvid, assuming that I have A0,B0,C0 parquet files with different schema and a common field *ID*, I want to write them to A1,B2,C3 files respectively. My problem is that in my code I do not want to know the full schema just by filtering using the ID field and writing the unfiltered lines to the destination file. each source file should have a matching destination file I tried to implement it using the ParquetInputFormat but I need to define the schema in advance (MessageType) .
class ParquetInput(path: Path, messageType: MessageType) extends ParquetInputFormat[Row](path, messageType){ I am looking for a way that my code will be agnostic to the schema and will only know the "ID" field (just like in spark) e.g *val filtered = rawsDF.filter(col("id") != "123")* Thanks Avi On Thu, Mar 11, 2021 at 2:53 PM Arvid Heise <ar...@apache.org> wrote: > Hi Avi, > > I'm not entirely sure I understand the question. Let's say you have source > A, B, C all with different schema but all have an id. You could use the > ParquetMapInputFormat that provides a map of the records and just use a > map-lookup. > > However, I'm not sure how you want to write these records with different > schema into the same parquet file. Maybe, you just want to extract the > common fields of A, B, C? Then you can also use Table API and just declare > the fields that are common. > > Or do you have sink A, B, C and actually 3 separate topologies? > > On Wed, Mar 10, 2021 at 10:50 AM Avi Levi <a...@theneura.com> wrote: > >> Hi all, >> I am trying to filter lines from parquet files, the problem is that they >> have different schemas, however the field that I am using to filter >> exists in all schemas. >> in spark this is quite straight forward : >> >> *val filtered = rawsDF.filter(col("id") != "123")* >> >> I tried to do it in flink by extending the ParquetInputFormat but in this >> case I need to schema (message type) and implement Convert method which I >> want to avoid since I do not want to convert the line (I want to write is >> as is to other parquet file) >> >> Any ideas ? >> >> Cheers >> Avi >> >>