Hi, Reading in the DataStream API (that's what I'm using you are doing) from Parquet files is officially supported and documented only since 1.14 [1]. Before that it was only supported for the Table API. As far as I can tell, the basic classes (`FileSource` and `ParquetColumnarRowInputFormat`) have already been in the code base since 1.12.x. I don't know how stable it was and how well it was working. I would suggest upgrading to Flink 1.14.1. As a last resort you can try using the very least the latest version of 1.12.x branch as documented by 1.14 version, but I can not guarantee that it will be working.
Regarding the S3 issue, have you followed the documentation? [2][3] Best, Piotrek [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/formats/parquet/ [2] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins [3] https://nightlies.apache.org/flink/flink-docs-release-1.12/deployment/filesystems/s3.html pt., 17 gru 2021 o 10:10 Alexandre Montecucco < alexandre.montecu...@grabtaxi.com> napisał(a): > Hello everyone, > I am struggling to read S3 parquet files from S3 with Flink Streaming > 1.12.2 > I had some difficulty simply reading from local parquet files. I finally > managed that part, though the solution feels dirty: > - I use the readFile function + ParquetInputFormat abstract class (that is > protected) (as I could not find a way to use the public > ParquetRowInputFormat). > - the open function, in ParquetInputFormat is > using org.apache.hadoop.conf.Configuration. I am not sure which import to > add. It seems the flink-parquet library is importing the dependency from > hadoop-common but the dep is marked as provided. THe doc only shows usage > of flink-parquet from Flink SQL. So I am under the impression that this > might not work in the streaming case without extra code. I 'solved' this by > adding a dependency to hadoop-common. We did something similar to write > parquet data to S3. > > Now, when trying to run the application to read from S3, I get an > exception with root cause: > ``` > Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No > FileSystem for scheme "s3" > ``` > I guess there are some issues with hadoop-common not knowing about the > flink-s3-hadoop plugin setup. But I ran out of ideas on how to solve this. > > > I also noticed there were some changes with flink-parquet in Flink 1.14, > but I had some issues with simply reading data (but I did not investigate > so deeply for that version). > > Many thanks for any help. > -- > > [image: Grab] <https://htmlsig.com/t/000001BKA99J> > > [image: Twitter] <https://htmlsig.com/t/000001BKDVDC> [image: Facebook] > <https://htmlsig.com/t/000001BF8J9Q> [image: LinkedIn] > <https://htmlsig.com/t/000001BKYJ3R> [image: Instagram] > <https://htmlsig.com/t/000001BH4CH1> [image: Youtube] > <https://htmlsig.com/t/0000001BMMNPF> > > Alexandre Montecucco / Grab, Software Developer > alexandre.montecu...@grab.com <claire...@grab.com> / 8782 0937 > > Grab > 138 Cecil Street, Cecil Court #01-01Singapore 069538 > https://www.grab.com/ <https://www.grab.com/sg/hitch> > > > By communicating with Grab Inc and/or its subsidiaries, associate > companies and jointly controlled entities (“Grab Group”), you are deemed to > have consented to the processing of your personal data as set out in the > Privacy Notice which can be viewed at https://grab.com/privacy/ > > This email contains confidential information and is only for the intended > recipient(s). If you are not the intended recipient(s), please do not > disseminate, distribute or copy this email Please notify Grab Group > immediately if you have received this by mistake and delete this email from > your system. Email transmission cannot be guaranteed to be secure or > error-free as any information therein could be intercepted, corrupted, > lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do > not accept liability for any errors or omissions in the contents of this > email arises as a result of email transmission. All intellectual property > rights in this email and attachments therein shall remain vested in Grab > Group, unless otherwise provided by law. >