Hi,

Reading in the DataStream API (that's what I'm using you are doing) from
Parquet files is officially supported and documented only since 1.14 [1].
Before that it was only supported for the Table API. As far as I can tell,
the basic classes (`FileSource` and `ParquetColumnarRowInputFormat`) have
already been in the code base since 1.12.x. I don't know how stable it was
and how well it was working. I would suggest upgrading to Flink 1.14.1. As
a last resort you can try using the very least the latest version of 1.12.x
branch as documented by 1.14 version, but I can not guarantee that it will
be working.

Regarding the S3 issue, have you followed the documentation? [2][3]

Best,
Piotrek

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/formats/parquet/
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
[3]
https://nightlies.apache.org/flink/flink-docs-release-1.12/deployment/filesystems/s3.html


pt., 17 gru 2021 o 10:10 Alexandre Montecucco <
alexandre.montecu...@grabtaxi.com> napisał(a):

> Hello everyone,
> I am struggling to read S3 parquet files from S3 with Flink Streaming
> 1.12.2
> I had some difficulty simply reading from local parquet files. I finally
> managed that part, though the solution feels dirty:
> - I use the readFile function + ParquetInputFormat abstract class (that is
> protected) (as I could not find a way to use the public
> ParquetRowInputFormat).
> - the open function, in ParquetInputFormat is
> using org.apache.hadoop.conf.Configuration. I am not sure which import to
> add. It seems the flink-parquet library is importing the dependency from
> hadoop-common but the dep is marked as provided. THe doc only shows usage
> of flink-parquet from Flink SQL. So I am under the impression that this
> might not work in the streaming case without extra code. I 'solved' this by
> adding a dependency to hadoop-common. We did something similar to write
> parquet data to S3.
>
> Now, when trying to run the application to read from S3, I get an
> exception with root cause:
> ```
> Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No
> FileSystem for scheme "s3"
> ```
> I guess there are some issues with hadoop-common not knowing about the
> flink-s3-hadoop plugin setup. But I ran out of ideas on how to solve this.
>
>
> I also noticed there were some changes with flink-parquet in Flink 1.14,
> but I had some issues with simply reading data (but I did not investigate
> so deeply for that version).
>
> Many thanks for any help.
> --
>
> [image: Grab] <https://htmlsig.com/t/000001BKA99J>
>
> [image: Twitter]  <https://htmlsig.com/t/000001BKDVDC> [image: Facebook]
> <https://htmlsig.com/t/000001BF8J9Q> [image: LinkedIn]
> <https://htmlsig.com/t/000001BKYJ3R> [image: Instagram]
> <https://htmlsig.com/t/000001BH4CH1> [image: Youtube]
> <https://htmlsig.com/t/0000001BMMNPF>
>
> Alexandre Montecucco / Grab, Software Developer
> alexandre.montecu...@grab.com <claire...@grab.com> / 8782 0937
>
> Grab
> 138 Cecil Street, Cecil Court #01-01Singapore 069538
> https://www.grab.com/ <https://www.grab.com/sg/hitch>
>
>
> By communicating with Grab Inc and/or its subsidiaries, associate
> companies and jointly controlled entities (“Grab Group”), you are deemed to
> have consented to the processing of your personal data as set out in the
> Privacy Notice which can be viewed at https://grab.com/privacy/
>
> This email contains confidential information and is only for the intended
> recipient(s). If you are not the intended recipient(s), please do not
> disseminate, distribute or copy this email Please notify Grab Group
> immediately if you have received this by mistake and delete this email from
> your system. Email transmission cannot be guaranteed to be secure or
> error-free as any information therein could be intercepted, corrupted,
> lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do
> not accept liability for any errors or omissions in the contents of this
> email arises as a result of email transmission. All intellectual property
> rights in this email and attachments therein shall remain vested in Grab
> Group, unless otherwise provided by law.
>

Reply via email to