Hello everyone,
I am struggling to read S3 parquet files from S3 with Flink Streaming 1.12.2
I had some difficulty simply reading from local parquet files. I finally
managed that part, though the solution feels dirty:
- I use the readFile function + ParquetInputFormat abstract class (that is
protected) (as I could not find a way to use the public
ParquetRowInputFormat).
- the open function, in ParquetInputFormat is
using org.apache.hadoop.conf.Configuration. I am not sure which import to
add. It seems the flink-parquet library is importing the dependency from
hadoop-common but the dep is marked as provided. THe doc only shows usage
of flink-parquet from Flink SQL. So I am under the impression that this
might not work in the streaming case without extra code. I 'solved' this by
adding a dependency to hadoop-common. We did something similar to write
parquet data to S3.

Now, when trying to run the application to read from S3, I get an exception
with root cause:
```
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No
FileSystem for scheme "s3"
```
I guess there are some issues with hadoop-common not knowing about the
flink-s3-hadoop plugin setup. But I ran out of ideas on how to solve this.


I also noticed there were some changes with flink-parquet in Flink 1.14,
but I had some issues with simply reading data (but I did not investigate
so deeply for that version).

Many thanks for any help.
--

[image: Grab] <https://htmlsig.com/t/000001BKA99J>

[image: Twitter]  <https://htmlsig.com/t/000001BKDVDC> [image: Facebook]
<https://htmlsig.com/t/000001BF8J9Q> [image: LinkedIn]
<https://htmlsig.com/t/000001BKYJ3R> [image: Instagram]
<https://htmlsig.com/t/000001BH4CH1> [image: Youtube]
<https://htmlsig.com/t/0000001BMMNPF>

Alexandre Montecucco / Grab, Software Developer
alexandre.montecu...@grab.com <claire...@grab.com> / 8782 0937

Grab
138 Cecil Street, Cecil Court #01-01Singapore 069538
https://www.grab.com/ <https://www.grab.com/sg/hitch>

-- 


By communicating with Grab Inc and/or its subsidiaries, associate 
companies and jointly controlled entities (“Grab Group”), you are deemed to 
have consented to the processing of your personal data as set out in the 
Privacy Notice which can be viewed at https://grab.com/privacy/ 
<https://grab.com/privacy/>


This email contains confidential information 
and is only for the intended recipient(s). If you are not the intended 
recipient(s), please do not disseminate, distribute or copy this email 
Please notify Grab Group immediately if you have received this by mistake 
and delete this email from your system. Email transmission cannot be 
guaranteed to be secure or error-free as any information therein could be 
intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain 
viruses. Grab Group do not accept liability for any errors or omissions in 
the contents of this email arises as a result of email transmission. All 
intellectual property rights in this email and attachments therein shall 
remain vested in Grab Group, unless otherwise provided by law.

Reply via email to