We have an input file that is tarred and compressed to 12gb. It is about
50gb uncompressed.

With readTextFile(), I see it uncompress the file but then flink doesn't
seem to handle the untar portion. It's just a single file. (We don't
control the input format)

foo.tar.gz 12gb
foo.tar  50gb
then untar it and it is valid jsonl

When reading, we get this exception:

Caused by:
Unrecognized token 'playstore': was expecting (JSON String, Number, Array,
Object or token 'null', 'true' or 'false')
 at [Source: UNKNOWN; line: 1, column: 10]

The process is seeing the header in the tar format and rightly complaining
about the JSON format.

Is it possible to untar this file using Flink?

Wayne D. Young
aka Billy Bob Bain

Reply via email to