We have an input file that is tarred and compressed to 12gb. It is about
50gb uncompressed.

With readTextFile(), I see it uncompress the file but then flink doesn't
seem to handle the untar portion. It's just a single file. (We don't
control the input format)

foo.tar.gz 12gb
foo.tar  50gb
then untar it and it is valid jsonl

When reading, we get this exception:

Caused by:
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParseException:
Unrecognized token 'playstore': was expecting (JSON String, Number, Array,
Object or token 'null', 'true' or 'false')
 at [Source: UNKNOWN; line: 1, column: 10]
at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1840)

The process is seeing the header in the tar format and rightly complaining
about the JSON format.

Is it possible to untar this file using Flink?

-- 
Wayne D. Young
aka Billy Bob Bain
billybobb...@gmail.com

Reply via email to