You need to use wholetextfiles to read the whole file at once. Otherwise,
It can be split.

DB Tsai - Sent From My Phone
On Mar 17, 2016 12:45 AM, "Blaž Šnuderl" <snud...@gmail.com> wrote:

> Hi.
>
> We have json data stored in S3 (json record per line). When reading the
> data from s3 using the following code we started noticing json decode
> errors.
>
> sc.textFile(paths).map(json.loads)
>
>
> After a bit more investigation we noticed an incomplete line, basically
> the line was
>
>> {"key": "value", "key2":  <- notice the line abruptly ends with no json
>> close tag etc
>
>
> It is not an issue with our data and it doesn't happen very often, but it
> makes us very scared since it means spark could be dropping data.
>
> We are using spark 1.5.1. Any ideas why this happens and possible fixes?
>
> Regards,
> Blaž Šnuderl
>

Reply via email to