cutionException e) {
>
> logger.error("", e);
>
> }
>
> }
>
> }
>
>
> static class SaveData {
>
> private DataFrame df;
>
> private String path;
>
>
> SaveData(DataFrame df, String path) {
>
> this.df = df;
>
> this.path
te().json(data.path);
}
}
}
}
From: Pedro Rodriguez
Date: Wednesday, July 27, 2016 at 8:40 PM
To: Andrew Davidson
Cc: "user @spark"
Subject: Re: performance problem when reading lots of small files created
by spark streaming.
> There are a few blog posts that detail one p
There are a few blog posts that detail one possible/likely issue for
example:
http://tech.kinja.com/how-not-to-pull-from-s3-using-apache-spark-1704509219
TLDR: The hadoop libraries spark uses assumes that its input comes from a
file system (works with HDFS) however S3 is a key value store, not a