Re: More instances = slower Spark job

Daniel Siegmann Thu, 28 Sep 2017 07:28:23 -0700

> Can you kindly explain how Spark uses parallelism for bigger (say 1GB)
> text file? Does it use InputFormat do create multiple splits and creates 1
> partition per split? Also, in case of S3 or NFS, how does the input split
> work? I understand for HDFS files are already pre-split so Spark can use
> dfs.blocksize to determine partitions. But how does it work other than HDFS?
>


S3 is similar to HDFS I think. I'm not sure off-hand how exactly it decides
to split for the local filesystem. But it does. Maybe someone else will be
able to explain the details.

Re: More instances = slower Spark job

Reply via email to