> Can you kindly explain how Spark uses parallelism for bigger (say 1GB) > text file? Does it use InputFormat do create multiple splits and creates 1 > partition per split? Also, in case of S3 or NFS, how does the input split > work? I understand for HDFS files are already pre-split so Spark can use > dfs.blocksize to determine partitions. But how does it work other than HDFS? >
S3 is similar to HDFS I think. I'm not sure off-hand how exactly it decides to split for the local filesystem. But it does. Maybe someone else will be able to explain the details.