from:"shea.parkes"

Re: How does Spark determine in-memory partition count when reading Parquet ~files?

2016-10-22 Thread shea.parkes

Thank you for the reply tosaurabh85. We do tune and adjust our shuffle partition count, but that was not influencing the reading of parquets (the data is not shuffled as it is read as I understand it). I apologize that I actually received an answer, but it was not caught on the mailing list here.

How does Spark determine in-memory partition count when reading Parquet ~files?

2016-10-18 Thread shea.parkes

When reading a parquet ~file with >50 parts, Spark is giving me a DataFrame object with far fewer in-memory partitions. I'm happy to troubleshoot this further, but I don't know Scala well and could use some help pointing me in the right direction. Where should I be looking in the code base to und