Thank you for the reply tosaurabh85. We do tune and adjust our shuffle
partition count, but that was not influencing the reading of parquets (the
data is not shuffled as it is read as I understand it).
I apologize that I actually received an answer, but it was not caught on the
mailing list here.
When reading a parquet ~file with >50 parts, Spark is giving me a DataFrame
object with far fewer in-memory partitions.
I'm happy to troubleshoot this further, but I don't know Scala well and
could use some help pointing me in the right direction. Where should I be
looking in the code base to und