Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

Steve Loughran Tue, 07 Aug 2018 10:26:19 -0700

CVS with schema inference is a full read of the data, so that could be one of 
the problems. Do it at most once, print out the schema and use it from then on 
during ingress & use something else for persistence


On 6 Aug 2018, at 05:44, makatun 
<d.i.maka...@gmail.com<mailto:d.i.maka...@gmail.com>> wrote:

         a. csv and parquet formats (parquet created from the same csv):
.format(<csv/parquet>)
         b. schema-on-read on/off:  .option(inferSchema=<true/false>)

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

Reply via email to