CVS with schema inference is a full read of the data, so that could be one of the problems. Do it at most once, print out the schema and use it from then on during ingress & use something else for persistence
On 6 Aug 2018, at 05:44, makatun <d.i.maka...@gmail.com<mailto:d.i.maka...@gmail.com>> wrote: a. csv and parquet formats (parquet created from the same csv): .format(<csv/parquet>) b. schema-on-read on/off: .option(inferSchema=<true/false>)