This is unfortunately due to the way we set handle default values in Python. I agree it doesn't follow the principle of least astonishment.
Maybe the best thing to do here is to put the actual default values in the Python API for csv (and json, parquet, etc), rather than using None in Python. This would require us to duplicate default values twice (once in data source options, and another in the Python API), but that's probably OK given they shouldn't change all the time. Ticket https://issues.apache.org/jira/browse/SPARK-15585 On Thu, May 26, 2016 at 3:35 PM, Koert Kuipers <ko...@tresata.com> wrote: > in spark 1.6.1 we used: > sqlContext.read > .format("com.databricks.spark.csv") > .delimiter("~") > .option("quote", null) > > this effectively turned off quoting, which is a necessity for certain data > formats where quoting is not supported and "\"" is a valid character itself > in the data. > > in spark 2.0.0-SNAPSHOT we did same thing: > sqlContext.read > .format("csv") > .delimiter("~") > .option("quote", null) > > but this did not work, we got weird blowups where spark was trying to > parse thousands of lines as if it is one record. the reason was that a > (valid) quote character ("\"") was present in the data. for example > a~b"c~d > > as it turns out setting quote to null does not turn of quoting anymore. > instead it means to use the default quote character. > > does anyone know how to turn off quoting now? > > our current workaround is: > sqlContext.read > .format("csv") > .delimiter("~") > .option("quote", "☃") > > (we assume there are no unicode snowman's in our data...) > > >