in spark 1.6.1 we used: sqlContext.read .format("com.databricks.spark.csv") .delimiter("~") .option("quote", null)
this effectively turned off quoting, which is a necessity for certain data formats where quoting is not supported and "\"" is a valid character itself in the data. in spark 2.0.0-SNAPSHOT we did same thing: sqlContext.read .format("csv") .delimiter("~") .option("quote", null) but this did not work, we got weird blowups where spark was trying to parse thousands of lines as if it is one record. the reason was that a (valid) quote character ("\"") was present in the data. for example a~b"c~d as it turns out setting quote to null does not turn of quoting anymore. instead it means to use the default quote character. does anyone know how to turn off quoting now? our current workaround is: sqlContext.read .format("csv") .delimiter("~") .option("quote", "☃") (we assume there are no unicode snowman's in our data...)