Andreas Franz created SPARK-51199: ------------------------------------- Summary: Valid CSV records considered malformed Key: SPARK-51199 URL: https://issues.apache.org/jira/browse/SPARK-51199 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.4 Environment: SparkContext: Running Spark version 3.5.4 SparkContext: OS info Mac OS X, 15.3, aarch64 SparkContext: Java version 17.0.14 2025-01-21 LTS OpenJDK Runtime Environment Corretto-17.0.14.7.1 (build 17.0.14+7-LTS) OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (build 17.0.14+7-LTS, mixed mode, sharing) Reporter: Andreas Franz
There is an issue parsing CSV files with a combination of escaped double quotes and commas in a field. I've created a small example that demonstrates the issue: {code:java} package com.example import org.apache.spark.sql.SparkSession object Example { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("CSV Example") .master("local[*]") .config("spark.driver.host", "localhost") .config("spark.ui.enabled", "false") .getOrCreate() val csv = spark .read .option("header", "true") .option("mode", "FAILFAST") .csv("./src/main/scala/com/example/example.csv") csv.show(2, truncate = false) spark.stop() } } {code} {code:java} id,region_name,gp_id,gp_name,gp_group_id,gp_group_name,gp_group_region_name 111234567,east,1122723,"Test 1",,, 001234567,east,1122723,"Foo ""Bar"", New York, US",,, {code} According to [https://www.ietf.org/rfc/rfc4180.txt|http://example.com/] this is a valid CSV record. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org