[ https://issues.apache.org/jira/browse/SPARK-51199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927475#comment-17927475 ]
Snehal Bhatnagar commented on SPARK-51199: ------------------------------------------ Hi [~andreasfranz], I would like to start contributing here, is there any way I might help with this? > Valid CSV records considered malformed > -------------------------------------- > > Key: SPARK-51199 > URL: https://issues.apache.org/jira/browse/SPARK-51199 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.5.4 > Environment: SparkContext: Running Spark version 3.5.4 > SparkContext: OS info Mac OS X, 15.3, aarch64 > SparkContext: Java version 17.0.14 2025-01-21 LTS > OpenJDK Runtime Environment Corretto-17.0.14.7.1 (build 17.0.14+7-LTS) > OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (build 17.0.14+7-LTS, mixed > mode, sharing) > Reporter: Andreas Franz > Priority: Major > > There is an issue parsing CSV files with a combination of escaped double > quotes and commas in a field. > I've created a small example that demonstrates the issue: > {code:java} > package com.example > import org.apache.spark.sql.SparkSession > object Example { > def main(args: Array[String]): Unit = { > val spark = SparkSession.builder() > .appName("CSV Example") > .master("local[*]") > .config("spark.driver.host", "localhost") > .config("spark.ui.enabled", "false") > .getOrCreate() > val csv = spark > .read > .option("header", "true") > .option("mode", "FAILFAST") > .csv("./src/main/scala/com/example/example.csv") > csv.show(2, truncate = false) > spark.stop() > } > } {code} > {code:java} > id,region_name,gp_id,gp_name,gp_group_id,gp_group_name,gp_group_region_name > 111234567,east,1122723,"Test 1",,, 001234567,east,1122723,"Foo ""Bar"", New > York, US",,, > {code} > According to [https://www.ietf.org/rfc/rfc4180.txt|http://example.com/] this > is a valid CSV record. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org