[jira] [Created] (SPARK-51199) Valid CSV records considered malformed

Andreas Franz (Jira) Wed, 12 Feb 2025 23:08:06 -0800

Andreas Franz created SPARK-51199:
-------------------------------------

             Summary: Valid CSV records considered malformed
                 Key: SPARK-51199
                 URL: https://issues.apache.org/jira/browse/SPARK-51199
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.5.4
         Environment: SparkContext: Running Spark version 3.5.4
SparkContext: OS info Mac OS X, 15.3, aarch64
SparkContext: Java version 17.0.14 2025-01-21 LTS
OpenJDK Runtime Environment Corretto-17.0.14.7.1 (build 17.0.14+7-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (build 17.0.14+7-LTS, mixed mode, 
sharing)
            Reporter: Andreas Franz



There is an issue parsing CSV files with a combination of escaped double quotes 
and commas in a field.

I've created a small example that demonstrates the issue:
{code:java}
package com.example

import org.apache.spark.sql.SparkSession

object Example {

    def main(args: Array[String]): Unit = {

        val spark = SparkSession.builder()
            .appName("CSV Example")
            .master("local[*]")
            .config("spark.driver.host", "localhost")
            .config("spark.ui.enabled", "false")
            .getOrCreate()

        val csv = spark
            .read
            .option("header", "true")
            .option("mode", "FAILFAST")
            .csv("./src/main/scala/com/example/example.csv")

        csv.show(2, truncate = false)

        spark.stop()
    }
} {code}
{code:java}
id,region_name,gp_id,gp_name,gp_group_id,gp_group_name,gp_group_region_name 
111234567,east,1122723,"Test 1",,, 001234567,east,1122723,"Foo ""Bar"", New 
York, US",,,
{code}
According to [https://www.ietf.org/rfc/rfc4180.txt|http://example.com/] this is 
a valid CSV record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51199) Valid CSV records considered malformed

Reply via email to