Márcio Furlani Carmona created SPARK-23308:
----------------------------------------------

             Summary: ignoreCorruptFiles should not ignore retryable IOException
                 Key: SPARK-23308
                 URL: https://issues.apache.org/jira/browse/SPARK-23308
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.1, 2.3.1
            Reporter: Márcio Furlani Carmona


When `spark.sql.files.ignoreCorruptFiles` is set it totally ignores any kind of 
RuntimeException or IOException, but some possible IOExceptions may happen even 
if the file is not corrupted.

One example is the SocketTimeoutException which can be retried to possibly 
fetch the data without meaning the data is corrupted.

 

See: 

https://github.com/apache/spark/blob/e30e2698a2193f0bbdcd4edb884710819ab6397c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L163



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to