Re: Spark <--> S3 flakiness

lucas.g...@gmail.com Thu, 11 May 2017 14:43:06 -0700

Interesting, the links here: http://spark.apache.org/community.html
point to: http://apache-spark-user-list.1001560.n3.nabble.com/




On 11 May 2017 at 12:35, Vadim Semenov <vadim.seme...@datadoghq.com> wrote:

> Use the official mailing list archive
>
> http://mail-archives.apache.org/mod_mbox/spark-user/201705.mbox/%
> 3ccajyeq0gh1fbhbajb9gghognhqouogydba28lnn262hfzzgf...@mail.gmail.com%3e
>
> On Thu, May 11, 2017 at 2:50 PM, lucas.g...@gmail.com <
> lucas.g...@gmail.com> wrote:
>
>> Also, and this is unrelated to the actual question... Why don't these
>> messages show up in the archive?
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> Ideally I'd want to post a link to our internal wiki for these questions,
>> but can't find them in the archive.
>>
>> On 11 May 2017 at 07:16, lucas.g...@gmail.com <lucas.g...@gmail.com>
>> wrote:
>>
>>> Looks like this isn't viable in spark 2.0.0 (and greater I presume).
>>> I'm pretty sure I came across this blog and ignored it due to that.
>>>
>>> Any other thoughts?  The linked tickets in: https://issues.apache.org/
>>> jira/browse/SPARK-10063 https://issues.apache.org/jira/brows
>>> e/HADOOP-13786 https://issues.apache.org/jira/browse/HADOOP-9565 look
>>> relevant too.
>>>
>>> On 10 May 2017 at 22:24, Miguel Morales <therevolti...@gmail.com> wrote:
>>>
>>>> Try using the DirectParquetOutputCommiter:
>>>> http://dev.sortable.com/spark-directparquetoutputcommitter/
>>>>
>>>> On Wed, May 10, 2017 at 10:07 PM, lucas.g...@gmail.com
>>>> <lucas.g...@gmail.com> wrote:
>>>> > Hi users, we have a bunch of pyspark jobs that are using S3 for
>>>> loading /
>>>> > intermediate steps and final output of parquet files.
>>>> >
>>>> > We're running into the following issues on a semi regular basis:
>>>> > * These are intermittent errors, IE we have about 300 jobs that run
>>>> > nightly... And a fairly random but small-ish percentage of them fail
>>>> with
>>>> > the following classes of errors.
>>>> >
>>>> > S3 write errors
>>>> >
>>>> >> "ERROR Utils: Aborting task
>>>> >> com.amazonaws.services.s3.model.AmazonS3Exception: Status Code:
>>>> 404, AWS
>>>> >> Service: Amazon S3, AWS Request ID: 2D3RP, AWS Error Code: null, AWS
>>>> Error
>>>> >> Message: Not Found, S3 Extended Request ID: BlaBlahEtc="
>>>> >
>>>> >
>>>> >>
>>>> >> "Py4JJavaError: An error occurred while calling o43.parquet.
>>>> >> : com.amazonaws.services.s3.model.MultiObjectDeleteException:
>>>> Status Code:
>>>> >> 0, AWS Service: null, AWS Request ID: null, AWS Error Code: null,
>>>> AWS Error
>>>> >> Message: One or more objects could not be deleted, S3 Extended
>>>> Request ID:
>>>> >> null"
>>>> >
>>>> >
>>>> >
>>>> > S3 Read Errors:
>>>> >
>>>> >> [Stage 1:=================================================>
>>>>  (27 + 4)
>>>> >> / 31]17/05/10 16:25:23 ERROR Executor: Exception in task 10.0 in
>>>> stage 1.0
>>>> >> (TID 11)
>>>> >> java.net.SocketException: Connection reset
>>>> >> at java.net.SocketInputStream.read(SocketInputStream.java:196)
>>>> >> at java.net.SocketInputStream.read(SocketInputStream.java:122)
>>>> >> at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
>>>> >> at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
>>>> >> at sun.security.ssl.InputRecord.read(InputRecord.java:509)
>>>> >> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927)
>>>> >> at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.
>>>> java:884)
>>>> >> at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
>>>> >> at
>>>> >> org.apache.http.impl.io.AbstractSessionInputBuffer.read(Abst
>>>> ractSessionInputBuffer.java:198)
>>>> >> at
>>>> >> org.apache.http.impl.io.ContentLengthInputStream.read(Conten
>>>> tLengthInputStream.java:178)
>>>> >> at
>>>> >> org.apache.http.impl.io.ContentLengthInputStream.read(Conten
>>>> tLengthInputStream.java:200)
>>>> >> at
>>>> >> org.apache.http.impl.io.ContentLengthInputStream.close(Conte
>>>> ntLengthInputStream.java:103)
>>>> >> at
>>>> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
>>>> nagedEntity.java:168)
>>>> >> at
>>>> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
>>>> orInputStream.java:228)
>>>> >> at
>>>> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
>>>> utStream.java:174)
>>>> >> at java.io.FilterInputStream.close(FilterInputStream.java:181)
>>>> >> at java.io.FilterInputStream.close(FilterInputStream.java:181)
>>>> >> at java.io.FilterInputStream.close(FilterInputStream.java:181)
>>>> >> at java.io.FilterInputStream.close(FilterInputStream.java:181)
>>>> >> at com.amazonaws.services.s3.model.S3Object.close(S3Object.java:203)
>>>> >> at org.apache.hadoop.fs.s3a.S3AInputStream.close(S3AInputStream
>>>> .java:187)
>>>> >
>>>> >
>>>> >
>>>> > We have literally tons of logs we can add but it would make the email
>>>> > unwieldy big.  If it would be helpful I'll drop them in a pastebin or
>>>> > something.
>>>> >
>>>> > Our config is along the lines of:
>>>> >
>>>> > spark-2.1.0-bin-hadoop2.7
>>>> > '--packages
>>>> > com.amazonaws:aws-java-sdk:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0
>>>> > pyspark-shell'
>>>> >
>>>> > Given the stack overflow / googling I've been doing I know we're not
>>>> the
>>>> > only org with these issues but I haven't found a good set of
>>>> solutions in
>>>> > those spaces yet.
>>>> >
>>>> > Thanks!
>>>> >
>>>> > Gary Lucas
>>>>
>>>
>>>
>>
>

Re: Spark <--> S3 flakiness

Reply via email to