Re: File not found exceptions on S3 while running spark jobs

Slava Rodionov Fri, 17 Jul 2020 11:57:14 -0700

Hi
those are only my thoughts, not a solution, hope they may help you.

First of all, we need a full stacktrace not just an exception to make a
conclusion.
I see you're using s3a. Where do you run your job? Is that EMR? Normally
you need to make S3 more consistent first to make it usable. This means
using some consistency laye, e.g. via emrfs on EMR or S3Guard on vanilla
Hadoop. Databricks is using DBFS for that purpose and there are some
others. I'm not sure that Delta Lake can work with S3 directly without such
a layer even though I see that they're trying to do that in their code.


Best regards,
Viacheslav Rodionov


On Fri, 17 Jul 2020, 18:03 Nagendra Darla, <dvv.nagen...@gmail.com> wrote:

> Hi,
>
> Thanks I know about FileNotFound Exception.
>
> This error is with S3 buckets which has a delay in showing newly created
> files. These files eventually shows up after some time.
>
> These errors are coming up while running a parquet table into Delta table.
>
> My question is more around avoiding this error with spark jobs which
> create / updates / deletes lots of files on S3 buckets.
>
> On Thu, Jul 16, 2020 at 10:28 PM Hulio andres <hulioand...@usa.com> wrote:
>
>>
>> https://examples.javacodegeeks.com/java-io-filenotfoundexception-how-to-solve-file-not-found-exception/
>>
>> Are you a programmer   ?
>>
>> Regards,
>>
>> Hulio
>>
>>
>>
>> > Sent: Friday, July 17, 2020 at 2:41 AM
>> > From: "Nagendra Darla" <dvv.nagen...@gmail.com>
>> > To: user@spark.apache.org
>> > Subject: File not found exceptions on S3 while running spark jobs
>> >
>> > Hello All,
>> > I am converting existing parquet table (size: 50GB) into Delta format.
>> It
>> > took around 1hr 45 mins to convert.
>> > And I see that there are lot of FileNotFoundExceptions in the logs
>> >
>> > Caused by: java.io.FileNotFoundException: No such file or directory:
>> >
>> s3a://old-data/delta-data/PL1/output/denorm_table/part-00031-183e54ef-50bc-46fc-83a3-7836baa28f86-c000.snappy.parquet
>> >
>> > *How do I fix these errors?* I am using below options in spark-submit
>> > command
>> >
>> > spark-submit --packages
>> > io.delta:delta-core_2.11:0.6.0,org.apache.hadoop:hadoop-aws:2.8.5
>> > --conf
>> spark.delta.logStore.class=org.apache.spark.sql.delta.storage.S3SingleDriverLogStore
>> > --class Pipeline1 Pipeline.jar
>> >
>> > Thank You,
>> > Nagendra Darla
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Sent from iPhone
>

Re: File not found exceptions on S3 while running spark jobs

Reply via email to