Re: AvroParquetWriter issues writing to S3

Till Rohrmann Thu, 16 Apr 2020 00:27:25 -0700

Hi Diogo,

thanks for reporting this issue. It looks quite strange to be honest.
flink-s3-fs-hadoop-1.10.0.jar contains the DateTimeParserBucket class. So
either this class wasn't loaded when starting the application from scratch
or there could be a problem with the plugin mechanism on restarts. I'm
pulling in Arvid who worked on the plugin mechanism and might be able to
tell us more. In the meantime, could you provide us with the logs? They
might tell us a bit more what happened.


Cheers,
Till

On Wed, Apr 15, 2020 at 5:54 PM Diogo Santos <diogodssan...@gmail.com>
wrote:

> Hi guys,
>
> I'm using AvroParquetWriter to write parquet files into S3 and when I
> setup the cluster (starting fresh instances jobmanager/taskmanager etc),
> the scheduled job starts executing without problems and could write the
> files into S3 but if the job is canceled and starts again the job throws
> the exception java.lang.NoClassDefFoundError:
> org/joda/time/format/DateTimeParserBucket
>
> *Caused by: java.lang.NoClassDefFoundError:
> org/joda/time/format/DateTimeParserBucket at 
> *org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:825)
> at com.amazonaws.util.DateUtils.parseRFC822Date(DateUtils.java:196) at
> com.amazonaws.services.s3.internal.ServiceUtils.parseRfc822Date(ServiceUtils.java:88)
> at
> com.amazonaws.services.s3.internal.AbstractS3ResponseHandler.populateObjectMetadata(AbstractS3ResponseHandler.java:121)
> at
> com.amazonaws.services.s3.internal.S3MetadataResponseHandler.handle(S3MetadataResponseHandler.java:32)
> at
> com.amazonaws.services.s3.internal.S3MetadataResponseHandler.handle(S3MetadataResponseHandler.java:25)
> at
> com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:69)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1714)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse(AmazonHttpClient.java:1434)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1356)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)
> at
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)
> at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
> at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
> at
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1335)
> at
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1309)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:904)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1553)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:555) at
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:929) at
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:910) at
> org.apache.parquet.hadoop.util.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:81)
> at
> org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:246)
> at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:280)
> at
> org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:535)
> at
> ....
>
>
> Environment configuration:
> - apache flink 1.10
> - scala 2.12
> - the uber jar is in the application classloader (/lib)
> flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
> - in plugins folder exists the folder s3-fs-hadoop with the jar
> flink-s3-fs-hadoop-1.10.0.jar
>
> I can fix this issue adding the dependency joda-time to the flink lib
> folder and excluding the dependency joda-time from the hadoop-aws that is
> required by the application code.
>
> Do you know what is the root cause of this? Or if I could do another
> thing than adding the joda-time dependency on the flink lib folder?
>
> Thanks
>
> --
> cumprimentos,
> Diogo Santos
>

Re: AvroParquetWriter issues writing to S3

Reply via email to