Hi Mayur, sorry I did not follow up on this, were you able to fix the issue with the AWS SDK upgrade? -Jack Ye
On Thu, Sep 23, 2021 at 1:13 PM Mayur Srivastava < mayur.srivast...@twosigma.com> wrote: > I’ll try to upgrade the version and retry. > > > > Thanks, > > Mayur > > > > *From:* Jack Ye <yezhao...@gmail.com> > *Sent:* Thursday, September 23, 2021 2:35 PM > *To:* Iceberg Dev List <dev@iceberg.apache.org> > *Subject:* Re: Error when writing large number of rows with S3FileIO > > > > Thanks, while I am looking into this, this seems to be a very old version, > is there any reason to use that version specifically? Have you tried a > newer version? I know there have been quite a few updates to the S3 package > related to uploading since then, maybe upgrading can solve the problem. > > > > -Jack > > > > On Thu, Sep 23, 2021 at 11:02 AM Mayur Srivastava < > mayur.srivast...@twosigma.com> wrote: > > No problem Jack. > > > > I’m using > https://mvnrepository.com/artifact/software.amazon.awssdk/s3/2.10.53 > > > > Thanks, > > Mayur > > > > *From:* Jack Ye <yezhao...@gmail.com> > *Sent:* Thursday, September 23, 2021 1:24 PM > *To:* Iceberg Dev List <dev@iceberg.apache.org> > *Subject:* Re: Error when writing large number of rows with S3FileIO > > > > Hi Mayur, > > > > Thanks for reporting this issue, could you report what version of AWS SDK > V2 you are using? > > > > Best, > > Jack Ye > > > > On Thu, Sep 23, 2021 at 8:39 AM Mayur Srivastava < > mayur.srivast...@twosigma.com> wrote: > > Hi, > > > > I've an Iceberg table partitioned by a single "time" (monthly partitioned) > column that has 400+ columns and >100k rows. I'm using parquet files and > PartitionedWriter<Record> + S3FileIO to write the data. When I write <~50k > rows, the writer works. But it fails with the exception below if I write > more than ~50k rows. The writer, however, works for the full >100k rows if > I use HadoopFileIO. > > > > Has anyone seen this error before and know a way to fix this? > > > > The writer code is as follows: > > AppendFiles append = table.newAppend(); > > > > for (GenericRecord record : records) { > > writer.write(record); > > } > > > > Arrays.stream(writer.complete().dataFiles()).forEach(append::appendFile); > > append.commit(); > > > > Thanks, > > Mayur > > > > software.amazon.awssdk.services.s3.model.S3Exception: The specified media > type is unsupported. Content type binary/octet-stream is not legal. > (Service: S3, Status Code: 415, Request ID: xxxxxx) > > at > software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:158) > > at > software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) > > at > software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:86) > > at > software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:44) > > at > software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:94) > > at > software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$4(BaseClientHandler.java:215) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) > > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.doExecute(RetryableStage.java:114) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.execute(RetryableStage.java:87) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:63) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:43) > > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > > at > software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:57) > > at > software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:37) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:81) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:61) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43) > > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > > at > software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) > > at > software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) > > at > software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:198) > > at > software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:122) > > at > software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:148) > > at > software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:102) > > at > software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) > > at > software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55) > > at > software.amazon.awssdk.services.s3.DefaultS3Client.createMultipartUpload(DefaultS3Client.java:1410) > > at > org.apache.iceberg.aws.s3.S3OutputStream.initializeMultiPartUpload(S3OutputStream.java:209) > > at > org.apache.iceberg.aws.s3.S3OutputStream.write(S3OutputStream.java:168) > > at java.io.OutputStream.write(OutputStream.java:122) > > at > org.apache.parquet.io.DelegatingPositionOutputStream.write(DelegatingPositionOutputStream.java:56) > > at > org.apache.parquet.bytes.ConcatenatingByteArrayCollector.writeAllTo(ConcatenatingByteArrayCollector.java:46) > > at > org.apache.parquet.hadoop.ParquetFileWriter.writeColumnChunk(ParquetFileWriter.java:620) > > at > org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:241) > > at > org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:319) > > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:566) > > at > org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:65) > > at > org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:77) > > at > org.apache.iceberg.common.DynMethods$BoundMethod.invoke(DynMethods.java:180) > > at > org.apache.iceberg.parquet.ParquetWriter.flushRowGroup(ParquetWriter.java:176) > > at > org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:211) > > at org.apache.iceberg.io.DataWriter.close(DataWriter.java:71) > > at > org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.closeCurrent(BaseTaskWriter.java:282) > > at > org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.close(BaseTaskWriter.java:298) > > at > org.apache.iceberg.io.PartitionedWriter.close(PartitionedWriter.java:82) > > at > org.apache.iceberg.io.BaseTaskWriter.complete(BaseTaskWriter.java:83) > >