I’ll try to upgrade the version and retry. Thanks, Mayur
From: Jack Ye <yezhao...@gmail.com> Sent: Thursday, September 23, 2021 2:35 PM To: Iceberg Dev List <dev@iceberg.apache.org> Subject: Re: Error when writing large number of rows with S3FileIO Thanks, while I am looking into this, this seems to be a very old version, is there any reason to use that version specifically? Have you tried a newer version? I know there have been quite a few updates to the S3 package related to uploading since then, maybe upgrading can solve the problem. -Jack On Thu, Sep 23, 2021 at 11:02 AM Mayur Srivastava <mayur.srivast...@twosigma.com<mailto:mayur.srivast...@twosigma.com>> wrote: No problem Jack. I’m using https://mvnrepository.com/artifact/software.amazon.awssdk/s3/2.10.53 Thanks, Mayur From: Jack Ye <yezhao...@gmail.com<mailto:yezhao...@gmail.com>> Sent: Thursday, September 23, 2021 1:24 PM To: Iceberg Dev List <dev@iceberg.apache.org<mailto:dev@iceberg.apache.org>> Subject: Re: Error when writing large number of rows with S3FileIO Hi Mayur, Thanks for reporting this issue, could you report what version of AWS SDK V2 you are using? Best, Jack Ye On Thu, Sep 23, 2021 at 8:39 AM Mayur Srivastava <mayur.srivast...@twosigma.com<mailto:mayur.srivast...@twosigma.com>> wrote: Hi, I've an Iceberg table partitioned by a single "time" (monthly partitioned) column that has 400+ columns and >100k rows. I'm using parquet files and PartitionedWriter<Record> + S3FileIO to write the data. When I write <~50k rows, the writer works. But it fails with the exception below if I write more than ~50k rows. The writer, however, works for the full >100k rows if I use HadoopFileIO. Has anyone seen this error before and know a way to fix this? The writer code is as follows: AppendFiles append = table.newAppend(); for (GenericRecord record : records) { writer.write(record); } Arrays.stream(writer.complete().dataFiles()).forEach(append::appendFile); append.commit(); Thanks, Mayur software.amazon.awssdk.services.s3.model.S3Exception: The specified media type is unsupported. Content type binary/octet-stream is not legal. (Service: S3, Status Code: 415, Request ID: xxxxxx) at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:158) at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:86) at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:44) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:94) at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$4(BaseClientHandler.java:215) at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.doExecute(RetryableStage.java:114) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.execute(RetryableStage.java:87) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:63) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:43) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:57) at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:37) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:81) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:61) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:198) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:122) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:148) at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:102) at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55) at software.amazon.awssdk.services.s3.DefaultS3Client.createMultipartUpload(DefaultS3Client.java:1410) at org.apache.iceberg.aws.s3.S3OutputStream.initializeMultiPartUpload(S3OutputStream.java:209) at org.apache.iceberg.aws.s3.S3OutputStream.write(S3OutputStream.java:168) at java.io.OutputStream.write(OutputStream.java:122) at org.apache.parquet.io.DelegatingPositionOutputStream.write(DelegatingPositionOutputStream.java:56) at org.apache.parquet.bytes.ConcatenatingByteArrayCollector.writeAllTo(ConcatenatingByteArrayCollector.java:46) at org.apache.parquet.hadoop.ParquetFileWriter.writeColumnChunk(ParquetFileWriter.java:620) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:241) at org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:319) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:566) at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:65) at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:77) at org.apache.iceberg.common.DynMethods$BoundMethod.invoke(DynMethods.java:180) at org.apache.iceberg.parquet.ParquetWriter.flushRowGroup(ParquetWriter.java:176) at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:211) at org.apache.iceberg.io.DataWriter.close(DataWriter.java:71) at org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.closeCurrent(BaseTaskWriter.java:282) at org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.close(BaseTaskWriter.java:298) at org.apache.iceberg.io.PartitionedWriter.close(PartitionedWriter.java:82) at org.apache.iceberg.io.BaseTaskWriter.complete(BaseTaskWriter.java:83)