I’ll try to upgrade the version and retry.

Thanks,
Mayur

From: Jack Ye <yezhao...@gmail.com>
Sent: Thursday, September 23, 2021 2:35 PM
To: Iceberg Dev List <dev@iceberg.apache.org>
Subject: Re: Error when writing large number of rows with S3FileIO

Thanks, while I am looking into this, this seems to be a very old version, is 
there any reason to use that version specifically? Have you tried a newer 
version? I know there have been quite a few updates to the S3 package related 
to uploading since then, maybe upgrading can solve the problem.

-Jack

On Thu, Sep 23, 2021 at 11:02 AM Mayur Srivastava 
<mayur.srivast...@twosigma.com<mailto:mayur.srivast...@twosigma.com>> wrote:
No problem Jack.

I’m using https://mvnrepository.com/artifact/software.amazon.awssdk/s3/2.10.53

Thanks,
Mayur

From: Jack Ye <yezhao...@gmail.com<mailto:yezhao...@gmail.com>>
Sent: Thursday, September 23, 2021 1:24 PM
To: Iceberg Dev List <dev@iceberg.apache.org<mailto:dev@iceberg.apache.org>>
Subject: Re: Error when writing large number of rows with S3FileIO

Hi Mayur,

Thanks for reporting this issue, could you report what version of AWS SDK V2 
you are using?

Best,
Jack Ye

On Thu, Sep 23, 2021 at 8:39 AM Mayur Srivastava 
<mayur.srivast...@twosigma.com<mailto:mayur.srivast...@twosigma.com>> wrote:
Hi,

I've an Iceberg table partitioned by a single "time" (monthly partitioned) 
column that has 400+ columns and >100k rows. I'm using parquet files and 
PartitionedWriter<Record> + S3FileIO to write the data. When I write <~50k 
rows, the writer works. But it fails with the exception below if I write more 
than ~50k rows. The writer, however, works for the full >100k rows if I use 
HadoopFileIO.

Has anyone seen this error before and know a way to fix this?

The writer code is as follows:
AppendFiles append = table.newAppend();

for (GenericRecord record : records) {
    writer.write(record);
}

Arrays.stream(writer.complete().dataFiles()).forEach(append::appendFile);
append.commit();

Thanks,
Mayur

software.amazon.awssdk.services.s3.model.S3Exception: The specified media type 
is unsupported. Content type binary/octet-stream is not legal. (Service: S3, 
Status Code: 415, Request ID: xxxxxx)
              at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:158)
              at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
              at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:86)
              at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:44)
              at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:94)
              at 
software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$4(BaseClientHandler.java:215)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
              at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.doExecute(RetryableStage.java:114)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.execute(RetryableStage.java:87)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:63)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:43)
              at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
              at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:57)
              at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:37)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:81)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:61)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
              at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
              at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
              at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
              at 
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:198)
              at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:122)
              at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:148)
              at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:102)
              at 
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
              at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
              at 
software.amazon.awssdk.services.s3.DefaultS3Client.createMultipartUpload(DefaultS3Client.java:1410)
              at 
org.apache.iceberg.aws.s3.S3OutputStream.initializeMultiPartUpload(S3OutputStream.java:209)
              at 
org.apache.iceberg.aws.s3.S3OutputStream.write(S3OutputStream.java:168)
              at java.io.OutputStream.write(OutputStream.java:122)
              at 
org.apache.parquet.io.DelegatingPositionOutputStream.write(DelegatingPositionOutputStream.java:56)
              at 
org.apache.parquet.bytes.ConcatenatingByteArrayCollector.writeAllTo(ConcatenatingByteArrayCollector.java:46)
              at 
org.apache.parquet.hadoop.ParquetFileWriter.writeColumnChunk(ParquetFileWriter.java:620)
              at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:241)
              at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:319)
              at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
              at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:566)
              at 
org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:65)
              at 
org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:77)
              at 
org.apache.iceberg.common.DynMethods$BoundMethod.invoke(DynMethods.java:180)
              at 
org.apache.iceberg.parquet.ParquetWriter.flushRowGroup(ParquetWriter.java:176)
              at 
org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:211)
              at org.apache.iceberg.io.DataWriter.close(DataWriter.java:71)
              at 
org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.closeCurrent(BaseTaskWriter.java:282)
              at 
org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.close(BaseTaskWriter.java:298)
              at 
org.apache.iceberg.io.PartitionedWriter.close(PartitionedWriter.java:82)
              at 
org.apache.iceberg.io.BaseTaskWriter.complete(BaseTaskWriter.java:83)

Reply via email to