Hi Mayur,

Thanks for reporting this issue, could you report what version of AWS SDK
V2 you are using?

Best,
Jack Ye

On Thu, Sep 23, 2021 at 8:39 AM Mayur Srivastava <
mayur.srivast...@twosigma.com> wrote:

> Hi,
>
>
>
> I've an Iceberg table partitioned by a single "time" (monthly partitioned)
> column that has 400+ columns and >100k rows. I'm using parquet files and
> PartitionedWriter<Record> + S3FileIO to write the data. When I write <~50k
> rows, the writer works. But it fails with the exception below if I write
> more than ~50k rows. The writer, however, works for the full >100k rows if
> I use HadoopFileIO.
>
>
>
> Has anyone seen this error before and know a way to fix this?
>
>
>
> The writer code is as follows:
>
> AppendFiles append = table.newAppend();
>
>
>
> for (GenericRecord record : records) {
>
>     writer.write(record);
>
> }
>
>
>
> Arrays.stream(writer.complete().dataFiles()).forEach(append::appendFile);
>
> append.commit();
>
>
>
> Thanks,
>
> Mayur
>
>
>
> software.amazon.awssdk.services.s3.model.S3Exception: The specified media
> type is unsupported. Content type binary/octet-stream is not legal.
> (Service: S3, Status Code: 415, Request ID: xxxxxx)
>
>               at
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:158)
>
>               at
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
>
>               at
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:86)
>
>               at
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:44)
>
>               at
> software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:94)
>
>               at
> software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$4(BaseClientHandler.java:215)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.doExecute(RetryableStage.java:114)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage$RetryExecutor.execute(RetryableStage.java:87)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:63)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:43)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
>
>               at
> software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:57)
>
>               at
> software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:37)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:81)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:61)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
>
>               at
> software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
>
>               at
> software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:198)
>
>               at
> software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:122)
>
>               at
> software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:148)
>
>               at
> software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:102)
>
>               at
> software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
>
>               at
> software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
>
>               at
> software.amazon.awssdk.services.s3.DefaultS3Client.createMultipartUpload(DefaultS3Client.java:1410)
>
>               at
> org.apache.iceberg.aws.s3.S3OutputStream.initializeMultiPartUpload(S3OutputStream.java:209)
>
>               at
> org.apache.iceberg.aws.s3.S3OutputStream.write(S3OutputStream.java:168)
>
>               at java.io.OutputStream.write(OutputStream.java:122)
>
>               at
> org.apache.parquet.io.DelegatingPositionOutputStream.write(DelegatingPositionOutputStream.java:56)
>
>               at
> org.apache.parquet.bytes.ConcatenatingByteArrayCollector.writeAllTo(ConcatenatingByteArrayCollector.java:46)
>
>               at
> org.apache.parquet.hadoop.ParquetFileWriter.writeColumnChunk(ParquetFileWriter.java:620)
>
>               at
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:241)
>
>               at
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:319)
>
>               at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>               at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>               at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>               at java.lang.reflect.Method.invoke(Method.java:566)
>
>               at
> org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:65)
>
>               at
> org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:77)
>
>               at
> org.apache.iceberg.common.DynMethods$BoundMethod.invoke(DynMethods.java:180)
>
>               at
> org.apache.iceberg.parquet.ParquetWriter.flushRowGroup(ParquetWriter.java:176)
>
>               at
> org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:211)
>
>               at org.apache.iceberg.io.DataWriter.close(DataWriter.java:71)
>
>               at
> org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.closeCurrent(BaseTaskWriter.java:282)
>
>               at
> org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.close(BaseTaskWriter.java:298)
>
>               at
> org.apache.iceberg.io.PartitionedWriter.close(PartitionedWriter.java:82)
>
>               at
> org.apache.iceberg.io.BaseTaskWriter.complete(BaseTaskWriter.java:83)
>

Reply via email to