[ 
https://issues.apache.org/jira/browse/HIVE-26657?focusedWorklogId=819135&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-819135
 ]

ASF GitHub Bot logged work on HIVE-26657:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Oct/22 13:31
            Start Date: 21/Oct/22 13:31
    Worklog Time Spent: 10m 
      Work Description: sonarcloud[bot] commented on PR #3695:
URL: https://github.com/apache/hive/pull/3695#issuecomment-1286966296

   Kudos, SonarCloud Quality Gate passed!    [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3695)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3695&resolved=false&types=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3695&resolved=false&types=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3695&resolved=false&types=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3695&resolved=false&types=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3695&metric=coverage&view=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3695&metric=duplicated_lines_density&view=list)
 No Duplication information
   
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 819135)
    Time Spent: 0.5h  (was: 20m)

> [Iceberg] Filter out the metadata.json file when migrating 
> -----------------------------------------------------------
>
>                 Key: HIVE-26657
>                 URL: https://issues.apache.org/jira/browse/HIVE-26657
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Pintér
>            Assignee: László Pintér
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When migrating a hive table to an iceberg in certain cases a Runtime 
> exception is raised
> {code:java}
> ERROR : Failed
> java.lang.RuntimeException: 
> s3a://dev-nfqe-base/cc-cdw-nfqe-q7wj9a/archive/env-8pt556/parquet/bakeoff/large/pli/metadata/00000-94fffe5c-c307-4341-9ea3-f5fa4863d301.metadata.json
>  is not a Parquet file. Expected magic number at tail, but found [32, 93, 10, 
> 125]
> {code}
> The hive-to-iceberg table migration has the following logic.
> 1. In order to walk through all the data files we request a file iterator 
> from the filesystem. This iterator will provide all the references to be able 
> to scan the data files.
> 2. The new iceberg table is created, meaning that a new entry is added to the 
> hive catalog and on the file system level the metadata directory is created 
> together with the first metadata file (*.metadata.json)
> 3. All the data files are scanned and the manifests are created.
> The issue occurs when there are so many data files that it doesn't fit into 
> memory in one go. So in step 3 when we walk through the data files list, the 
> iterator has to run another round of file listing that reads up the content 
> of the metadata directory that was created in step 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to