[
https://issues.apache.org/jira/browse/IMPALA-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921560#comment-17921560
]
ASF subversion and git services commented on IMPALA-13691:
----------------------------------------------------------
Commit fb45c786e9527d00844dfe986bf624ec5181cb31 in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fb45c786e ]
IMPALA-13691: Partition values from HMS events don't need URL decoding
Hive uses URL encoding to format the partition strings when creating the
partition folders, e.g. "00:00:00" will be encoded into "00%3A00%3A00".
When you create a partition of string type partition column "p" and
using "00:00:00" as the partition value, the underlying partition folder
is "p=00%3A00%3A00".
When parsing the partition folders, Impala will URL-decode the partition
folder names to get the correct partition values. This is correct in
ALTER TABLE RECOVER PARTITIONS command that gets the partition strings
from the file paths. However, for partition strings come from HMS
events, Impala shouldn't URL-decode them since they are not URL encoded
and are the original partition values. This causes HMS events on
partitions that have percent signs in the value strings being matched to
wrong partitions.
This patch fixes the issue by only URL-decoding the partition strings
that come from file paths.
Tests:
- Ran tests/metadata/test_recover_partitions.py
- Added custom-cluster test.
Change-Id: I7ba7fbbed47d39b02fa0b1b86d27dcda5468e344
Reviewed-on: http://gerrit.cloudera.org:8080/22388
Reviewed-by: Wenzhe Zhou <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Processing INSERT event failed by partition values mismatch
> -----------------------------------------------------------
>
> Key: IMPALA-13691
> URL: https://issues.apache.org/jira/browse/IMPALA-13691
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> Create a partitioned table:
> {code:sql}
> create external table test_part (i int) partitioned by (s string);{code}
> Add the following partition folders inside the table location:
> {code:bash}
> TBL_DIR=hdfs://localhost:20500/test-warehouse/test_part
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-09 00%25253A00%25253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-09 00%253A00%253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-09 00%3A00%3A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-10 00%25253A00%25253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-10 00%253A00%253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2024-09-10 00%3A00%3A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2025-01-21 00%253A00%253A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2025-01-21 00%3A00%3A00"
> hdfs dfs -mkdir "$TBL_DIR/s=2025-01-22 00%3A00%3A00"{code}
> In Impala, create the partitions by ALTER TABLE RECOVER PARTITIONS:
> {code:sql}
> impala> alter table test_part recover partitions;{code}
> The partition values are inconsistent with the partition folders:
> {noformat}
> Query: show partitions test_part
> +-----------------------------+-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------+-----------+
> | s | #Rows | #Files | Size | Bytes Cached |
> Cache Replication | Format | Incremental stats | Location
> | EC Policy |
> +-----------------------------+-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------+-----------+
> | 2024-09-09 00%253A00%253A00 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-09
> 00%25253A00%25253A00 | NONE |
> | 2024-09-09 00%3A00%3A00 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-09 00%253A00%253A00
> | NONE |
> | 2024-09-09 00:00:00 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-09 00%3A00%3A00
> | NONE |
> | 2024-09-10 00%253A00%253A00 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-10
> 00%25253A00%25253A00 | NONE |
> | 2024-09-10 00%3A00%3A00 | -1 | 4 | 1.70KB | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-10 00%253A00%253A00
> | NONE |
> | 2024-09-10 00:00:00 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2024-09-10 00%3A00%3A00
> | NONE |
> | 2025-01-21 00%3A00%3A00 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2025-01-21 00%253A00%253A00
> | NONE |
> | 2025-01-21 00:00:00 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2025-01-21 00%3A00%3A00
> | NONE |
> | 2025-01-22 00:00:00 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED | PARQUET | false |
> hdfs://localhost:20500/test-warehouse/test_part/s=2025-01-22 00%3A00%3A00
> | NONE |
> | Total | -1 | 4 | 1.70KB | 0B |
> | | |
> | |
> +-----------------------------+-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------------------------+-----------+{noformat}
> INSERT one partition in Hive:
> {code:sql}
> hive> insert into test_part partition(s="2024-09-10 00%3A00%3A00") values
> (0);{code}
> The EventProcessor in catalogd failed to process the INSERT event:
> {noformat}
> E0124 12:37:52.303791 1926240 MetastoreEventsProcessor.java:1098] Unexpected
> exception received while processing event
> Java exception follows:
> java.lang.IllegalArgumentException
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)
> at
> org.apache.impala.catalog.HdfsTable.reloadPartitions(HdfsTable.java:3054)
> at
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames(HdfsTable.java:2946)
> at
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsFromNamesIfExists(CatalogOpExecutor.java:5092)
> at
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExist(CatalogOpExecutor.java:5021)
> at
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions(MetastoreEvents.java:1112)
> at
> org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processPartitionInserts(MetastoreEvents.java:1671)
> at
> org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processTableEvent(MetastoreEvents.java:1653)
> at
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1339)
> at
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:701)
> at
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1336)
> at
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1079)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> E0124 12:37:52.306558 1926240 MetastoreEventsProcessor.java:1436] Event id:
> 38879
> Event Type: INSERT
> Event time: 1737693396
> Database name: default
> Table name: test_part
> Event message:
> H4sIAAAAAAAAAO1WW2+bMBj9KxPV3ggxJoUQaQ9pS7VM3VqlTJu0TJELTvHkYGqbTl2V/z5fSBsovajawx6ah9h85/D5u/mIW0dgfo25M3FkwclKToZDyjJECybkJAbjwHENhWT4jJMyIxWiiqys+YVac7xCNZXqUaILirUbLOSyQvzOlt5U2p58T5P5l+nJMp0enCRb8PTi1yfBSoXfLhx/4UzUIiRXm8W9p4WzcRcObKPNyRYL2thVjUrKyksLjixIAu3Bj4IojAM/ijW0vwsBbQkfWCJr4TizmyZKKtTZkx8N4PruwwRIb+Ck1EFvfvZARb4SrQZAsA/AUBdi8BtxXLBa4GGnLp3cGb/0UIWyAnsFyhmrvIJcY++KeoR56q2rGkvvM6o4zs/s06ysannM+BrJVsFe7/G0lh2XTaHl6uV1hq+IQk1qjr0mio8KP8f8CLfqtEaV7Ztx7G5X4N6qlmw0cdxpcEMwDYt7m28xH7zgBM3zn5mo3QNhBzObButmZLGejHzYKn9vlo+PsegdY7USc8M2u4V5JPdAu90qgHk9nX9NDEFyVAqCS7mkSMijnKZkjQ3l/qoa4unBp8Pp2fRgdjJLZ8m5oSiX82R65Kr123yWKo9NiTvBtsXH5uM/mEk/6lxHNUANd9zSEGNqMibhaPs+DHdNMBzDzUYJXSXLPpnrtNvXDgAcDUA88ME7AN4H0+Zv4fSJxVMC2JGIXgWM4ZMK+B/q3VB8aFcI7psa2eVNDd/U8NVqGD0Z7EgP+7NCBcehVTQmET0nfyw4joCxlvX6mFAsEo5EzfEhy3FuCG3YmOBWx+JHBQnsSs3AN0IjVUBConV1d8mDOHRVZSuKMv0NtkJUYEVc6ZNUnv/2Ag6B+S3BMmPVzTLY29tTwvUXNRWTVGIKAAA=
> W0124 12:37:52.306648 1926240 MetastoreEventsProcessor.java:1067] Event
> processing is skipped since status is ERROR. Last synced event id is
> 38878{noformat}
> Note that to reproduce the issue after IMPALA-12832, you need to launch
> catalogd with "--invalidate_metadata_on_event_processing_failure=false".
> CC [~hemanth619], [~VenuReddy]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]