[jira] [Updated] (KUDU-3564) Range specific hashing table when queried with InList predicate may lead to incorrect results

Alexey Serbin (Jira) Fri, 05 Apr 2024 22:58:06 -0700


     [ 
https://issues.apache.org/jira/browse/KUDU-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Serbin updated KUDU-3564:
--------------------------------
    Description: 
Reproduce steps that copy from the Slack channel:
 
{code:sql}
-- create the table and data in Impala:
CREATE TABLE age_table
(
id BIGINT,
name STRING,
age INT,
PRIMARY KEY(id,name,age)
)
PARTITION BY HASH (id) PARTITIONS 4,
HASH (name) PARTITIONS 4,
range (age)
( 
PARTITION 30 <= VALUES < 60,
PARTITION 60 <= VALUES < 90
) 
STORED AS KUDU 
TBLPROPERTIES ('kudu.num_tablet_replicas' = '1');

ALTER TABLE age_table ADD RANGE PARTITION 90<= VALUES <120
HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 3;

INSERT INTO age_table VALUES (3, 'alex', 50);
INSERT INTO age_table VALUES (12, 'bob', 100);
{code}

Now, let's run a few queries using the {{kudu table scan}} CLI tool:
{noformat}
# This query produces wrong results: the expected row for 'bob' isn't returned.
# Note that the troublesome row is in the range partition with custom 
(per-range) hash schema.
$ sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
-predicates='["AND", ["IN", "id", [12,20]]]'
Total count 0 cost 0.0224966 seconds

# This query produces correct results: the expected row for 'alex' is returned.
$ sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
-predicates='["AND", ["IN", "id", [3,20]]]'
(int64 id=3, int32 age=50)
Total count 1 cost 0.0178102 seconds

# However, predicates on the primary key columns seem to work as expected, even 
for the rows in the range with custom hash schema.
$ sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
-predicates='["AND", ["=", "id", 12]]'
(int64 id=12, int32 age=100)
Total count 1 cost 0.0137217 seconds
{noformat}

  was:
Reproduce steps that copy from the Slack channel:
 
{code:sql}
// create the table and data in Impala:
CREATE TABLE age_table
(
id BIGINT,
name STRING,
age INT,
PRIMARY KEY(id,name,age)
)
PARTITION BY HASH (id) PARTITIONS 4,
HASH (name) PARTITIONS 4,
range (age)
( 
PARTITION 30 <= VALUES < 60,
PARTITION 60 <= VALUES < 90
) 
STORED AS KUDU 
TBLPROPERTIES ('kudu.num_tablet_replicas' = '1');

ALTER TABLE age_table ADD RANGE PARTITION 90<= VALUES <120
HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 3;


INSERT INTO age_table VALUES (3, 'alex', 50);
INSERT INTO age_table VALUES (12, 'bob', 100);

# This query produces wrong results: the expected row for 'bob' isn't returned.
# Note that the troublesome row is in the range partition with custom 
(per-range) hash schema.
sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
-predicates='["AND", ["IN", "id", [12,20]]]'
Total count 0 cost 0.0224966 seconds

# This query produces correct results: the expected row for 'alex' is returned.
sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
-predicates='["AND", ["IN", "id", [3,20]]]'
(int64 id=3, int32 age=50)
Total count 1 cost 0.0178102 seconds

# However, predicates on the primary key columns seem to work as expected, even 
for the rows in the range with custom hash schema.
sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
-predicates='["AND", ["=", "id", 12]]'
(int64 id=12, int32 age=100)
Total count 1 cost 0.0137217 seconds

{code}


> Range specific hashing table when queried with InList predicate may lead to 
> incorrect results
> ---------------------------------------------------------------------------------------------
>
>                 Key: KUDU-3564
>                 URL: https://issues.apache.org/jira/browse/KUDU-3564
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.17.0
>            Reporter: YifanZhang
>            Priority: Major
>
> Reproduce steps that copy from the Slack channel:
>  
> {code:sql}
> -- create the table and data in Impala:
> CREATE TABLE age_table
> (
> id BIGINT,
> name STRING,
> age INT,
> PRIMARY KEY(id,name,age)
> )
> PARTITION BY HASH (id) PARTITIONS 4,
> HASH (name) PARTITIONS 4,
> range (age)
> ( 
> PARTITION 30 <= VALUES < 60,
> PARTITION 60 <= VALUES < 90
> ) 
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.num_tablet_replicas' = '1');
> ALTER TABLE age_table ADD RANGE PARTITION 90<= VALUES <120
> HASH(id) PARTITIONS 3 HASH(name) PARTITIONS 3;
> INSERT INTO age_table VALUES (3, 'alex', 50);
> INSERT INTO age_table VALUES (12, 'bob', 100);
> {code}
> Now, let's run a few queries using the {{kudu table scan}} CLI tool:
> {noformat}
> # This query produces wrong results: the expected row for 'bob' isn't 
> returned.
> # Note that the troublesome row is in the range partition with custom 
> (per-range) hash schema.
> $ sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
> -predicates='["AND", ["IN", "id", [12,20]]]'
> Total count 0 cost 0.0224966 seconds
> # This query produces correct results: the expected row for 'alex' is 
> returned.
> $ sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
> -predicates='["AND", ["IN", "id", [3,20]]]'
> (int64 id=3, int32 age=50)
> Total count 1 cost 0.0178102 seconds
> # However, predicates on the primary key columns seem to work as expected, 
> even for the rows in the range with custom hash schema.
> $ sudo -u kudu kudu table scan <master.url> default.age_table -columns=id,age 
> -predicates='["AND", ["=", "id", 12]]'
> (int64 id=12, int32 age=100)
> Total count 1 cost 0.0137217 seconds
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KUDU-3564) Range specific hashing table when queried with InList predicate may lead to incorrect results

Reply via email to