YinChunGuang opened a new issue, #13887:
URL: https://github.com/apache/hudi/issues/13887
**Hudi record level index dose not work when execute spark sql**
Hudi record level index dose not work when execute spark sql
**To Reproduce**
Steps to reproduce the behavior:
1. create table
create table if not exists hudix.hudihudi_table2(
id string,
name string,
price string
) using hudi
options (
type = 'cow',
primaryKey = 'id',
'hoodie.index.type'='RECORD_INDEX',
'hoodie.metadata.record.index.enable'='true'
);
2. insert data
insert into hudix.hudihudi_table2 select UUID() AS id, UUID() as id2,
UUID() AS id3 from range(0,10000007,1,200);
3. execute query
3.1 SET hoodie.metadata.record.index.enable=true;
3.2 select * from hudix.hudihudi_table2 where id='1';
20250912145409224 20250912145409224_0_0 1
9922068f-73ab-4a2e-bc3f-dc241e4b0368-0_0-267-6144_20250912152837536.parquet
1 2 3
20250912145553793 20250912145553793_0_2 1
9922068f-73ab-4a2e-bc3f-dc241e4b0368-0_0-267-6144_20250912152837536.parquet
1 2 3
4. got spark dag scan all records as follows.
>
<img width="972" height="336" alt="Image"
src="https://github.com/user-attachments/assets/b9d0e19c-ca77-4d31-bfab-bc0d099ccc02"
/>
<img width="575" height="696" alt="Image"
src="https://github.com/user-attachments/assets/08a1cdca-612b-496c-bc79-a942241aaeeb"
/>
<img width="1784" height="709" alt="Image"
src="https://github.com/user-attachments/assets/c7479136-4cfb-46b9-9101-b778ef451021"
/>
**Expected behavior**
Hudi Record Level Index dose not work with sparksql . I expected that index
filter work .
**Environment Description**
* Hudi version : 0.15
* Spark version : 3.3.1
* Hive version : 2.3.9
* Hadoop version : 3.3.1
* Storage (HDFS/S3/GCS..) : hdfs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]