YinChunGuang opened a new issue, #13887:
URL: https://github.com/apache/hudi/issues/13887

   **Hudi record level  index dose not work when execute spark  sql**
   
   
   
   Hudi record level  index dose not work when execute spark  sql
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  create table 
   create table if not exists hudix.hudihudi_table2(
     id string, 
     name string, 
     price string
   ) using hudi
   options (
     type = 'cow',
     primaryKey = 'id',
     'hoodie.index.type'='RECORD_INDEX',
     'hoodie.metadata.record.index.enable'='true'
   );
   
   
   2.  insert data 
   insert into hudix.hudihudi_table2 select  UUID() AS id, UUID() as id2, 
UUID() AS id3 from  range(0,10000007,1,200);
   
   3. execute query  
   
   3.1 SET hoodie.metadata.record.index.enable=true;
   3.2 select * from hudix.hudihudi_table2 where id='1';
   20250912145409224    20250912145409224_0_0   1               
9922068f-73ab-4a2e-bc3f-dc241e4b0368-0_0-267-6144_20250912152837536.parquet     
1       2       3
   20250912145553793    20250912145553793_0_2   1               
9922068f-73ab-4a2e-bc3f-dc241e4b0368-0_0-267-6144_20250912152837536.parquet     
1       2       3
   
   4. got spark dag scan all records as follows.
   
   >
   
   <img width="972" height="336" alt="Image" 
src="https://github.com/user-attachments/assets/b9d0e19c-ca77-4d31-bfab-bc0d099ccc02";
 />
   
   <img width="575" height="696" alt="Image" 
src="https://github.com/user-attachments/assets/08a1cdca-612b-496c-bc79-a942241aaeeb";
 />
   
   <img width="1784" height="709" alt="Image" 
src="https://github.com/user-attachments/assets/c7479136-4cfb-46b9-9101-b778ef451021";
 />
   
   
   
   **Expected behavior**
   
   Hudi Record Level Index dose not work with sparksql .  I expected that index 
filter work .
   
   **Environment Description**
   
   * Hudi version :  0.15
   
   * Spark version : 3.3.1
   
   * Hive version : 2.3.9
   
   * Hadoop version : 3.3.1
   
   * Storage (HDFS/S3/GCS..) : hdfs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to