[I] Bug: column stat expression index does not work as expected in Quick Start Guide Scala implementation. [hudi]

via GitHub Tue, 25 Nov 2025 05:20:20 -0800


rangareddy opened a new issue, #14352:
URL: https://github.com/apache/hudi/issues/14352


   ### Bug Description
   
   **What happened:**
   
   The "column stat expression index" functionality, as implemented in the 
provided Scala code example within the Spark Quick Start Guide, is not 
performing its intended optimization or yielding the expected results.
   
   ```scala
   scala> // Query on ts column would prune the data using the idx_column_ts 
index
   
   scala> spark.sql(s"SELECT * FROM hudi_indexed_table WHERE from_unixtime(ts, 
'yyyy-MM-dd') = '2023-09-24'").show(false);
   25/11/24 11:20:31 WARN CacheManager: Asked to cache already cached data.
   25/11/24 11:20:32 WARN CacheManager: Asked to cache already cached data.
   
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+-----+------+----+----+
   
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|ts
 |uuid|rider|driver|fare|city|
   
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+-----+------+----+----+
   
+-------------------+--------------------+------------------+----------------------+-----------------+---+----+-----+------+----+----+
   ```
   
   **What you expected:**
   
   I expected the Scala code to successfully implement and utilize the column 
stat expression index, resulting in the anticipated query optimization and 
improved performance (e.g., predicate pushdown or faster data filtering) as 
documented in the Quick Start Guide.
   
   **Steps to reproduce:**
   1. Follow the Spark quick start guide index example 
(https://hudi.apache.org/docs/quick-start-guide#indexing)
   2. Query the table data and you will see empty results.
   
   
   ### Environment
   
   **Hudi version:**
   **Query engine:** (Spark/Flink/Trino etc)
   **Relevant configs:**
   
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Bug: column stat expression index does not work as expected in Quick Start Guide Scala implementation. [hudi]

Reply via email to