Re: [PR] feat: add `get_file_slices_splits_between` API [hudi-rs]

via GitHub Tue, 12 Aug 2025 08:37:37 -0700


xushiyan commented on code in PR #411:
URL: https://github.com/apache/hudi-rs/pull/411#discussion_r2270308275



##########
crates/core/src/table/mod.rs:
##########
@@ -520,6 +520,72 @@ impl Table {
             })
     }
 
+    /// Get all the changed [FileSlice]s in splits from the table between the 
given timestamps.
+    ///
+    /// # Arguments
+    ///     * `n` - The number of chunks to split the file slices into.
+    ///     * `start_timestamp` - If provided, only file slices that were 
changed after this timestamp will be returned.
+    ///     * `end_timestamp` - If provided, only file slices that were 
changed before or at this timestamp will be returned.

Review Comment:
   @codope yes i wanted to go with `[start, end)` the more common semantics, 
but then i realized that we have to make end inclusive for getting file slices 
for fresher data. say you set end = latest commit time, but you don't return 
the latest records until the next pull, which can make people annoyed :). We 
have end inclusive also makes it aligned with other query types semantics: when 
end is omitted, it's implicitly getting the latest, which aligns with snapshot 
query (T_end = the latest inclusive); when end is present, it's getting the 
data up to T_end, aligning with time-travel query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: add `get_file_slices_splits_between` API [hudi-rs]

Reply via email to