abhibhat98 opened a new issue #1675:
URL: https://github.com/apache/hudi/issues/1675


   **Describe the problem you faced**
   When I do an incremental query, I only get the latest event per key. I want 
to get all the events as a log.
   e,g 
   at time T1, key value as K1-V1
   at time T2, updated key value is K1-V2
   at time T3, updated key value is K1-V3
   
   When I do an incremental query between time 0(start) to T3, I only get 
K1-V3. Is there a way I can set maxCommits(I see that there's an option Setting 
fromCommitTime=0 and maxCommits=-1 will fetch the entire source table in 
HiveIncrementalPuller), so that I can stream all these events back from a 
certain time.
   
   As an example, if I ask incremental updates after T1+1, I'd get:
   K1-V2
   K1-V3
   
   I am able to get it using spark.read.parquet ... Is there a way I can get it 
from Hudi?
   
   The environment I am on is  EMR 6.0.0 on AWS with Hudi
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to