[ 
https://issues.apache.org/jira/browse/KUDU-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Yao updated KUDU-2673:
-------------------------
    Labels: features roadmap-candidate  (was: features)

> Event timestamp support with kudu.
> ----------------------------------
>
>                 Key: KUDU-2673
>                 URL: https://issues.apache.org/jira/browse/KUDU-2673
>             Project: Kudu
>          Issue Type: Improvement
>          Components: java, spark, tserver
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Priority: Major
>              Labels: features, roadmap-candidate
>             Fix For: 1.8.0
>
>
> Kudu has the ability to read historical data. But it is based by the 
> timestamp produced by kudu transaction and mvcc system. The timestamp kudu 
> used greatly weakened the usability.
> For our use case. we write data to kudu from data stream. We use range 
> partition by day.
> We want to get the hour version from kudu. So we need read history data from 
> kudu.
> It produced by undo file. But when user give a timestamp, it means timestamp 
> the event happen, associated with the data. Not the timestamp kudu produced. 
> So we need a way to set event timestamp to the kudu system.
> Finally, we got a way to solve this problem.
> But our solution has two limit.
>  # We only update the table by a row, and for one row we have a timestamp 
> with it.
>  # For getting the right history version of data, we need the data stream 
> send data by event time order.
> Despite these problems, it has satisfied our current business.
>  
> And our implement also solve part problem for the wrong order problem of 
> event time if you only need the newest data, which will not read undo file.
> for the data send into kudu,       t1 < t2
> t1 upsert -> t2 upsert      ->    newest will be t2 value
> t2 upsert -> t1 upsret      ->    (current kudu implement) t1,  our implement 
> will be t2.
>  
> Maybe our solution is not the best for the problem. But I think kudu snapshot 
> read should support event time.
> Our solution is not so complete for all user cases. But I hope it will be 
> useful for some cases with the community.   
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to