What you are asking looks similar to this: HBASE-5010 Filter HFiles based on TTL
It went into 0.94.0 Cheers On Thu, Mar 14, 2013 at 3:53 PM, Pankaj Gupta <[email protected]>wrote: > Hi, > > I have a question regarding query performance for rows greater than a > timestamp. The use case is this: > I want to find all the rows in a key range that have changed after a > certain timestamp and upto a certain timestamp, i.e. exactly using this > SCAN api: > Scan setTimeRange(long minStamp, long maxStamp) > Get versions of columns only within the specified timestamp > range, [minStamp, maxStamp) > > Would this query go through all the rows in the key range or is there an > optimization that makes it faster. > > I ask because I read about such an optimization in the following paper: > > http://oss.csie.fju.edu.tw/~tzu98/Apache%20Hadoop%20Goes%20Realtime%20at%20Facebook.pdf > > Here is the excerpt: > "For data stored in HBase that is time-series or contains a specific, > known timestamp, a special timestamp file selection algorithm > was added. Since time moves forward and data is rarely inserted > at a significantly later time than its timestamp, each HFile will > generally contain values for a fixed range of time. This > information is stored as metadata in each HFile and queries that > ask for a specific timestamp or range of timestamps will check if > the request intersects with the ranges of each file, skipping those > which do not overlap. " > > > This will work perfectly for my use case but I don't know if this > optimization, or any other for this use case, exists in the Apache HBase. > The version of Apache HBASE we are currently using is 0.92.1 but > considering moving to 0.94. > > Thanks, > Pankaj
