It does sound like what I'd want (that's why I was trying to use it :) ),
but it isn't working as described. Maybe it is a bug?

The behavior I'm seeing is that the delete markers are removed on major
compaction, regardless of having a hbase.hstore.time.to.purge.deletes set
in hbase-site.xml:
https://gist.github.com/housejester/2b8fbba0d05c6abbe784

I think I've found the issue now. You mentioned the setting could be
applied per CF...so I tested that way, and it works as expected. My
hbase-site.xml had:

<property>
  <name>hbase.hstore.time.to.purge.deletes</name>
  <name>600000</name>
</property>

But that doesn't seem to be applied (even with restarts, etc). Setting
hbase.hstore.time.to.purge.deletes directly on the column family does work
though:
https://gist.github.com/housejester/a81274bf74a8666fba32

Not sure why it isn't picking up from my hbase-site.xml, but I'll just
configure it on the CFs. This is on hbase-0.98.6.1-hadoop2 and
hbase-0.96.0-hadoop2 running in local mode.

Thanks Lars,
James

On Mon, Sep 22, 2014 at 11:04 PM, lars hofhansl <la...@apache.org> wrote:

> You can use the hbase.hstore.time.to.purge.deletes config option.
> You can set it globally or per Column Family.
>
> This is the description in hbase-default.xml:
>   <property>
>     <name>hbase.hstore.time.to.purge.deletes</name>
>     <value>0</value>
>     <description>The amount of time to delay purging of delete markers
> with future timestamps. If
>       unset, or set to 0, all delete markers, including those with future
> timestamps, are purged
>       during the next major compaction. Otherwise, a delete marker is kept
> until the major compaction
>       which occurs after the marker's timestamp plus the value of this
> setting, in milliseconds.
>     </description>
>   </property>
>
> That seems to be exactly what you want.
>
> -- Lars
>
>
> ----- Original Message -----
> From: James Estes <james.es...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Monday, September 22, 2014 10:39 AM
> Subject: Configuring tombstone purge independent of deleted cell purge
>
> Could tombstone purges be independent of purging deleted cells and
> KEEP_DELETED_CELLS setting? In my use case, I do not want to keep deleted
> cells, but I do need to keep the tombstones around. Without the tombstones,
> I'm not able to do incremental backups (custom, we do timerange raw scans
> ourselves for this).
>
> As a rough example, if I have the following timeline for the same row key
> (where t# is time):
> t0 - full backup (using a time range up to t0)
> t1 - PUT v1
> t2 - incremental backup #1 (time range t0 up to t2)
> t3 - DELETE
> t4 - flush and major compaction happens
> t5 - incremental backup #2 (time range t2 up to t5)
> t6 - full system crash
> t7 - data restored from full backup + incrementals #1 and #2
>
> When the restore completes, the row will have been un-deleted. This is
> because the incremental backup in #2 will not have the tombstone, since it
> gets compacted out.
>
> So in our case, I do NOT want to keep deleted cells (because I do not want
> the cells to show up in time range scans users may do), but I DO want to
> keep the tombstones for a configurable amount of time (much larger than our
> planned incremental backup schedule) so they are captured during backup.
> This would allow for the custom incremental backups to be independent of
> major compactions. Without it, the backup schedule would have to manually
> handle compactions and would always have to do a FULL Backup after a major
> compaction (otherwise there can be loss because when any major compaction
> happens, any tombstone that came in after the last incremental will be
> lost).
>
> It seems like there could be another setting for when to purge tombstones.
> Currently, there is hbase.hstore.time.to.purge.deletes for when to purge
> deleted cells, but ONLY if KEEP_DELETED_CELLS is configured (which makes
> sense). I'd like to propose a hbase.hstore.time.to.purge.tombstones that
> could default to the same value as hbase.hstore.time.to.purge.deletes, but
> would take effect regardless of the KEEP_DELETED_CELLS setting. It should
> have a constraint so that hbase.hstore.time.to.purge.deletes <
> hbase.hstore.time.to.purge.tombstones (b/c we don't want tombstones
> disappearing before the deleted cells).
>
> Does this seem reasonable? Is there another approach I might take?
>
> Thanks,
>
>

Reply via email to