Re: RegionServers shutdown randomly

2015-08-07 Thread James Estes
There is this http://mail-archives.apache.org/mod_mbox/hbase-user/201507.mbox/%3CCAE8tVdmyUfG%2BajK0gvMG_tLjoStZ0HjrQxJuuJzQ3Z%2B4vbzSuQ%40mail.gmail.com%3E Which points to https://issues.apache.org/jira/browse/HDFS-8809 But (at least for us) this hasn't lead to region server crashing...though I'm

Re: Full GC on client may lead to empty scan results

2015-07-31 Thread James Estes
tion? > > On Thu, Jul 30, 2015 at 1:13 PM, James Estes > wrote: > > > All, > > > > If a full GC happens on the client when a scan is in progress, the scan > can > > be missing rows. I have a test that repros this almost every time. > > > > The t

fsck reports WALs corrupt in Hadoop 2.6.0

2015-07-30 Thread James Estes
All, With 0.98.12 on top of Hadoop 2.6.0, running: ./bin/hdfs fsck -openforwrite Reports that the WALs are corrupt. Without -openforwrite, everything is fine. They all seem to be "MISSING 1 blocks of total size 83 B" for each region server WAL. May be the same as to https://issues.apache.org/jir

Full GC on client may lead to empty scan results

2015-07-30 Thread James Estes
All, If a full GC happens on the client when a scan is in progress, the scan can be missing rows. I have a test that repros this almost every time. The test runs against a local standalone server with 10g heap, using jdk1.7.0_45. The Test: - run with -Xmx1900m to restrict client heap - run with

RS crash: OOME: Requested array size exceeds VM limit.

2015-07-29 Thread James Estes
All, Running 0.98.12 on hadoop 2.6.0 and Java 7. TL;DR; If your RS is crashing for no apparent reason, check the stderr/stdout log, and if you have wide rows, upgrade to 0.98.13, OR set batch size AND max result size on scans. I'm reporting here because the failure was tough to track down and it

Data loss after split in 0.98.12

2015-07-28 Thread James Estes
All, We've been running with HBase 0.98.12 and Hadoop 2.6.0* for about 3 months now with really no issues in 4 clusters. However, recently we've been seeing some issues. I'm not sure they're related to the combination, and they may be fixed in 1.1.1 (which we are in the process of rolling out soon

Re: Troubles with HBase 1.1.0 RC2

2015-05-15 Thread James Estes
to modify the code you plan to deploy on the server. I don't think any >> client side changes are needed. Unless your coprocessor implements an >> Endpoint and _you_ are changing your RPC message formats, a 1.0.x client >> shouldn't care whether it is talking to a 1.0.x se

Troubles with HBase 1.1.0 RC2

2015-05-13 Thread James Estes
I saw the vote thread for RC2, so tried to build my project against it. My build fails when I depend on 1.1.0. I created a bare bones project to show the issue I'm running into: https://github.com/housejester/hbase-deps-test To be clear, it works in 1.0.0 (and I did add the repository). Further,

Re: Hbase row ingestion ..

2015-04-30 Thread James Estes
Guatam, Michael makes a lot of good points. Especially the importance of analyzing your use case for determining the row key design. We (Jive) did a talk at HBasecon a couple years back talking about our row key redesign to vastly improve performance. It also talks a little about the write path

Replication Codec

2015-04-20 Thread James Estes
I'm running HBase 0.98 and looking into replication. Looking at the Codecs (those that include tags), I see KeyValueCodecWithTags and CellCodecWithTags. Is there a reason to prefer one over the other? Thanks, James

Slow Scan can loop forever

2014-10-14 Thread James Estes
I'm having an issue where a Scan gets in a loop, and never completes and never times out. I've seen it run for hours, and is repeatable on my system. After looking through hbase code and logs, I think I understand what is going on. I'm using hbase 0.96.0-hadoop2, running on Hadoop 2.2.0. I'm using

Re: Configuring tombstone purge independent of deleted cell purge

2014-09-23 Thread James Estes
g> wrote: > >> Hi James, >> >> Is it possible that you are impacted by >> https://issues.apache.org/jira/browse/HBASE-10118 ? Any change to test >> with >> one release where HBASE-10118 is available? >> >> JM >> >> 2014-09-23 12:1

Re: Configuring tombstone purge independent of deleted cell purge

2014-09-23 Thread James Estes
are purged > during the next major compaction. Otherwise, a delete marker is kept > until the major compaction > which occurs after the marker's timestamp plus the value of this > setting, in milliseconds. > > > > That seems to be exactly what you

Configuring tombstone purge independent of deleted cell purge

2014-09-22 Thread James Estes
Could tombstone purges be independent of purging deleted cells and KEEP_DELETED_CELLS setting? In my use case, I do not want to keep deleted cells, but I do need to keep the tombstones around. Without the tombstones, I'm not able to do incremental backups (custom, we do timerange raw scans ourselve

Re: Missing region data.

2012-01-12 Thread James Estes
would be just bounce the server if compactions pile up and we see something like this in the logs :) Thanks, James On Tue, Jan 10, 2012 at 11:18 AM, Stack wrote: > On Mon, Jan 9, 2012 at 1:57 PM, James Estes wrote: >> Should we file a ticket for this issue?  FWIW we got this fixed (not

Re: Missing region data.

2012-01-09 Thread James Estes
hu, Dec 22, 2011 at 2:34 PM, James Estes wrote: > We have a 6 node 0.90.3-cdh3u1 cluster.  We have 8092 regions.  I > realize we have too many regions and too few nodes…we're addressing > that.  We currently have an issue where we seem to have lost region > data.  When data is re

Missing region data.

2011-12-22 Thread James Estes
We have a 6 node 0.90.3-cdh3u1 cluster. We have 8092 regions. I realize we have too many regions and too few nodes…we're addressing that. We currently have an issue where we seem to have lost region data. When data is requested for a couple of our regions, we get errors like the following on th