I found that if I split the last row off the tablet and merge it into the 
following tablet then it removed the orphans due to dir being added to del list.

As to why it is crashing, I think that :
If there is a very large tablet and accumulo tries to split it and there is 
ingest causing minor compactions then accumulo somehow loses count of the minor 
compactions and eventually the concurrent minor compactions is full and no more 
can be added.
The dashboard shows minor compactions as running but listcompactions does not 
show any.
At this point hold time is triggered and if the tablet cannot be split within 5 
minutes the tserver exits.

I don't understand why that would orphan an rfile though.

As to the very large tablet...if you repeatedly add to the same row it can't be 
split but then if you add some different row that will end in the same tablet, 
it will then attempt to split a large tablet.  Or is there a mechanism to 
prevent this?

For instance, I never see a log like this "Tablet x contains a large row y, 
isolating it in own tablet, splitting x into x,y,z"

From: Christopher <ctubb...@apache.org>
Sent: 29 June 2022 13:23
To: accumulo-user <user@accumulo.apache.org>; Hart, Andrew <and.h...@cgi.com>
Subject: Re: Un-referenced rfiles in hdfs

EXTERNAL SENDER: Do not click any links or open any attachments unless you 
trust the sender and know the content is safe.
EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n'ouvrez aucune pièce jointe à 
moins qu'ils ne proviennent d'un expéditeur fiable, ou que vous ayez 
l'assurance que le contenu provient d'une source sûre.


The Accumulo file garbage collection mechanism is designed to fail safe to only 
delete files it knows are no longer in use. It also tries to do this with 
minimal interaction with the hdfs name node (so, no scanning the entire file 
system to find files). It's possible that in some circumstances, servers can 
crash in a way that leaves a file on the file system that Accumulo is no longer 
using but Accumulo does not have evidence off its existence in order to know to 
clean it up. This is a preferred failure scenario than accidentally 
aggressively deleting files that could still be in use.

My recommendation is to periodically check your file system for such orphaned 
files, and determine if you wish to delete them based on their age or content. 
These should only appear after a server failure, so you could perform such 
tasks during triage/investigation of whatever failure occurred when it occurs 
in your system. You could also write a small trivial monitoring service to 
identify old unreferenced files and report them to you by whatever means you 
prefer. Since these should only appear after an unexpected failure, it's hard 
to provide a general solution within Accumulo itself.

On Wed, Jun 29, 2022, 07:54 Hart, Andrew via user 
<user@accumulo.apache.org<mailto:user@accumulo.apache.org>> wrote:
Hi,

I have some rfiles in hdfs that aren't referenced in the accumulo.metadata.
So there will be a file like   8500000000 2022-02-02 11:59 
/accumulo/tables/3/t-1234567/Cabcdef.rf
but grep -t accumulo.metadata Cabcdef.rf doesn't find anything.

Is there any way run the gc process so that it cleans up the orphan rfiles?

And.


Public


Public

Reply via email to