FWIW, migrations that never go away have been a symptom of bugs in the
Master before. The master gets into a state where it either stops
processing migrations or it doesn't realize that there is a migration to
process. You might be able to grep over the Master log and find
information about migrations. Sorry I don't have anything more specific.
The lock without a FATE op also seems problematic, but might be
unrelated to the migration? You might be able to find more information
in the master log about that FATE transaction ID.
Michael Wall wrote:
Are you currently experiencing 1 outstanding migration? Does it go away
on it's own? Unless servers are going down, tablets will migrate when
their split threshold is reached. Is it possible you are constantly
splitting a table?
If all the tservers appear to be in good shape, maybe it is an issue
with the master. What does the jstack look like for that?
On Thu, Aug 4, 2016 at 12:06 PM, Tim I <t...@timisrael.com
<mailto:t...@timisrael.com>> wrote:
Hi Mike,
Thanks for the direction.
Empty result set from the scan you suggested
There was a lock without an associated FATE operation.
The following locks did not have an associated FATE operation
txid: 667becf32c0fe544 locked: [R:+default]
No recoveries stuck currently, and no long running scans.
Otherwise, the system seems fine.
Is it possible this is just benign? Should we monitor for locks
that don't have FATE operations and delete them from time to time?
Thanks,
Tim
On Thu, Aug 4, 2016 at 11:44 AM, Michael Wall <mjw...@gmail.com
<mailto:mjw...@gmail.com>> wrote:
Hi Tim,
You can try scanning the metadata table for a future colfam.
Something like
scan -t accumulo.metadata -c fut
If you find one, look at the tabletserver that is slated to host
that tablet. There could be an issue with that server
preventing assignment from completing. Get a jstack and save
the logs so you can further troubleshoot. Killing that tserver
will cause the assignment to go elsewhere, but make sure you get
as much info as you can before killing it.
What else is going on with the system? Do you have any
recoveries that are stuck? Are there any fate transactions that
have been running for a while? Any long running scans?
HTH
Mike
On Thu, Aug 4, 2016 at 11:04 AM, Tim I <t...@timisrael.com
<mailto:t...@timisrael.com>> wrote:
Hi all,
We're running accumulo 1.6.5
One of the issues we're seeing on a consistent basis is this
message:
"Not balancing due to 1 outstanding migrations".
Is there a simple way to see the number of outstanding
migrations? Based on what we've read and experienced, it
eventually means we have to bounce the master to get things
to a better state, however the message comes back within
about 1 hour.
Any thoughts and suggestions would be greatly appreciated.
Thanks,
Tim