I would suspect that the metadata table became corrupted when the system went unstable and two tablet servers somehow ended up both thinking that they were responsible for the same extents(s) This should not be because of the balancer running.
If you scan the accumulo.metadata table using the shell (scan -t accumulo.metadata -c loc) or (scan -t accumulo.metadata -c loc -b [TABLE_ID#]:[EXTENT]) There will be duplicated loc entries. I am uncertain on the best way to fix this and do not have a place to try things out, but possible actions. Shutdown / bounce the tservers that have the duplicated assignments β you could start with just one and see what happens. When the tservers go offline β the tablets should be reassigned and maybe only one (re)assignment will occur. Try bouncing the manager (master) If those donβt work, then a very aggressive / dangerous / only as a last resort: Delete the specific loc rows from the metadata table (delete [row_id] loc [value] -t accumulo.metadata) This will cause a future entry in the zookeeper β to get that to reassign it might be enough to bounce the master, or you may need to shutdown / restart the cluster. Ed Coleman From: Ligade, Shailesh [USA] <ligade_shail...@bah.com> Sent: Tuesday, April 12, 2022 8:36 AM To: user@accumulo.apache.org Subject: Accumulo 1.10.0 Hello, Last weekend we ran out of hdfs space π all volumes were 100% yeah it was crazy. This accumulo has many tables with good data. Although accumulo was up it had 3 unsassigned tablets So i added few nodes to hdfs/accumulo, now hdfs capacity is 33% empty. I issued hdfs rebalance command (just in case) So all good. Accumulo unassigned tablets went away but tables are show no assigned tablets on the accumulo monitor. On the active master i am seeing error ERROR: Error processing table state for store Normal Tablets java.langRuntimeexception: org.apache.accumulo.server.master.state.TabletLocationState$BadLocationStateException: found two locations for the same extent xxxxxxxx Question is i am getting this because balancer is running and once it finished it will recover? What can be done to save this cluster? Thanks -S