Could it be the second problem? I'm seeing exceptions like this one in the tablet server logs: 2018-01-02 09:09:23,481 [zookeeper.DistributedWorkQueue] WARN : Failed to process work b1bba0c2-dde2-42d4-8c10-ef51d13448ca|peer1|4l|6 java.lang.RuntimeException: Instance name peer1 does not exist in zookeeper. Run "accumulo org.apache.accumulo.server.util.ListInstances" to see a list.
When I run "accumulo org.apache.accumulo.server.util.ListInstances" it only lists the primary accumulo. Could the problem be in the ZooKeeper Quorum I used when I registered the peer instance? I used the IP ot the peer as the only IP as the ZooKeeper Quorum value. 2017-12-29 16:07 GMT+01:00 Josh Elser <[email protected]>: > If the system is reporting files that need to be replicated, it's probably > one of two problems: > > * The WALs are still in use by the TabletServers. In its current > implementation, the WALs are not replicated until the TabletServers don't > referenced those WALs. This happens either by writing enough data or when > the tabletserver is restarted. You can try to investigate either for this. > * The replication is trying to happen but fails. You can look at the > TabletServer logs on the primary instance to see if there are any reported > exceptions around sending the data to the peer. > > > On 12/29/17 8:24 AM, vLex Systems wrote: >> >> Hi, >> >> I've configured replication between two instances of accumulo: one is >> the primary accumulo and the other is a peer created from a restore of >> the backup of the primary. >> >> I've followed the instructions in the manual >> (https://accumulo.apache.org/1.7/accumulo_user_manual#_replication) >> and I can see the 4 tables I've configured to replicate in the >> Accumulo Monitor but they do not replicate. They have 1 or 2 "Files >> needing replication" and this number never decreases. >> >> I've also tried inserting data in one of the tables and the data does >> not replicate to the accumulo peer instance. >> >> In the master log I see many entries like this one: >> 2017-12-29 13:22:25,490 [replication.RemoveCompleteReplicationRecords] >> INFO : Removed 0 complete replication entries from the table >> accumulo.replication >> >> Does anyone know what could be happening? >> >> Thanks. >> >
