Not daily but over weekend. ________________________________ From: Mike Miller <mmil...@apache.org> Sent: Tuesday, March 15, 2022 10:39 AM To: user@accumulo.apache.org <user@accumulo.apache.org> Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up
Why are you bringing the cluster down every night? That is not ideal. On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>> wrote: Thanks Mike, We bring the servers down nightly. these are on aws. This worked yesterday (Monday) but this (Tuesday) i went on to check on it and it was down, I guess i didn't check yesterday. I assume it was up as no one complained., but it was up and kicking last week for sure. So not exactly sure when or what caused it, all services are up (tserver, master) so services are not crashing themselves. I guess worst case, i can re-initialize and recreate tables form hdfs..:-( -S ________________________________ From: Mike Miller <mmil...@apache.org<mailto:mmil...@apache.org>> Sent: Tuesday, March 15, 2022 9:16 AM To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> <user@accumulo.apache.org<mailto:user@accumulo.apache.org>> Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up What was going on in the tserver before you saw that error? Did it finish recovering after the restart? If it is still recovering, I don't think you will be able to do any scans. On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>> wrote: Thanks Mike, That was my first reaction but the instance is backed up by puppet and no configuration was updated (i double checked and ran puppet manually as well as automatically after restart), Since the system was operational yesterday, So I think I can rule that out. For other error, I did see the exact error https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j<https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>, https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> https://markmail.org/message/bc7ijdsgqmod5p2h<https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$> but those are for lot older accumulo. and server didn't go out of memory so I think that must have been fixed.. [https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> COMET - accumulomaster out of memory issue · Issue #14 · RENCI-NRIG/COMET-Accumulo<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> In COMET cluster running in AWS, node running accumulomaster also hosts comet head node. In current deployment, EC2 instance is of type small which has 2GB ram. Issue: Accumulomaster process is kil... github.com<https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$> -S ________________________________ From: Mike Miller <mmil...@apache.org<mailto:mmil...@apache.org>> Sent: Tuesday, March 15, 2022 8:47 AM To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> <user@accumulo.apache.org<mailto:user@accumulo.apache.org>> Subject: [External] Re: odd issue with accumulo 1.10.0 starting up Check your configuration. The log message indicates that there is a problem with the internal system user performing operations. The internal system user uses credentials derived from the configuration (such as the instance.secret field). Make sure your configuration is identical across all nodes in your cluster. On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] <ligade_shail...@bah.com<mailto:ligade_shail...@bah.com>> wrote: Hello, I am getting little odd issue with accumulo starting up on tserver i am seeing [tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS) on the ,aster log i am seeing ERROR: read a frame size of 1195725856, which is bigger than the maximum allowable buffer size for ALL connections. from the shell i can list all the tables but canot scan any. Monitor is shwoing tablet count 0 and unassigned tablet 1 HDFS fsck is all healthy. Any suggestions? Thanks -S