Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Ligade, Shailesh [USA] Tue, 15 Mar 2022 09:47:33 -0700

Not daily but  over weekend.
________________________________
From: Mike Miller <[email protected]>
Sent: Tuesday, March 15, 2022 10:39 AM
To: [email protected] <[email protected]>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up


Why are you bringing the cluster down every night? That is not ideal.

On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Mike,

We bring the servers down nightly. these are on aws. This worked yesterday 
(Monday) but this (Tuesday) i went on to check on it and it was down, I guess i 
didn't check yesterday. I assume it was up as no one complained., but it was up 
and kicking last week for sure.

So not exactly sure when or what caused it, all services are up (tserver, 
master) so services are not crashing themselves.

I guess worst case, i can re-initialize and recreate tables form hdfs..:-(

-S
________________________________
From: Mike Miller <[email protected]<mailto:[email protected]>>
Sent: Tuesday, March 15, 2022 9:16 AM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: [External] Re: odd issue with accumulo 1.10.0 starting up

What was going on in the tserver before you saw that error? Did it finish 
recovering after the restart? If it is still recovering, I don't think you will 
be able to do any scans.

On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Mike,

That was my first reaction but the instance is backed up by puppet and no 
configuration was updated (i double checked and ran puppet manually as well as 
automatically after restart), Since the system was operational yesterday, So I 
think I can rule that out.

For other error, I did see the exact error 
https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j<https://urldefense.com/v3/__https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>,
  
https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
 
https://markmail.org/message/bc7ijdsgqmod5p2h<https://urldefense.com/v3/__https://markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
 but those are for lot older accumulo. and server didn't go out of memory so I 
think that must have been fixed..
[https://opengraph.githubassets.com/a2a13484b2e7a58170dedb3c7c2ac885281f5a1590788aadd302359400e5f74c/RENCI-NRIG/COMET-Accumulo/issues/14]<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
COMET - accumulomaster out of memory issue · Issue #14 · 
RENCI-NRIG/COMET-Accumulo<https://urldefense.com/v3/__https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
In COMET cluster running in AWS, node running accumulomaster also hosts comet 
head node. In current deployment, EC2 instance is of type small which has 2GB 
ram. Issue: Accumulomaster process is kil...
github.com<https://urldefense.com/v3/__http://github.com__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3PgqTmzQ$>


-S

________________________________
From: Mike Miller <[email protected]<mailto:[email protected]>>
Sent: Tuesday, March 15, 2022 8:47 AM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: [External] Re: odd issue with accumulo 1.10.0 starting up

Check your configuration. The log message indicates that there is a problem 
with the internal system user performing operations. The internal system user 
uses credentials derived from the configuration (such as the instance.secret 
field). Make sure your configuration is identical across all nodes in your 
cluster.

On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

I am getting little odd issue with accumulo starting up

on tserver i am seeing

[tserver.TabletServer] ERROR: Caller doesn't have permission to get active scnas
ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)

on the ,aster log i am seeing

ERROR: read a frame size of 1195725856, which is bigger than the maximum 
allowable buffer size for ALL connections.

from the shell i can list all the tables but canot scan any. Monitor is shwoing 
tablet count 0 and unassigned tablet 1

HDFS fsck is all healthy.

Any suggestions?

Thanks

-S

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Reply via email to