Are you still running Replication? I would turn it off if you can. On Thu, Mar 17, 2022 at 7:44 AM dev1 <d...@etcoleman.com> wrote:
> When an Accumulo process abnormally terminates, there may be a file create > with the exception of the problem – the files may be names *.out (or *.err) > can’t recall which. Normally the files have 0 size, but on termination will > have some text. > > > > Are you seeing those files and do they point to the issue? > > > > Do you have the jvm configured to terminate on out of memory – and print > that error condition? Maybe the manager is running out of memory. > > > > Ed Coleman > > > > *From:* Ligade, Shailesh [USA] <ligade_shail...@bah.com> > *Sent:* Wednesday, March 16, 2022 3:31 PM > *To:* user@accumulo.apache.org > *Subject:* RE: [External] Re: odd issue with accumulo 1.10.0 starting up > > > > Thanks, > > > > I think we are having the same or similar issue with virus scan/security > scan. However that should not bring down the master, can it?? > > > > I am still digging thru the logs. > > > > -S > > > > *From:* Adam J. Shook <adamjsh...@gmail.com> > *Sent:* Wednesday, March 16, 2022 2:46 PM > *To:* user@accumulo.apache.org > *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up > > > > This is certainly anecdotal, but we've seen this "ERROR: Read a frame size > of (large number)" before on our Accumulo cluster that would show up at a > regular and predictable frequency. The root cause was due to a routine scan > done by the security team looking for vulnerabilities across the entire > enterprise (nothing Accumulo-specific). I don't have any additional > information about the specifics of the scan. From all that we can tell, it > has no impact on our Accumulo cluster outside of these error messages. > > > > --Adam > > > > On Wed, Mar 16, 2022 at 8:35 AM Christopher <ctubb...@apache.org> wrote: > > Since that error message is coming from the libthrift library, and not > Accumulo code, we would need a lot more context to even begin helping you > troubleshoot it. For example, the complete stack trace that shows the > Accumulo code that called into the Thrift library, would be extremely > helpful. > > It's a bit concerning that you're trying to send a single buffer over > thrift that's over a gigabyte in size, according to that number. You've > said before that you use live ingest. Are you trying to send a 1GB mutation > to a tablet server? Or are you using replication and the stack trace looks > like it's sending 1GB of replication data? > > > > On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] < > ligade_shail...@bah.com> wrote: > > Well, I re-initialized accumulo but I still see > > > > ERROR: Read a frame size of 1195725856, which is bigger than the maximum > allowable buffer size for ALL connections. > > > > Is there a setting that I can increase to get past it? > > > > -S > > > > > ------------------------------ > > *From:* Ligade, Shailesh [USA] <ligade_shail...@bah.com> > *Sent:* Tuesday, March 15, 2022 12:47 PM > *To:* user@accumulo.apache.org <user@accumulo.apache.org> > *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up > > > > Not daily but over weekend. > ------------------------------ > > *From:* Mike Miller <mmil...@apache.org> > *Sent:* Tuesday, March 15, 2022 10:39 AM > *To:* user@accumulo.apache.org <user@accumulo.apache.org> > *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up > > > > Why are you bringing the cluster down every night? That is not ideal. > > > > On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] < > ligade_shail...@bah.com> wrote: > > Thanks Mike, > > > > We bring the servers down nightly. these are on aws. This worked yesterday > (Monday) but this (Tuesday) i went on to check on it and it was down, I > guess i didn't check yesterday. I assume it was up as no one complained., > but it was up and kicking last week for sure. > > > > So not exactly sure when or what caused it, all services are up (tserver, > master) so services are not crashing themselves. > > > > I guess worst case, i can re-initialize and recreate tables form hdfs..:-( > > > > -S > ------------------------------ > > *From:* Mike Miller <mmil...@apache.org> > *Sent:* Tuesday, March 15, 2022 9:16 AM > *To:* user@accumulo.apache.org <user@accumulo.apache.org> > *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up > > > > What was going on in the tserver before you saw that error? Did it finish > recovering after the restart? If it is still recovering, I don't think you > will be able to do any scans. > > > > On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] < > ligade_shail...@bah.com> wrote: > > Thanks Mike, > > > > That was my first reaction but the instance is backed up by puppet and no > configuration was updated (i double checked and ran puppet manually as well > as automatically after restart), Since the system was operational > yesterday, So I think I can rule that out. > > > > For other error, I did see the exact error > https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j > <https://urldefense.com/v3/__https:/lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$> > , https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14 > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > https://markmail.org/message/bc7ijdsgqmod5p2h > <https://urldefense.com/v3/__https:/markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$> > but > those are for lot older accumulo. and server didn't go out of memory so I > think that must have been fixed.. > > COMET - accumulomaster out of memory issue · Issue #14 · > RENCI-NRIG/COMET-Accumulo > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > In COMET cluster running in AWS, node running accumulomaster also hosts > comet head node. In current deployment, EC2 instance is of type small which > has 2GB ram. Issue: Accumulomaster process is kil... > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > github.com > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > -S > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > ------------------------------ > > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > > > > > > *From: Mike Miller <mmil...@apache.org> Sent: Tuesday, March 15, 2022 8:47 > AM To: user@accumulo.apache.org <user@accumulo.apache.org> Subject: > [External] Re: odd issue with accumulo 1.10.0 starting up > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>* > > > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > Check your configuration. The log message indicates that there is a > problem with the internal system user performing operations. The internal > system user uses credentials derived from the configuration (such as the > instance.secret field). Make sure your configuration is identical across > all nodes in your cluster. > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] < > ligade_shail...@bah.com> wrote: > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > Hello, > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > I am getting little odd issue with accumulo starting up > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > on tserver i am seeing > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > [tserver.TabletServer] ERROR: Caller doesn't have permission to get active > scnas > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS) > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > on the ,aster log i am seeing > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > ERROR: read a frame size of 1195725856, which is bigger than the maximum > allowable buffer size for ALL connections. > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > from the shell i can list all the tables but canot scan any. Monitor is > shwoing tablet count 0 and unassigned tablet 1 > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > HDFS fsck is all healthy. > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > Any suggestions? > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > Thanks > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > > -S > <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$> > >