Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Mike Miller Thu, 17 Mar 2022 08:45:04 -0700

Are you still running Replication? I would turn it off if you can.

On Thu, Mar 17, 2022 at 7:44 AM dev1 <d...@etcoleman.com> wrote:


> When an Accumulo process abnormally terminates, there may be a file create
> with the exception of the problem – the files may be names *.out (or *.err)
> can’t recall which. Normally the files have 0 size, but on termination will
> have some text.
>
>
>
> Are you seeing those files and do they point to the issue?
>
>
>
> Do you have the jvm configured to terminate on out of memory – and print
> that error condition? Maybe the manager is running out of memory.
>
>
>
> Ed Coleman
>
>
>
> *From:* Ligade, Shailesh [USA] <ligade_shail...@bah.com>
> *Sent:* Wednesday, March 16, 2022 3:31 PM
> *To:* user@accumulo.apache.org
> *Subject:* RE: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> Thanks,
>
>
>
> I think we are having the same or similar issue with virus scan/security
> scan. However that should not bring down the master, can it??
>
>
>
> I am still digging thru the logs.
>
>
>
> -S
>
>
>
> *From:* Adam J. Shook <adamjsh...@gmail.com>
> *Sent:* Wednesday, March 16, 2022 2:46 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> This is certainly anecdotal, but we've seen this "ERROR: Read a frame size
> of (large number)" before on our Accumulo cluster that would show up at a
> regular and predictable frequency. The root cause was due to a routine scan
> done by the security team looking for vulnerabilities across the entire
> enterprise (nothing Accumulo-specific). I don't have any additional
> information about the specifics of the scan. From all that we can tell, it
> has no impact on our Accumulo cluster outside of these error messages.
>
>
>
> --Adam
>
>
>
> On Wed, Mar 16, 2022 at 8:35 AM Christopher <ctubb...@apache.org> wrote:
>
> Since that error message is coming from the libthrift library, and not
> Accumulo code, we would need a lot more context to even begin helping you
> troubleshoot it. For example, the complete stack trace that shows the
> Accumulo code that called into the Thrift library, would be extremely
> helpful.
>
> It's a bit concerning that you're trying to send a single buffer over
> thrift that's over a gigabyte in size, according to that number. You've
> said before that you use live ingest. Are you trying to send a 1GB mutation
> to a tablet server? Or are you using replication and the stack trace looks
> like it's sending 1GB of replication data?
>
>
>
> On Wed, Mar 16, 2022 at 7:14 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Well, I re-initialized accumulo but I still see
>
>
>
> ERROR: Read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
>
>
>
> Is there a setting that I can increase to get past it?
>
>
>
> -S
>
>
>
>
> ------------------------------
>
> *From:* Ligade, Shailesh [USA] <ligade_shail...@bah.com>
> *Sent:* Tuesday, March 15, 2022 12:47 PM
> *To:* user@accumulo.apache.org <user@accumulo.apache.org>
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> Not daily but  over weekend.
> ------------------------------
>
> *From:* Mike Miller <mmil...@apache.org>
> *Sent:* Tuesday, March 15, 2022 10:39 AM
> *To:* user@accumulo.apache.org <user@accumulo.apache.org>
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> Why are you bringing the cluster down every night? That is not ideal.
>
>
>
> On Tue, Mar 15, 2022 at 9:24 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
>
>
> We bring the servers down nightly. these are on aws. This worked yesterday
> (Monday) but this (Tuesday) i went on to check on it and it was down, I
> guess i didn't check yesterday. I assume it was up as no one complained.,
> but it was up and kicking last week for sure.
>
>
>
> So not exactly sure when or what caused it, all services are up (tserver,
> master) so services are not crashing themselves.
>
>
>
> I guess worst case, i can re-initialize and recreate tables form hdfs..:-(
>
>
>
> -S
> ------------------------------
>
> *From:* Mike Miller <mmil...@apache.org>
> *Sent:* Tuesday, March 15, 2022 9:16 AM
> *To:* user@accumulo.apache.org <user@accumulo.apache.org>
> *Subject:* Re: [External] Re: odd issue with accumulo 1.10.0 starting up
>
>
>
> What was going on in the tserver before you saw that error? Did it finish
> recovering after the restart? If it is still recovering, I don't think you
> will be able to do any scans.
>
>
>
> On Tue, Mar 15, 2022 at 8:56 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
>
> Thanks Mike,
>
>
>
> That was my first reaction but the instance is backed up by puppet and no
> configuration was updated (i double checked and ran puppet manually as well
> as automatically after restart), Since the system was operational
> yesterday, So I think I can rule that out.
>
>
>
> For other error, I did see the exact error
> https://lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j
> <https://urldefense.com/v3/__https:/lists.apache.org/thread/bobn2vhkswl6c0pkzpy8n13z087z1s6j__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3t6n73Xg$>
> ,  https://github.com/RENCI-NRIG/COMET-Accumulo/issues/14
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>  https://markmail.org/message/bc7ijdsgqmod5p2h
> <https://urldefense.com/v3/__https:/markmail.org/message/bc7ijdsgqmod5p2h__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs2d0PoHHw$>
>  but
> those are for lot older accumulo. and server didn't go out of memory so I
> think that must have been fixed..
>
> COMET - accumulomaster out of memory issue · Issue #14 ·
> RENCI-NRIG/COMET-Accumulo
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> In COMET cluster running in AWS, node running accumulomaster also hosts
> comet head node. In current deployment, EC2 instance is of type small which
> has 2GB ram. Issue: Accumulomaster process is kil...
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> github.com
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> -S
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
>
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
> ------------------------------
>
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
>
>
>
>
>
> *From: Mike Miller <mmil...@apache.org> Sent: Tuesday, March 15, 2022 8:47
> AM To: user@accumulo.apache.org <user@accumulo.apache.org> Subject:
> [External] Re: odd issue with accumulo 1.10.0 starting up
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>*
>
>
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> Check your configuration. The log message indicates that there is a
> problem with the internal system user performing operations. The internal
> system user uses credentials derived from the configuration (such as the
> instance.secret field). Make sure your configuration is identical across
> all nodes in your cluster.
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> On Tue, Mar 15, 2022 at 8:34 AM Ligade, Shailesh [USA] <
> ligade_shail...@bah.com> wrote:
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> Hello,
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> I am getting little odd issue with accumulo starting up
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> on tserver i am seeing
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> [tserver.TabletServer] ERROR: Caller doesn't have permission to get active
> scnas
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> ThriftSecurityException(user:!SYSTEM, code:BAD_CREDENTIALS)
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> on the ,aster log i am seeing
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> ERROR: read a frame size of 1195725856, which is bigger than the maximum
> allowable buffer size for ALL connections.
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> from the shell i can list all the tables but canot scan any. Monitor is
> shwoing tablet count 0 and unassigned tablet 1
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> HDFS fsck is all healthy.
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> Any suggestions?
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> Thanks
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
> -S
> <https://urldefense.com/v3/__https:/github.com/RENCI-NRIG/COMET-Accumulo/issues/14__;!!May37g!bEmBzvybPxmvx4MS_-OYwTOeru_6IIn_qXlJD6pLuO1q59kx4txH7_I3zs3RaAeRzw$>
>
>

Re: [External] Re: odd issue with accumulo 1.10.0 starting up

Reply via email to