On Fri, Mar 4, 2022 at 8:36 AM Ligade, Shailesh [USA]
<ligade_shail...@bah.com> wrote:
>
EDIT again: Thanks Chris[topher],


>
> Appreciate your support!
>
> Not sure why volumes.replacement was set, especially since we have HA 
> namenode and that’s the only hdfs targeted. The volumes.replacement was set 
> to the same url though e.g. nameservice/accumulo, nameservice:8020/accumulo

That explains the relocation messages.

>
> Regardless, when tserver went down, even though if we set 
> table.suspend.duration=15m, I was seeing volume replacement messages in the 
> master log for every tablet hosted and that is taking looong time (hours for 
> 33k tablets/tserver). So how best to remove this volumes? There is no 
> delete-volumes, I see only add-volumes under accumulo init. Is there anything 
> I need to do after I remove entire instance.volumes.replacement section from 
> accumulo-site.xml?

Just restart any server that had that replacements config, so they
don't try to unnecessarily update metadata that is already correct.
Updating volume references using the replacements config is just a
metadata update, though, not a lot of I/O. I'm not sure it would
explain things taking a long time. It's possible that it's
contributing to the slowness, I suppose, perhaps the tserver hosting
the metadata tablet for the tablet whose metadata is being updated is
too managing 33k other tablets.

In the past, I think we've recommended around 100 up to 1K tablets per
server. I'm not sure if that's still a good recommendation or not. In
any case, you can't reduce the number of tablets you have without
doing merges, or deleting entire ranges, or compacting and bulk
importing into a new table with more reasonable split points. And you
probably shouldn't try that until you have your current situation
under control. But, that's sorta why I was previously suggesting to
examine your whole config. Maybe think about your whole architecture,
to figure out where you want to go, and compare with where you are
now, so you can figure out how to get to your target setup from your
current setup.

>
> I will have to look at each and every property to ensure it makes sense for 
> sure..
>
> Thanks
>
> -S
>
> -----Original Message-----
> From: Christopher <ctubb...@apache.org>
> Sent: Wednesday, March 2, 2022 3:09 PM
> To: accumulo-user <user@accumulo.apache.org>
> Subject: Re: [External] Re: accumulo 1.10.0 unassigned tablets issue
>
> On Wed, Mar 2, 2022 at 1:51 PM Ligade, Shailesh [USA] 
> <ligade_shail...@bah.com> wrote:
> >
> EDIT > Thanks Chris[topher],
> >
> > I do have instance.volume.replacement overridden
> >
> > Does that mean it will not work with table.suspend.duration property?
>
> No. It's just that's where the RecoveryManager message is coming from.
>
> >
> > uhmm thinking about it i am not sure why we set that as we have only one 
> > hdfs and we have less than 10 beefy nodes...
> >
> > may be I can remove this property after i set table.suspend.duration, and 
> > stop/reboot tserver. After i am done, i can restore the property. Please 
> > advise.
>
> I have no idea why you would set that if you're not replacing one volume with 
> another. I think you would probably benefit from reviewing all of your 
> configuration. Please check the documentation for an explanation of each 
> property. If you have a specific question regarding them, you can ask here, 
> but I would start by reviewing your configs against the docs.
>
> >
> > Thanks
> >
> > -S
> >
> >
> > ________________________________
> > From: Christopher <ctubb...@apache.org>
> > Sent: Wednesday, March 2, 2022 1:32 PM
> > To: accumulo-user <user@accumulo.apache.org>
> > Subject: [External] Re: accumulo 1.10.0 unassigned tablets issue
> >
> > The replacements message should only appear if you have
> > instance.volumes.replacements set in your configuration.
> >
> > On Wed, Mar 2, 2022 at 11:02 AM Ligade, Shailesh [USA]
> > <ligade_shail...@bah.com> wrote:
> > >
> > > Hello,
> > >
> > > I need reboot a tserver with 34k hosted tablets.
> > >
> > > I set table.supend.duration to 15 min and stop tserver and rebooted the 
> > > machine.
> > >
> > > As soon as tablet server came on line the its hosted tablets counts went 
> > > from 0 to 34k, however, on the master i see 34k unassigned tablets, 
> > > although the count is going down it is taking hours.
> > > not sure why master is stating unassigne dtablets when the tablet server 
> > > has correct hosted tablet server count?
> > >
> > > Also in the master log i see
> > >
> > > recovery.RecoveryManager INFO: Volume replaced hdfs://xxxx -> hdfs://xxxx 
> > >   the issue is both from and to hdfs urls are identical, so why master is 
> > > trying to do that??
> > >
> > > Is the cluster safe to use? I can reboot another tablet server before 
> > > this unassigned tablet count goes to 0? I can reboot entire cluster if i 
> > > have to, will that help?
> > >
> > > Thanks in advance.
> > >
> > > -S

Reply via email to