Re: Restrict Frequency of BlockReport To Namenode startup and failover

Surendra Singh Lilhore Thu, 06 Feb 2020 22:59:24 -0800

Thanks Wei-Chiu,

I feel now IBR is more stable in branch 3.x. If BR is just added to prevent
bugs in IBR, I feel we should fix such bug in IBR. Adding one new
functionality to prevent bug in other is not good.


I also thing, DN should send BR in failure and process start scenario only.

-Surendra

On Fri, Feb 7, 2020 at 10:52 AM Ayush Saxena <[email protected]> wrote:

> Hi Wei-Chiu,
> Thanx for the response.
> Yes, We are talking about the FBR only.
> Increasing the frequency limits the problem, but doesn’t seems to be
> solving it. With increasing cluster size, the frequency needs to be
> increased, and we cannot increase it indefinitely, as in some case FBR is
> needed.
> One such case is Namenode failover, In case of failover the namenode marks
> all the storages as Stale, it would correct them only once FBR comes, Any
> overreplicated blocks won’t be deleted until the storages are in stale
> state.
>
> Regarding the IBR error, the block is set Completed post IBR, when the
> client claimed value and IBR values matches, so if there is a discrepancy
> here, it would alarm out there itself.
>
> If it passes over this spot, so the FBR would also be sending the same
> values from memory, it doesn’t check from the actual disk.
> DirectoryScanner would be checking if the in memory data is same as that
> on the disk.
> Other scenario where FBR could be needed is to counter a split brain
> scenario, but with QJM’s that is unlikely to happen.
>
> In case of any connection losses during the interval, we tend to send the
> BR, so should be safe here.
>
> Anyway if a client gets hold of a invalid block, it will too report to the
> Namenode.
>
> Other we cannot think as such, where not sending FBR can cause any issue.
>
> Let us know your thoughts on this..
>
> -Ayush
>
> >>> On 07-Feb-2020, at 4:12 AM, Wei-Chiu Chuang <[email protected]>
> wrote:
> >> Hey Ayush,
> >>
> >> Thanks a lot for your proposal.
> >>
> >> Do you mean the Full Block Report that is sent out every 6 hours per
> >> DataNode?
> >> Someone told me they reduced the frequency of FBR to 24 hours and it
> seems
> >> okay.
> >>
> >> One of the purposes of FBR was to prevent bugs in incremental block
> report
> >> implementation. In other words, it's a fail-safe mechanism. Any bugs in
> >> IBRs get corrected after a FBR that refreshes the state of blocks at
> >> NameNode. At least, that's my understanding of FBRs in its early days.
> >>
> >> On Tue, Feb 4, 2020 at 12:21 AM Ayush Saxena <[email protected]>
> wrote:
> >>
> >> Hi All,
> >> Me and Surendra have been lately trying to minimise the impact of Block
> >> Reports on Namenode in huge cluster. We observed in a huge cluster,
> about
> >> 10k datanodes, the periodic block reports impact the Namenode
> performance
> >> adversely.
> >> We have been thinking to restrict the block reports to be triggered only
> >> during Namenode startup or in case of failover and eliminate the
> periodic
> >> block report.
> >> The main purpose of block report is to get a corrupt blocks recognised,
> so
> >> as a follow up we can maintain a service at datanode to run
> periodically to
> >> check if the block size in memory is same as that reported to namenode,
> and
> >> the datanode can alarm the namenode in case of any suspect,(We still
> need
> >> to plan this.)
> >>
> >> At the datanode side, a datanode can send a BlockReport or restore its
> >> actual frequency in case during the configured time period, the Datanode
> >> got shutdown or lost connection with the namenode, say if the datanode
> was
> >> supposed to send BR at 2100 hrs, if during the last 6 hrs there has been
> >> any failover or loss of connection between the namenode and datanode, it
> >> will trigger BR normally, else shall skip sending the BR
> >>
> >> Let us know thoughts/challenges/improvements in this.
> >>
> >> -Ayush
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
>

Re: Restrict Frequency of BlockReport To Namenode startup and failover

Reply via email to