Thanx Everyone!!!
Just to conclude the thread.
Have created HDFS-15162 to track this.

-Ayush

> On 09-Feb-2020, at 5:01 PM, Ayush Saxena <ayush...@gmail.com> wrote:
> 
> Hi Stephen,
> We are trying this on 3.1.1
> We aren’t upgrading from 2.x, we are trying to increase the cluster size to 
> go beyond 10K datanodes.
> In the process, we analysed that block reports from these many DN’s are quite 
> bothersome.
> There are plenty of reasons why block reports bothers performance, the major 
> being namenode holding the lock for these many datanodes, as you mentioned.
> HDFS-14657 may improve the situation a bit(I didn’t follow it) but our point 
> is rather than improving the impact, we can completely get rid of them in 
> most of the cases.
> 
> Why to unnecessarily have load of processing Block Reports, if it isn’t doing 
> anything good.
> 
> So, just wanted to know, if people are aware of any cases where eliminating 
> regular BR’s can be a problem, which we might have missed.
> 
> Let me know if you possess hard feelings for the change or  doubt something.
> 
> -Ayush
> 
>>> On 07-Feb-2020, at 4:03 PM, Stephen O'Donnell <sodonn...@cloudera.com> 
>>> wrote:
>>> 
>> 
>> Are you seeing this problem on the 3.x branch, and if so, did the problem 
>> exist before you upgraded to 3.x? I am wondering if the situation is better 
>> or worse since moving to 3.x.
>> 
>> Also, do you believe the issue is driven by the namenode holding its lock 
>> for too long while it processes each block report, blocking other threads?
>> 
>> There was an interesting proposal in 
>> https://issues.apache.org/jira/browse/HDFS-14657 to allow the NN lock to be 
>> dropped and retaken periodically while processing FBRs, but it has not 
>> progressed recently. I wonder if that would help here?
>> 
>> Thanks,
>> 
>> Stephen.
>> 
>>> On Fri, Feb 7, 2020 at 6:58 AM Surendra Singh Lilhore 
>>> <surendralilh...@apache.org> wrote:
>>> Thanks Wei-Chiu,
>>> 
>>> I feel now IBR is more stable in branch 3.x. If BR is just added to prevent
>>> bugs in IBR, I feel we should fix such bug in IBR. Adding one new
>>> functionality to prevent bug in other is not good.
>>> 
>>> I also thing, DN should send BR in failure and process start scenario only.
>>> 
>>> -Surendra
>>> 
>>> On Fri, Feb 7, 2020 at 10:52 AM Ayush Saxena <ayush...@gmail.com> wrote:
>>> 
>>> > Hi Wei-Chiu,
>>> > Thanx for the response.
>>> > Yes, We are talking about the FBR only.
>>> > Increasing the frequency limits the problem, but doesn’t seems to be
>>> > solving it. With increasing cluster size, the frequency needs to be
>>> > increased, and we cannot increase it indefinitely, as in some case FBR is
>>> > needed.
>>> > One such case is Namenode failover, In case of failover the namenode marks
>>> > all the storages as Stale, it would correct them only once FBR comes, Any
>>> > overreplicated blocks won’t be deleted until the storages are in stale
>>> > state.
>>> >
>>> > Regarding the IBR error, the block is set Completed post IBR, when the
>>> > client claimed value and IBR values matches, so if there is a discrepancy
>>> > here, it would alarm out there itself.
>>> >
>>> > If it passes over this spot, so the FBR would also be sending the same
>>> > values from memory, it doesn’t check from the actual disk.
>>> > DirectoryScanner would be checking if the in memory data is same as that
>>> > on the disk.
>>> > Other scenario where FBR could be needed is to counter a split brain
>>> > scenario, but with QJM’s that is unlikely to happen.
>>> >
>>> > In case of any connection losses during the interval, we tend to send the
>>> > BR, so should be safe here.
>>> >
>>> > Anyway if a client gets hold of a invalid block, it will too report to the
>>> > Namenode.
>>> >
>>> > Other we cannot think as such, where not sending FBR can cause any issue.
>>> >
>>> > Let us know your thoughts on this..
>>> >
>>> > -Ayush
>>> >
>>> > >>> On 07-Feb-2020, at 4:12 AM, Wei-Chiu Chuang <weic...@apache.org>
>>> > wrote:
>>> > >> Hey Ayush,
>>> > >>
>>> > >> Thanks a lot for your proposal.
>>> > >>
>>> > >> Do you mean the Full Block Report that is sent out every 6 hours per
>>> > >> DataNode?
>>> > >> Someone told me they reduced the frequency of FBR to 24 hours and it
>>> > seems
>>> > >> okay.
>>> > >>
>>> > >> One of the purposes of FBR was to prevent bugs in incremental block
>>> > report
>>> > >> implementation. In other words, it's a fail-safe mechanism. Any bugs in
>>> > >> IBRs get corrected after a FBR that refreshes the state of blocks at
>>> > >> NameNode. At least, that's my understanding of FBRs in its early days.
>>> > >>
>>> > >> On Tue, Feb 4, 2020 at 12:21 AM Ayush Saxena <ayush...@gmail.com>
>>> > wrote:
>>> > >>
>>> > >> Hi All,
>>> > >> Me and Surendra have been lately trying to minimise the impact of Block
>>> > >> Reports on Namenode in huge cluster. We observed in a huge cluster,
>>> > about
>>> > >> 10k datanodes, the periodic block reports impact the Namenode
>>> > performance
>>> > >> adversely.
>>> > >> We have been thinking to restrict the block reports to be triggered 
>>> > >> only
>>> > >> during Namenode startup or in case of failover and eliminate the
>>> > periodic
>>> > >> block report.
>>> > >> The main purpose of block report is to get a corrupt blocks recognised,
>>> > so
>>> > >> as a follow up we can maintain a service at datanode to run
>>> > periodically to
>>> > >> check if the block size in memory is same as that reported to namenode,
>>> > and
>>> > >> the datanode can alarm the namenode in case of any suspect,(We still
>>> > need
>>> > >> to plan this.)
>>> > >>
>>> > >> At the datanode side, a datanode can send a BlockReport or restore its
>>> > >> actual frequency in case during the configured time period, the 
>>> > >> Datanode
>>> > >> got shutdown or lost connection with the namenode, say if the datanode
>>> > was
>>> > >> supposed to send BR at 2100 hrs, if during the last 6 hrs there has 
>>> > >> been
>>> > >> any failover or loss of connection between the namenode and datanode, 
>>> > >> it
>>> > >> will trigger BR normally, else shall skip sending the BR
>>> > >>
>>> > >> Let us know thoughts/challenges/improvements in this.
>>> > >>
>>> > >> -Ayush
>>> > >>
>>> > >>
>>> > >>
>>> > >> ---------------------------------------------------------------------
>>> > >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>>> > >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>>> >

Reply via email to