Thanx Everyone!!! Just to conclude the thread. Have created HDFS-15162 to track this.
-Ayush > On 09-Feb-2020, at 5:01 PM, Ayush Saxena <ayush...@gmail.com> wrote: > > Hi Stephen, > We are trying this on 3.1.1 > We aren’t upgrading from 2.x, we are trying to increase the cluster size to > go beyond 10K datanodes. > In the process, we analysed that block reports from these many DN’s are quite > bothersome. > There are plenty of reasons why block reports bothers performance, the major > being namenode holding the lock for these many datanodes, as you mentioned. > HDFS-14657 may improve the situation a bit(I didn’t follow it) but our point > is rather than improving the impact, we can completely get rid of them in > most of the cases. > > Why to unnecessarily have load of processing Block Reports, if it isn’t doing > anything good. > > So, just wanted to know, if people are aware of any cases where eliminating > regular BR’s can be a problem, which we might have missed. > > Let me know if you possess hard feelings for the change or doubt something. > > -Ayush > >>> On 07-Feb-2020, at 4:03 PM, Stephen O'Donnell <sodonn...@cloudera.com> >>> wrote: >>> >> >> Are you seeing this problem on the 3.x branch, and if so, did the problem >> exist before you upgraded to 3.x? I am wondering if the situation is better >> or worse since moving to 3.x. >> >> Also, do you believe the issue is driven by the namenode holding its lock >> for too long while it processes each block report, blocking other threads? >> >> There was an interesting proposal in >> https://issues.apache.org/jira/browse/HDFS-14657 to allow the NN lock to be >> dropped and retaken periodically while processing FBRs, but it has not >> progressed recently. I wonder if that would help here? >> >> Thanks, >> >> Stephen. >> >>> On Fri, Feb 7, 2020 at 6:58 AM Surendra Singh Lilhore >>> <surendralilh...@apache.org> wrote: >>> Thanks Wei-Chiu, >>> >>> I feel now IBR is more stable in branch 3.x. If BR is just added to prevent >>> bugs in IBR, I feel we should fix such bug in IBR. Adding one new >>> functionality to prevent bug in other is not good. >>> >>> I also thing, DN should send BR in failure and process start scenario only. >>> >>> -Surendra >>> >>> On Fri, Feb 7, 2020 at 10:52 AM Ayush Saxena <ayush...@gmail.com> wrote: >>> >>> > Hi Wei-Chiu, >>> > Thanx for the response. >>> > Yes, We are talking about the FBR only. >>> > Increasing the frequency limits the problem, but doesn’t seems to be >>> > solving it. With increasing cluster size, the frequency needs to be >>> > increased, and we cannot increase it indefinitely, as in some case FBR is >>> > needed. >>> > One such case is Namenode failover, In case of failover the namenode marks >>> > all the storages as Stale, it would correct them only once FBR comes, Any >>> > overreplicated blocks won’t be deleted until the storages are in stale >>> > state. >>> > >>> > Regarding the IBR error, the block is set Completed post IBR, when the >>> > client claimed value and IBR values matches, so if there is a discrepancy >>> > here, it would alarm out there itself. >>> > >>> > If it passes over this spot, so the FBR would also be sending the same >>> > values from memory, it doesn’t check from the actual disk. >>> > DirectoryScanner would be checking if the in memory data is same as that >>> > on the disk. >>> > Other scenario where FBR could be needed is to counter a split brain >>> > scenario, but with QJM’s that is unlikely to happen. >>> > >>> > In case of any connection losses during the interval, we tend to send the >>> > BR, so should be safe here. >>> > >>> > Anyway if a client gets hold of a invalid block, it will too report to the >>> > Namenode. >>> > >>> > Other we cannot think as such, where not sending FBR can cause any issue. >>> > >>> > Let us know your thoughts on this.. >>> > >>> > -Ayush >>> > >>> > >>> On 07-Feb-2020, at 4:12 AM, Wei-Chiu Chuang <weic...@apache.org> >>> > wrote: >>> > >> Hey Ayush, >>> > >> >>> > >> Thanks a lot for your proposal. >>> > >> >>> > >> Do you mean the Full Block Report that is sent out every 6 hours per >>> > >> DataNode? >>> > >> Someone told me they reduced the frequency of FBR to 24 hours and it >>> > seems >>> > >> okay. >>> > >> >>> > >> One of the purposes of FBR was to prevent bugs in incremental block >>> > report >>> > >> implementation. In other words, it's a fail-safe mechanism. Any bugs in >>> > >> IBRs get corrected after a FBR that refreshes the state of blocks at >>> > >> NameNode. At least, that's my understanding of FBRs in its early days. >>> > >> >>> > >> On Tue, Feb 4, 2020 at 12:21 AM Ayush Saxena <ayush...@gmail.com> >>> > wrote: >>> > >> >>> > >> Hi All, >>> > >> Me and Surendra have been lately trying to minimise the impact of Block >>> > >> Reports on Namenode in huge cluster. We observed in a huge cluster, >>> > about >>> > >> 10k datanodes, the periodic block reports impact the Namenode >>> > performance >>> > >> adversely. >>> > >> We have been thinking to restrict the block reports to be triggered >>> > >> only >>> > >> during Namenode startup or in case of failover and eliminate the >>> > periodic >>> > >> block report. >>> > >> The main purpose of block report is to get a corrupt blocks recognised, >>> > so >>> > >> as a follow up we can maintain a service at datanode to run >>> > periodically to >>> > >> check if the block size in memory is same as that reported to namenode, >>> > and >>> > >> the datanode can alarm the namenode in case of any suspect,(We still >>> > need >>> > >> to plan this.) >>> > >> >>> > >> At the datanode side, a datanode can send a BlockReport or restore its >>> > >> actual frequency in case during the configured time period, the >>> > >> Datanode >>> > >> got shutdown or lost connection with the namenode, say if the datanode >>> > was >>> > >> supposed to send BR at 2100 hrs, if during the last 6 hrs there has >>> > >> been >>> > >> any failover or loss of connection between the namenode and datanode, >>> > >> it >>> > >> will trigger BR normally, else shall skip sending the BR >>> > >> >>> > >> Let us know thoughts/challenges/improvements in this. >>> > >> >>> > >> -Ayush >>> > >> >>> > >> >>> > >> >>> > >> --------------------------------------------------------------------- >>> > >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >>> > >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >>> >