[jira] [Created] (HDFS-15161) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException in ShortCircuitCache#close()

2020-02-11 Thread Lisheng Sun (Jira)
Lisheng Sun created HDFS-15161:
--

 Summary: When evictableMmapped or evictable size is zero, do not 
throw NoSuchElementException in ShortCircuitCache#close() 
 Key: HDFS-15161
 URL: https://issues.apache.org/jira/browse/HDFS-15161
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lisheng Sun
Assignee: Lisheng Sun


detail see 
 # HDFS-14541

 # HDFS-14541

 # HDFS-14541

 

 
{code:java}
/**
 * Close the cache and free all associated resources.
 */
@Override
public void close() {
  try {
lock.lock();
if (closed) return;
closed = true;
LOG.info(this + ": closing");
maxNonMmappedEvictableLifespanMs = 0;
maxEvictableMmapedSize = 0;
// Close and join cacheCleaner thread.
IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner);
// Purge all replicas.
while (true) {
  Object eldestKey;
  try {
eldestKey = evictable.firstKey();
  } catch (NoSuchElementException e) {
break;
  }
  purge((ShortCircuitReplica)evictable.get(eldestKey));
}
while (true) {
  Object eldestKey;
  try {
eldestKey = evictableMmapped.firstKey();
  } catch (NoSuchElementException e) {
break;
  }
  purge((ShortCircuitReplica)evictableMmapped.get(eldestKey));
}
  } finally {
lock.unlock();
  }
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-02-11 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.fs.viewfs.TestViewFileSystemHdfs 
   hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA 
   hadoop.hdfs.TestRollingUpgrade 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.registry.secure.TestSecureLogins 
   hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-shellcheck.txt
  [56K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/whitespace-tabs.txt
  [1.3M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt
  [1.1M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [236K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-yarn-project_hadoop-

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2020-02-11 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/

[Feb 10, 2020 4:13:11 AM] (iwasakims) HADOOP-16739. Fix native build failure of 
hadoop-pipes on CentOS 8.




-1 overall


The following subsystems voted -1:
asflicense findbugs pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core
 
   Class org.apache.hadoop.applications.mawo.server.common.TaskStatus 
implements Cloneable but does not define or use clone method At 
TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 
39-346] 
   Equals method for 
org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument 
is of type WorkerId At WorkerId.java:the argument is of type WorkerId At 
WorkerId.java:[line 114] 
   
org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does 
not check for null argument At WorkerId.java:null argument At 
WorkerId.java:[lines 114-115] 

FindBugs :

   module:hadoop-cloud-storage-project/hadoop-cos 
   Redundant nullcheck of dir, which is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:[line 66] 
   org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may 
expose internal representation by returning CosNInputStream$ReadBuffer.buffer 
At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At 
CosNInputStream.java:[line 87] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, 
byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, 
File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long):in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long): new String(byte[]) At 
CosNativeFileSystemStore.java:[line 178] 
   org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, 
String, String, int) may fail to clean up java.io.InputStream Obligation to 
clean up resource created at CosNativeFileSystemStore.java:fail to clean up 
java.io.InputStream Obligation to clean up resource created at 
CosNativeFileSystemStore.java:[line 252] is not discharged 

Failed junit tests :

   hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-compile-cc-root.txt
  [8.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-compile-javac-root.txt
  [428K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-checkstyle-root.txt
  [16M]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-shellcheck.txt
  [16K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-shelldocs.txt
  [44K]

   whitespace:

   
https://build

[jira] [Created] (HDFS-15162) Optimize frequency of regular block reports

2020-02-11 Thread Ayush Saxena (Jira)
Ayush Saxena created HDFS-15162:
---

 Summary: Optimize frequency of regular block reports
 Key: HDFS-15162
 URL: https://issues.apache.org/jira/browse/HDFS-15162
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Avoid sending block report at regular interval, if there is no failover, 
DiskError or any exception encountered in connecting to the Namenode.
This JIRA intents to limit the regular block reports to be sent only in case of 
the above scenarios and during re-registration  of datanode, to eliminate the 
overhead of processing BlockReports at Namenode in case of huge clusters.
*Eg.* If a block report was sent at  hours and the next was scheduled at 
0600 hours if there is no above mentioned scenario, it will skip sending the 
BR, and schedule it to next 1200 hrs. if something of such sort happens between 
06:- 12: it would send the BR normally.

*NOTE*: This would be optional and can be turned off by default. Would add a 
configuration to enable this.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Restrict Frequency of BlockReport To Namenode startup and failover

2020-02-11 Thread Ayush Saxena
Thanx Everyone!!!
Just to conclude the thread.
Have created HDFS-15162 to track this.

-Ayush

> On 09-Feb-2020, at 5:01 PM, Ayush Saxena  wrote:
> 
> Hi Stephen,
> We are trying this on 3.1.1
> We aren’t upgrading from 2.x, we are trying to increase the cluster size to 
> go beyond 10K datanodes.
> In the process, we analysed that block reports from these many DN’s are quite 
> bothersome.
> There are plenty of reasons why block reports bothers performance, the major 
> being namenode holding the lock for these many datanodes, as you mentioned.
> HDFS-14657 may improve the situation a bit(I didn’t follow it) but our point 
> is rather than improving the impact, we can completely get rid of them in 
> most of the cases.
> 
> Why to unnecessarily have load of processing Block Reports, if it isn’t doing 
> anything good.
> 
> So, just wanted to know, if people are aware of any cases where eliminating 
> regular BR’s can be a problem, which we might have missed.
> 
> Let me know if you possess hard feelings for the change or  doubt something.
> 
> -Ayush
> 
>>> On 07-Feb-2020, at 4:03 PM, Stephen O'Donnell  
>>> wrote:
>>> 
>> 
>> Are you seeing this problem on the 3.x branch, and if so, did the problem 
>> exist before you upgraded to 3.x? I am wondering if the situation is better 
>> or worse since moving to 3.x.
>> 
>> Also, do you believe the issue is driven by the namenode holding its lock 
>> for too long while it processes each block report, blocking other threads?
>> 
>> There was an interesting proposal in 
>> https://issues.apache.org/jira/browse/HDFS-14657 to allow the NN lock to be 
>> dropped and retaken periodically while processing FBRs, but it has not 
>> progressed recently. I wonder if that would help here?
>> 
>> Thanks,
>> 
>> Stephen.
>> 
>>> On Fri, Feb 7, 2020 at 6:58 AM Surendra Singh Lilhore 
>>>  wrote:
>>> Thanks Wei-Chiu,
>>> 
>>> I feel now IBR is more stable in branch 3.x. If BR is just added to prevent
>>> bugs in IBR, I feel we should fix such bug in IBR. Adding one new
>>> functionality to prevent bug in other is not good.
>>> 
>>> I also thing, DN should send BR in failure and process start scenario only.
>>> 
>>> -Surendra
>>> 
>>> On Fri, Feb 7, 2020 at 10:52 AM Ayush Saxena  wrote:
>>> 
>>> > Hi Wei-Chiu,
>>> > Thanx for the response.
>>> > Yes, We are talking about the FBR only.
>>> > Increasing the frequency limits the problem, but doesn’t seems to be
>>> > solving it. With increasing cluster size, the frequency needs to be
>>> > increased, and we cannot increase it indefinitely, as in some case FBR is
>>> > needed.
>>> > One such case is Namenode failover, In case of failover the namenode marks
>>> > all the storages as Stale, it would correct them only once FBR comes, Any
>>> > overreplicated blocks won’t be deleted until the storages are in stale
>>> > state.
>>> >
>>> > Regarding the IBR error, the block is set Completed post IBR, when the
>>> > client claimed value and IBR values matches, so if there is a discrepancy
>>> > here, it would alarm out there itself.
>>> >
>>> > If it passes over this spot, so the FBR would also be sending the same
>>> > values from memory, it doesn’t check from the actual disk.
>>> > DirectoryScanner would be checking if the in memory data is same as that
>>> > on the disk.
>>> > Other scenario where FBR could be needed is to counter a split brain
>>> > scenario, but with QJM’s that is unlikely to happen.
>>> >
>>> > In case of any connection losses during the interval, we tend to send the
>>> > BR, so should be safe here.
>>> >
>>> > Anyway if a client gets hold of a invalid block, it will too report to the
>>> > Namenode.
>>> >
>>> > Other we cannot think as such, where not sending FBR can cause any issue.
>>> >
>>> > Let us know your thoughts on this..
>>> >
>>> > -Ayush
>>> >
>>> > >>> On 07-Feb-2020, at 4:12 AM, Wei-Chiu Chuang 
>>> > wrote:
>>> > >> Hey Ayush,
>>> > >>
>>> > >> Thanks a lot for your proposal.
>>> > >>
>>> > >> Do you mean the Full Block Report that is sent out every 6 hours per
>>> > >> DataNode?
>>> > >> Someone told me they reduced the frequency of FBR to 24 hours and it
>>> > seems
>>> > >> okay.
>>> > >>
>>> > >> One of the purposes of FBR was to prevent bugs in incremental block
>>> > report
>>> > >> implementation. In other words, it's a fail-safe mechanism. Any bugs in
>>> > >> IBRs get corrected after a FBR that refreshes the state of blocks at
>>> > >> NameNode. At least, that's my understanding of FBRs in its early days.
>>> > >>
>>> > >> On Tue, Feb 4, 2020 at 12:21 AM Ayush Saxena 
>>> > wrote:
>>> > >>
>>> > >> Hi All,
>>> > >> Me and Surendra have been lately trying to minimise the impact of Block
>>> > >> Reports on Namenode in huge cluster. We observed in a huge cluster,
>>> > about
>>> > >> 10k datanodes, the periodic block reports impact the Namenode
>>> > performance
>>> > >> adversely.
>>> > >> We have been thinking to restrict the block reports to be triggered 
>>> > >> only
>>> > >> during