[jira] [Created] (HDFS-15420) approx scheduled blocks not reseting over time

2020-06-18 Thread Max Mizikar (Jira)
Max Mizikar created HDFS-15420:
--

 Summary: approx scheduled blocks not reseting over time
 Key: HDFS-15420
 URL: https://issues.apache.org/jira/browse/HDFS-15420
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: block placement
Affects Versions: 3.0.0, 2.6.0
 Environment: Our 2.6.0 environment is a 3 node cluster running 
cdh5.15.0.
Our 3.0.0 environment is a 4 node cluster running cdh6.3.0.
Reporter: Max Mizikar
 Attachments: Screenshot from 2020-06-18 09-29-57.png, Screenshot from 
2020-06-18 09-31-15.png

We have been experiencing large amounts of scheduled blocks that never get 
cleared out. This is preventing blocks from being placed even when there is 
plenty of space on the system.
Here is an example of the block growth over 24 hours on one of our systems 
running 2.6.0
 !Screenshot from 2020-06-18 09-29-57.png! 
Here is an example of the block growth over 24 hours on one of our systems 
running 3.0.0
 !Screenshot from 2020-06-18 09-31-15.png! 
https://issues.apache.org/jira/browse/HDFS-1172 appears to be the main issue we 
were having on 2.6.0 so the growth has decreased since upgrading to 3.0.0, 
however, there appears to still be a systemic growth in scheduled blocks over 
time and our systems will still need to restart the namenode on occasion to 
reset this count. I have not determined what is causing the leaked blocks in 
3.0.0.

Looking into the issue, I discovered that the intention is for scheduled blocks 
to slowly go back down to 0 after errors cause blocks to be leaked.
{code}
  /** Increment the number of blocks scheduled. */
  void incrementBlocksScheduled(StorageType t) {
currApproxBlocksScheduled.add(t, 1);
  }
  
  /** Decrement the number of blocks scheduled. */
  void decrementBlocksScheduled(StorageType t) {
if (prevApproxBlocksScheduled.get(t) > 0) {
  prevApproxBlocksScheduled.subtract(t, 1);
} else if (currApproxBlocksScheduled.get(t) > 0) {
  currApproxBlocksScheduled.subtract(t, 1);
} 
// its ok if both counters are zero.
  }
  
  /** Adjusts curr and prev number of blocks scheduled every few minutes. */
  private void rollBlocksScheduled(long now) {
if (now - lastBlocksScheduledRollTime > BLOCKS_SCHEDULED_ROLL_INTERVAL) {
  prevApproxBlocksScheduled.set(currApproxBlocksScheduled);
  currApproxBlocksScheduled.reset();
  lastBlocksScheduledRollTime = now;
}
  }
{code}

However, this code does not do what is intended if the system has a constant 
flow of written blocks. If blocks make it into prevApproxBlocksScheduled, the 
next scheduled block increments currApproxBlocksScheduled and when it 
completes, it decrements prevApproxBlocksScheduled preventing the leaked block 
to be removed from the approx count. So, for errors to be corrected, we have to 
not write any data for the roll period of 10 minutes. The number of blocks we 
write per 10 minutes is quite high. This allows the error on the approx counts 
to grow to very large numbers.

The comments in the ticket for the original implementation suggest this issues 
was known. https://issues.apache.org/jira/browse/HADOOP-3707. However, it's not 
clear to me if the severity of it was known at the time.
> So if there are some blocks that are not reported back by the datanode, they 
> will eventually get adjusted (usually 10 min; bit longer if datanode is 
> continuously receiving blocks).
The comments suggest it will eventually get cleared out, but in our case, it 
never gets cleared out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-06-18 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/721/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint jshint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

findbugs :

   module:hadoop-common-project/hadoop-minikdc 
   Possible null pointer dereference in 
org.apache.hadoop.minikdc.MiniKdc.delete(File) due to return value of called 
method Dereferenced at 
MiniKdc.java:org.apache.hadoop.minikdc.MiniKdc.delete(File) due to return value 
of called method Dereferenced at MiniKdc.java:[line 515] 

findbugs :

   module:hadoop-common-project/hadoop-auth 
   
org.apache.hadoop.security.authentication.server.MultiSchemeAuthenticationHandler.authenticate(HttpServletRequest,
 HttpServletResponse) makes inefficient use of keySet iterator instead of 
entrySet iterator At MultiSchemeAuthenticationHandler.java:of keySet iterator 
instead of entrySet iterator At MultiSchemeAuthenticationHandler.java:[line 
192] 

findbugs :

   module:hadoop-common-project/hadoop-common 
   org.apache.hadoop.crypto.CipherSuite.setUnknownValue(int) 
unconditionally sets the field unknownValue At CipherSuite.java:unknownValue At 
CipherSuite.java:[line 44] 
   org.apache.hadoop.crypto.CryptoProtocolVersion.setUnknownValue(int) 
unconditionally sets the field unknownValue At 
CryptoProtocolVersion.java:unknownValue At CryptoProtocolVersion.java:[line 67] 
   Possible null pointer dereference in 
org.apache.hadoop.fs.FileUtil.fullyDeleteOnExit(File) due to return value of 
called method Dereferenced at 
FileUtil.java:org.apache.hadoop.fs.FileUtil.fullyDeleteOnExit(File) due to 
return value of called method Dereferenced at FileUtil.java:[line 118] 
   Possible null pointer dereference in 
org.apache.hadoop.fs.RawLocalFileSystem.handleEmptyDstDirectoryOnWindows(Path, 
File, Path, File) due to return value of called method Dereferenced at 
RawLocalFileSystem.java:org.apache.hadoop.fs.RawLocalFileSystem.handleEmptyDstDirectoryOnWindows(Path,
 File, Path, File) due to return value of called method Dereferenced at 
RawLocalFileSystem.java:[line 383] 
   Useless condition:lazyPersist == true at this point At 
CommandWithDestination.java:[line 502] 
   org.apache.hadoop.io.DoubleWritable.compareTo(DoubleWritable) 
incorrectly handles double value At DoubleWritable.java: At 
DoubleWritable.java:[line 78] 
   org.apache.hadoop.io.DoubleWritable$Comparator.compare(byte[], int, int, 
byte[], int, int) incorrectly handles double value At DoubleWritable.java:int) 
incorrectly handles double value At DoubleWritable.java:[line 97] 
   org.apache.hadoop.io.FloatWritable.compareTo(FloatWritable) incorrectly 
handles float value At FloatWritable.java: At FloatWritable.java:[line 71] 
   org.apache.hadoop.io.FloatWritable$Comparator.compare(byte[], int, int, 
byte[], int, int) incorrectly handles float value At FloatWritable.java:int) 
incorrectly handles float value At FloatWritable.java:[line 89] 
   Possible null pointer dereference in 
org.apache.hadoop.io.IOUtils.listDirectory(File, FilenameFilter) due to return 
value of called method Dereferenced at 
IOUtils.java:org.apache.hadoop.io.IOUtils.listDirectory(File, FilenameFilter) 
due to return value of called method Dereferenced at IOUtils.java:[line 389] 
   Possible bad parsing of shift operation in 
org.apache.hadoop.io.file.tfile.Utils$Version.hashCode() At 
Utils.java:operation in 
org.apache.hadoop.io.file.tfile.Utils$Version.hashCode() At Utils.java:[line 
398] 
   
org.apache.hadoop.metrics2.lib.DefaultMetricsFactory.setInstance(MutableMetricsFactory)
 unconditionally sets the field mmfImpl At DefaultMetricsFactory.java:mmfImpl 
At DefaultMetricsFactory.java:[line 49] 
   
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.setMiniClusterMode(boolean) 
unconditionally sets the field miniClusterMode At 
DefaultMetricsSystem.java:miniClusterMode At DefaultMetricsSystem.java:[line 
92] 
   Useless object stored in variable seqOs of method 
org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.addOrUpdateToken(AbstractDelegationTokenIdentifier,
 AbstractDelegationTokenSecretManager$DelegationTokenInformation, boolean) At 
ZKDelegationTokenSecretManager.java:seqOs of method 
org.apache.

Re: [DISCUSS] Hadoop 3.3.0 Release include ARM binary

2020-06-18 Thread Adam Antal
YARN-10314 is also merged. I don't see any blockers at this point.
(Actually I couldn't see any jiras

targeted for 3.3.0).

In the community sync yesterday we wanted to discuss the 3.3.0 release, but
nobody had information about it in the call. Could you share the latest on
the upcoming 3.3.0 release?

Thanks,
Adam

On Mon, Jun 15, 2020 at 9:17 AM Ayush Saxena  wrote:

> YARN-10314 also seems to be a blocker.
>
> https://issues.apache.org/jira/browse/YARN-10314
>
> We should wait for that as well, should get concluded in a day or two.
>
> -Ayush
>
> > On 15-Jun-2020, at 7:21 AM, Sheng Liu  wrote:
> >
> > The  HADOOP-17046 
> has
> > been merged :)
> >
> > Brahma Reddy Battula  于2020年6月4日周四 下午10:43写道:
> >
> >> Following blocker is pending for 3.3.0 release which is ready for
> review.
> >> Mostly we'll have RC soon.
> >> https://issues.apache.org/jira/browse/HADOOP-17046
> >>
> >> Protobuf dependency was unexpected .
> >>
> >>> On Mon, Jun 1, 2020 at 7:11 AM Sheng Liu 
> wrote:
> >>>
> >>> Hi folks,
> >>>
> >>> It looks like the 3.3.0 branch has been created for quite a while. Not
> >> sure
> >>> if there is remain block issue that need to be addressed before Hadoop
> >>> 3.3.0 release publishing, maybe we can bring up to here and move the
> >>> release forward ?
> >>>
> >>> Thank.
> >>>
> >>> Brahma Reddy Battula  于2020年3月25日周三 上午1:55写道:
> >>>
>  thanks to all.
> 
>  will make this as optional..will update the wiki accordingly.
> 
>  On Wed, Mar 18, 2020 at 12:05 AM Vinayakumar B <
> >> vinayakum...@apache.org>
>  wrote:
> 
> > Making ARM artifact optional, makes the release process simpler for
> >> RM
>  and
> > unblocks release process (if there is unavailability of ARM
> >> resources).
> >
> > Still there are possible options to collaborate with RM ( as brahma
> > mentioned earlier) and provide ARM artifact may be before or after
> >>> vote.
> > If feasible RM can decide to add ARM artifact by collaborating with
>  @Brahma
> > Reddy Battula  or me to get the ARM artifact.
> >
> > -Vinay
> >
> > On Tue, Mar 17, 2020 at 11:39 PM Arpit Agarwal
> >  wrote:
> >
> >> Thanks for the clarification Brahma. Can you update the proposal to
>  state
> >> that it is optional (it may help to put the proposal on cwiki)?
> >>
> >> Also if we go ahead then the RM documentation should be clear this
> >> is
>  an
> >> optional step.
> >>
> >>
> >>> On Mar 17, 2020, at 11:06 AM, Brahma Reddy Battula <
>  bra...@apache.org>
> >> wrote:
> >>>
> >>> Sure, we can't make mandatory while voting and we can upload to
> > downloads
> >>> once release vote is passed.
> >>>
> >>> On Tue, 17 Mar 2020 at 11:24 PM, Arpit Agarwal
> >>>  wrote:
> >>>
> > Sorry,didn't get you...do you mean, once release voting is
> > processed and upload by RM..?
> 
>  Yes, that is what I meant. I don’t want us to make more
> >> mandatory
>  work
> >> for
>  the release manager because the job is hard enough already.
> 
> 
> > On Mar 17, 2020, at 10:46 AM, Brahma Reddy Battula <
> > bra...@apache.org>
>  wrote:
> >
> > Sorry,didn't get you...do you mean, once release voting is
>  processed
> >> and
> > upload by RM..?
> >
> > FYI. There is docker image for ARM also which support all
> >> scripts
> > (createrelease, start-build-env.sh, etc ).
> >
> > https://issues.apache.org/jira/browse/HADOOP-16797
> >
> > On Tue, Mar 17, 2020 at 10:59 PM Arpit Agarwal
> >  wrote:
> >
> >> Can ARM binaries be provided after the fact? We cannot
> >> increase
>  the
> >> RM’s
> >> burden by asking them to generate an extra set of binaries.
> >>
> >>
> >>> On Mar 17, 2020, at 10:23 AM, Brahma Reddy Battula <
> >> bra...@apache.org>
> >> wrote:
> >>>
> >>> + Dev mailing list.
> >>>
> >>> -- Forwarded message -
> >>> From: Brahma Reddy Battula 
> >>> Date: Tue, Mar 17, 2020 at 10:31 PM
> >>> Subject: Re: [DISCUSS] Hadoop 3.3.0 Release include ARM
> >> binary
> >>> To: junping_du 
> >>>
> >>>
> >>> thanks junping for your reply.
> >>>
> >>> bq.  I think most of us in Hadoop community doesn't want
> >> to
> > have
> >> biased
> >>> on ARM or any other platforms.
> >>>
> >>> Yes, release voting will be based on the source
> >>> code.AFAIK,Binary

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2020-06-18 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/177/

[Jun 17, 2020 3:56:41 AM] (noreply) YARN-10314. YarnClient throws 
NoClassDefFoundError for WebSocketException with only shaded client jars (#2075)
[Jun 17, 2020 8:25:40 AM] (Ayush Saxena) HADOOP-9851. dfs -chown does not like 
"+" plus sign in user name. Contributed by Andras Bokor.
[Jun 17, 2020 12:34:40 PM] (Szilard Nemeth) YARN-10281. Redundant QueuePath 
usage in UserGroupMappingPlacementRule and AppNameMappingPlacementRule. 
Contributed by Gergely Pollak
[Jun 17, 2020 3:15:26 PM] (noreply) HADOOP-17020. Improve RawFileSystem 
Performance (#2063)
[Jun 17, 2020 4:04:26 PM] (Eric Yang) YARN-10308. Update javadoc and variable 
names for YARN service.


[Error replacing 'FILE' - Workspace is not accessible]

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-06-18 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/722/

No changes




-1 overall


The following subsystems voted -1:
docker


Powered by Apache Yetushttps://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org