[jira] [Created] (HDFS-15474) HttpFS: WebHdfsFileSystem cannot renew an expired delegation token from HttpFS response

2020-07-17 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-15474:
---

 Summary: HttpFS: WebHdfsFileSystem cannot renew an expired 
delegation token from HttpFS response
 Key: HDFS-15474
 URL: https://issues.apache.org/jira/browse/HDFS-15474
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


When clients use WebHdfsFileSystem for HttpFS, they cannot renew expired 
delegation tokens with the following error.
{noformat}
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.security.token.SecretManager$InvalidToken: token (owner=..., 
renewer=..., realUser=..., issueDate=..., maxDate=..., sequenceNumber=..., 
masterKeyId=...) is expired
at 
org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:89)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:509)
...
{noformat}
When using WebHdfsFileSystem for NameNode, it succeeds. This is because the 
response of HttpFS is different from its of NameNode. We should fix the 
response of HttpFS.

This issue is reported by Masayuki Yatagawa.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [RESULT][VOTE] Rlease Apache Hadoop-3.3.0

2020-07-17 Thread Ayush Saxena
Hi Brahma,
Seems the link to changelog for Release-3.3.0 isn't correct at :
https://hadoop.apache.org/

It points to :
http://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-common/release/3.3.0/CHANGES.3.3.0.html

CHANGES.3.3.0.html isn't there, instead it should point to :

http://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-common/release/3.3.0/CHANGELOG.3.3.0.html

Please give a check once!!!

-Ayush




On Wed, 15 Jul 2020 at 19:18, Brahma Reddy Battula 
wrote:

> Hi Stephen,
>
> thanks for bringing this to my attention.
>
> Looks it's late..I pushed the release tag ( which can't be reverted) and
> updated the release date in the jira.
>
> Can we plan this next release near future..?
>
>
> On Wed, Jul 15, 2020 at 5:25 PM Stephen O'Donnell
>  wrote:
>
> > Hi All,
> >
> > Sorry for being a bit late to this, but I wonder if we have a potential
> > blocker to this release.
> >
> > In Cloudera we have recently encountered a serious dataloss issue in HDFS
> > surrounding snapshots. To hit the dataloss issue, you must have
> HDFS-13101
> > and HDFS-15012 on the build (which branch-3.3.0 does). To prevent it, you
> > must also have HDFS-15313 and unfortunately, this was only committed to
> > trunk, so we need to cherry-pick it down the active branches.
> >
> > With data loss being a serious issue, should we pull this Jira into
> > branch-3.3.0 and cut a new release candidate?
> >
> > Thanks,
> >
> > Stephen.
> >
> > On Tue, Jul 14, 2020 at 1:22 PM Brahma Reddy Battula 
> > wrote:
> >
> > > Hi All,
> > >
> > > With 8 binding and 11 non-binding +1s and no -1s the vote for Apache
> > > hadoop-3.3.0 Release
> > > passes.
> > >
> > > Thank you everybody for contributing to the release, testing, and
> voting.
> > >
> > > Special thanks whoever verified the ARM Binary as this is the first
> > release
> > > to support the ARM in hadoop.
> > >
> > >
> > > Binding +1s
> > >
> > > =
> > > Akira Ajisaka
> > > Vinayakumar B
> > > Inigo Goiri
> > > Surendra Singh Lilhore
> > > Masatake Iwasaki
> > > Rakesh Radhakrishnan
> > > Eric Badger
> > > Brahma Reddy Battula
> > >
> > > Non-binding +1s
> > >
> > > =
> > > Zhenyu Zheng
> > > Sheng Liu
> > > Yikun Jiang
> > > Tianhua huang
> > > Ayush Saxena
> > > Hemanth Boyina
> > > Bilwa S T
> > > Takanobu Asanuma
> > > Xiaoqiao He
> > > CR Hota
> > > Gergely Pollak
> > >
> > > I'm going to work on staging the release.
> > >
> > >
> > > The voting thread is:
> > >
> > >  https://s.apache.org/hadoop-3.3.0-Release-vote-thread
> > >
> > >
> > >
> > > --Brahma Reddy Battula
> > >
> >
>
>
> --
>
>
>
> --Brahma Reddy Battula
>


Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-07-17 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/750/

[Jul 16, 2020 4:17:17 PM] (hexiaoqiao) HDFS-14498. LeaseManager can loop 
forever on the file for which create




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint jshint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

findbugs :

   module:hadoop-yarn-project/hadoop-yarn 
   Useless object stored in variable removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:[line 664] 
   
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeVeryOldStoppedContainersFromCache()
 makes inefficient use of keySet iterator instead of entrySet iterator At 
NodeStatusUpdaterImpl.java:keySet iterator instead of entrySet iterator At 
NodeStatusUpdaterImpl.java:[line 741] 
   
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.createStatus()
 makes inefficient use of keySet iterator instead of entrySet iterator At 
ContainerLocalizer.java:keySet iterator instead of entrySet iterator At 
ContainerLocalizer.java:[line 359] 
   
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.usageMetrics
 is a mutable collection which should be package protected At 
ContainerMetrics.java:which should be package protected At 
ContainerMetrics.java:[line 134] 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 
   
org.apache.hadoop.yarn.state.StateMachineFactory.generateStateGraph(String) 
makes inefficient use of keySet iterator instead of entrySet iterator At 
StateMachineFactory.java:keySet iterator instead of entrySet iterator At 
StateMachineFactory.java:[line 505] 

findbugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
   
org.apache.hadoop.yarn.state.StateMachineFactory.generateStateGraph(String) 
makes inefficient use of keySet iterator instead of entrySet iterator At 
StateMachineFactory.java:keySet iterator instead of entrySet iterator At 
StateMachineFactory.java:[line 505] 

findbugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
   Useless object stored in variable removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:[line 664] 
   
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeVeryOldStoppedContainersFromCache()
 makes inefficient use of keySet iterator instead of entrySet iterator At 
NodeStatusUpdaterImpl.java:keySet iterator instead of entrySet iterator At 
NodeStatusUpdaterImpl.java:[line 741] 
   
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.createStatus()
 makes inefficient use of keySet iterator instead of entrySet iterator At 
ContainerLocalizer.java:keySet iterator instead of entrySet iterator At 
ContainerLocalizer.java:[line 359] 
   
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.usageMetrics
 is a mutable collection which should be package protected At 
ContainerMetrics.java:which should be package protected At 
ContainerMetrics.java:[line 134] 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean)

[Reposting] How to set md5 checksum

2020-07-17 Thread nikita Balakrishnan
Hey team,

I’m developing a system where we are trying to sink to an immutable s3
bucket as part of a flink job. This bucket has server side encryption set
to KMS. The DataStream sink works perfectly fine when I don’t use the
immutable bucket but when I use an immutable bucket, I get exceptions
regarding multipart upload failures. It says we need to enable md5 hashing
for the put object to work.

According to was s3 documentation for immutable buckets (with object locks)
they say it’s mandatory to have a content-md5 header -
https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html
“The Content-MD5 header is required for any request to upload an object
with a retention period configured using Amazon S3 Object Lock. For more
information about Amazon S3 Object Lock, see Amazon S3 Object Lock Overview
 in
the *Amazon Simple Storage Service Developer Guide*."

My question to the link team was -  "How do I set this HTTP header while
sinking? I checked most of the documentation and tried going through the
source code too but couldn’t really find a provision where we could set the
headers for a request that goes in as a sink." And they got back asking me
to set fs.s3a.etag.checksum.enabled: true. But that didn’t work either. And
then they redirected me to the Hadoop team. Can you please help me out here?

The one thing I’m unclear about is how does the system know that we’re
using md5 hashing when we enable checksum? Is there some way to specify
that? Feels like I’m missing that.

Here’s the stack trace:

org.apache.flink.streaming.runtime.tasks.AsynchronousException: Caught
exception while processing timer.
at
org.apache.flink.streaming.runtime.tasks.StreamTask$StreamTaskAsyncExceptionHandler.handleAsyncException(StreamTask.java:1090)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.handleAsyncException(StreamTask.java:1058)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1520)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$10(StreamTask.java:1509)
at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261)
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.flink.streaming.runtime.tasks.TimerException:
java.io.IOException: Uploading parts failed
... 11 common frames omitted
Caused by: java.io.IOException: Uploading parts failed
at
org.apache.flink.fs.s3.common.writer.RecoverableMultiPartUploadImpl.awaitPendingPartUploadToComplete(RecoverableMultiPartUploadImpl.java:231)
at
org.apache.flink.fs.s3.common.writer.RecoverableMultiPartUploadImpl.awaitPendingPartsUpload(RecoverableMultiPartUploadImpl.java:215)
at
org.apache.flink.fs.s3.common.writer.RecoverableMultiPartUploadImpl.snapshotAndGetRecoverable(RecoverableMultiPartUploadImpl.java:151)
at
org.apache.flink.fs.s3.common.writer.RecoverableMultiPartUploadImpl.snapshotAndGetCommitter(RecoverableMultiPartUploadImpl.java:123)
at
org.apache.flink.fs.s3.common.writer.RecoverableMultiPartUploadImpl.snapshotAndGetCommitter(RecoverableMultiPartUploadImpl.java:56)
at
org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeForCommit(S3RecoverableFsDataOutputStream.java:167)
at
org.apache.flink.streaming.api.functions.sink.filesystem.PartFileWriter.closeForCommit(PartFileWriter.java:71)
at
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:239)
at
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onProcessingTime(Bucket.java:338)
at
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.onProcessingTime(Buckets.java:304)
at
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.onProcessingTime(StreamingFileSink.java:439)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1518)
... 10 common frames omitted
Caused by: org.apache.hadoop.fs.s3a.AWSBadRequestException: upload part on
raw_events/xxx/xxx/2020/07/15/20/archived-2-0.txt:
com.amazonaws.services.s3.model.AmazonS3Exception: Content-MD5 HTTP header
is required for Put Part requests with Object Lock parameters (Service:
Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: xxx;
S3 Extended Request ID: ), 

[jira] [Created] (HDFS-15475) -D mapreduce.framework.name CLI parameter for miniCluster not working

2020-07-17 Thread Xiang Zhang (Jira)
Xiang Zhang created HDFS-15475:
--

 Summary: -D mapreduce.framework.name CLI parameter for miniCluster 
not working
 Key: HDFS-15475
 URL: https://issues.apache.org/jira/browse/HDFS-15475
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Xiang Zhang


I am running miniCluster using doc here: 
[https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/CLIMiniCluster.html.|https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/CLIMiniCluster.html]

I notice that by default mapreduce jobs do not run on YARN and I understand 
that setting mapreduce.framework.name to yarn can make this work.
{code:java}


mapreduce.framework.name
yarn

{code}
So I tried to add this in etc/hadoop/mapreduces-site.xml and managed to run a 
wordcount example through this:
{code:java}
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar 
wordcount hdfs://localhost:8020/user/iamabug/input 
hdfs://localhost:8020/user/iamabug/output
{code}
However, according to the doc, this parameter should also be available to be 
set through -D parameter, i.e.,
{code:java}
bin/hadoop jar 
./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.0-tests.jar 
minicluster  -format -D mapreduce.framework.name=yarn -writeConfig 2.txt
{code}
Notice that I write config to 2.txt and the parameter can be found in this file:
{code:java}

mapreduce.framework.name
yarn
programatically

{code}
I submitted a wordcount example again and it didn't run on YARN according to 
the logs and YARN Web UI ([http://localhost:8088).|http://localhost:8088)./]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org