Re: [VOTE] Release Apache Hadoop 3.1.1 - RC0

2018-08-07 Thread Craig . Condit
+1 (non-binding).

Built from tag, spun up single-node cluster, ran basic DFS commands, various MR 
/ YARN jobs with and without Docker.

Craig Condit

On 2018/08/02 18:43:50, Wangda Tan mailto:w...@gmail.com>> 
wrote:
> Hi folks,>
>
> I've created RC0 for Apache Hadoop 3.1.1. The artifacts are available here:>
>
> http://people.apache.org/~wangda/hadoop-3.1.1-RC0/>
>
> The RC tag in git is release-3.1.1-RC0:>
> https://github.com/apache/hadoop/commits/release-3.1.1-RC0>
>
> The maven artifacts are available via repository.apache.org at>
> https://repository.apache.org/content/repositories/orgapachehadoop-1139/>
>
> You can find my public key at>
> http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS>
>
> This vote will run 5 days from now.>
>
> 3.1.1 contains 435 [1] fixed JIRA issues since 3.1.0.>
>
> I have done testing with a pseudo cluster and distributed shell job. My +1>
> to start.>
>
> Best,>
> Wangda Tan>
>
> [1] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.1.1)>
> ORDER BY priority DESC>
>


Re: [VOTE] Release Apache Hadoop 3.2.0 - RC1

2019-01-10 Thread Craig . Condit
+1 (non-binding)

- built from source on CentOS 7.5
- deployed single node cluster
- ran several yarn jobs
- ran multiple docker jobs, including spark-on-docker

On 1/8/19, 5:42 AM, "Sunil G"  wrote:

Hi folks,


Thanks to all of you who helped in this release [1] and for helping to vote
for RC0. I have created second release candidate (RC1) for Apache Hadoop
3.2.0.


Artifacts for this RC are available here:

http://home.apache.org/~sunilg/hadoop-3.2.0-RC1/


RC tag in git is release-3.2.0-RC1.



The maven artifacts are available via repository.apache.org at
https://repository.apache.org/content/repositories/orgapachehadoop-1178/


This vote will run 7 days (5 weekdays), ending on 14th Jan at 11:59 pm PST.



3.2.0 contains 1092 [2] fixed JIRA issues since 3.1.0. Below feature
additions

are the highlights of this release.

1. Node Attributes Support in YARN

2. Hadoop Submarine project for running Deep Learning workloads on YARN

3. Support service upgrade via YARN Service API and CLI

4. HDFS Storage Policy Satisfier

5. Support Windows Azure Storage - Blob file system in Hadoop

6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a

7. Improvements in Router-based HDFS federation



Thanks to Wangda, Vinod, Marton for helping me in preparing the release.

I have done few testing with my pseudo cluster. My +1 to start.



Regards,

Sunil



[1]


https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E

[2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.2.0)
AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status = Resolved
ORDER BY fixVersion ASC




Re: [EXTERNAL] Re: [DISCUSS] Secure Hadoop without Kerberos

2020-05-20 Thread Craig . Condit
I have to strongly disagree with making UGI.doAs() private. Just because you 
feel that impersonation isn't an important feature, does not make it so for all 
users. There are many valid use cases which require impersonation, and in fact 
I consider this to be one of the differentiating features of the Hadoop 
ecosystem. We make use of it heavily to build a variety of services which would 
not be possible without this. Also consider that in addition to gateway 
services such as Knox being broken by this change, you would also cripple job 
schedulers such as Oozie. Running workloads on YARN as different users is vital 
to ensure that queue resources are allocated and accounted for properly as well 
as file permissions enforced. Without impersonation, all users of a cluster 
would need to be granted access to talk directly to YARN. Higher level access 
points or APIs would not be possible.

Craig Condit


From: Eric Yang 
Sent: Wednesday, May 20, 2020 1:57 PM
To: Akira Ajisaka 
Cc: Hadoop Common 
Subject: [EXTERNAL] Re: [DISCUSS] Secure Hadoop without Kerberos

Hi Akira,

Thank you for the information.  Knox plays a main role in reverse proxy for
Hadoop cluster.  I understand the importance to keep Knox running to
centralize audit log for ingress into the cluster.  Other reverse proxy
solution like Nginx are more feature rich for caching static contents and
load balancer.  It would be great to have ability to use either Knox or
Nginx as reverse proxy solution.  Company wide OIDC is likely to run
independently from Hadoop cluster, but also possible to run in a Hadoop
cluster.  Reverse proxy must have ability to redirects to OIDC where
exposed endpoint is appropriate.

HADOOP-11717 was a good effort to enable SSO integration except it is
written to extend on Kerberos authentication, which prevents decoupling
from Kerberos a reality.  I gathered a few design requirements this
morning, and welcome to contribute:

1.  Encryption is mandatory.  Server certificate validation is required.
2.  Existing token infrastructure for block access token remains the same.
3.  Replace delegation token transport with OIDC JWT token.
4.  Patch token renewer logic to support renew token with OIDC endpoint
before token expires.
5.  Impersonation logic uses service user credentials.  New way to renew
service user credentials securely.
6.  Replace Hadoop RPC SASL transport with TLS because OIDC works with TLS
natively.
7.  Command CLI improvements to use environment variables or files for
accessing client credentials

Downgrade the use of UGI.doAs() to private of Hadoop.  Service should not
run with elevated privileges unless there is a good reason for it (i.e.
loading hive external tables).
I think this is good starting point, and feedback can help to turn these
requirements into tasks.  Let me know what you think.  Thanks

regards,
Eric

On Tue, May 19, 2020 at 9:47 PM Akira Ajisaka  wrote:

> Hi Eric, thank you for starting the discussion.
>
> I'm interested in OpenID Connect (OIDC) integration.
>
> In addition to the benefits (security, cloud native), operating costs may
> be reduced in some companies.
> We have our company-wide OIDC provider and enable SSO for Hadoop Web UIs
> via Knox + OIDC in Yahoo! JAPAN.
> On the other hand, Hadoop administrators have to manage our own KDC
> servers only for Hadoop ecosystems.
> If Hadoop and its ecosystem can support OIDC, we don't have to manage KDC
> and that way operating costs will be reduced.
>
> Regards,
> Akira
>
> On Thu, May 7, 2020 at 7:32 AM Eric Yang  wrote:
>
>> Hi all,
>>
>> Kerberos was developed decade before web development becomes popular.
>> There are some Kerberos limitations which does not work well in Hadoop.  A
>> few examples of corner cases:
>>
>> 1. Kerberos principal doesn't encode port number, it is difficult to know
>> if the principal is coming from an authorized daemon or a hacker container
>> trying to forge service principal.
>> 2. Hadoop Kerberos principals are used as high privileged principal, a
>> form
>> of credential to impersonate end user.
>> 3. Delegation token may allow expired users to continue to run jobs long
>> after they are gone, without rechecking if end user credentials is still
>> valid.
>> 4.  Passing different form of tokens does not work well with cloud
>> provider
>> security mechanism.  For example, passing AWS sts token for S3 bucket.
>> There is no renewal mechanism, nor good way to identify when the token
>> would expire.
>>
>> There are companies that work on bridging security mechanism of different
>> types, but this is not primary goal for Hadoop.  Hadoop can benefit from
>> modernized security using open standards like OpenID Connect, which
>> proposes to unify web applic

[jira] [Created] (HADOOP-16240) start-build-env.sh can consume all disk space during image creation

2019-04-08 Thread Craig Condit (JIRA)
Craig Condit created HADOOP-16240:
-

 Summary: start-build-env.sh can consume all disk space during 
image creation
 Key: HADOOP-16240
 URL: https://issues.apache.org/jira/browse/HADOOP-16240
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 3.3.0, 3.2.1
 Environment: macOS Mohave 10.14.4
Reporter: Craig Condit


The start-build-env.sh creates a Docker image and creates a user within it 
which maps to the user ID from the host. In the case where the host UID is very 
large (> 1 billion or so, not uncommon in large AD deployments), the resultant 
image fails to build due to /var/log/lastlog and /var/log/faillog growing to 
consume all available disk space.

These files are not necessary for the build process and if they do not exist, 
they will not be grown.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org