Hi Colin,

Thanks for the hints around JIRAs.

You are correct errno still exists, however sys_errlist does not.

Hadoop uses a function terror (defined in exception.c) which indexes sys_errlist by errno to return the error message from the array. This function is called 26 times in various places (in 2.2)

Originally, I thought to replace all calls to terror with strerror, but there can be issues with multi-threading (it returns a buffer which can be overwritten), so it seemed simpler just to recreate the sys_errlist message array.

There is also a multi-threaded version strerror_r where you pass the buffer as a parameter, but this would necessitate changing every call to terror with mutiple lines of code.

Sorry, I wasn't clear.

Also, I have been requested to ensure my port is available on 2.4, perceived as a more stable release. If I make changes to this branch are they automatically available for 2.6, or will I need multiple JIRAs ?

Thanks,
Malcolm

On 12/10/2014 10:45 AM, Colin McCabe wrote:
Hi Malcolm,

In general we file JIRAs for particular issues.  So if one issue is
handling errlist on Solaris, that might be one JIRA.  Another issue
might be handling socket write timeouts on Solaris.  And so on.  Most
of these should probably be HADOOP tickets since they sound like they
are mostly in the generic hadoop-common code.

"solaris does not have errno" seems like a bold statement.  errno is
part of POSIX, and Solaris is a POSIX os, right?  Am I way off base on
this?
I googled around and one of the first results I found talked about
errno values on Solaris.
http://www.pixelstech.net/article/1413273556-A-trick-of-building-multithreaded-application-on-Solaris
  Perhaps I misunderstood what you meant by this statement.

Anyway, please file JIRAs for any portability improvements you can think of!

best,
Colin

On Mon, Dec 8, 2014 at 9:09 PM, malcolm <malcolm.kaval...@oracle.com> wrote:
Hi Colin,

A short summary of my changes are as follows:

- Native C source files: added 5,  modified 6, requiring also changes to
CMakeLists.txt. Of course, all changes are "ifdeffed" for Solaris
appropriately and new files, are prefixed with solaris_ as well.

For example, Solaris does not have errno, or errlist any more which are used
quite a lot in hadoop native code. I could have replaced all calls to use
strerror instead which would be compatible with Linux, however in the
interests of making minimal changes, I recreated and added these files from
a running Solaris machine instead.

Another issue is that Solaris doesn't have the timeout option for sockets,
so I had to write my own solaris_read routine with timeout and added it to
DomainSocket.c . A few issues with lz4 on Sparc needed modification, and
some other OS specific issues: getgrouplist, container-executer (from yarn).

- Some very minor changes were made to some Java source files (mainly tests
to get them to pass on Solaris)

The above changes were made to 2.2, I will recheck everything against the
latest trunk, maybe some fixes aren't needed any more.

I have generated a single patch file with all changes. Perhaps it would be
better to file multiple JIRAs for each change, perhaps grouped, one per
issue ? Or should I file a JIRA for each modified source file ?

Thank you,
Malcolm


On 12/08/2014 09:53 PM, Colin McCabe wrote:
Hi Malcolm,

It's great that you are going to contribute!  Please make your patches
against trunk.

2.2 is fairly old at this point.  It hasn't been the focus of
development in more than a year.

We don't use github or pull requests.

Check the section on http://wiki.apache.org/hadoop/HowToContribute
that talks about "Contributing your work".  Excerpt:
"Finally, patches should be attached to an issue report in Jira via
the Attach File link on the issue's Jira. Please add a comment that
asks for a code review following our code review checklist. Please
note that the attachment should be granted license to ASF for
inclusion in ASF works (as per the Apache License ยง5)."

As this says, you attach the patch file to a JIRA that you have
created, and then hit "submit patch."

I don't think a branch is required for this work since it is just
build fixes, right?

best,
Colin


On Mon, Dec 8, 2014 at 3:30 AM, malcolm <malcolm.kaval...@oracle.com>
wrote:
I have ported Hadoop  native libraries to Solaris 11 (both Sparc and
Intel )
Oracle have agreed to release my changes to the community so that Solaris
platforms can benefit.
Reading the HowToContribute and GitandHadoop documents, I am not 100%
clear
on how to get my changes into the main tree. I am also a Git(hub) newbie,
and was using svn previously.

Please let me know if I am going the correct path:

1. I forked Hadoop on Github and downloaded a clone to my development
machine.

2. The changes I made were to 2.2.0, can I still add changes to this
branch,
and hopefully get them accepted or must I migrate my changes to 2.6 ? (On
the main Hadoop download page, 2.2 is still listed as the GA version )

3. I understand that I should create a new branch for my changes, and
then
generate pull requests after uploading them to Github.

4. I also registered  at Jira in the understanding that I need to
generate a
Jira number for my changes, and to name my branch accordingly ?

Does all this make sense ?

Thanks,
Malcolm



Reply via email to