Thanks, Malcom.  I reviewed it.  The only thing you still have to do
is hit "submit patch" to get a Jenkins run.  See our HowToContribute
wiki page for more details.

wiki.apache.org/hadoop/HowToContribute

best,
Colin

On Sat, Dec 13, 2014 at 9:22 PM, malcolm <malcolm.kaval...@oracle.com> wrote:
> I am checking on the latest release of Solaris 11 and yes, it is still
> thread safe (or MT Safe as documented on the man page).
>
> strerror checks the error code, and returns the same "unknown error" string
> as terror does, if it receives an invalid code. I checked this on Windows,
> Solaris and Linux (though my changes only affect Solaris platforms).
>
> JIRA newbie question:
>
> I have filed the JIRA attaching the patch  HADOOP-11403 against the trunk,
> asking for reviewers in the comments section.
> Is there any other protocol I should follow ?
>
> Thanks,
> Malcolm
>
>
> On 12/14/2014 01:08 AM, Asokan, M wrote:
>>
>> Malcom,
>>     That's great! Is strerror() thread-safe in the recent version of
>> Solaris?  In any case, to be correct you still need to make sure that the
>> code passed to strerror() is a valid one.  For this you need to check errno
>> after the call to strerror().  Please check the code snippet I sent earlier
>> for HPUX.
>>
>> -- Asokan
>> ________________________________________
>> From: malcolm [malcolm.kaval...@oracle.com]
>> Sent: Saturday, December 13, 2014 3:13 PM
>> To: common-dev@hadoop.apache.org
>> Subject: Re: Solaris Port SOLVED!
>>
>> Wiping egg off face  ...
>>
>> After consulting with the Solaris team (and looking at the source code
>> and man page) ,  it turns out that strerror itself on Solaris is MT-Safe
>> ! (Just like HPUX)
>>
>> So, after all this effort, all I need to do is modify terror as follows:
>>
>>      const char* terror(int errnum)
>>      {
>>
>>      #if defined(__sun)
>>         return strerror(errnum); //  MT-Safe under Solaris
>>      #else
>>         if ((errnum < 0) || (errnum >= sys_nerr)) {
>>           return "unknown error.";
>>         }
>>         return sys_errlist[errnum];
>>      #endif
>>      }
>>
>> And in two other files where sys_errlist is referenced directly
>> (NativeIO and hdfs_http_client.c), I replaced this direct access instead
>> with a call to terror.
>>
>> Thanks for all your help and patience,
>>
>> I'll file a JIRA asap,
>>
>> Cheers,
>> Malcolm
>>
>> On 12/13/2014 05:26 PM, malcolm wrote:
>>>
>>> Thanks Asokan,
>>>
>>> Looked up Gcc's thread local variables, seems a bit complex though and
>>> quite specific to Gnu.
>>>
>>> Intialization of the static errlist array should be thread safe i.e.
>>> initially the array is nulled out, and afterwards if two threads write
>>> to the same address, then they would be writing the same string.
>>>
>>> But if we are ok with changing 5 files, not just terror, then I would
>>> just remove terror completely and use strerror_r (or the alternatives
>>> for Windows and HP_UX) in the caller code instead i.e. using your
>>> suggestion for a local buffer in the caller, wherever needed. The more
>>> I think about it, the more this seems to be the right thing to do.
>>>
>>> Cheers,
>>> Malcolm
>>>
>>>
>>> On 12/13/2014 04:38 PM, Asokan, M wrote:
>>>>
>>>> Malcom,
>>>>      Gcc supports thread-local variables. See
>>>>
>>>> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html
>>>>
>>>> I am not sure about native compilers on Solaris, HPUX, or AIX.
>>>>
>>>> In any case, I found out that the Windows native code in Hadoop seems
>>>> to handle error messages properly. Here is what I found:
>>>>
>>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to
>>>> file a jira
>>>>
>>>> FormatMessage|awk -F: '{print $1}'|sort -u
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
>>>>
>>>>
>>>>
>>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk
>>>> -F: '{print $1}'|sort -u
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
>>>>
>>>>
>>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c
>>>>
>>>>
>>>>
>>>> This means you need not worry about the Windows version of terror().
>>>> You need to change five files that contain UNIX specific native code.
>>>>
>>>> I have a question on your suggested implementation:
>>>>
>>>> How do you initialize the static errlist array in a thread-safe manner?
>>>>
>>>> ________________________________
>>>> Here is another thread-safe implementation that I could come up with:
>>>>
>>>> #include <string.h>
>>>> #include <stdlib.h>
>>>> #include <errno.h>
>>>> #include <stdio.h>
>>>>
>>>> #define MESSAGE_BUFFER_SIZE 256
>>>>
>>>> char * getSystemErrorMessage(char * buf, int buf_len, int code) {
>>>> #if defined(_HPUX_SOURCE)
>>>>     char * msg;
>>>>     errno = 0;
>>>>     msg = strerror(code);
>>>>     if (errno == 0) {
>>>>       strncpy(buf, msg, buf_len-1);
>>>>       buf[buf_len-1] = '\0';
>>>>     } else {
>>>>       snprintf(buf, buf_len, "%s %d",
>>>>           "Can't get system error message for code", code);
>>>>     }
>>>> #else
>>>>     if (strerror_r(code, buf, buf_len) != 0) {
>>>>       snprintf(buf, buf_len, "%s %d",
>>>>           "Can't get system error message for code", code);
>>>>     }
>>>> #endif
>>>>     return buf;
>>>> }
>>>>
>>>> #define TERROR(code) \
>>>> getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)
>>>>
>>>> int main(int argc, char ** argv) {
>>>>     if (argc > 1) {
>>>>       char messageBuffer[MESSAGE_BUFFER_SIZE];
>>>>       int code = atoi(argv[1]);
>>>>
>>>>       fprintf(stderr, "System error for code %s: %s\n", argv[1],
>>>> TERROR(code));
>>>>     }
>>>>     return 0;
>>>> }
>>>>
>>>>
>>>> This changes terror to a macro TERROR and requires all functions that
>>>> call TERROR macro to declare the local variable messageBuffer. Since
>>>> there are only five files to modify, I think it is not a big effort.
>>>> What do you think?
>>>>
>>>> -- Asokan
>>>>
>>>> On 12/13/2014 04:29 AM, malcolm wrote:
>>>> Colin,
>>>>
>>>> I am not sure what you mean by a thread-local buffer (in native
>>>> code). In Java this is pretty standard, but I couldn't find any
>>>> implementation for C code.
>>>>
>>>> Here is the terror function:
>>>>
>>>>       const char* terror(int errnum)
>>>>       {
>>>>         if ((errnum < 0) || (errnum >= sys_nerr)) {
>>>>           return "unknown error.";
>>>>         }
>>>>         return sys_errlist[errnum];
>>>>       }
>>>>
>>>>
>>>> The interface is identical to strerror, but the implementation is
>>>> actually re-entrant since it returns a pointer to a static string.
>>>>
>>>> If I understand your suggestion, the new function would look like this:
>>>>
>>>>      const char* terror(int errnum)
>>>>      {
>>>>         static char result[65];
>>>>
>>>>         strerror_r(errnum, result, 64);
>>>>
>>>>         return result;
>>>>      }
>>>>
>>>> No need for snprintf, strerror_r  has the 'n' bounding built-in.
>>>>
>>>> Of course, this is still non-re-entrant, so unless the caller copies
>>>> the returned buffer, before the function is called again, there is a
>>>> problem.
>>>>
>>>> After considerable thought, I have come up with this version of
>>>> terror, tested OK on Windows, Linux and Solaris:
>>>>
>>>>      #if defined(_WIN32)
>>>>      #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
>>>>      #endif
>>>>
>>>>      #define MAX_ERRORS 256
>>>>      #define MAX_ERROR_LEN 80
>>>>
>>>>      char *terror(int errnum)
>>>>      {
>>>>
>>>>         static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
>>>>      error messages
>>>>
>>>>         if ( errnum >= 0 && errnum < MAX_ERRORS )
>>>>           {
>>>>             if ( errlist[errnum][0] == 0 )
>>>>               strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);
>>>>
>>>>             return errlist[errnum];
>>>>           }
>>>>         else
>>>>           {
>>>>             return "Unknown error";
>>>>           }
>>>>      }
>>>>
>>>> This version is portable and re-entrant.
>>>>
>>>> On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133,
>>>> and on Solaris 11.1 we get 151.
>>>>
>>>> If this is OK with you, I will open a jira for this.
>>>>
>>>>
>>>> Thanks,
>>>> Malcolm
>>>>
>>>>
>>>> On 12/12/2014 11:10 PM, Colin McCabe wrote:
>>>> Just use snprintf to copy the error message from strerror_r into a
>>>> thread-local buffer of 64 bytes or so.  Then preserve the existing
>>>> terror() interface.
>>>>
>>>> Can you open a jira for this?
>>>>
>>>> best,
>>>> Colin
>>>>
>>>> On Thu, Dec 11, 2014 at 8:35 PM,
>>>> malcolm<malcolm.kaval...@oracle.com><mailto:malcolm.kaval...@oracle.com>
>>>> wrote:
>>>> So, turns out that if I had naively changed all calls to terror or
>>>> references to sys_errlist, to using strerror_r, then I would have broken
>>>> code for Windows and HPUX (and possibly other OSes).
>>>>
>>>> If we are to assume that current code runs fine on all platforms
>>>> (maybe even
>>>> AIX an MacOS, for example), then any change/additions made to the
>>>> code and
>>>> not ifdeffed appropriately can break on other OSes. On the other
>>>> hand,  too
>>>> many ifdefs can pollute the code source and render it less readable
>>>> (though
>>>> possibly less important).
>>>>
>>>> In the general case what are code contributors responsibilities to
>>>> adding
>>>> code regarding OSes besides Linux ?
>>>> What OSes does jenkins test on ?
>>>> I guess maintainers of code on non-tested platforms are responsible for
>>>> their own testing ?
>>>>
>>>> How do we avoid the ping-pong effect, i.e. I make a generic change to
>>>> code
>>>> which breaks on Windows, then the Windows maintainer reverts changes to
>>>> break on Solaris for example ? Or does this not happen in actuality ?
>>>>
>>>>
>>>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>>>> Hi Malcom,
>>>>       The Windows versions of strerror() and strerror_s() functions are
>>>> probably meant for ANSI C library functions that set errno.  For core
>>>> Windows API calls (like UNIX system calls), one gets the error number by
>>>> calling GetLastError() function.  In the code snippet I sent earlier,
>>>> the
>>>> "code" argument is the value returned by GetLastError(). Neither
>>>> strerror()
>>>> nor strerror_s() will give the correct error message for this error
>>>> code.
>>>>
>>>> You could probably look at libwinutils.c in Hadoop source.  It uses
>>>> FormatMessageW (which returns messages in Unicode.)  My requirement
>>>> was to
>>>> return messages in current system locale.
>>>>
>>>> -- Asokan
>>>> ________________________________________
>>>> From: malcolm
>>>> [malcolm.kaval...@oracle.com<mailto:malcolm.kaval...@oracle.com>]
>>>> Sent: Thursday, December 11, 2014 4:04 PM
>>>> To:common-dev@hadoop.apache.org<mailto:To:common-dev@hadoop.apache.org>
>>>> Subject: Re: Solaris Port
>>>>
>>>> Hi Asok,
>>>>
>>>> I googled and found that windows has strerror, and strerror_s (which is
>>>> the strerror_r equivalent).
>>>> Is there a reason why you didn't use this call ?
>>>>
>>>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>>>> Hi Malcom,
>>>>         Recently, I had to work on a function to get system error
>>>> message on
>>>> various systems.  Here is the piece of code I came up with. Hope it
>>>> helps.
>>>>
>>>> static void get_system_error_message(char * buf, int buf_len, int code)
>>>> {
>>>> #if defined(_WIN32)
>>>>          LPVOID lpMsgBuf;
>>>>          DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>>>                                       FORMAT_MESSAGE_FROM_SYSTEM |
>>>> FORMAT_MESSAGE_IGNORE_INSERTS,
>>>>                                       NULL, code,
>>>>                                       MAKELANGID(LANG_NEUTRAL,
>>>> SUBLANG_DEFAULT),
>>>>                                                               /* Default
>>>> language */
>>>>                                       (LPTSTR) &lpMsgBuf, 0, NULL);
>>>>          if (status > 0)
>>>>          {
>>>>              strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>>>              buf[buf_len-1] = '\0';
>>>>              /* Free the buffer returned by system */
>>>>              LocalFree(lpMsgBuf);
>>>>          }
>>>>          else
>>>>          {
>>>>              _snprintf(buf, buf_len-1 , "%s %d",
>>>>                  "Can't get system error message for code", code);
>>>>              buf[buf_len-1] = '\0';
>>>>          }
>>>> #else
>>>> #if defined(_HPUX_SOURCE)
>>>>          {
>>>>              char * msg;
>>>>              errno = 0;
>>>>              msg = strerror(code);
>>>>              if (errno == 0)
>>>>              {
>>>>                  strncpy(buf, msg, buf_len-1);
>>>>                  buf[buf_len-1] = '\0';
>>>>              }
>>>>              else
>>>>              {
>>>>                  snprintf(buf, buf_len, "%s %d",
>>>>                      "Can't get system error message for code", code);
>>>>              }
>>>>          }
>>>> #else
>>>>          if (strerror_r(code, buf, buf_len) != 0)
>>>>          {
>>>>              snprintf(buf, buf_len, "%s %d",
>>>>                  "Can't get system error message for code", code);
>>>>          }
>>>> #endif
>>>> #endif
>>>> }
>>>>
>>>> Note that HPUX does not have strerror_r() since strerror() itself is
>>>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>>>> function _snprintf() has a subtle difference in its interface.
>>>>
>>>> -- Asokan
>>>> ________________________________________
>>>> From: malcolm
>>>> [malcolm.kaval...@oracle.com<mailto:malcolm.kaval...@oracle.com>]
>>>> Sent: Thursday, December 11, 2014 11:02 AM
>>>> To:common-dev@hadoop.apache.org<mailto:To:common-dev@hadoop.apache.org>
>>>> Subject: Re: Solaris Port
>>>>
>>>> Fine with me, I volunteer to do this, if accepted.
>>>>
>>>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>>>> Solaris will mean the libhadoop.so will need to be tied a specific build
>>>> (kernel/include pairing) and therefore limits upward
>>>> mobility/compatibility.
>>>> That doesn’t seem like a very good idea.
>>>>
>>>> IMO, switching to strerror_r is much preferred, since other than the
>>>> brain-dead GNU libc version, is highly portable and should work
>>>> regardless
>>>> of the kernel or OS in place.
>>>>
>>>> On Dec 11, 2014, at 5:20 AM,
>>>> malcolm<malcolm.kaval...@oracle.com><mailto:malcolm.kaval...@oracle.com>
>>>> wrote:
>>>>
>>>> FYI, there are a couple more files that reference sys_errlist directly
>>>> (not just terror within exception.c) , but also hdfs_http_client.c and
>>>> NativeiO.c
>>>>
>>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>> Hi Colin,
>>>>
>>>> Exactly, as you noticed, the problem is the thread-local buffer needed
>>>> to return from terror.
>>>> Currently, terror just returns a static string from an array, this is
>>>> fast, simple and error-proof.
>>>>
>>>> In order to use strerror_r inside terror,  would require allocating a
>>>> buffer inside terror  and depend on the caller to free the buffer after
>>>> using it, or to pass a buffer to terrror (which is basically the same as
>>>> strerror_r, rendering terror redundant).
>>>> Both cases require modification outside terror itself, as far as I can
>>>> tell, no simple fix. Unless you have an alternative which I haven't
>>>> thought
>>>> of ?
>>>>
>>>> As far as I can tell, we have two choices:
>>>>
>>>> 1. Remove terror and replace calls with strerror_r, passing a buffer
>>>> from the callee.
>>>>          Advantage: a more modern portable interface.
>>>>          Disadvantage: All calls to terror need to be modified, though
>>>> all seem to be in a few files as far as I can tell.
>>>>
>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>          Advantage: no change to any calls to terror
>>>>          Disadvantage: 2 additional files added to source tree (.c and
>>>> .h) and some minor ifdefs only used for Solaris.
>>>>
>>>> I think it is more a question of style than anything else, so I leave
>>>> you to make the call.
>>>>
>>>> Thanks for your patience,
>>>> Malcolm
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>>>> <malcolm.kaval...@oracle.com><mailto:malcolm.kaval...@oracle.com> wrote:
>>>> Hi Colin,
>>>>
>>>> Thanks for the hints around JIRAs.
>>>>
>>>> You are correct errno still exists, however sys_errlist does not.
>>>>
>>>> Hadoop uses a function terror (defined in exception.c) which indexes
>>>> sys_errlist by errno to return the error message from the array.
>>>> This
>>>> function is called 26 times in various places (in 2.2)
>>>>
>>>> Originally, I thought to replace all calls to terror with strerror,
>>>> but
>>>> there can be issues with multi-threading (it returns a buffer which
>>>> can be
>>>> overwritten), so it seemed simpler just to recreate the sys_errlist
>>>> message
>>>> array.
>>>>
>>>> There is also a multi-threaded version strerror_r where you pass the
>>>> buffer
>>>> as a parameter, but this would necessitate changing every call to
>>>> terror
>>>> with mutiple lines of code.
>>>> Why don't you just use strerror_r inside terror()?
>>>>
>>>> I wrote that code originally.  The reason I didn't want to use
>>>> strerror_r there is because GNU libc provides a non-POSIX definition
>>>> of strerror_r, and forcing it to use the POSIX one is a pain. But you
>>>> can do it.  You also will require a thread-local buffer to hold the
>>>> return from strerror_r, since it is not guaranteed to be static
>>>> (although in practice it is 99% of the time-- another annoyance with
>>>> the API).
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>> ATTENTION: -----
>>>>
>>>> The information contained in this message (including any files
>>>> transmitted with this message) may contain proprietary, trade secret or
>>>> other confidential and/or legally privileged information. Any pricing
>>>> information contained in this message or in any files transmitted
>>>> with this
>>>> message is always confidential and cannot be shared with any third
>>>> parties
>>>> without prior written approval from Syncsort. This message is
>>>> intended to be
>>>> read only by the individual or entity to whom it is addressed or by
>>>> their
>>>> designee. If the reader of this message is not the intended
>>>> recipient, you
>>>> are on notice that any use, disclosure, copying or distribution of this
>>>> message, in any form, is strictly prohibited. If you have received this
>>>> message in error, please immediately notify the sender and/or
>>>> Syncsort and
>>>> destroy all copies of this message in your possession, custody or
>>>> control.
>>>>
>>>>
>>>>
>>
>> ________________________________
>>
>>
>>
>> ATTENTION: -----
>>
>> The information contained in this message (including any files transmitted
>> with this message) may contain proprietary, trade secret or other
>> confidential and/or legally privileged information. Any pricing information
>> contained in this message or in any files transmitted with this message is
>> always confidential and cannot be shared with any third parties without
>> prior written approval from Syncsort. This message is intended to be read
>> only by the individual or entity to whom it is addressed or by their
>> designee. If the reader of this message is not the intended recipient, you
>> are on notice that any use, disclosure, copying or distribution of this
>> message, in any form, is strictly prohibited. If you have received this
>> message in error, please immediately notify the sender and/or Syncsort and
>> destroy all copies of this message in your possession, custody or control.
>
>

Reply via email to