Alan, I also meant to say that I didn't understand the comment about "in
production it seems that DomainSocket is less commonly used".  The current
implementation of short-circuit read definitely utilizes DomainSocket, and
it's very common to enable this in production clusters.  The documentation
page you mentioned includes discussion of a legacy short-circuit read
implementation, which did not utilize UNIX domain sockets, but the legacy
implementation is rarely used in practice now.

--Chris Nauroth




On 9/30/15, 8:46 AM, "Chris Nauroth" <cnaur...@hortonworks.com> wrote:

>Hello Alan,
>
>I think this sounds like a reasonable approach.  I recommend that you file
>a JIRA with the proposal (copy-paste the content of your email into a
>comment) and then wait a few days before starting work in earnest to see
>if anyone else wants to discuss it first.  I also recommend notifying
>Colin Patrick McCabe on that JIRA.  It would be good to get a second
>opinion from him, since he is the original author of much of this code.
>
>--Chris Nauroth
>
>
>
>
>On 9/30/15, 1:14 AM, "Alan Burlison" <alan.burli...@oracle.com> wrote:
>
>>Now that the Hadoop native code builds on Solaris I've been chipping
>>away at all the test failures. About 50% of the failures involve
>>DomainSocket, either directly or indirectly. That seems to be mainly
>>because the tests use DomainSocket to do single-node testing, whereas in
>>production it seems that DomainSocket is less commonly used
>>(https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/Sh
>>o
>>rtCircuitLocalReads.html).
>>
>>The particular problem on Solaris is that socket read/write timeouts
>>(the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for
>>UNIX domain (PF_UNIX) sockets. Those options are however supported for
>>PF_INET sockets. That's because the socket implementation on Solaris is
>>split roughly into two parts, for inet sockets and for STREAMS sockets,
>>and the STREAMS implementation lacks support for SO_SNDTIMEO and
>>SO_RCVTIMEO. As an aside, performance of sockets that use loopback or
>>the host's own IP is slightly better than that of UNIX domain sockets on
>>Solaris.
>>
>>I'm investigating getting timeouts supported for PF_UNIX sockets added
>>to Solaris, but in the meantime I'm also looking how this might be
>>worked around in Hadoop. One way would be to implement timeouts by
>>wrapping all the read/write/send/recv etc calls in DomainSocket.c with
>>either poll() or select().
>>
>>The basic idea is to add two new fields to DomainSocket.c to hold the
>>read/write timeouts. On platforms that support SO_SNDTIMEO and
>>SO_RCVTIMEO these would be unused as setsockopt() would be used to set
>>the socket timeouts. On platforms such as Solaris the JNI code would use
>>the values to implement the timeouts appropriately.
>>
>>To prevent the code in DomainSocket.c becoming a #ifdef hairball, the
>>current socket IO function calls such as accept(), send(), read() etc
>>would be replaced with a macros such as HD_ACCEPT. On platforms that
>>provide timeouts these would just expand to the normal socket functions,
>>on platforms that don't support timeouts it would expand to wrappers
>>that implements timeouts for them.
>>
>>The only caveats are that all code that does anything to a PF_UNIX
>>socket would *always* have to do so via DomainSocket. As far as I can
>>tell that's not an issue, but it would have to be borne in mind if any
>>changes were made in this area.
>>
>>Before I set about doing this, does the approach seem reasonable?
>>
>>Thanks,
>>
>>-- 
>>Alan Burlison
>>--
>>
>

Reply via email to