Hello Alan, I think this sounds like a reasonable approach. I recommend that you file a JIRA with the proposal (copy-paste the content of your email into a comment) and then wait a few days before starting work in earnest to see if anyone else wants to discuss it first. I also recommend notifying Colin Patrick McCabe on that JIRA. It would be good to get a second opinion from him, since he is the original author of much of this code.
--Chris Nauroth On 9/30/15, 1:14 AM, "Alan Burlison" <alan.burli...@oracle.com> wrote: >Now that the Hadoop native code builds on Solaris I've been chipping >away at all the test failures. About 50% of the failures involve >DomainSocket, either directly or indirectly. That seems to be mainly >because the tests use DomainSocket to do single-node testing, whereas in >production it seems that DomainSocket is less commonly used >(https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/Sho >rtCircuitLocalReads.html). > >The particular problem on Solaris is that socket read/write timeouts >(the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for >UNIX domain (PF_UNIX) sockets. Those options are however supported for >PF_INET sockets. That's because the socket implementation on Solaris is >split roughly into two parts, for inet sockets and for STREAMS sockets, >and the STREAMS implementation lacks support for SO_SNDTIMEO and >SO_RCVTIMEO. As an aside, performance of sockets that use loopback or >the host's own IP is slightly better than that of UNIX domain sockets on >Solaris. > >I'm investigating getting timeouts supported for PF_UNIX sockets added >to Solaris, but in the meantime I'm also looking how this might be >worked around in Hadoop. One way would be to implement timeouts by >wrapping all the read/write/send/recv etc calls in DomainSocket.c with >either poll() or select(). > >The basic idea is to add two new fields to DomainSocket.c to hold the >read/write timeouts. On platforms that support SO_SNDTIMEO and >SO_RCVTIMEO these would be unused as setsockopt() would be used to set >the socket timeouts. On platforms such as Solaris the JNI code would use >the values to implement the timeouts appropriately. > >To prevent the code in DomainSocket.c becoming a #ifdef hairball, the >current socket IO function calls such as accept(), send(), read() etc >would be replaced with a macros such as HD_ACCEPT. On platforms that >provide timeouts these would just expand to the normal socket functions, >on platforms that don't support timeouts it would expand to wrappers >that implements timeouts for them. > >The only caveats are that all code that does anything to a PF_UNIX >socket would *always* have to do so via DomainSocket. As far as I can >tell that's not an issue, but it would have to be borne in mind if any >changes were made in this area. > >Before I set about doing this, does the approach seem reasonable? > >Thanks, > >-- >Alan Burlison >-- >