> >> Hi Ken
> >>
> >>>>>>>>>>> Using AF_UNIX/SOCK_DGRAM with current version (3.2.0)
> seems
> >>> to
> >>>>>>>>>>> drop messages or at least they are not received in the same
> >>>>>>>>>>> order they are  sent
> >>>>>>>
> >>>>>>> [snip]
> >>>>>>>
> >>>>>>>> Thanks for the test case.  I can confirm the problem.  I'm not
> >>>>>>>> familiar enough with the current AF_UNIX implementation to
> >>>>>>>> debug this easily.  I'd rather spend my time on the new
> >>>>>>>> implementation (on the topic/af_unix branch).  It turns out
> >>>>>>>> that your test case fails there too, but in a completely
> >>>>>>>> different way, due to a bug in sendto for datagrams.  I'll see
> >>>>>>>> if I can fix that bug and then try again.
> >>>>>>>>
> >>>>>>>> Ken
> >>>>>>>
> >>>>>>> Ok, too bad it wasn't our own code base but good that the
> "mystery"
> >>>>>>> is verified
> >>>>>>>
> >>>>>>> I finally succeed to build topic/af_unix (after finding out what
> >>>>>>> version of zlib was needed), but not with -D__WITH_AF_UNIX to
> >>>>>>> CXXFLAGS though and thus I haven’t tested it yet
> >>>>>>>
> >>>>>>> Is it sufficient to add the define to the "main" Makefile or do
> >>>>>>> you have to add it to all the Makefile:s ? I guess I can find
> >>>>>>> out though
> >>>>>>
> >>>>>> I do it on the configure line, like this:
> >>>>>>
> >>>>>>     ../af_unix/configure CXXFLAGS="-g -O0 -D__WITH_AF_UNIX" --
> >>> prefix=...
> >>>>>>
> >>>>>>> Is topic/af_unix fairly up to date with master branch ?
> >>>>>>
> >>>>>> Yes, I periodically cherry-pick commits from master to topic/af_unix.
> >>>>>> I'lldo that again right now.
> >>>>>>
> >>>>>>> Either way, I'll be glad to help out testing topic/af_unix
> >>>>>>
> >>>>>> Thanks!
> >>>>>
> >>>>> I've now pushed a fix for that sendto bug, and your test case runs
> >>>>> without error on the topic/af_unix branch.
> >>>>
> >>>> It seems like the test-case do work now with topic/af_unix in
> >>>> blocking mode, but when using non-blocking (with MSG_DONTWAIT)
> >>>> there are
> >>> some
> >>>> issues I think
> >>>>
> >>>> 1. When the queue is empty with non-blocking recv(), errno is set
> >>>> to EPIPE but I think it should be EAGAIN (or maybe the pipe is
> >>>> getting broken for real of some reason ?)
> >>>>
> >>>> 2. When using non-blocking recv() and no message is written at all,
> >>>> it seems like recv() blocks forever
> >>>>
> >>>> 3. Using non-blocking recv() where the "client" does send less than
> >>>> "count" messages, sometimes recv() blocks forever (as well)
> >>>>
> >>>>
> >>>> My naïve analysis of this is that for the first issue (if any) the
> >>>> wrong errno is set and for the second issue it blocks if no
> >>>> sendto() is done after the first recv(), i.e. nothing kicks the "reader
> thread"
> >>>> in the butt to realise the queue is empty. It is not super clear
> >>>> though what POSIX says about creating blocking descriptors and then
> >>>> using non-blocking-flags with recv(), but this works in Linux any
> >>>> way
> >>>
> >>> The explanation is actually much simpler.  In the recv code where a
> >>> bound datagram socket waits for a remote socket to connect to the
> >>> pipe, I simply forget to handle MSG_DONTWAIT.  I've pushed a
> fix.  Please retest.
> >>>
> >>> I should add that in all my work so far on the topic/af_unix branch,
> >>> I've thought mainly about stream sockets.  So there may still be
> >>> things remaining to be implemented for the datagram case.
> >>
> >> I finally got some time to test topic/af_unix in our "real"
> >> cygwin-application
> >> (casual) and unfortunately very few of our unittests pass
> >>
> >> The symptoms are that there's unexpected eternal blocking, sometimes
> >> there's unexpected EADDRNOTAVAIL, sometimes it looks like some
> memory
> >> corruption (and
> >> core-dumps)
> >>
> >> Of course the memory corruption etc could be our self and the
> >> core-dumps might be because of uncaught exceptions
> >>
> >> Needles to say is that all unittests pass on Linux, but of course
> >> cygwin-topic/af_unix could act according to POSIX-standard and the
> >> behaviour couldbe due to our own misinterpretation of how POSIX works
> >
> > More likely it's due to bugs in the topic/af_unix branch.  This is
> > still very much a work in progress.
> >
> >> I will try to narrow down the quite complex logic and reproduce the
> >> problems
> >
> > That would be ideal.
> >
> >> If you of some reason wanna try it with casual, I'd be glad to help
> >> you out (it should be easier now that last time (but there might be
> >> some documentation missing for Cygwin still))
> >>
> >> https://bitbucket.org/casualcore/
> >
> > I'm going on vacation in a few days, but I might do this when I get back.
> >
> > Thanks for your testing.
> 
> By the way, if your code is using datagram sockets, then there are very 
> serious
> problems with our implementation (even aside from the performance issue
> that we've already discussed).  For example, I don't know of any reasonable
> way for select to test whether such a socket is ready for writing.  We'll 
> need to
> solve that somehow.

If you by that mean if we're using SOCK_DGRAM, the answer is yes

I tried SOCK_STREAM (and SOCK_SEQPACKET I think) for CYGWIN 3.2.0 but that 
didn't work at all

As far as I understand, both all types on pretty much all implementations 
preserves message ordering though

I haven't tried SOCK_STREAM and/or SOCK_SEQPACKET with the 
topic/af_unix-branch. Is that worth a try ?

Best regards,
Kristian

> Ken

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to