On 15/05/07, guy keren <[EMAIL PROTECTED]> wrote:
Amos Shapira wrote: > On 15/05/07, *guy keren* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > wrote: > > > I think you are tinkering with semantics and so miss the real > issue (do > > you work as a consultant? :). > > did you write that to rafi or to me? i'm not dealing with semantics - i > am dealing with a real problem, that stable applications have to deal > with - when the network breaks, and you never get the close from the > other side. > > > I wrote this to you, Guy. Rafi maybe used "disconnect" when he basically > ment that the TCP connection went down from the other side while you > seemed to hang on "disconnect" being defined as "cable eaten by an > aligator" :). lets leave this subject. i brought it up, because many programmers new to socket programming are surprised by the fact that a network disconnection does not cause the socket to close, and that the connection may stay there for hours. > As long as Rafi feels happy about the replies that's not relevant any > more, IMHO. > > > Alas - I think that I've just read not long > > ago that there is a bug in Linux' select in implementing just > that and > > it might miss the close from the other side sometimes > > what you are describing here sounds astonishing - that such a basic > feature of the sockets implementation is broken? i find this hard to > believe, without clear evidence. > > > Here is something about what I read before, it's the other way around, > and possibly only relevant to UDP but I'm not sure - if a packet arrives > with bad CRC, it's possible that the FD will be marked as "ready to > read" by select but then the packet will be discarded (because of the > CRC error) and when the process reads the socket it won't get anything. > That would make the process get a "0 read right after select" which does > NOT indicate a close from the other side. > > http://www.uwsg.indiana.edu/hypermail/linux/kernel/0410.2/0001.html > > I don't know what would be a select(2)-based work-around, if required at > all. first, it does not return a '0 read'. this situation could have two different effects, depending on the blocking-mode of the socket. if the socket is in blocking mode (the default mode) - select() might state there's data to be read, but recvmsg (or read) will block. if the socket is in non-blocking mode - select() might state there's data to be read, but recvmsg (of read) will return with -1, and errno set to EAGAIN. in neither case will read return 0. the only time that read is allowed to return 0, is when it encounters an EOF. for a socket, this happens ONLY if the other side closed the sending-side of the connection.
Is there an on-line reference (or a manual page) to support this?
From what I remember about select, the definition of it returning a "ready
to read" bit set is "the next read won't block", which will be true for non-blocking sockets any time and therefore they weren't encouraged together with select. ofcourse, whenever i did select-based socket programming, i always set
the sockets to non-blocking mode. this requires some careful programming, to avoid busy-waits, but it's the only way to gurantee fully non-blocking behaviour. and people should also note that the socket should be set to non-blocking mode before calling connect, and be ready to handle the peculear way that the connect call works for non-blocking sockets.
Also there is the issue of signals. If you want robust programs then you'll have to use pselect. doing socket programming without referencing stevens' latest TCP/IP book
is foolish.
Sorry for being foolish, I learned TCP/IP from RFC's and socket programming from BSD4.2 sources in `86, Steven's book wasn't available then. :^) I since then read the early editions of his books (circa early 90's, I remember reading a volume while the later ones where still "in the making"), but it's been a while since I had to write a complete C socket program with select in earnest, and I accept that some interfaces may have changed over the years. These days, with pthreads being a mainstream, I'd consider using multiple threads. select() is nice when you absolutely *must* use a single thread (which was the case back when pthreads wasn't invented yet, or later when the various UNIX versions had their own idea on thread API's) but if you have so many connections that multiple threads will become a problem then a single thread having to cycle through all these connections one by one will also slow things down. Not to mention the signal problem and just generally the fact that one connection taking too much time to handle will slow the handling of other connections. A possible go-between might be to select/poll on multiple FD's then handing the work to threads from a thread pool, but such a job would be justifiable only for a large number of connections, IMHO. If you insist on using a single thread then select seems to be the underdog today - poll is just as portable (AFAIKT), and Boost ASIO (and I'd expect ACE) allows making portable code which uses the superior API's such as epoll/kqueue/"dev/poll". --Amos