But those connections aren't supposed to ever terminate unless a node dies or is partitioned. So if we "fix" it by adding a socket.close I worry that we're covering up something more important.
On Wed, Apr 21, 2010 at 8:53 PM, Ingram Chen <ingramc...@gmail.com> wrote: > I agree your point. I patch the code and log more informations to find out > the real cause. > > Here is the code snip I think may be the cause: > > IncomingTcpConnection: > > public void run() > { > while (true) > { > try > { > MessagingService.validateMagic(input.readInt()); > int header = input.readInt(); > int type = MessagingService.getBits(header, 1, 2); > boolean isStream = MessagingService.getBits(header, 3, 1) == > 1; > int version = MessagingService.getBits(header, 15, 8); > > if (isStream) > { > new IncomingStreamReader(socket.getChannel()).read(); > } > else > { > int size = input.readInt(); > byte[] contentBytes = new byte[size]; > input.readFully(contentBytes); > MessagingService.getDeserializationExecutor().submit(new > MessageDeserializationTask(new ByteArrayInputStream(contentBytes))); > } > } > catch (EOFException e) > { > if (logger.isTraceEnabled()) > logger.trace("eof reading from socket; closing", e); > break; > } > catch (IOException e) > { > if (logger.isDebugEnabled()) > logger.debug("error reading from socket; closing", e); > break; > } > } > } > > In normal condition, while loop is terminated after input.readInt() throw > EOFException. but it quits without socket.close(). what I do is wrap whole > while block inside a try { ... } finally {socket.close();} > > > On Thu, Apr 22, 2010 at 01:14, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> I'd like to get something besides "I'm seeing close wait but i have no >> idea why" for a bug report, since most people aren't seeing that. >> >> On Tue, Apr 20, 2010 at 9:33 AM, Ingram Chen <ingramc...@gmail.com> wrote: >> > I trace IncomingStreamReader source and found that incoming socket comes >> > from MessagingService$SocketThread. >> > but there is no close() call on either accepted socket or socketChannel. >> > >> > Should I file a bug report ? >> > >> > On Tue, Apr 20, 2010 at 11:02, Ingram Chen <ingramc...@gmail.com> wrote: >> >> >> >> this happened after several hours of operations and both nodes are >> >> started >> >> at the same time (clean start without any data). so it might not relate >> >> to >> >> Bootstrap. >> >> >> >> In system.log I do not see any logs like "xxx node dead" or exceptions. >> >> and both nodes in test are alive. they serve read/write well, too. >> >> Below >> >> four connections between nodes are keep healthy from time to time. >> >> >> >> tcp 0 0 ::ffff:192.168.2.87:7000 >> >> ::ffff:192.168.2.88:58447 ESTABLISHED >> >> tcp 0 0 ::ffff:192.168.2.87:54986 >> >> ::ffff:192.168.2.88:7000 ESTABLISHED >> >> tcp 0 0 ::ffff:192.168.2.87:59138 >> >> ::ffff:192.168.2.88:7000 ESTABLISHED >> >> tcp 0 0 ::ffff:192.168.2.87:7000 >> >> ::ffff:192.168.2.88:39074 ESTABLISHED >> >> >> >> so connections end in CLOSE_WAIT should be newly created. (for >> >> streaming >> >> ?) This seems related to streaming issues we suffered recently: >> >> http://n2.nabble.com/busy-thread-on-IncomingStreamReader-td4908640.html >> >> >> >> I would like add some debug codes around opening and closing of socket >> >> to >> >> find out what happend. >> >> >> >> Could you give me some hint, about what classes I should take look ? >> >> >> >> >> >> On Tue, Apr 20, 2010 at 04:47, Jonathan Ellis <jbel...@gmail.com> >> >> wrote: >> >>> >> >>> Is this after doing a bootstrap or other streaming operation? Or did >> >>> a node go down? >> >>> >> >>> The internal sockets are supposed to remain open, otherwise. >> >>> >> >>> On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen <ingramc...@gmail.com> >> >>> wrote: >> >>> > Thank your information. >> >>> > >> >>> > We do use connection pools with thrift client and ThriftAdress is on >> >>> > port >> >>> > 9160. >> >>> > >> >>> > Those problematic connections we found are all in port 7000, which >> >>> > is >> >>> > internal communications port between >> >>> > nodes. I guess this related to StreamingService. >> >>> > >> >>> > On Mon, Apr 19, 2010 at 23:46, Brandon Williams <dri...@gmail.com> >> >>> > wrote: >> >>> >> >> >>> >> On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen >> >>> >> <ingramc...@gmail.com> >> >>> >> wrote: >> >>> >>> >> >>> >>> Hi all, >> >>> >>> >> >>> >>> We have observed several connections between nodes in >> >>> >>> CLOSE_WAIT >> >>> >>> after several hours of operation: >> >>> >> >> >>> >> This is symptomatic of not pooling your client connections >> >>> >> correctly. >> >>> >> Be >> >>> >> sure you're using one connection per thread, not one connection per >> >>> >> operation. >> >>> >> -Brandon >> >>> > >> >>> > >> >>> > -- >> >>> > Ingram Chen >> >>> > online share order: http://dinbendon.net >> >>> > blog: http://www.javaworld.com.tw/roller/page/ingramchen >> >>> > >> >> >> >> >> >> >> >> -- >> >> Ingram Chen >> >> online share order: http://dinbendon.net >> >> blog: http://www.javaworld.com.tw/roller/page/ingramchen >> > >> > >> > >> > -- >> > Ingram Chen >> > online share order: http://dinbendon.net >> > blog: http://www.javaworld.com.tw/roller/page/ingramchen >> > > > > > -- > Ingram Chen > online share order: http://dinbendon.net > blog: http://www.javaworld.com.tw/roller/page/ingramchen >