Do you have the VBox client utilities installed? I had seen strange clock problems when the Agent is not installed.
Mike On Tue, 2012-08-07 at 12:31 +0800, Patrick Yu wrote: > I was actually trying to say "strange problem of TCP reset packet (or > the the lack of)". :-) > > Anyway, after some more hours of digging around, I found some leads: > > # ndd tcp tcp_rst_sent_rate_enabled > 1 > # ndd tcp tcp_rst_sent_rate > 40 > # kstat tcp 1 1 | egrep '[Rr]st' > outRsts 874 > tcp_rst_unsent 3644 > # telnet 127.0.0.1 12345 > Trying 127.0.0.1... > ^C > # kstat tcp 1 1 | egrep '[Rr]st' > outRsts 875 > tcp_rst_unsent 3648 > > The rst sent rate of 40 a second seems not being observed, despite > there's no reset packets generated in the system except for the test > run. I did some more tests: When trying to increase the rst_sent_rate, > it takes a value of 800+ to make reset packets work, and the value > needs to be further incremented when more reset packets are being > sent. It seems like the counter for reset packets per second never get > zeroed. > > Looks like a real bug to me. But I am still not sure how to trigger > this - it runs fine in the first day of two before exhibiting this > strange behavior. I even did some "stress" test from > https://blogs.oracle.com/clive/entry/tcp_reset_delay to a freshly > rebooted system in failed attempts to reproduce the erroneous > conditions. But I am sure it will come back when it's left there for > another day. > > I suspect it could be the time accuracy problem due to it being a vbox > VM. I looked at tcp_output.c from > https://hg.openindiana.org/upstream/illumos/illumos-gate/file/adffc698eaf5/usr/src/uts/common/inet/tcp/tcp_output.c#l3279 > and tried to change the clock backwards and forwards, but still could > not reproduce it. > > Now, my temporary workaround is to set 0 to tcp_rst_sent_rate_enabled, > but in effect totally disable any tcp reset DOS protection. Hope this > could help someone with a similar case. > > Best regards, > Patrick > > On Mon, Aug 6, 2012 at 3:09 PM, Patrick Yu <ipaq3...@gmail.com> wrote: > > Hi, > > > > I am experiencing a very strange TCP problem (the lack of) with my new > > oi_151a5 install. The machine ran fine on the first day or two after a > > fresh reboot, and after that SSH connections broke down and hanged > > mysteriously during SSL handshake where no connections could be made > > from both outside or even from inside using loopback lo0. > > > > It took me awhile to track it down to this bug - > > https://www.illumos.org/issues/1983 where the workaround posted solved > > my SSH problem. But upon closer examination I found the source of the > > problem is actually something else in my particular case. It turns out > > any TCP connections to a closed port that is not being listened to > > would not generate a TCP reset packet from the networking core. Any > > clients connecting to these ports would hang there indefinitely for > > lengthy retries. > > > > I initially thought it was due to ipfilter but even after I cleared > > the table, RST was still not being sent no matter what interface was > > involved (lo0, e1000g0). The connection and RST packet would come back > > after a reboot, and the problem recurs after a few days even with > > low/no load as this is a testing installation running as a VM. > > > > Things like X didn't start properly when there's missing TCP RST. I > > didn't have time to look into it, but I presume it's related to this > > problem too. Worth nothing is that those ports being listened to > > exhibited no problems whatsoever - I can even do a iperf across the > > network with very good results. > > > > I could do some silly thing like the below ipf.conf snippet to "force" > > RST packet being sent. But then if there's any pass statement at the > > end like "pass in quick on lo0", RST would disappear again! > > set intercept_loopback true; > > block return-rst in > > > > Anyone has an idea what could be the cause? A misconfiguration or a > > bug? Any pointer would be greatly appreciated. I still keep a snapshot > > of the problematic VM and am ready to do some more experiments with > > it. Below is what the problematic session looks like, and a normal > > snoop after reboot. > > > > # telnet 127.0.0.1 12345 > > Trying 127.0.0.1... > > ^C > > > > # snoop -I lo0 -tr -r > > Using device ipnet/lo0 (promiscuous mode) > > 0.00000 127.0.0.1 -> 127.0.0.1 TCP D=12345 S=36692 Syn > > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp > > 71716119 0,nop,wscale 2> > > 1.13752 127.0.0.1 -> 127.0.0.1 TCP D=12345 S=36692 Syn > > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp > > 71716119 0,nop,wscale 2> > > 3.40631 127.0.0.1 -> 127.0.0.1 TCP D=12345 S=36692 Syn > > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp > > 71716119 0,nop,wscale 2> > > 7.92479 127.0.0.1 -> 127.0.0.1 TCP D=12345 S=36692 Syn > > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp > > 71716119 0,nop,wscale 2> > > 16.93940 127.0.0.1 -> 127.0.0.1 TCP D=12345 S=36692 Syn > > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp > > 71716119 0,nop,wscale 2> > > ^C# > > > > # ifconfig lo0 > > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu > > 8232 index 1 > > inet 127.0.0.1 netmask ff000000 > > # > > # netstat -r -n | grep lo0 > > 127.0.0.1 127.0.0.1 UH 5 9638 lo0 > > ::1 ::1 UH 7 1612 > > lo0 > > # > > # ipf -Fa > > # > > # ipfstat -io > > empty list for ipfilter(out) > > empty list for ipfilter(in) > > # > > # netstat -anv | grep 12345 > > # > > # svccfg -s ipfilter:default listprop |grep firewall_config > > firewall_config_default com.sun,fw_configuration > > firewall_config_default/value_authorization astring > > solaris.smf.value.firewall.config > > firewall_config_default/version count 1 > > firewall_config_default/apply_to astring > > firewall_config_default/exceptions astring > > firewall_config_default/policy astring custom > > firewall_config_default/custom_policy_file astring /etc/ipf/ipf.conf > > firewall_config_default/open_ports astring > > firewall_config_override com.sun,fw_configuration > > firewall_config_override/apply_to astring > > firewall_config_override/value_authorization astring > > solaris.smf.value.firewall.config > > firewall_config_override/policy astring none > > # > > # reboot > > # > > # telnet 127.0.0.1 12345 > > Trying 127.0.0.1... > > telnet: Unable to connect to remote host: Connection refused > > # > > # snoop -I lo0 -tr -r > > Using device ipnet/lo0 (promiscuous mode) > > 0.00000 127.0.0.1 -> 127.0.0.1 TCP D=12345 S=53940 Syn > > Seq=1084268217 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp 6061 > > 0,nop,wscale 2> > > 0.00005 127.0.0.1 -> 127.0.0.1 TCP D=53940 S=12345 Rst > > Ack=1084268218 Win=0 > > ^C# > > > > Thanks. > > > > Best regards, > > Patrick > > _______________________________________________ > OpenIndiana-discuss mailing list > OpenIndiana-discuss@openindiana.org > http://openindiana.org/mailman/listinfo/openindiana-discuss _______________________________________________ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss