Re: snapshots total freeze

2012-12-25 Thread epsilon
On Tue, Dec 25, 2012 at 06:05:10PM +0100, frantisek holop wrote:
> since a couple of snapshosts back i can quite reliably
> freeze my openbsd notebook simply by leaving it on
> overnight.  the desktop is there, all the open windows
> are there, but it has become a painting...
> nothing in the logs, no panic, nothing.
> anybody else is seeing something similar?

Not really the same, but maybe compareable. I am unsure, but let's
see:

Since upgrade to 5.2 my gateway box freezes in about one out of four
times I boot it (it's switched off over night). It freezes somewhere
after "starting network daemons" and "starting local daemons". I
tried to disable services I do not essentially need or to substitute
them with other solutions. So far no findings here.

But this box runs no X. I have connected a keyboard and a monitor and
I am able to switch between the virtual terminals but no reaction
there. If I simply hit return, nothing happens. No login possible.

ICMP pings are replyed, but I cannot SSH into the box. Connections are
NOT rejected, they just time out. Same with all other TCP connections.

After a while the fan accelerates. It looks like the CPU is working
very hard. Unfortunately this is really the only reaction this box
gives me. But better than nothing.

> nothing in the logs, no panic, nothing.

Yes! Even the named startup logging misses in /var/log/messages.
The freeze always appears somewhere after named starts (see above).
It looks like syslogd did not have the time to write the file.
The last thing I got in /var/log/messages is:

... /bsd: root on wd0a ...

After rebooting the hard way, the only thing I got are sometimes (not
always) /var/lost+found/* files (/var is a separate partition).

For some weeks I used a more recent hardware. The only difference is:
The fan is louder. So I stay sticky with this one to minimize the harm
done to me (OK - it has other NICs, and a SATA drive).

Maybe I missinterpret things, but for me it looks like the kernel is
still running, but all userland activities are completely
dead/blocked/locked/looping/whatever.

IPv6 is disabled on all NICs. Just saying ...

Greetings
  E.

$ cat rc.conf.local
sendmail_flags="-L sm-mta -C/etc/mail/sendmail.cf -bd -q30m"
named_flags=""
httpd_flags="-DSSL -u"
ftpproxy_flags=""
tftpd_flags="-4 -l xx.xx.xx.xx /tftpboot"
ifstated_flags=""
dhcpd_flags="xl0"

$ dmesg
OpenBSD 5.2 (GENERIC) #278: Wed Aug  1 10:04:16 MDT 2012
dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel Pentium III ("GenuineIntel" 686-class) 732 MHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PSE36,MMX,FXSR,SSE
real mem  = 266727424 (254MB)
avail mem = 251506688 (239MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 08/25/00, BIOS32 rev. 0 @ 0xe7300, SMBIOS 
rev. 2.3 @ 0xf8dc6 (47 entries)
bios0: vendor Compaq version "686P2 v2.04" date 08/25/2000
bios0: Compaq Deskpro
apm0 at bios0: Power Management spec V1.2
acpi at bios0 function 0x0 not configured
pcibios0 at bios0: rev 2.1 @ 0xe7300/0x8d00
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf6260/208 (11 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801BA LPC" rev 0x00)
pcibios0: PCI bus #2 is the last bus
bios0: ROM list: 0xc/0xa000 0xca000/0x800 0xca800/0xd800! 0xe/0x1!
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 82815 Host" rev 0x02
vga1 at pci0 dev 2 function 0 "Intel 82815 Video" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
intagp0 at vga1
agp0 at intagp0: aperture at 0x4400, size 0x400
ppb0 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x02
pci1 at ppb0 bus 2
rl0 at pci1 dev 4 function 0 "Realtek 8139" rev 0x10: irq 5, address 
00:08:a1:57:08:83
rlphy0 at rl0 phy 0: RTL internal PHY
fxp0 at pci1 dev 8 function 0 "Intel 82562" rev 0x01, i82562: irq 10, address 
00:02:a5:2b:0f:43
inphy0 at fxp0 phy 1: i82562EM 10/100 PHY, rev. 0
xl0 at pci1 dev 9 function 0 "3Com 3c905C 100Base-TX" rev 0x78: irq 11, address 
00:04:76:26:b5:0f
exphy0 at xl0 phy 24: 3Com internal media interface
ichpcib0 at pci0 dev 31 function 0 "Intel 82801BA LPC" rev 0x02: 24-bit timer 
at 3579545Hz
pciide0 at pci0 dev 31 function 1 "Intel 82801BA IDE" rev 0x02: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0:  ATAPI 5/cdrom 
removable
cd0(pciide0:0:0): using PIO mode 4, DMA mode 2
wd0 at pciide0 channel 1 drive 0: 
wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 5
uhci0 at pci0 dev 31 function 4 "Intel 82801BA USB" rev 0x02: irq 10
auich0 at pci0 dev 31 function 5 "Intel 82801BA AC97" rev 0x02: irq 5, ICH2 AC97
ac97: codec id 0x41445360 (Analog Devices AD1885)
ac97: codec features headphone, Analog Devices

Re: snapshots total freeze

2012-12-28 Thread epsilon
Hi,

On Fri, Dec 28, 2012 at 12:01:37PM +0100, Joerg Goltermann wrote:
> ...
> We hit this problem on a physical server after upgrading to 5.2 too.
> ...
> Since 9 days, I run sync every 5 minutes and both systems did
> *not* freeze again.

Thanks for the hint. I will cronjob this.

 - Eps



Various system freeze

2012-12-29 Thread epsilon
Hi all,

recently we read a lot of total system freezes. Let me try to
summarize:

Common in many cases is: The system totally freezes. No keyboard
interaction possible. No kernel panic. No coredump. Nothing in the
logs. Network (ICMP, routing) looks up. But no userland action.

Different are the situations: Some users observe this during boot,
others in X during night, some see a high diskio just before the
freeze, others see heavy network load. Some systems run in a VM,
others on real hardware. Sometimes the issue is reproducable at the
same time during night, in other cases it occurs randomly.

So we have a wide variety of situations, but often the same result:
Total freeze without any log or coredump.

Let's assume all this cases have someting in common. Than something
very fundamental is broken.

On the other hand, is it really likely all this cases are different
bugs?

To all the users: Thanks for all the reports.

To the developers: What is to provide if users did not have anything
in their logs, no cordeump, nothing. Only a total frozen system? Maybe
dmesg and config files, right? And a verbal description what happens,
right?

 - Eps



Re: Various system freeze

2012-12-29 Thread epsilon
Hi Philip,

thanks for answering. Let's go ...

On Sat, Dec 29, 2012 at 04:10:05AM -0800, Philip Guenther wrote:
> Your case, as far as you described it, is not the same as frantisek holop's.

Right. Not totally the same. But some similarities.
 
> Most of the descriptions I've seen have been too imprecise to help in 
> diagnosis.
>  "It freezes somewhere after "starting network daemons" and "starting
> local daemons". I
>   tried to disable services I do not essentially need or to substitute
>   them with other solutions. So far no findings here."
> 
> Freezes 'somewhere'?  Hard to make hypotheses about the cause when
> we're not told what processes were started, or whether it's consistent
> from freeze to freeze.  If you turn on ddb.console=1 in sysctl.conf

ddb.console=1 turned on now. Will check the next time the freeze
occurs.

> can you break into ddb when it hangs?

Shout at me, but the magic key mentioned in the manpage is ctrl+c on
i386, right?

> What's trace and ps show in
> that case?  show bcstats?  If you've performed tests of various sorts,
> what did they show?  Negative results are sometimes _more_ important
> than positive results; why bother doing a test if you're going to
> throw out the result?  What hypotheses have been *excluded* by your
> test results?

First rc.conf.local:

sendmail_flags="-L sm-mta -C/etc/mail/sendmail.cf -bd -q30m"
named_flags=""
httpd_flags="-DSSL -u"
ftpproxy_flags=""
tftpd_flags="-4 -l 192.168.xx.xx /tftpboot"
ifstated_flags=""
dhcpd_flags="xl0"

additionally rc.local:

/usr/local/sbin/sockd -D
/usr/local/sbin/squid

I did use the old sytyle for starting local daemons to eliminate
problems with the new rd.d system. It is just a guess. But
unfortunately no result. Please: I did not mean there are problems
with the recent introduced rc.d system. It was just a guess to see if
this changes anything. But it did not.

Now what did I mean with somewhere: Randomly. The freeze happens
randomly after starting one of the daemons. There is no pattern.
Sometimes it freezes after starting sshd, sometimes later. In one case
the freeze was after the loginprompt appears. In most cases it's
earlier.

What else did I try?

o I substituted sockd (dante) with nylon. Result: For three days no
  freeze. First I was lucky, I thought I found the problem. But than
  again a freeze.
o I disabled ifstated after the freeze occured just after starting
  this daemon. One day no freeze. But than again: freeze.
o I disabled ntpd complete because it's possible to operate the box
  with slightly inaccurate time. So ntpd can be excluded for sure!

What proofs this? Is is possible to exclude dante/nylon/ifstated for
sure? Not really. Maybe it's a combination I did not find so far.

I have not disabled squid. I do not use NAT, so disable squid and
dante makes this box wortless (i.e. me offline).

> The title of the original thread was "snapshots total freeze", but
> there were dmesg's in the thread showing Aug kernel builds; for those
> who haven't tried running a (recent) snapshot, does your problem
> reproduce or change symptoms when you do?

For now I don't want to update the system to a snapshot. My primary
reason is this would imply a complete new installation when 5.3 comes
out. The updateprocess ist described from stable to stable and not
anything else. I hope it's possible to find something without
switching to current/snapshot. This box survived two months 5.2, so
maybe the next four month will be survived too :\ Shout at me, but I
am a -stable user.

> Is this consistent across hardware?  Drop another machine into place
> where the freezing one is; does it freeze too?

It is consistent across hardware. I tested another hardware with some
differences:

o SATA drive instead IDE.
o other NICs
o faster CPU (and a heavy duty fan that gaves me the ability to make a
  guess on CPU load which was confirmed by j...@osn.de useing a VM).

Additionally I ran a `dd if=/dev/rwd0c of=/dev/null bs=1m` as sugested
on the list. No errors.

What makes me wonder is the following: Why did those freezes occur on
5.2 and on snapshots starting in November? My box runs as a gateway
useing pppoe(4). Again I guessed: Maybe something "from the evil
internets" like those nasty bug we had once with protocol 0 (maybe
you remember the guy running nmap protocol scans through PF). So I did
not power on the DSL modem during boot for some days. But no success.
The box froze after one or two days during boot and without powered
modem.

I think this is really the only thing I can exclude for sure. Because
my modem was switched off, it cannot be something triggered from the
"evil internets". It must be triggered from my local site. And
additionally, it must be triggered from within this box, because for
some days I powered on this box alone, i.e. all other machines on my
local network were still switched off. Again: Freezes after some days.

But the network is still on topic: Someone claimed he had no freezes
if he disabled l

Re: Various system freeze

2012-12-29 Thread epsilon
Hi,

On Sat, Dec 29, 2012 at 07:47:12PM +0100, Loïc BLOT wrote:
> After those events, i try a new approach, i saw squid and moreover
> squidguard (when loads blacklists, 96 childs) use big IOPS over the
> raid. I decided to create two MFS for blacklists and squid disk cache.
> Now the problem is resolved, IOPS are backported to memory and no freeze
> problem occurs. Maybe you must try this solution.
> (i use 1G mfs for blacklists and 4G mfs for squid disk cache).

I do blacklists with squid acls, and therefore squidguard is not
needed here. Further /var/squid lies on it's own partition. For test
purposes it looks easier to me just to disable the squid disk cache at
all. This makes squid a non cacheing proxy, but better than a
sometimes frozen box at all.

Will give this a try after checking the other stuff Philip pointed
out.

Thanks a lot!
  - Eps



Re: Various system freeze

2012-12-30 Thread epsilon
On Sun, Dec 30, 2012 at 12:05:50AM -0800, Philip Guenther wrote:
> > Shout at me, but the magic key mentioned in the manpage is ctrl+c on
> > i386, right?
> No.  Try the second and third paragraphs of "man ddb".

Thanks. Got it! I will report as soon as the freeze occured again.

> If it *ever* froze before ifstated/squid/dante/whatever was started,
> then they are not required for it to occur. 

If it is a single bug you are totally right.

Yes and you are right. If I only focus on a single bug, it is possible
to eliminate more cases.

> Indeed, happening that
> early tends to make me think it's something in the network stack, its
> configuration, and/or the network traffic that the machine sees.

Yes, this is why I powered off the DSL modem during boot.

> Hmm, you mention later that you _have_ reproduced it on other
> hardware; can you install the current snapshot on to that other
> hardware and test with it?  You could then leave your current box as
> is...

Will put this on my ToDo list.

> No traffic but it freezes.  Does your building/area get power "brownouts"?

During thunderstorms if lightnings hit the infrastructure im my area,
yes. But no such event occured when the freeze happend. No flashing/dimming
lights in this room.

> > But the network is still on topic: Someone claimed he had no freezes
> > if he disabled logging in PF. pflogd is started _after_ PF is enabled.
> > Did anyone check what happens if pflogd is started before PF? Maybe I
> > give it a try. It's just I feel uncomfortable in hacking /etc/rc. This
> > file is not intended to be changed by users, right?
> 
> That seems very unlikely.  If it was something like that I would
> expect it to be more consistent.

Thanks. So no need to do such hacks.

BTW: No freeze for three days (about 8 times booted the box) now.
*knock*on*wood*

 - Eps



D.ROOT-SERVERS.NET.

2013-01-03 Thread epsilon
Hi,

according to

ftp://rs.internic.net/domain/named.root

D.ROOT-SERVERS.NET changed it's IPv4 address.


Index: etc/bind/root.hint
===
RCS file: /cvs/src/etc/bind/root.hint,v
retrieving revision 1.9
diff -u -p -r1.9 root.hint
--- etc/bind/root.hint  22 Jun 2011 05:22:20 -  1.9
+++ etc/bind/root.hint  3 Jan 2013 17:25:45 -
@@ -33,7 +33,7 @@ C.ROOT-SERVERS.NET.  360  A 
 ; FORMERLY TERP.UMD.EDU
 ;
 .360  NSD.ROOT-SERVERS.NET.
-D.ROOT-SERVERS.NET.  360  A 128.8.10.90
+D.ROOT-SERVERS.NET.  360  A 199.7.91.13
 D.ROOT-SERVERS.NET. 360    2001:500:2D::D
 ;
 ; FORMERLY NS.NASA.GOV
Index: usr.sbin/bind/lib/dns/rootns.c
===
RCS file: /cvs/src/usr.sbin/bind/lib/dns/rootns.c,v
retrieving revision 1.6
diff -u -p -r1.6 rootns.c
--- usr.sbin/bind/lib/dns/rootns.c  7 Feb 2008 09:14:47 -   1.6
+++ usr.sbin/bind/lib/dns/rootns.c  3 Jan 2013 17:26:44 -
@@ -63,7 +63,7 @@ static char root_ns[] =
 "A.ROOT-SERVERS.NET. 360 IN  2001:503:BA3E::2:30\n"
 "B.ROOT-SERVERS.NET. 360 IN  A   192.228.79.201\n"
 "C.ROOT-SERVERS.NET. 360 IN  A   192.33.4.12\n"
-"D.ROOT-SERVERS.NET. 360 IN  A   128.8.10.90\n"
+"D.ROOT-SERVERS.NET. 360 IN  A   199.7.91.13\n"
 "E.ROOT-SERVERS.NET. 360 IN  A   192.203.230.10\n"
 "F.ROOT-SERVERS.NET. 360 IN  A   192.5.5.241\n"
 "F.ROOT-SERVERS.NET. 360 IN  2001:500:2F::F\n"
Index: usr.sbin/unbound/iterator/iter_hints.c
===
RCS file: /cvs/src/usr.sbin/unbound/iterator/iter_hints.c,v
retrieving revision 1.1.1.1
diff -u -p -r1.1.1.1 iter_hints.c
--- usr.sbin/unbound/iterator/iter_hints.c  26 Mar 2012 18:05:43 -  
1.1.1.1
+++ usr.sbin/unbound/iterator/iter_hints.c  3 Jan 2013 17:26:47 -
@@ -119,7 +119,7 @@ compile_time_root_prime(struct regional*
if(!ah(dp, r, "A.ROOT-SERVERS.NET.", "198.41.0.4")) return 0;
if(!ah(dp, r, "B.ROOT-SERVERS.NET.", "192.228.79.201")) return 0;
if(!ah(dp, r, "C.ROOT-SERVERS.NET.", "192.33.4.12"))return 0;
-   if(!ah(dp, r, "D.ROOT-SERVERS.NET.", "128.8.10.90"))return 0;
+   if(!ah(dp, r, "D.ROOT-SERVERS.NET.", "199.7.91.13"))return 0;
if(!ah(dp, r, "E.ROOT-SERVERS.NET.", "192.203.230.10")) return 0;
if(!ah(dp, r, "F.ROOT-SERVERS.NET.", "192.5.5.241"))return 0;
if(!ah(dp, r, "G.ROOT-SERVERS.NET.", "192.112.36.4"))   return 0;
Index: usr.sbin/unbound/ldns/drill/root.c
===
RCS file: /cvs/src/usr.sbin/unbound/ldns/drill/root.c,v
retrieving revision 1.1.1.1
diff -u -p -r1.1.1.1 root.c
--- usr.sbin/unbound/ldns/drill/root.c  26 Mar 2012 18:08:26 -  1.1.1.1
+++ usr.sbin/unbound/ldns/drill/root.c  3 Jan 2013 17:26:47 -
@@ -32,7 +32,7 @@ init_root(void)
ldns_rr_list_push_rr(global_dns_root, r);
(void)ldns_rr_new_frm_str(&r, "C.ROOT-SERVERS.NET.  360  A  
   192.33.4.12", 0, NULL, NULL);
ldns_rr_list_push_rr(global_dns_root, r);
-   (void)ldns_rr_new_frm_str(&r, "D.ROOT-SERVERS.NET.  360  A  
   128.8.10.90", 0, NULL, NULL);
+   (void)ldns_rr_new_frm_str(&r, "D.ROOT-SERVERS.NET.  360  A  
   199.7.91.13", 0, NULL, NULL);
ldns_rr_list_push_rr(global_dns_root, r);
(void)ldns_rr_new_frm_str(&r, "E.ROOT-SERVERS.NET.  360  A  
   192.203.230.10", 0, NULL, NULL);
ldns_rr_list_push_rr(global_dns_root, r);