Re: impossible packet length ...
I'm reposting this to hackers, and there is some more info. > Hi, > on 2 different servers, running 7.1-stable + zfs, I get this > error rather frequently: > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (543383918) from > nfs server sunfire:/dist > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1936028704) from > nfs server sunfire:/dist > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1869363744) from > nfs server sunfire:/dist > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1667787057) from > nfs server sunfire:/dist > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (976040755) from > nfs server sunfire:/dist > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1953459488) from > nfs server sunfire:/dist > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1348825156) from > nfs server sunfire:/dist > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (0) from nfs > server > sunfire:/dist > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1647208041) from > nfs server sunfire:/dist > > in this case the server is running Freebsd-7.0-stable, but I also get it when > the server is a > netapp. > > is there a connection? > > thanks, > danny going through the logs, after it happened again, I got a glimps of this: Feb 6 18:00:13 warhol-00.cs.huji.ac.il kernel: bce0: discard frame w/o leading ethernet header (len 0 pkt len 0) Feb 6 18:00:19 klee-05.cs.huji.ac.il kernel: nfs: server warhol-00 not responding, timed out ... Feb 6 19:00:00 warhol-00.cs.huji.ac.il amd[715]: More than a single value for /defaults in hesiod.local Feb 6 19:00:00 warhol-00.cs.huji.ac.il amd[715]: Unknown $ sequence in "rhost:=${RHOST};type:=nfsl;fs:=${FS};rfs:=$huldig#^ZM-^KoM- abase" Feb 6 19:00:00 warhol-00.cs.huji.ac.il kernel: impossible packet length (2068989523) from nfs server sunfire:/dist which seems to point fingers at bce... danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: impossible packet length ...
> > --jI8keyz6grp/JLjh > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > Content-Transfer-Encoding: quoted-printable > > On 2009-Feb-08 10:45:13 +0200, Danny Braniss wrote: > >Feb 6 18:00:13 warhol-00.cs.huji.ac.il kernel: bce0: discard frame w/o=20 > >leading ethernet header (len 0 pkt len 0) > =2E.. > >Feb 6 19:00:00 warhol-00.cs.huji.ac.il amd[715]: Unknown $ sequence in=20 > >"rhost:=3D${RHOST};type:=3Dnfsl;fs:=3D${FS};rfs:=3D$huldig#^ZM-^KoM- a= > base" > >Feb 6 19:00:00 warhol-00.cs.huji.ac.il kernel: impossible packet length= > =20 > >(2068989523) from nfs server sunfire:/dist > > > >which seems to point fingers at bce... > > It does rather suggest that bce is not behaving. What happens if you > turn off checksum off-loading? This should make the kernel drop the > corrupt packets instead of trying to process them. If practical, you > could also try (temporarily) plugging in a different NIC. > I have, and now it's a matter of waiting... Q: with rxcsum on, and a bad checksum packet is received, is it dropped by the NIC? if not, then it somewhat explains the behaviour changing the nic is tough, but if needed will be done. danny > Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: impossible packet length ...
On 2009-Feb-08 11:31:45 +0200, Danny Braniss wrote: >Q: with rxcsum on, and a bad checksum packet is received, is it > dropped by the NIC? if not, then it somewhat explains the behaviour If checksum offloading is working correctly then a bad packet should be dropped by the NIC. If checksum offloading isn't working correctly then you can wind up in the situation where both the NIC and the driver think the other party has verified the checksum. It's also possible that you may be running into corruption during DMA transfer from the NIC to RAM. ISTR there have been some issues reported recently with checksum offloading on some NICs - though I don't have details to hand - you might like to search the lists. >changing the nic is tough, but if needed will be done. If disabling checksum offloading fixes the problem and the additional CPU load is acceptable (at least until you find a real fix) then there's no need to change NICs. -- Peter Jeremy pgpvjmeZt076h.pgp Description: PGP signature
Re: impossible packet length ...
On Sun, Feb 08, 2009 at 10:45:13AM +0200, Danny Braniss wrote: > I'm reposting this to hackers, and there is some more info. > > > Hi, > > on 2 different servers, running 7.1-stable + zfs, I get this > > error rather frequently: > > > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (543383918) from > > nfs server sunfire:/dist > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1936028704) > > from > > nfs server sunfire:/dist > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1869363744) > > from > > nfs server sunfire:/dist > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1667787057) > > from > > nfs server sunfire:/dist > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (976040755) from > > nfs server sunfire:/dist > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1953459488) > > from > > nfs server sunfire:/dist > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1348825156) > > from > > nfs server sunfire:/dist > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (0) from nfs > > server > > sunfire:/dist > > Feb 5 17:01:03 warhol-00 kernel: impossible packet length (1647208041) > > from > > nfs server sunfire:/dist > > > > in this case the server is running Freebsd-7.0-stable, but I also get it > > when > > the server is a > > netapp. > > > > is there a connection? > > > > thanks, > > danny > > going through the logs, after it happened again, I got a glimps of this: > > Feb 6 18:00:13 warhol-00.cs.huji.ac.il kernel: bce0: discard frame w/o > leading ethernet header (len 0 pkt len 0) > Feb 6 18:00:19 klee-05.cs.huji.ac.il kernel: nfs: server warhol-00 not > responding, timed out > ... > Feb 6 19:00:00 warhol-00.cs.huji.ac.il amd[715]: More than a single value > for > /defaults in hesiod.local > Feb 6 19:00:00 warhol-00.cs.huji.ac.il amd[715]: Unknown $ sequence in > "rhost:=${RHOST};type:=nfsl;fs:=${FS};rfs:=$huldig#^ZM-^KoM- abase" > Feb 6 19:00:00 warhol-00.cs.huji.ac.il kernel: impossible packet length > (2068989523) from nfs server sunfire:/dist > > which seems to point fingers at bce... bce(4) is broken in stable, your best option is to revert to the driver in releng 7.1. pgpYYfqRaG8TK.pgp Description: PGP signature
Possible VFS KPI and KBI breakage on stable/7
There are three sets of changes that would benefit stable/7. Namely, there are 1. Improvements for the UFS unmount or rw->ro remount, that perform suspension during the operation. The changes depend on the the suspension mechanism path, that introduced the suspension owner, and added new VFS OP into the mount method table. This might also fix the hangs with gjournal or gjournal together with snapshots experienced by some users. Since the only real consumer of the suspension is UFS, I believe that MFC would have quite low impact, if any. Corresponding revision is 183073. 2. The openat(2) and similar syscalls. The new ZFS requires openat() functionality. We have to change struct nameidata to merge NDINIT_ATVP(). All modules using namei() need to be recompiled. 3. The Marcus' work on vn_fullpath() support for synthetic filesystems introduces new VOP, vop_vptocnp. This would allow procstat(1) to work on devfs and pseudofs vnodes. As I understand, this would also improve Gnome experience on FreeBSD. All fs modules need to be recompiled. There was one very magisterial voice that objected against KBI breakage on stable branch in principle. In my opinion, the benefits of the bug fixes and functionality improvements with the proposed merges are much greater then inconvenience of the need to recompile out-of-tree fs modules. Changes were discussed with re@ to some extent. In case there is vocal objection against the merge, I would abstain from doing this. pgpoffmkW1qOg.pgp Description: PGP signature
Re: impossible packet length ...
On Sun, 8 Feb 2009, Peter Jeremy wrote: On 2009-Feb-08 11:31:45 +0200, Danny Braniss wrote: Q: with rxcsum on, and a bad checksum packet is received, is it dropped by the NIC? if not, then it somewhat explains the behaviour If checksum offloading is working correctly then a bad packet should be dropped by the NIC. If checksum offloading isn't working correctly then you can wind up in the situation where both the NIC and the driver think the other party has verified the checksum. It's also possible that you may be running into corruption during DMA transfer from the NIC to RAM. ISTR there have been some issues reported recently with checksum offloading on some NICs - though I don't have details to hand - you might like to search the lists. changing the nic is tough, but if needed will be done. If disabling checksum offloading fixes the problem and the additional CPU load is acceptable (at least until you find a real fix) then there's no need to change NICs. Actually, my understanding was that packets with bad checksums are delivered to software, and flag the descriptor ring header for each packet tells us whether the checksum was (a) checked and (b) validated by the hardware. We then propagate these to mbuf flags so that higher stack layers know whether or not to calculate the checksum themselves. Regardless of the specifics, though, packets with checked but bad checksums shouldn't make it to the socket layer where they would be visible to NFS. If the NIC is marking apparently bad packets as good, there are a number of possible sources -- be it bad checksum handling in the card, corruption between the card and higher levels of the stack (a DMA problem, as you point out, would have this symptom). Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problems with 7.1 locking up :-(
> load. Kip Macy has corrected at least one (both?) problems in head, and > plans to MFC the fixes in the near future. We'll follow up further once > the fixes are merged, and if any further problems transpire. Hi, just wondering if we are any closer to having the MFC for this yet, or if there are any patches I could test ? cheers, -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: impossible packet length ...
> On Sun, 8 Feb 2009, Peter Jeremy wrote: > > > On 2009-Feb-08 11:31:45 +0200, Danny Braniss wrote: > >> Q: with rxcsum on, and a bad checksum packet is received, is it > >> dropped by the NIC? if not, then it somewhat explains the behaviour > > > > If checksum offloading is working correctly then a bad packet should be > > dropped by the NIC. If checksum offloading isn't working correctly then > > you > > can wind up in the situation where both the NIC and the driver think the > > other party has verified the checksum. It's also possible that you may be > > running into corruption during DMA transfer from the NIC to RAM. ISTR > > there > > have been some issues reported recently with checksum offloading on some > > NICs - though I don't have details to hand - you might like to search the > > lists. > > > >> changing the nic is tough, but if needed will be done. > > > > If disabling checksum offloading fixes the problem and the additional CPU > > load is acceptable (at least until you find a real fix) then there's no > > need > > to change NICs. > > Actually, my understanding was that packets with bad checksums are delivered > to software, and flag the descriptor ring header for each packet tells us > whether the checksum was (a) checked and (b) validated by the hardware. We > then propagate these to mbuf flags so that higher stack layers know whether > or > not to calculate the checksum themselves. Regardless of the specifics, > though, packets with checked but bad checksums shouldn't make it to the > socket > layer where they would be visible to NFS. If the NIC is marking apparently > bad packets as good, there are a number of possible sources -- be it bad > checksum handling in the card, corruption between the card and higher levels > of the stack (a DMA problem, as you point out, would have this symptom). looking at the bce source, it's not clear (to me :-). If errors are detected in bce_rx_intr(), the packet gets dropped, which I would expect to be the treatment of an offloded chekcum error, but it seems that is not the case. danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: impossible packet length ...
On Sun, 8 Feb 2009, Danny Braniss wrote: looking at the bce source, it's not clear (to me :-). If errors are detected in bce_rx_intr(), the packet gets dropped, which I would expect to be the treatment of an offloded chekcum error, but it seems that is not the case. I think we're thinking of different checksums -- devices/device drivers drop frames with bad ethernet checksums, but not IP and above layer checksums. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: impossible packet length ...
On 2009-Feb-08 10:45:13 +0200, Danny Braniss wrote: >Feb 6 18:00:13 warhol-00.cs.huji.ac.il kernel: bce0: discard frame w/o >leading ethernet header (len 0 pkt len 0) ... >Feb 6 19:00:00 warhol-00.cs.huji.ac.il amd[715]: Unknown $ sequence in >"rhost:=${RHOST};type:=nfsl;fs:=${FS};rfs:=$huldig#^ZM-^KoM- abase" >Feb 6 19:00:00 warhol-00.cs.huji.ac.il kernel: impossible packet length >(2068989523) from nfs server sunfire:/dist > >which seems to point fingers at bce... It does rather suggest that bce is not behaving. What happens if you turn off checksum off-loading? This should make the kernel drop the corrupt packets instead of trying to process them. If practical, you could also try (temporarily) plugging in a different NIC. -- Peter Jeremy pgpuJJeSAGTcl.pgp Description: PGP signature
Re: impossible packet length ...
> > On Sun, 8 Feb 2009, Danny Braniss wrote: > > > looking at the bce source, it's not clear (to me :-). If errors are > > detected > > in bce_rx_intr(), the packet gets dropped, which I would expect to be the > > treatment of an offloded chekcum error, but it seems that is not the case. > > I think we're thinking of different checksums -- devices/device drivers drop > frames with bad ethernet checksums, but not IP and above layer checksums. I know I'm stepping on thin ice hear - haven't touched Stevens for a while, (and I doubt it mentions offloading), but if the offload checksum is bad, why not just drop the packet? The way I read the driver, if the offload checksum is on, and if no errors where detected, then it's marked as ok. danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: impossible packet length ...
On Sun, 8 Feb 2009, Danny Braniss wrote: On Sun, 8 Feb 2009, Danny Braniss wrote: looking at the bce source, it's not clear (to me :-). If errors are detected in bce_rx_intr(), the packet gets dropped, which I would expect to be the treatment of an offloded chekcum error, but it seems that is not the case. I think we're thinking of different checksums -- devices/device drivers drop frames with bad ethernet checksums, but not IP and above layer checksums. I know I'm stepping on thin ice hear - haven't touched Stevens for a while, (and I doubt it mentions offloading), but if the offload checksum is bad, why not just drop the packet? The way I read the driver, if the offload checksum is on, and if no errors where detected, then it's marked as ok. There are a few good reasons I can think of, but this is hardly a comprehensive list: (1) If there are bad higher level checksums on the wire, you want to see them in tcpdump, so allow them to get up to a higher layer if network layer checksums aren't good. (2) It's a matter of local policy as to whether UDP checksums (for example) are observed or not. (3) If you're forwarding or bridging packets, it should be up to the end nodes how they deal with bad UDP checksums on packets to them, not the routers. Looking at if_bce.c, the following seems to be reasonable logic; first, ethernet-layer checksums: 5902 /* Check the received frame for errors. */ 5903 if (status & (L2_FHDR_ERRORS_BAD_CRC | 5904 L2_FHDR_ERRORS_PHY_DECODE | L2_FHDR_ERRORS_ALIGNMENT | 5905 L2_FHDR_ERRORS_TOO_SHORT | L2_FHDR_ERRORS_GIANT_FRAME)) { 5906 5907 /* Log the error and release the mbuf. */ 5908 ifp->if_ierrors++; 5909 DBRUN(sc->l2fhdr_status_errors++); 5910 5911 m_freem(m0); 5912 m0 = NULL; 5913 goto bce_rx_int_next_rx; 5914 } I.e., if there are ethernet-level CRC failures, drop the packet. 5922 /* Validate the checksum if offload enabled. */ 5923 if (ifp->if_capenable & IFCAP_RXCSUM) { 5924 5925 /* Check for an IP datagram. */ 5926 if (!(status & L2_FHDR_STATUS_SPLIT) && 5927 (status & L2_FHDR_STATUS_IP_DATAGRAM)) { 5928 m0->m_pkthdr.csum_flags |= CSUM_IP_CHECKED; 5929 5930 /* Check if the IP checksum is valid. */ 5931 if ((l2fhdr->l2_fhdr_ip_xsum ^ 0x) == 0) 5932 m0->m_pkthdr.csum_flags |= CSUM_IP_VALID; 5933 } 5934 5935 /* Check for a valid TCP/UDP frame. */ 5936 if (status & (L2_FHDR_STATUS_TCP_SEGMENT | 5937 L2_FHDR_STATUS_UDP_DATAGRAM)) { 5938 5939 /* Check for a good TCP/UDP checksum. */ 5940 if ((status & (L2_FHDR_ERRORS_TCP_XSUM | 5941 L2_FHDR_ERRORS_UDP_XSUM)) == 0) { 5942 m0->m_pkthdr.csum_data = 5943 l2fhdr->l2_fhdr_tcp_udp_xsum; 5944 m0->m_pkthdr.csum_flags |= (CSUM_DATA_VALID 5945 | CSUM_PSEUDO_HDR); 5946 } 5947 } 5948 } Only look at higher level checksums if policy enables it on the interface; then, only if the hardware has a view on the IP-layer checksums, propagte that information to the mbuf flags from the descriptor ring entry flags, both whether or not the checksum was verified, and whether or not it was good. If policy disables it, or the hardware expresses no view, we don't set flags, which simply defers checksumming to a higher layer (if required -- for forwarded packets, we won't test UDP-layer checksums at all). Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problems with 7.1 locking up :-(
Hi all, In this thread someone mention a problem with soekris devices. I personally have one of those new soekris devices and installed 7.1R and it is very easy to freeze it. All that I have to do is to copy big file vfer WIFI (atheros) with speed higher then 1-2MB/s. It takes less then 2 minutes to freeze. I wonder if there is some improvement in 7.1-stable so I can try it or if I can help by compiling debug kernel? But I'm not sure if this is the same problem as it may be just the wireless driver in my case. On Feb 8, 2009, at 3:11 PM, Pete French wrote: load. Kip Macy has corrected at least one (both?) problems in head, and plans to MFC the fixes in the near future. We'll follow up further once the fixes are merged, and if any further problems transpire. Hi, just wondering if we are any closer to having the MFC for this yet, or if there are any patches I could test ? cheers, -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org " -- Best Wishes, Stefan Lambrev ICQ# 24134177 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problems with 7.1 locking up :-(
On Sun, Feb 08, 2009 at 05:11:02PM +0200, Stefan Lambrev wrote: > Hi all, > > In this thread someone mention a problem with soekris devices. > I personally have one of those new soekris devices and installed 7.1R > and it is very easy to freeze it. > All that I have to do is to copy big file vfer WIFI (atheros) with > speed higher then 1-2MB/s. > It takes less then 2 minutes to freeze. I wonder if there is some > improvement > in 7.1-stable so I can try it or if I can help by compiling debug > kernel? > But I'm not sure if this is the same problem as it may be just the > wireless driver in my case. One some net4801's without WIFI, I also experience frequent freezes after a couple of hours up to 2-5 days... so it's probably not only ath related. What's your kern.hz value? In my /boot/loader.conf, it is set to 100. Could you try it too, and see if you can still freeze the box (just to rule out some weird timing / interrupt issue)? > Best Wishes, > Stefan Lambrev > ICQ# 24134177 Regards, -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Big problems with 7.1 locking up :-(
At 10:11 AM 2/8/2009, Stefan Lambrev wrote: Hi all, In this thread someone mention a problem with soekris devices. I personally have one of those new soekris devices and installed 7.1R and it is very easy to freeze it. All that I have to do is to copy big file vfer WIFI (atheros) with speed higher then 1-2MB/s. Try and copy across the ethernet. I have several RELENG_7 boxes deployed on soekris and Alix boards (same chipset pretty well) and have not seen any stability issues. ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Broken loader on 7.1-STABLE?
Mark Kirkwood wrote: ...specifying /boot/loader.old got me booted ok (not sure why this *didn't* work with the Asus, maybe I need to try it again with the Feb sources). I tried the latest RELENG_7 sources, same result - does *not* boot even specifying the old loader. I spent a bit of time narrowing down why. I'd previously noted that an empty loader.conf was sufficient to get it to boot again. After some experimentation I discovered that this line in loader.conf: sound_load="YES" made the boot with the old loader fail (loading the sound module after booting seems to work ok). The box is an Asus a8vx with amd64 x2 3800+, running i386. regards Mark ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Unhappy Xorg upgrade
Bruce M. Simpson wrote: S.N.Grigoriev wrote: I thank you for your response. I've applied the patch to pci.c from kern/130957. Unfortunately there are no positive results. USB is still unreachable with X. Just following up to confirm that you are seeing exactly the same symptoms with USB and Xorg 7.4 as I see on my amd64 desktop running 7-STABLE from 00:00 UTC on this Wednesday. I still see the USB symptoms with xorg-server port as of today -- forced rebuild with libpciaccess also. So amd64 is still regressed -- USB is totally unusable there after X is started. My theory was that somehow Xorg was stomping on the USB controller registers on this machine. The USB controller on this box is ALi, card=0x81561043. My i386 laptop (IBM/Lenovo T43) is not affected, and USB mice work just fine there. Obviously it's difficult to check what Xorg is actually doing to the registers on the box w/o a PCI bus analyzer, and of course due to normal decoding, those cycles probably won't be seen on the backplane itself as it sits behind a bridge; I haven't fully read what libpciaccess is doing. I skimmed patch-src-freebsd_pci.c. I wonder if this code may be stomping on the USB controller in some way (i.e. how it frobs the BARs). According to src/tools/tools/pciroms, the only PCI devices on this box with ROM BARs are mskc0 and vgapci0. (I also wonder if it's possible to guarantee that the window at 0xC is always going to be available, even in the amd64 case.) cheers BMS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: impossible packet length ...
On Feb 8, 2009, at 3:31 AM, Danny Braniss wrote: --jI8keyz6grp/JLjh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2009-Feb-08 10:45:13 +0200, Danny Braniss wrote: Feb 6 18:00:13 warhol-00.cs.huji.ac.il kernel: bce0: discard frame w/o=20 leading ethernet header (len 0 pkt len 0) =2E.. Feb 6 19:00:00 warhol-00.cs.huji.ac.il amd[715]: Unknown $ sequence in=20 "rhost:=3D${RHOST};type:=3Dnfsl;fs:=3D${FS};rfs:=3D$huldig#^ZM- ^KoM- a= base" Feb 6 19:00:00 warhol-00.cs.huji.ac.il kernel: impossible packet length= =20 (2068989523) from nfs server sunfire:/dist which seems to point fingers at bce... It does rather suggest that bce is not behaving. What happens if you turn off checksum off-loading? This should make the kernel drop the corrupt packets instead of trying to process them. If practical, you could also try (temporarily) plugging in a different NIC. I have, and now it's a matter of waiting... Q: with rxcsum on, and a bad checksum packet is received, is it dropped by the NIC? if not, then it somewhat explains the behaviour changing the nic is tough, but if needed will be done. danny Peter Jeremy We were hitting this quite a bit (also bce), and updated to a recent 7- branch and it seems to be behaving better for now. Running 12 days so far (which is better than what we had been seeing). Eric ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
sysctl lock in RELENG_6
Hi! I've RELENG_6 system controlling local PBX through RS-232 port, sio(4). It also runs syslogd, cron, sshd, bsnmpd and sendmail for outgoing reports. It locks very often: it answers to pings but PBX controlling software stops responding, local and remote login attempts hang due to 'login' process stuck in 'sysctl lock' state. Local consoles do switch with 'Alt-Fn' and DDB works. It shows that sendmail is in 'sysctl lock' state too. This is NanoBSD installation running from IDE flash, it's swapless but I think I could manage to obtain crashdump if there is an interest of it. I've digged commit logs a bit and found this change MFC'd to RELENG_7 but not RELENG_6: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_sysctl.c#rev1.177.6.2 It seems RELENG_6 needs this too, doesn't it? I'm going to merge the change to RELENG_6 and give it a try. Eugene Grosbein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Unhappy Xorg upgrade
On Mon, 2009-02-09 at 02:08 +, Bruce M. Simpson wrote: > Bruce M. Simpson wrote: > > S.N.Grigoriev wrote: > >> I thank you for your response. I've applied the patch to pci.c from > >> kern/130957. Unfortunately there are no positive results. USB is still > >> unreachable with X. > > > > Just following up to confirm that you are seeing exactly the same > > symptoms with USB and Xorg 7.4 as I see on my amd64 desktop running > > 7-STABLE from 00:00 UTC on this Wednesday. > > I still see the USB symptoms with xorg-server port as of today -- forced > rebuild with libpciaccess also. So amd64 is still regressed -- USB is > totally unusable there after X is started. My theory was that somehow > Xorg was stomping on the USB controller registers on this machine. The > USB controller on this box is ALi, card=0x81561043. > > My i386 laptop (IBM/Lenovo T43) is not affected, and USB mice work just > fine there. > > Obviously it's difficult to check what Xorg is actually doing to the > registers on the box w/o a PCI bus analyzer, and of course due to normal > decoding, those cycles probably won't be seen on the backplane itself as > it sits behind a bridge; I haven't fully read what libpciaccess is doing. > > I skimmed patch-src-freebsd_pci.c. I wonder if this code may be stomping > on the USB controller in some way (i.e. how it frobs the BARs). Until last night, it only probed pci resources for pci class DISPLAY subclass VGA. The rom reading was restricted to 0xc/0x1, which it mmap and copied out to a userland buffer. As of last night, I committed the code that actually checks for a pci rom. If it finds one, it uses those values (base address, length) to mmap the bios for copy. If it doesn't find a pci rom, (most IGDs (intel, via, sis) it just uses the 0xc mapping as it did before if it is i386 or amd64. Otherwise, bios reading just fails. robert. > According to src/tools/tools/pciroms, the only PCI devices on this box > with ROM BARs are mskc0 and vgapci0. > > (I also wonder if it's possible to guarantee that the window at 0xC > is always going to be available, even in the amd64 case.) > > cheers > BMS -- Robert Noland FreeBSD signature.asc Description: This is a digitally signed message part
Re: Unhappy Xorg upgrade
On Mon, 2009-02-09 at 02:08 +, Bruce M. Simpson wrote: > Bruce M. Simpson wrote: > > S.N.Grigoriev wrote: > >> I thank you for your response. I've applied the patch to pci.c from > >> kern/130957. Unfortunately there are no positive results. USB is still > >> unreachable with X. > > > > Just following up to confirm that you are seeing exactly the same > > symptoms with USB and Xorg 7.4 as I see on my amd64 desktop running > > 7-STABLE from 00:00 UTC on this Wednesday. > > I still see the USB symptoms with xorg-server port as of today -- forced > rebuild with libpciaccess also. So amd64 is still regressed -- USB is > totally unusable there after X is started. My theory was that somehow > Xorg was stomping on the USB controller registers on this machine. The > USB controller on this box is ALi, card=0x81561043. Is your usb sharing interrupts with the video card? Does the issue occur if you aren't using a usb mouse? robert. > My i386 laptop (IBM/Lenovo T43) is not affected, and USB mice work just > fine there. > > Obviously it's difficult to check what Xorg is actually doing to the > registers on the box w/o a PCI bus analyzer, and of course due to normal > decoding, those cycles probably won't be seen on the backplane itself as > it sits behind a bridge; I haven't fully read what libpciaccess is doing. > > I skimmed patch-src-freebsd_pci.c. I wonder if this code may be stomping > on the USB controller in some way (i.e. how it frobs the BARs). > > According to src/tools/tools/pciroms, the only PCI devices on this box > with ROM BARs are mskc0 and vgapci0. > > (I also wonder if it's possible to guarantee that the window at 0xC > is always going to be available, even in the amd64 case.) > > cheers > BMS -- Robert Noland FreeBSD signature.asc Description: This is a digitally signed message part
7.1 Panic on degraded disk w/mpt
Howdy, I dug around and can't find a PR on this, and the only other report I saw was in this mailing list post that has no replies: http://www.nabble.com/7.1-BETA2-panic-on-mpt-degrade-td20183173.html The hardware is a Dell PowerEdge 860 with the Dell/LSI SAS5 controller: mpt0: port 0xec00-0xecff mem 0xfe9fc000-0xfe9f,0xfe9e-0xfe9e irq 16 at device 8.0 on pci2 mpt0: MPI Version=1.5.13.0 The panic is repeatable by forcing the array into a degraded state. Here's my best shot at getting info out of kgdb: [r...@uniweb /home/spork]# cd /usr/obj/usr/src/sys/BWAY7/ [r...@uniweb /usr/obj/usr/src/sys/BWAY7]# kgdb kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x14 fault code = supervisor read, page not present instruction pointer = 0x20:0xc044b09b stack pointer = 0x28:0xe6ee5b80 frame pointer = 0x28:0xe6ee5b9c code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 17 (swi2: cambio) trap number = 12 panic: page fault cpuid = 0 Uptime: 3m7s Physical memory: 3575 MB Dumping 94 MB: 79 63 47 31 15 Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done. done. Loaded symbols for /boot/kernel/acpi.ko #0 doadump () at pcpu.h:196 196 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) list *0xc044b09b 0xc044b09b is in xpt_done (/usr/src/sys/cam/cam_xpt.c:4832). 4827if ((done_ccb->ccb_h.func_code & XPT_FC_QUEUED) != 0) { 4828/* 4829 * Queue up the request for handling by our SWI handler 4830 * any of the "non-immediate" type of ccbs. 4831 */ 4832sim = done_ccb->ccb_h.path->bus->sim; 4833switch (done_ccb->ccb_h.path->periph->type) { 4834case CAM_PERIPH_BIO: 4835TAILQ_INSERT_TAIL(&sim->sim_doneq, &done_ccb->ccb_h, 4836 sim_links.tqe); (kgdb) backtrace #0 doadump () at pcpu.h:196 #1 0xc061d0f7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xc061d3c9 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0xc0865fcc in trap_fatal (frame=0xe6ee5b40, eva=20) at /usr/src/sys/i386/i386/trap.c:939 #4 0xc0866230 in trap_pfault (frame=0xe6ee5b40, usermode=0, eva=20) at /usr/src/sys/i386/i386/trap.c:852 #5 0xc0866bc2 in trap (frame=0xe6ee5b40) at /usr/src/sys/i386/i386/trap.c:530 #6 0xc084d45b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 #7 0xc044b09b in xpt_done (done_ccb=0xc6bf5000) at /usr/src/sys/cam/cam_xpt.c:4832 #8 0xc044eee9 in xpt_scan_bus (periph=0xc6984b00, request_ccb=0xc6bf5000) at /usr/src/sys/cam/cam_xpt.c:5395 #9 0xc044d241 in camisr_runqueue (V_queue=Variable "V_queue" is not available. ) at /usr/src/sys/cam/cam_xpt.c:7316 #10 0xc044d39e in camisr (dummy=0x0) at /usr/src/sys/cam/cam_xpt.c:7216 #11 0xc05fb41b in ithread_loop (arg=0xc699d770) at /usr/src/sys/kern/kern_intr.c:1088 #12 0xc05f7f69 in fork_exit (callout=0xc05fb260 , arg=0xc699d770, frame=0xe6ee5d38) at /usr/src/sys/kern/kern_fork.c:810 #13 0xc084d4d0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264 I can supply dmesg, more info, make it crash more, etc. I suspect it will panic again when the rebuild completes, I'll capture that one as well. Please let me know how to proceed - I can open a PR if this is truly a bug, or bring it over to freebsd-scsi if more appropriate. Thanks, Charles ___ Charles Sprickman NetEng/SysAdmin Bway.net - New York's Best Internet - www.bway.net sp...@bway.net - 212.655.9344 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: impossible packet length ...
> > On Sun, 8 Feb 2009, Danny Braniss wrote: > > >> On Sun, 8 Feb 2009, Danny Braniss wrote: > >> > >>> looking at the bce source, it's not clear (to me :-). If errors are > >>> detected in bce_rx_intr(), the packet gets dropped, which I would expect > >>> to be the treatment of an offloded chekcum error, but it seems that is > >>> not > >>> the case. > >> > >> I think we're thinking of different checksums -- devices/device drivers > >> drop frames with bad ethernet checksums, but not IP and above layer > >> checksums. > > > > I know I'm stepping on thin ice hear - haven't touched Stevens for a while, > > (and I doubt it mentions offloading), but if the offload checksum is bad, > > why not just drop the packet? > > > > The way I read the driver, if the offload checksum is on, and if no errors > > where detected, then it's marked as ok. > > There are a few good reasons I can think of, but this is hardly a > comprehensive list: > > (1) If there are bad higher level checksums on the wire, you want to see them > in tcpdump, so allow them to get up to a higher layer if network layer > checksums aren't good. > > (2) It's a matter of local policy as to whether UDP checksums (for example) > are observed or not. > > (3) If you're forwarding or bridging packets, it should be up to the end nodes > how they deal with bad UDP checksums on packets to them, not the routers. ok, I can understand the logic. > > Looking at if_bce.c, the following seems to be reasonable logic; first, > ethernet-layer checksums: > > 5902 /* Check the received frame for errors. */ > 5903 if (status & (L2_FHDR_ERRORS_BAD_CRC | > 5904 L2_FHDR_ERRORS_PHY_DECODE | > L2_FHDR_ERRORS_ALIGNMENT | > 5905 L2_FHDR_ERRORS_TOO_SHORT | > L2_FHDR_ERRORS_GIANT_FRAME)) { > 5906 > 5907 /* Log the error and release the mbuf. */ > 5908 ifp->if_ierrors++; > 5909 DBRUN(sc->l2fhdr_status_errors++); > 5910 > 5911 m_freem(m0); > 5912 m0 = NULL; > 5913 goto bce_rx_int_next_rx; > 5914 } > > I.e., if there are ethernet-level CRC failures, drop the packet. > > 5922 /* Validate the checksum if offload enabled. */ > 5923 if (ifp->if_capenable & IFCAP_RXCSUM) { > 5924 > 5925 /* Check for an IP datagram. */ > 5926 if (!(status & L2_FHDR_STATUS_SPLIT) && > 5927 (status & L2_FHDR_STATUS_IP_DATAGRAM)) { > 5928 m0->m_pkthdr.csum_flags |= > CSUM_IP_CHECKED; > 5929 > 5930 /* Check if the IP checksum is valid. */ > 5931 if ((l2fhdr->l2_fhdr_ip_xsum ^ 0x) > == > 0) > 5932 m0->m_pkthdr.csum_flags |= > CSUM_IP_VALID; > 5933 } > 5934 > 5935 /* Check for a valid TCP/UDP frame. */ > 5936 if (status & (L2_FHDR_STATUS_TCP_SEGMENT | > 5937 L2_FHDR_STATUS_UDP_DATAGRAM)) { > 5938 > 5939 /* Check for a good TCP/UDP checksum. */ > 5940 if ((status & (L2_FHDR_ERRORS_TCP_XSUM | > 5941 L2_FHDR_ERRORS_UDP_XSUM)) > == 0) { > 5942 m0->m_pkthdr.csum_data = > 5943 l2fhdr->l2_fhdr_tcp_udp_xsum; > 5944 m0->m_pkthdr.csum_flags |= > (CSUM_DATA_VALID > 5945 | CSUM_PSEUDO_HDR); > 5946 } > 5947 } > 5948 } > > Only look at higher level checksums if policy enables it on the interface; > then, only if the hardware has a view on the IP-layer checksums, propagte > that > information to the mbuf flags from the descriptor ring entry flags, both > whether or not the checksum was verified, and whether or not it was good. If > policy disables it, or the hardware expresses no view, we don't set flags, > which simply defers checksumming to a higher layer (if required -- for > forwarded packets, we won't test UDP-layer checksums at all). I missed line 5928, and as usual, your explanation is most educational! The comment in line 5939 is a bit missleading, the way I read the code, it does not check for good checksum. Cheers, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"