ext4: recursive oops doing online resize on 3.8.3, flexbg-related

2013-03-19 Thread Nix
Yes, it's another issue of 'Nix uses experimental options long before they are fully baked and causes trouble after years in which they seemed to work perfectly fine'. So I tried to double the size of one of my ext4 filesystems just now, using x86-64 Linux 3.8.3. Like almost all

Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?

2012-10-01 Thread Nix
On 1 Oct 2012, Pierre Beck stated: > On 23.09.2012 17:42, Nix wrote: >> On 19 Sep 2012, Chris Murphy outgrape: >> >>> On Sep 19, 2012, at 12:52 PM, Nix wrote: >>> >>>> So I have this x86-64 server running Linux 3.5.1 with a SATA-on-PCIe >>>&g

Re: udev breakages -

2012-10-03 Thread Nix
On 3 Oct 2012, Al Viro spake thusly: > Looks sane. TBH, I'd still prefer to see udev forcibly taken over and put > into > usr/udev in kernel tree - I don't trust that crowd at all and the fewer > critical userland bits they can play leverage games with, the safer we are. > > Al, that -><- clos

Re: udev breakages -

2012-10-04 Thread Nix
[Kay removed because I don't like emailing arguable flamebait directly to the person flamed.] On 4 Oct 2012, n...@esperi.org.uk stated: > By udev 175 I, and a lot of other people, had simply stopped upgrading > udev entirely on the grounds that we could no longer tolerate the > uncertainty over w

Re: Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression?

2012-10-23 Thread Nix
On 23 Oct 2012, J. Bruce Fields uttered the following: > On Mon, Oct 22, 2012 at 05:17:04PM +0100, Nix wrote: >> I just had a panic/oops on upgrading from 3.6.1 to 3.6.3, after weeks of >> smooth operation on 3.6.1: one of the NFS changes that went into one of >> the two

Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)

2012-10-23 Thread Nix
On 23 Oct 2012, J. Bruce Fields uttered the following: > nfs-utils shouldn't be capable of oopsing the kernel, so from my > (selfish) point of view I'd actually rather you stick with whatever you > have and try to reproduce the oops. Reproduced in 3.6.3, not in 3.6.1, not tried 3.6.2. Capturing it

Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)

2012-10-23 Thread Nix
On 23 Oct 2012, Trond Myklebust spake thusly: > On Tue, 2012-10-23 at 12:46 -0400, J. Bruce Fields wrote: >> Looks like there's some confusion about whether nsm_client_get() returns >> NULL or an error? > > nsm_client_get() looks extremely racy in the case where ln->nsm_users == > 0. Since we neve

Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)

2012-10-23 Thread Nix
On 23 Oct 2012, n...@esperi.org.uk uttered the following: > On 23 Oct 2012, Trond Myklebust spake thusly: >> On Tue, 2012-10-23 at 12:46 -0400, J. Bruce Fields wrote: >>> Looks like there's some confusion about whether nsm_client_get() returns >>> NULL or an error? >> >> nsm_client_get() looks ext

Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)

2012-10-23 Thread Nix
On 23 Oct 2012, Trond Myklebust outgrape: > On Tue, 2012-10-23 at 13:57 -0400, Trond Myklebust wrote: >> On Tue, 2012-10-23 at 17:44 +, Myklebust, Trond wrote: >> > You can't hold a spinlock while sleeping. Both mutex_lock() and >> > nsm_create() can definitely sleep. >> > >> > The correct w

Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-23 Thread Nix
[Bruce, Trond, I fear it may be hard for me to continue chasing this NFS lockd crash as long as ext4 on 3.6.3 is hosing my filesystems like this. Apologies.] On 23 Oct 2012, n...@esperi.org.uk uttered the following: > Reproduced in 3.6.3, not in 3.6.1, not tried 3.6.2. Capturing it was > rendere

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-23 Thread Nix
On 23 Oct 2012, Theodore Ts'o said: > The reason why the problem happens rarely is that the effect of the > buggy commit is that if the journal's starting block is zero, we fail > to truncate the journal when we unmount the file system. Oh dear oh dear. >

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-23 Thread Nix
On 23 Oct 2012, Theodore Ts'o verbalised: > *Sigh*. My apologies for not catching this when I reviewed this > patch. I believe the following patch should fix the bug; once it's > reviewed by other ext4 developers, I'll push this to Linus ASAP. I note that the patch is in the latest stable releas

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-23 Thread Nix
On 24 Oct 2012, Theodore Ts'o told this: > hurt, but we do want to make 100% sure that it really fixes the > problem. Well, yes, that would be nice. I can certainly try to verify that it stops my filesystems getting corrupted. (And if so, I owe you a $BEVERAGE. Though I suspect I owe you about th

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-23 Thread Nix
On 24 Oct 2012, Eric Sandeen uttered the following: > On 10/23/12 3:57 PM, Nix wrote: >> The only unusual thing about the filesystems on this machine are that >> they have hardware RAID-5 (using the Areca driver), so I'm mounting with >> 'nobarrier': &g

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-24 Thread Nix
On 24 Oct 2012, Theodore Ts'o stated: > Journal flushes outside of an unmount does > happen as part of online resizing, the FIBMAP ioctl, or when the file > system is frozen. But it didn't sound like Toralf or Nix was using > any of those feat

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-24 Thread Nix
On 24 Oct 2012, Hugh Dickins verbalised: > On Wed, 24 Oct 2012, Theodore Ts'o wrote: >> Journal flushes outside of an unmount does >> happen as part of online resizing, the FIBMAP ioctl, or when the file >> system is frozen. But it didn't sound like Toralf o

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-24 Thread Nix
On 24 Oct 2012, Theodore Ts'o spake thusly: > Toralf, Nix, if you could try applying this patch (at the end of this > message), and let me know how and when the WARN_ON triggers, and if it > does, please send the empty_bug_workaround plus the WARN_ON(1) report. > I know about the

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-24 Thread Nix
On 24 Oct 2012, n...@esperi.org.uk uttered the following: > So, the net effect of this is that normally I get no journal recovery on > anything at all -- but sometimes, if umounting takes longer than a few > seconds, I reboot with not everything unmounted, and journal recovery > kicks in on reboot.

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-24 Thread Nix
On 24 Oct 2012, Eric Sandeen uttered the following: > On 10/24/2012 02:49 PM, Nix wrote: >> On 24 Oct 2012, Theodore Ts'o spake thusly: >>> Toralf, Nix, if you could try applying this patch (at the end of this >>> message), and let me know how and when the WARN

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-24 Thread Nix
On 24 Oct 2012, n...@esperi.org.uk spake thusly: > So, the net effect of this is that normally I get no journal recovery on > anything at all -- but sometimes, if umounting takes longer than a few > seconds, I reboot with not everything unmounted, and journal recovery > kicks in on reboot. It occu

Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)

2012-10-24 Thread Nix
On 24 Oct 2012, Theodore Ts'o verbalised: > On Wed, Oct 24, 2012 at 09:45:47PM +0100, Nix wrote: >> >> It occurs to me that it is possible that this bug hits only those >> filesystems for which a umount has started but been unable to complete. >> If so, this is a

Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)

2012-10-24 Thread Nix
On 25 Oct 2012, n...@esperi.org.uk said: > Even though my own system relies on the possibility of rebooting during > umount to reboot reliably, I'd be inclined to say 'not a bug, don't do > that then' -- except that this renders it unreliable to use umount -l to > unmount all the filesystems you ca

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-24 Thread Nix
On 24 Oct 2012, Theodore Ts'o uttered the following: > (Keep in mind this is why commercial software corporations like > Microsoft or Apple generally don't make discussions as they are trying > to root cause a problem public; sometimes the initial theories can be > incorrect, and it's unfortunate w

Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)

2012-10-24 Thread Nix
On 25 Oct 2012, Theodore Ts'o stated: > On Thu, Oct 25, 2012 at 12:27:02AM +0100, Nix wrote: >> >> - /sbin/reboot -f of running system >>-> Journal replay, no problems other than the expected free block >> count problems. This is not such a severe

Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)

2012-10-25 Thread Nix
On 25 Oct 2012, Theodore Ts'o stated: > Also, can you reproduce the problem with the nobarrier and > journal_async_commit options *removed*? Yes, I know you have battery > backup, but it would be interesting to see if the problem shows up in > the default configuration with none of the more specia

Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)

2012-10-25 Thread Nix
On 25 Oct 2012, Theodore Ts'o stated: > I've been thinking about this some more, and if you don't have a lot > of time, I've got time, but it's this weekend, not during the week :) > perhaps the most important test to do is this. Does the > chance of your seeing corrupted files in v3.6

Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)

2012-10-25 Thread Nix
On 25 Oct 2012, n...@esperi.org.uk said: > This I can verify, sometime this evening. Sometime *tomorrow* evening. This has been quite stressful and I can hardly keep my eyes open. I'm not doing anything risky in this state. -- NULL && (void) -- To unsubscribe from this list: send the line "unsu

Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??)

2012-10-25 Thread Nix
On 25 Oct 2012, Theodore Ts'o told this: > If that does make the problem go away, that will be a very interesting > data point I'll be looking at this tomorrow, but as sod's law would have it I have another user on this machine who didn't want it mega-rebooted tonight, so I was reduced to tryi

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix
On 26 Oct 2012, Theodore Ts'o spake thusly: > On Thu, Oct 25, 2012 at 08:11:12PM -0400, Ric Wheeler wrote: >> >> Sending this just to you two to avoid embarrassing myself if I >> misread the thread, but >> >> Can we reproduce this with any other hardware RAID card? Or with MD? > > There was

Upgrade to 2.6.24 breaks NFS service

2008-02-13 Thread Nix
I upgraded from 2.6.23.10 to 2.6.24.2 yesterday, and found NFS service failing. To be specific, all locks were blocking forever, with an endless flood of Feb 12 22:53:10 loki notice: kernel: statd: server localhost not responding, timed out Feb 12 22:53:10 loki notice: kernel: lockd: cannot moni

Re: Upgrade to 2.6.24 breaks NFS service

2008-02-13 Thread Nix
On 13 Feb 2008, Jeff Layton told this: > If upgrading nfs-utils doesn't help, on this box, could you run: > > # rpcinfo -p localhost > > send the output? statd expects that lockd will always be listening on a > UDP socket and some changes recently made it so that when there are > only TCP mounts t

Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?

2012-09-23 Thread Nix
On 19 Sep 2012, Chris Murphy outgrape: > > On Sep 19, 2012, at 12:52 PM, Nix wrote: > >> So I have this x86-64 server running Linux 3.5.1 with a SATA-on-PCIe >> Areca 1210 hardware RAID-5 controller > > Did you find this? Same controller family. Weird that this just

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix
On 26 Oct 2012, Martin spake thusly: > On 10/24/2012 07:38 PM, Martin wrote: >> On 10/24/2012 01:40 AM, Nix wrote: >> >>> It's true that in less than a week >>> probably not all that many people have rebooted often enough to trip >>> over this.

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix
On 26 Oct 2012, Eric Sandeen outgrape: > On 10/23/12 3:57 PM, Nix wrote: >> The only unusual thing about the filesystems on this machine are that >> they have hardware RAID-5 (using the Areca driver), so I'm mounting with >> 'nobarrier': the full set of opt

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix
On 26 Oct 2012, Martin said: > On 10/26/2012 10:24 PM, Nix wrote: >> On 26 Oct 2012, Martin spake thusly: >>> Computer is booted again in order to copy a few files to memory stick. >>> Unbeknownst to me, the following entries are logged in the >>> system

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix
On 26 Oct 2012, Theodore Ts'o stated: > On Fri, Oct 26, 2012 at 09:37:08PM +0100, Nix wrote: >> >> I can reproduce this on a small filesystem and stick the image somewhere >> if that would be of any use to anyone. (If I'm very lucky, merely making >> thi

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-26 Thread Nix
On 26 Oct 2012, Theodore Ts'o uttered the following: > The plan is that eventually, we will have checksums on a > per-journalled block basis, instead of a per-commit basis, and when we > get a failed checksum, we skip the replay of that block, But not of everything it implies, since that's quite

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-27 Thread Nix
[nfs people purged from Cc] On 27 Oct 2012, Theodore Ts'o verbalised: > Huh? It's not turned on by default. If you mount with no mount > options, journal checksums are *not* turned on. ?! it's turned on for me, and though I use weird mount options I don't use that one: /dev/main/var /var

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-27 Thread Nix
On 27 Oct 2012, Theodore Ts'o said: > On Sat, Oct 27, 2012 at 01:45:25PM +0100, Nix wrote: >> Ah! it's turned on by journal_async_commit. OK, that alone argues >> against use of journal_async_commit, tested or not, and I'd not have >> turned it on if I'd n

Re: [PATCH] ext4: fix unjournaled inode bitmap modification

2012-10-28 Thread Nix
On 28 Oct 2012, Eric Sandeen outgrape: > I've tested this by mounting with journal_checksum and > running fsstress then dropping power; I've also tested by > hacking DM to create snapshots w/o first quiescing, which > allows me to test journal replay repeatedly w/o actually > power-cycling the box

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

2012-10-28 Thread Nix
On 29 Oct 2012, Theodore Ts'o spake thusly: > commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 modified this function > such that the inode bitmap was being modified outside a transaction, > which could lead to corruption, and was discovered when journal_checksum > found a bad check

Re: [PATCH] scsi disk: Use its own buffer for the vpd request

2013-08-30 Thread Nix
On 1 Aug 2013, Bernd Schubert said: > Once I noticed that scsi_get_vpd_page() works fine from other function > calls and that it is not 0x89, but already 0x0 that fails fixing it became > easy. > > Nix, any chance you could verify it also works for you? As an aside, this commit

Re: [PATCH] scsi disk: Use its own buffer for the vpd request

2013-08-31 Thread Nix
On 31 Aug 2013, Greg KH said: > On Fri, Aug 30, 2013 at 11:01:56AM +0100, Nix wrote: >> On 1 Aug 2013, Bernd Schubert said: >> >> > Once I noticed that scsi_get_vpd_page() works fine from other function >> > calls and that it is not 0x89, but already 0x0 that

Re: udev breakages -

2012-10-06 Thread Nix
On 5 Oct 2012, Henrique de Moraes Holschuh told this: > On Fri, 05 Oct 2012, da...@lang.hm wrote: >> >On Thu, Oct 4, 2012 at 9:50 PM, Kurt H Maier wrote: >> >>On Wed, Oct 03, 2012 at 07:27:01PM +, Al Viro wrote: >> >>>Al, that -><- close to volunteering for maintaining that FPOS >> >>>kernel

Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression?

2012-10-22 Thread Nix
I just had a panic/oops on upgrading from 3.6.1 to 3.6.3, after weeks of smooth operation on 3.6.1: one of the NFS changes that went into one of the two latest stable kernels appears to be lethal after around half an hour of uptime. The oops came from NFSv4, IIRC (relying on memory since my camera

Re: Linux 3.7-rc4

2012-11-08 Thread Nix
On 4 Nov 2012, Linus Torvalds stated: > Perhaps notable just because of the noise it caused in certain > circles, there's the ext4 bitmap journaling fix for the issue that > caused such a ruckus. It's a tiny patch and despite all the noise > about it you couldn't actually trigger the problem unles

Re: NFS/lazy-umount/path-lookup-related panics at shutdown (at kill of processes on lazy-umounted filesystems) with 3.9.2 and 3.9.5

2013-06-12 Thread Nix
On 12 Jun 2013, Al Viro outgrape: > On Wed, Jun 12, 2013 at 01:08:26PM +0100, Nix wrote: > >> At this point, we have a sibcall to call_connect() I think. The RPC task >> of discourse happens to be local, and as the relevant comment says >> >> * We w

NFS/lazy-umount/path-lookup-related panics at shutdown (at kill of processes on lazy-umounted filesystems) with 3.9.2 and 3.9.5

2013-06-10 Thread Nix
Yes, my shutdown scripts are panicking the kernel again! They're not causing filesystem corruption this time, but it's still fs-related. Here's the 3.9.5 panic, seen on an x86-32 NFS client using NFSv3: NFSv4 was compiled in but not used. This happened when processes whose current directory was on

Re: NFS/lazy-umount/path-lookup-related panics at shutdown (at kill of processes on lazy-umounted filesystems) with 3.9.2 and 3.9.5

2013-06-11 Thread Nix
On 11 Jun 2013, Al Viro spake thusly: > On Mon, Jun 10, 2013 at 06:42:49PM +0100, Nix wrote: >> Yes, my shutdown scripts are panicking the kernel again! They're not >> causing filesystem corruption this time, but it's still fs-related. >> >> Here's the

Re: NFS/lazy-umount/path-lookup-related panics at shutdown (at kill of processes on lazy-umounted filesystems) with 3.9.2 and 3.9.5

2013-06-12 Thread Nix
On 12 Jun 2013, Al Viro told this: > On Mon, Jun 10, 2013 at 06:42:49PM +0100, Nix wrote: >> Yes, my shutdown scripts are panicking the kernel again! They're not >> causing filesystem corruption this time, but it's still fs-related. >> >> Here's the

Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-29 Thread Nix
On 29 Jul 2013, Bernd Schubert said: > Hi Nick, > > On 07/29/2013 12:10 PM, Nick Alcock wrote: >> arcmsr0: abort device command of scsi id = 0 lun = 1 >> arcmsr0: abort device command of scsi id = 0 lun = 0 >> arcmsr: executing bus reset eh.num_resets=0, num_[...] >> >> arcmsr0: wait 'abort al

Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-29 Thread Nix
On 29 Jul 2013, Bernd Schubert spake thusly: > On 07/29/2013 03:05 PM, Nix wrote: >> On 29 Jul 2013, Bernd Schubert said: >>> I tested this patch with ARC-1260 and F/W V1.49, no issues. Also, this >>> patch is only in 3.10.3, but not yet in 3.10.1. >> >> ...

Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-29 Thread Nix
On 29 Jul 2013, Bernd Schubert spake thusly: > Could you try to run these commands with 3.10.1? > > # # check if reporting opcodes works > # sg_opcodes -v -n /dev/sdX spindle:/boot# sg_opcodes -v -n /dev/sda inquiry cdb: 12 00 00 00 24 00 Report Supported Operation Codes cmd: a3 0c 00 00

Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-29 Thread Nix
On 29 Jul 2013, Bernd Schubert uttered the following: > On 07/29/2013 03:05 PM, Nix wrote: >> On 29 Jul 2013, Bernd Schubert said: >> >>> Hi Nick, >>> >>> On 07/29/2013 12:10 PM, Nick Alcock wrote: >>>> arcmsr0: abort device command of scs

Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-29 Thread Nix
On 30 Jul 2013, Douglas Gilbert outgrape: > Please supply the information that Martin Petersen asked > for. Did it in private IRC (the advantage of working for the same division of the same company!) I didn't realise the original fix was actually implemented to allow Bernd, with a different Arec

Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-07-30 Thread Nix
On 30 Jul 2013, Bernd Schubert told this: > On 07/30/2013 02:56 AM, Nix wrote: >> On 30 Jul 2013, Douglas Gilbert outgrape: >> >>> Please supply the information that Martin Petersen asked >>> for. >> >> Did it in private IRC (the advantage of working

Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup / early userspace transition

2013-08-01 Thread Nix
On 1 Aug 2013, Bernd Schubert verbalised: > On 07/30/2013 11:20 PM, Nix wrote: >> On 30 Jul 2013, Bernd Schubert told this: >> >>> On 07/30/2013 02:56 AM, Nix wrote: >>>> On 30 Jul 2013, Douglas Gilbert outgrape: >>>> >>>>> Ple

[3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-04 Thread Nix
I just got this panic on 3.10.4, in the middle of a large parallel compilation (of Chromium, as it happens) over NFSv3: [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0008 [16364.527571] IP: [] nlmclnt_setlockargs+0x55/0xcf [16364.527611] PGD 0 [16364.527626]

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Nix
On 5 Aug 2013, Jeff Layton stated: > On Sun, 04 Aug 2013 16:40:58 +0100 > Nix wrote: > >> I just got this panic on 3.10.4, in the middle of a large parallel >> compilation (of Chromium, as it happens) over NFSv3: >> >> [16364.527516] BUG: unable to handle k

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Nix
On 5 Aug 2013, Jeff Layton said: > On Mon, 5 Aug 2013 11:04:27 -0400 > Jeff Layton wrote: > >> On Mon, 05 Aug 2013 15:48:01 +0100 >> Nix wrote: >> >> > On 5 Aug 2013, Jeff Layton stated: >> > >> > > On Sun, 04 Aug 2013 16:40:58 +0100 &

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Nix
On 5 Aug 2013, Trond Myklebust told this: > Does the attached patch fix the problem? > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust > Date: Mon, 5 Aug 2013 12:06:12 -0400 > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from > nlmclnt_set

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-06 Thread Nix
On 5 Aug 2013, Trond Myklebust uttered the following: > Yes. This scheme will only work if we make sure that host->h_rpcclnt is > initialised at mount time. Here is a v2 patch that should do the right > thing. Confirmed, that fixes it! I'll try your shutdown crash fix next. -- NULL && (void) --

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Nix
On 6 Aug 2013, Trond Myklebust verbalised: > True. How about something like the following instead. Note the change to > the original patch... Well, with those applied I could reboot without a panic for the first time since 3.8.x: looking good. I'll give it a reboot or two with a system that's not

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Nix
On 7 Aug 2013, Trond Myklebust said: > On Wed, 2013-08-07 at 11:18 +0100, Nix wrote: >> On 6 Aug 2013, Trond Myklebust verbalised: >> > True. How about something like the following instead. Note the change to >> > the original patch... >> >> Well, with

[3.5 regression] DRM: Massive (EDID-probing?) X startup delay on ATI Radeon RV770 (HD4870)

2012-08-04 Thread Nix
Possibly-relevant info: - Two DVI monitors, identical specs, one dual-head graphics card (so no VGA switcheroo or awesome-yet-terrifying PRIME madness needed) - KMS, Xserver 1.12.3, driver 6.14.6-28 (trunk current as of today), Mesa 8.0.4, libdrm 2.4.37 As of kernel 3.5 EDID probing of t

Re: [3.5 regression] DRM: Massive (EDID-probing?) X startup delay on ATI Radeon RV770 (HD4870)

2012-08-06 Thread Nix
On 6 Aug 2012, Alex Deucher outgrape: > On Sat, Aug 4, 2012 at 12:13 PM, Nix wrote: >> Something appears to be wrong, but I have no idea what. I've not changed >> anything other than the kernel since my last non-huge-delayed startup >> earlier this week, and bo

Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?

2012-09-19 Thread Nix
So I have this x86-64 server running Linux 3.5.1 with a SATA-on-PCIe Areca 1210 hardware RAID-5 controller driven by libata which has been humming along happily for years -- but suddenly, today, the entire machine froze for a couple of minutes (or at least fs access froze), followed by this in the

Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?

2012-09-19 Thread Nix
On 19 Sep 2012, Stan Hoeppner stated: > On 9/19/2012 1:52 PM, Nix wrote: >> So I have this x86-64 server running Linux 3.5.1 > > When did you install 3.5.1 on this machine? Forty days ago. > If fairly recently, does it > run

Re: [3.5 regression] DRM: Massive (EDID-probing?) X startup delay on ATI Radeon RV770 (HD4870)

2012-08-10 Thread Nix
On 6 Aug 2012, Alex Deucher verbalised: > On Sat, Aug 4, 2012 at 12:13 PM, Nix wrote: >> Possibly-relevant info: >> >> - Two DVI monitors, identical specs, one dual-head graphics card >>(so no VGA switcheroo or awesome-yet-terrifying PRIME madness needed) &

2.6.24.2: 4KSTACKS + PCA403CD IDE CD + pcdrw + mount + PREEMPT -> stack overflow

2008-02-23 Thread Nix
A loop mount/umounting a pcdrw or iso9660 (through the pktcdvd device) sees a stack overflow in four or five tries. Doing the same thing with the same CD in a normal non-pktcdvd-mounted drive doesn't cause a crash. Here's a couple of oopses. config follows. (There are a wide variety. Some I could

Re: 2.6.24.2: 4KSTACKS + pcdrw + dm + mount -> stack overflow: ide-cd related? dm-related?

2008-02-24 Thread Nix
On 24 Feb 2008, [EMAIL PROTECTED] outgrape: > A loop mount/umounting a pcdrw or iso9660 (through the pktcdvd device) > sees a stack overflow in four or five tries. Doing the same thing with > the same CD in a normal non-pktcdvd-mounted drive doesn't cause a crash. > (This may or may not be PREEMP

Re: 2.6.24.2: 4KSTACKS + pcdrw + dm + mount -> stack overflow: ide-cd related? dm-related?

2008-02-24 Thread Nix
On 24 Feb 2008, Peter Osterlund told this: > Nix <[EMAIL PROTECTED]> writes: >> But while I'd normally blame pktcdvd there's only one pktcdvd function >> in these tracebacks (pkt_open) and it's not got a significant stack >> footprint. > > Did y

Re: Linux 2.2.18pre21

2000-11-17 Thread Nix
Peter Samuelson <[EMAIL PROTECTED]> writes: > Two easy "get out of jail free" cards. There are other, more complex > exploits. You have added one more. They all require root privileges. Unless I'm missing something, not all of them do. I haven't checked this or anything, but it seems to me th

Re: [BUG] Inconsistent behaviour of rmdir

2000-11-17 Thread Nix
Alexander Viro <[EMAIL PROTECTED]> writes: > If every way from foo to target goes through the source rename(source,target) > _will_ make the graph disconnected. Checking that for generic DAG is a hell. Why do you say this? Algorithms for cycle detection are comparatively computationally expensiv

Re: 2.4.0-test8-pre1 is quite bad / how about integrating Rik's VM

2000-09-08 Thread Nix
Martin Dalecki <[EMAIL PROTECTED]> writes: > There is some facility allowing to implement this kind of things > in the C++ part of the most recent EGCS version which makes implementing > such things "relatively" easy - basiclly there is the provision to dump > the parser trees as easy to process

2.2.17 --- extreme format string weirdness in /proc

2000-09-30 Thread Nix
Yesterday, I noticed that netstat had stopped working on my 2.2.17 box (with the reiserfs 3.5.26 patches and lm-sensors/i2c-2.5.2, as it happens), built with gcc-1.1.2. The reason is fairly self-evident: : loki:/# cat /proc/net/dev : Inter-| Receive

Re: 2.2.17 --- extreme format string weirdness in /proc

2000-09-30 Thread Nix
Andries Brouwer <[EMAIL PROTECTED]> writes: > On Sat, Sep 30, 2000 at 03:09:10PM +0100, Nix wrote: > > > Yesterday, I noticed that netstat had stopped working on my 2.2.17 box > > The reason is fairly self-evident: > > > > : loki:/# cat /proc/net/dev > .

Re: What is up with Redhat 7.0?

2000-10-01 Thread Nix
Martin Dalecki <[EMAIL PROTECTED]> writes: > Get real: RedHat owns cygnus and cygnus owns GCC so what do you complain > about? It's up to them to decide which compiler is stable or which > isn't. > (Froget about the "committe" stuff...) Marc will probably agree here that this (except for the bit

Re: pktcddvd -> immediate crash

2005-04-05 Thread Nix
On 5 Apr 2005, Soeren Sonnenburg whispered secretively: > I wonder whether anyone could use the pktcddvd device without killing > random jobs (due to sudden out of memory or better memory leaks in > pktcddvd) and finally a complete freeze of the machine ? I'm using it without difficulty. > To rep

Re: kernel page size explanation

2005-07-24 Thread Nix
On 22 Jul 2005, Jesper Juhl suggested tentatively: > You can > A) look in the .config file for your current kernel (if your arch > supports different page sizes at all). > B) You can use the getpagesize(2) syscall at runtime. getpagesize() > returns the nr of bytes in a page - man getpagesize -

Re: kernel page size explanation

2005-07-24 Thread Nix
On Mon, 25 Jul 2005, VASM wrote: > i had one question > does the linux kernel support only one default page size even if the > processor on which it is working supports multiple ? No. Some architectures have compile-time support for multiple different page sizes (e.g. Itanium, SPARC64); many have

Re: Broke nice range for RLIMIT NICE

2005-07-29 Thread Nix
On 29 Jul 2005, Michael Kerrisk stated: > Yes, as noted in my earlier message -- at the moment RLIMIT_NICE > still isn't in the current glibc snapshot... According to traffic on libc-hacker, Ulrich committed it on Jun 20 (along with RLIMIT_RTPRIO support). -- `Tor employs several thousand edito

Re: Broke nice range for RLIMIT NICE

2005-07-29 Thread Nix
On Fri, 29 Jul 2005, Michael Kerrisk uttered the following: >> On 29 Jul 2005, Michael Kerrisk stated: >> > Yes, as noted in my earlier message -- at the moment RLIMIT_NICE >> > still isn't in the current glibc snapshot... >> >> According to traffic on libc-hacker, Ulrich committed it on Jun 20 >

Re: [PATCH] 2.6.13: Filesystem capabilities 0.16

2005-09-02 Thread Nix
On 1 Sep 2005, Olaf Dietsche murmured woefully: > This patch implements filesystem capabilities. It allows to run > privileged executables without the need for suid root. Is there some reason why this doesn't keep its capability data in xattrs? -- `... published last year in a limited edition...

Re: kernel 2.6.13 - space not freed to kernel

2005-09-05 Thread Nix
On 2 Sep 2005, [EMAIL PROTECTED] murmured woefully: > The usual malloc() never resets the break address or remaps memory > because it is an expensive operation. This means that when new > data space needs to be allocated, malloc() doesn't have to get > anything from the kernel because it already ha

Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?

2007-12-06 Thread Nix
On 6 Dec 2007, Jan Engelhardt verbalised: > On Dec 5 2007 19:29, Nix wrote: >>> >>> On Dec 1 2007 06:19, Justin Piszcz wrote: >>> >>>> RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if >>>> you use 1.x superblocks with

sym53c8xx2: incredible sloth after parity error / SCSI bus reset

2007-12-01 Thread Nix
About once a year I get a SCSI parity error on one of my systems (the only one with SCSI). I presume the cabling is substandard, but given my coordination deficits and the rarity of the errors I'd do far more damage replacing it than leaving it be. I had one of these today. The system (2.6.23.9)

Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?

2007-12-05 Thread Nix
On 1 Dec 2007, Jan Engelhardt uttered the following: > > On Dec 1 2007 06:19, Justin Piszcz wrote: > >> RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if >> you use 1.x superblocks with LILO you can't boot) > > Says who? (Don't use LILO ;-) Well, your kernels must be on a 0.90-s

Re: Is there any word about this bug in gcc ?

2007-11-20 Thread Nix
On 20 Nov 2007, H. Peter Anvin outgrape: > This one is definitely messy. There is absolutely no way to know what > gcc has miscompiled. Actually, since this only affects abs() calls containing multiplications or divisions by negative constants, you can at least make a pretty good guess as to its

[PATCH] linux-libc-headers-2.6.10.0: if_tunnel.h relies on byteorder.h having been included

2005-02-13 Thread Nix
In iproute2, ip/iptunnel.c says: #include #include #include #include Now the original Linux kernel includes byteorder.h as a side-effect of including netdevice.h, which it does inside a __KERNEL__ ifdef when if_arp.h is included. I think it makes more sense to include it in those headers tha

2.6.10: SPARC64 mapped figure goes unsignedly negative...

2005-01-30 Thread Nix
/proc/meminfo on my UltraSPARC IIi: MemTotal: 512816 kB MemFree: 14208 kB Buffers: 51328 kB Cached: 163056 kB SwapCached: 0 kB Active: 142160 kB Inactive: 304712 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 512816 kB Lo

Re: 2.6.10: SPARC64 mapped figure goes unsignedly negative...

2005-01-31 Thread Nix
On Mon, 31 Jan 2005, Hugh Dickins suggested tentatively: > On Sun, 30 Jan 2005, Nix wrote: >> /proc/meminfo on my UltraSPARC IIi: >> Mapped: 18446744073687883208 kB >> >> (This kernel is compiled with GCC-3.4.3, which might be relevant.) > > Indeed: s

Re: 2.6.10: SPARC64 mapped figure goes unsignedly negative...

2005-01-31 Thread Nix
On Mon, 31 Jan 2005, Hugh Dickins said: > On Mon, 31 Jan 2005, Nix wrote: >> (2.6.10 seems to *run* perfectly well on that box, for what it's worth; >> unless this is a symptom of some underlying dark and terrible failure, >> it looks like a not-very-important cosmetic bu

Re: 2.6.10: SPARC64 mapped figure goes unsignedly negative...

2005-01-31 Thread Nix
On Mon, 31 Jan 2005, Hugh Dickins uttered the following: > On Mon, 31 Jan 2005, Nix wrote: >> Filename TypeSizeUsedPriority >> /dev/sda2 partition523016 0 1 >> /dev/sda4

Re: a problem with linux 2.6.11 and sa

2005-03-09 Thread Nix
On Tue, 8 Mar 2005, George Georgalis announced authoritatively: > Here's what I'm doing that is broken. I use tcpserver (functionally > similar to inetd) to receive an incoming smtp connection. While the > smtp session is still open, the message is piped to a temp file which > is then scanned for s

Re: a problem with linux 2.6.11 and sa

2005-03-09 Thread Nix
On Wed, 09 Mar 2005, Paul Jarc uttered the following: > "George Georgalis" <[EMAIL PROTECTED]> wrote: >> It (Gerrit Pape's technique) very defiantly stopped working a few revs >> back (2.6.7?). I'm seeing a similar failed read from /dev/rtc and >> mplayer with 2.6.10, now too. > > The /proc/kmsg p

Re: Exporting a lot of data to other processes?

2007-10-26 Thread Nix
On 25 Oct 2007, Ph. Marek told this: > -) use some ramfs/shmfs or similar, and overwrite the data occasionally >- not current data >- runtime overhead (processor load) This is roughly what the nscd implementation in glibc does: the client can work over a socket, but prefers to ask the daem

Re: [uml-devel] User Mode Linux still doesn't build in 2.6.23-final.

2007-10-20 Thread Nix
On 20 Oct 2007, Paolo Giarrusso told this: > Guess most people are not using SMP right now, and that the error disappears > without that setting It doesn't. It fails with non-SMP as well. Rob, your patch works for me. (Not that the reboot into 2.6.23.1 was problem-free: iproute2-071016 fails to

Re: [uml-devel] User Mode Linux still doesn't build in 2.6.23-final.

2007-10-21 Thread Nix
On 22 Oct 2007, WANG Cong uttered the following: > I build UML for non-SMP x86. But I don't know about UML_NET_VDE. ;( > > Errors threw out by gcc (too many) are put here: > http://wangcong.org/down/errors.txt It's hard to tell without LOCALE=C, but those are the sorts of results I'd expect

Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-20 Thread Nix
On 20 Jun 2007, H. Peter Anvin verbalised: > Right now it is actually impossible to conclusively determine a > filesystem-relative path in the presence of bind (and possibly move) > mounts. This is highly desirable to be able to do in contexts that > involve non-Linux (or not-the-current-instance

  1   2   >