Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

2001-02-19 Thread James A. Pattie

I'm not subscribed to the kernel mailing list, so please cc any replies
to me.

I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
latest updates, etc. and kernel 2.4.1.
I've built a customized install of RH (~200MB)  which I untar onto the
system after building my raid arrays, etc. via a Rescue CD which I
created using Timo's Rescue CD project.  The booting kernel is
2.4.1-ac10, no networking, raid compiled in but raid1 as a module,
reiserfs as a module, ext2 and iso-9660 compiled in, using sg support
for cd-roms.  I had to strip the kernel down so it would fit on a floppy
as the system does not support booting off of CD-ROM.

After booting and getting my initial file system in memory (20+ MB
ramdisk), I created a partition for swap, format and then swapon so I
don't run out of memory.  At this point I usually have 3-5 MB free
memory, 128 MB swap.

I partitioned the 2 drives (on 1st and 2nd controller, both 1.3 GB each)
into 4 total partitions.  1st is swap and then the next 3, 1 primary, 2
extended are for raid 1 arrays.  I've given 20 MB to /boot (md0), 650MB
to / (md1) and the rest (400+MB) to /var (md2).  I format md0 as ext2
and md1 and md2 as reiserfs.  When I go to untar the image on the cd to
/mnt/slash (which has md1 mounted on it), the system extracts about 30MB
of data and then just stops responding.  No kernel output, etc.  I can
change to the other virtual consoles, but no other keyboard input is
accepted.  After resetting the machine, the raid arrays rebuild ok, and
reiserfs gives me no problems other than it usually replays 2 or 3
transactions.  If I tell tar to pickup on the last directory I saw
extracted, it gets about another 30MB of data and stops again.  I've
waited for the raid syncing to be finished or just started after the
arrays are available and it doesn't matter.

I first tried with 2.4.1 stock and then went to 2.4.1-ac10 (the latest
at the time I was playing with this) and it did exactly the same thing.
If I format md1 and md2 with ext2, then everything works fine.  I was
initially compiling 386 only support in and have tried with 586 support
(no difference).  I've tried both r5 and tea hashes with reiserfs.

One thing I did notice was that the syncing of the raid 1 arrays went in
sequence, md0, md1, md2 instead of in parrallel.  I assume it is because
the machine just doesn't have the horsepower, etc. or is it that I have
multiple raid arrays on the same drives?

This isn't a life or death issue at the moment, but I would like to be
able to use reiserfs in this scenario in the future.

I have tested the same rescue CD boot image on a K62, 450Mhz, 128 MB
system.  No raid, just one reiserfs partition and it untarred without
any issues.  I'm thinking this is something specific to older, lower
memory machines?

--
James A. Pattie
[EMAIL PROTECTED]

Linux  --  SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

2001-02-20 Thread James A. Pattie

Colonel wrote:

> In clouddancer.list.kernel.owner, you wrote:
> >
> >I'm not subscribed to the kernel mailing list, so please cc any replies
> >to me.
> >
> >I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
> >latest updates, etc. and kernel 2.4.1.
> >I've built a customized install of RH (~200MB)  which I untar onto the
> >system after building my raid arrays, etc. via a Rescue CD which I
> >created using Timo's Rescue CD project.  The booting kernel is
> >2.4.1-ac10, no networking, raid compiled in but raid1 as a module
>
> Hmm, raid as a module was always a Bad Idea(tm) in the 2.2 "alpha"
> raid (which was misnamed and is 2.4 raid).  I suggest you change that
> and update, as I had no problems with 2.4.2-pre2/3, nor have any been
> posted to the raid list.

I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did the
same thing.  I'm going to try to compile reiserfs in (if I have enough room
to still fit the kernel on the floppy with it's initial ramdisk, etc.) and
see what that does.


--
James A. Pattie
[EMAIL PROTECTED]

Linux  --  SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

2001-02-20 Thread James A. Pattie

Tom Sightler wrote:

> >> >I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
> >> >latest updates, etc. and kernel 2.4.1.
> >> >I've built a customized install of RH (~200MB)  which I untar onto
> the
> >> >system after building my raid arrays, etc. via a Rescue CD which I
> >> >created using Timo's Rescue CD project.  The booting kernel is
> >> >2.4.1-ac10, no networking, raid compiled in but raid1 as a module
> >>
> >> Hmm, raid as a module was always a Bad Idea(tm) in the 2.2 "alpha"
> >> raid (which was misnamed and is 2.4 raid).  I suggest you change that
> >> and update, as I had no problems with 2.4.2-pre2/3, nor have any been
> >> posted to the raid list.
> >
> >I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did the
> >same thing.  I'm going to try to compile reiserfs in (if I have enough
> room
> >to still fit the kernel on the floppy with it's initial ramdisk, etc.)
> and
> >see what that does.
>
> There seem to be several reports of reiserfs falling over when memory is
> low.  It seems to be undetermined if this problem is actually reiserfs or MM
> related, but there are other threads on this list regarding similar issues.
> This would explain why the same disk would work on a different machine with
> more memory.  Any chance you could add memory to the box temporarily just to
> see if it helps, this may help prove if this is the problem or not.
>
> Later,
> Tom

Out of all the old 72 pin simms we have, we have it maxed out at 48 MB's.  I'm
tempted to take the 2 drives out and put them in the k6-2, but that's too much
of a hassle.  I'm currently going to try 2.4.1-ac19 and see what happens.

The machine does have 128MB of swap space working, and whenever I've checked
memory usage (while the system was still responding), it never went over a
couple megs of swap space used.

--
James A. Pattie
[EMAIL PROTECTED]

Linux  --  SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

2001-02-20 Thread James A. Pattie

Tom Sightler wrote:

> > > There seem to be several reports of reiserfs falling over when memory is
> > > low.  It seems to be undetermined if this problem is actually reiserfs
> or MM
> > > related, but there are other threads on this list regarding similar
> issues.
> > > This would explain why the same disk would work on a different machine
> with
> > > more memory.  Any chance you could add memory to the box temporarily
> just to
> > > see if it helps, this may help prove if this is the problem or not.
> > >
> >
> > Out of all the old 72 pin simms we have, we have it maxed out at 48 MB's.
> I'm
> > tempted to take the 2 drives out and put them in the k6-2, but that's too
> much
> > of a hassle.  I'm currently going to try 2.4.1-ac19 and see what happens.
> >
> > The machine does have 128MB of swap space working, and whenever I've
> checked
> > memory usage (while the system was still responding), it never went over a
> > couple megs of swap space used.
>
> Ah yes, but, from what I've read, the problem seems to occur when
> buffer/cache memory is low (<6MB), you could have tons of swap and still
> reach this level.
>
> Later,
> Tom

You were right!  I managed to find another 32MB of memory to bump it up to 64
MB total and it worked perfectly.  It appears that I had only about 4 MB of
buffer/cache in the 48 MB system and over 15MB in the 64 MB system.  I did my
install and switched back to the 48MB running normally and its working just
fine.

Thanks,


--
James A. Pattie
[EMAIL PROTECTED]

Linux  --  SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

2001-02-21 Thread James A. Pattie

Colonel wrote:

>>There seem to be several reports of reiserfs falling over when memory is
>>low.  It seems to be undetermined if this problem is actually reiserfs
>> or MM related, but there are other threads on this list regarding similar
>> issues. This would explain why the same disk would work on a different
>> machine with more memory.  Any chance you could add memory to the box
>> temporarily just to see if it helps, this may help prove if this is the
>> problem or not.
>>
>>
>> Well, I didn't happen to start the thread, but your comments may
>> explain some "gee I wonder if it died" problems I just had with my
>> 2.4.1-pre2+reiser test box.  It only has 16M, so it's always low
>> memory (never been a real problem in the past however).  The test
>> situation is easily repeatable for me [1].  It's a 486 wall mount, so
>> it's easier to convert the fs than add memory, and it showed about
>> 200k free at the time of the sluggishness.  Previous 2.4.1 testing
>> with ext2 fs didn't show any sluggishness, but I also didn't happen to
>> run the test above either.  When I come back to the office later, I'll
>> convert the fs, repeat the test and pass on the results.
>>
>>
>> [1]  Since I decided to try to catch up on kernels, I had just grabbed
>> -ac18, cd to ~linux and run "rm -r *" via an ssh connection.  In a
>> second connection, I tried a simple "dmesg" and waited over a minute
>> for results (long enough to log in directly on the box and bring up
>> top) followed by loading emacs for ftp transfers from kernel.org,
>> which again 'went to sleep'.
>> -
>
>If these are freezes I had them too in 2.4.1, 2.4.2-pre1 fixed it for me.
>Really I think it was the patch in handle_mm_fault setting TASK_RUNNING.
>
>/RogerL
>
> Ohoh, I see that I fat-fingered the kernel version.  The test box
> kernel is 2.4.2-pre2 with Axboe's loop4 patch to the loopback fs.  It
> runs a three partition drive, a small /boot in ext2, / as reiser and
> swap.  I am verifying that the freeze is repeatable at the moment, and
> so far I cannot cause free memory to drop to 200k and a short ice age
> does not occur.  Unless I can get that to repeat, the effort will be
> useless... the only real difference is swap, it was not initially
> active and now it is.  Free memory never drops below 540k now, so I
> would suspect a MM influence.  [EMAIL PROTECTED] didn't mention
> the memory values in his initial post, but it would be interesting to
> see if he simply leaves his machine alone if it recovers
> (i.e. probable swap thrashing) and then determine if the freeze ever
> re-occurs.  James seems to have better repeatability than I do.
> Rebooting and retrying still doesn't result in a noticable freeze for
> me.  Some other factor must have been involved that I didn't notice.
> Still seems like MM over reiser tho.

When the machine stopped responding, the first time, I let it go over the weekend
(2 days+) and it still didn't recover.  I never saw a thrashing effect.  The
initial memory values were 2MB free memory, < 1MB cache.  I never really looked at
the cache values as I wasn't sure how they affected the system.  when the system
was untarring my tarball, the memory usage would get down < 500kb and swap would be
around a couple of megs usually.

>
>
> PS for james:
> >One thing I did notice was that the syncing of the raid 1 arrays went in
> sequence, md0, md1, md2 instead of in parrallel.  I assume it is because
> the machine just doesn't have the horsepower, etc. or is it that I have
> multiple raid arrays on the same drives?
>
> Same drives.

That's what I thought.

Thanks,


--
James A. Pattie
[EMAIL PROTECTED]

Linux  --  SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

2001-02-21 Thread James A. Pattie

Colonel wrote:

>Sender: [EMAIL PROTECTED]
>Date: Wed, 21 Feb 2001 08:45:02 -0600
>    From: "James A. Pattie" <[EMAIL PROTECTED]>
>
>Colonel wrote:
>
>>>There seem to be several reports of reiserfs falling over when memory is
>>>low.  It seems to be undetermined if this problem is actually reiserfs
>>> or MM related, but there are other threads on this list regarding similar
>>> issues. This would explain why the same disk would work on a different
>>> machine with more memory.  Any chance you could add memory to the box
>>> temporarily just to see if it helps, this may help prove if this is the
>>> problem or not.
>>>
>>>
>
>When the machine stopped responding, the first time, I let it go over the weekend
>(2 days+) and it still didn't recover.  I never saw a thrashing effect.  The
>initial memory values were 2MB free memory, < 1MB cache.  I never really looked at
>the cache values as I wasn't sure how they affected the system.  when the system
>was untarring my tarball, the memory usage would get down < 500kb and swap would 
>be
>around a couple of megs usually.
>
> Well, it still looks like you have a good test case to resolve the
> problem.  Can you add memory per the above request?
>
> I should drop out of this, it seems I had a one time event.  Something
> to keep in mind is /boot should either be ext2 or mounted differently
> under reiser (check their website for details).  You should probably
> try the Magic SysREQ stuff to see what's up at the time of freeze.
> You should probably run memtest86 to head off questions about your
> memory stability.

I added memory yesterday and got it to work after having 64MB in the system.  the free
memory (cache/buffer) was over 30MB.  I didn't have any problems then.

After I got everything installed, I bumped the memory back to 48MB and it is running
fine.  I don't have the 17+MB ramdisk taking up the memory anymore, so the system has >
15MB of cache/buffer available at all times, even running ssh, sendmail, squid,
firewalling, etc.


>
>
> Just to check on the raid setup, the drives are on separate
> controllers and there is not a slow device on the same bus?  I've been
> running the "2.4" raid for a couple years and that was the usual
> problem.  Reiserfs is probably more aggressive working the drive and
> it may tend to unhide other system problems.
>

They are on seperate controllers.  The second controller has the CD-ROM drive (32x) 
which
should be faster than the hard drive (since the drives are older).

>
> --
> "... being a Linux user is sort of like living in a house inhabited by
> a large family of carpenters and architects. Every morning when you
> wake up, the house is a little different. Maybe there is a new turret,
> or some walls have moved. Or perhaps someone has temporarily removed
> the floor under your bed." - Unix for Dummies, 2nd Edition

--
James A. Pattie
[EMAIL PROTECTED]

Linux  --  SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



PROBLEM: Reiserfs + 3Ware issue with 2.4.5

2001-06-12 Thread James A. Pattie

1. Reiserfs + 3Ware issue with 2.4.5
2. I'm building a rescue cd based on Timo's Rescue CD v. 0.5.4.  I've 
compiled the 2.4.5 kernel
to be a 386, all scsi controllers I might encounter compiled in, 
software raid, reisersfs compiled in.  The only modules are scsi tapes, 
network cards, nfs and smbfs support.

When I boot my K6II-350 Box (128 MB memory) off the cd, it boots up and 
eventually gets me to my prompt.  In the process the insmod of 3c59x 
gives me the following error:
/lib/modules/2.4.5/kernel/drivers/net/3c59x.o: unresolved symbol 
del_timer_sync

(This is just something I noticed along the way which is annoying but 
not a show stopper yet.)

The real issue is when after mounting my filesystems off the 3ware card 
and going to unmount them the following happens:

bash# umount tmp
journal_begin called without kernel lock held
kernel BUG at journal.c:423!
invalid operand: 
CPU:0
EIP:  0010:[]
EFLAGS: 00010282
eax: 001d   ebx: c65aff24   ecx: c65f4000   edx: 0001
esi: c4a1e800   edi:    ebp: 3b26397a   esp: c65afeac
ds:  0018   es: 0018ss: 0018
Process umount (pid: 116, stackpage=c65af000)
Stack: c027b0cc c027b264 01a7 c017b62f c027c281 5b86 0808 

   c0106fb0 5b86 c48f1e50 c48f1df0 c65aff24 c4a1e800 
c02c3560
  c02c35d8 c017b857 c65aff24 c4a1e800 000a  c016e064 
c65aff24
Call Trace: [] [] [] [] 
[] [] []
[][][][]

Code: 0f 0b 83 c4 0c c3 8d 76 00 31 c0 c3 90 31 c0 c3 90 53 31 db
Segmentation fault
bash#


/boot is ext2
/, /home, /tmp, /var, /usr are all reiserfs

--
As long as I don't retry to unmount the filesystem or unmount another 
filesystem the system is still usable.  But when I try to unmount 
another filesystem that process just appears to go into a never ending 
state.  Once I've locked up all my consoles I can only hit the reset 
button :(

ps ax shows:

51 tty3 SW  0:00 [kreiserfsd]
119 tty3D   0:00 umount home

I can get to my other virtual console and still look into the currently 
mounted filesystems,
I just can't shutdown the box, kill the umount process, etc.  I can even 
look into the filesystem mounted as home that I am currently trying to 
umount as seen in the ps output.

The box is currently running a RH 7.1 system with a custom built 2.4.2 
kernel so I know reiserfs and the 3ware card work correctly before 
2.4.5.  :)

ver_linux shows (on my box that built the kernel):

Linux navi.zelda.pcxperience.com 2.4.5 #1 Sat Jun 2 13:26:40 CDT 2001 
i586 unknown
 
Gnu C  2.96
Gnu make   3.79.1
binutils   2.10.91.0.2
util-linux 2.10s
mount  2.11b
modutils   2.4.2
e2fsprogs  1.19
reiserfsprogs  3.x.0f
PPP2.4.0
isdn4k-utils   3.1pre1
Linux C Library2.2.3
Dynamic linker (ldd)   2.2.3
Procps 2.0.7
Net-tools  1.57
Console-tools  0.3.3
Sh-utils   2.0
Modules Loaded tdfx 3c59x sb sb_lib uart401 sound soundcore

I'm attaching my .config file if that helps.

I can't really provide more output from the test box as I am having to 
manually type stuff in on my other machine (in this email). :(

-- 
James A. Pattie
[EMAIL PROTECTED]

Linux  --  SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



#
# Automatically generated make config: don't edit
#
CONFIG_X86=y
CONFIG_ISA=y
# CONFIG_SBUS is not set
CONFIG_UID16=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y

#
# Processor type and features
#
CONFIG_M386=y
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_X86_CMPXCHG is not set
# CONFIG_X86_XADD is not set
CONFIG_X86_L1_CACHE_SHIFT=4
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
# CONFIG_TOSHIBA is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
CONFIG_MATH_EMULATION=y
# CONFIG_MTRR is not set
CONFIG_SMP=y

#
# General setup
#
CONFIG_NET=y
# CONFIG_VISWS is not set
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
# CONFIG_PCI_NAMES is not set
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set
CONF