Rm performance issue

2007-09-26 Thread Ken Naim
I am removing 300gb of data spread across 130 files within a single
directory and the process take just over 2 hours. In my past experiences
removing a small number of large files was very quick, almost instantaneous.
I am running red hat Linux on ibm p series hardware against a san with sata
and fiber drives. I see this issue on both the sata and fiber side although
the rm process is slightly faster on fiber.

 

Uname -a : Linux hostname 2.6.9-55.EL #1 SMP Fri Apr 20 16:33:09 EDT 2007
ppc64 ppc64 ppc64 GNU/Linux

Commands : cd /path/directory/subdirectory

Rm -f *

 

I wanted to know if there is a way to speed this up as it causes 3 hour
process to go to 5 hours.

 

Thanks,

Ken Naim

 

 

 

___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Rm performance issue

2007-09-26 Thread Jim Meyering
"Ken Naim" <[EMAIL PROTECTED]> wrote:
> I am removing 300gb of data spread across 130 files within a single
> directory and the process take just over 2 hours. In my past experiences
> removing a small number of large files was very quick, almost instantaneous.
> I am running red hat Linux on ibm p series hardware against a san with sata
> and fiber drives. I see this issue on both the sata and fiber side although
> the rm process is slightly faster on fiber.
>
> Uname -a : Linux hostname 2.6.9-55.EL #1 SMP Fri Apr 20 16:33:09 EDT 2007
> ppc64 ppc64 ppc64 GNU/Linux
>
> Commands : cd /path/directory/subdirectory
>
> Rm -f *

Thanks for the report.
In general, it's good to include the version of the tool
in question (rm --version), but here, it probably makes no
difference since rm is almost certainly not at fault.
A performance problem like this is more likely to be a function of
your kernel and the file system type than of the rm command.

What type of file system are you using?
Are you using LVM?
Have you checked dmesg and the syslog for indications of hardware failure?


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Rm performance issue

2007-09-26 Thread Philip Rowlands

On Wed, 26 Sep 2007, Ken Naim wrote:


I am removing 300gb of data spread across 130 files within a single
directory and the process take just over 2 hours. In my past experiences
removing a small number of large files was very quick, almost instantaneous.
I am running red hat Linux on ibm p series hardware against a san with sata
and fiber drives. I see this issue on both the sata and fiber side although
the rm process is slightly faster on fiber.


It's more likely to be caused by your storage system than rm, but here's 
how to tell:


mkdir foo
touch foo/{a,b,c,d,e,f}
strace -T -e trace=file rm -rf foo

Try this in a temporary directory on your local disk, to get some idea 
of how long the unlink(2) system call takes. Then try the strace on your 
slow-running rm command, and see how long rm is spending waiting for the 
system call to complete.



Cheers,
Phil


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


RE: Rm performance issue

2007-09-26 Thread Philip Rowlands

[ re-adding bug-coreutils]

On Wed, 26 Sep 2007, Ken Naim wrote:

I created a test case similar to my nightly job which removes 130 or 
so 5gb files. The apps* files are identical to files I remove nightly 
and are 4.3 gb in size, the ctx files are 20mb, and the single letter 
files are 0 byte files. Based on the strace there is a correlation 
between file size and unlink time. 4.3gb takes 10 seconds, 20mb take 
less than .1 seconds and 0 byte files take no time.



unlink("apps_ts_tx_data_2.270.632954231") = 0 <10.418224>
unlink("apps_ts_tx_data_2.271.632954231") = 0 <10.691083>
unlink("ctxd.367.632955010")= 0 <0.051140>
unlink("ctxd.367.632955011")= 0 <0.078666>
unlink("d") = 0 <0.59>
unlink("e") = 0 <0.50>


As I thought, rm is limited by the speed of the underlying I/O. I'd 
suggest some performance tuning for your filesystem and SAN, but that's 
very dependent on your current setup. unlink shouldn't cause much I/O 
compared to other read/write operations, so I'm surprised you only 
noticed issues with rm.



Cheers,
Phil




___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: FW: Rm performance issue

2007-09-26 Thread Jim Meyering
[it's good to reply to the list, so I've Cc'd it]

"Ken Naim" <[EMAIL PROTECTED]> wrote:
> I created a test case similar to my nightly job which removes 130 or so 5gb
> files. The apps* files are identical to files I remove nightly and are 4.3
> gb in size, the ctx files are 20mb, and the single letter files are 0 byte
> files. Based on the strace there is a correlation between file size and
> unlink time. 4.3gb takes 10 seconds, 20mb take less than .1 seconds and 0
> byte files take no time.
>
> I appreciare your help.
>
> [EMAIL PROTECTED] foo2]$ strace -T -e trace=file rm -f * >../x.log
> execve("/bin/rm", ["rm", "-f", "apps_ts_tx_data_2.270.632954231",
> "apps_ts_tx_data_2.271.632954231", "apps_ts_tx_data_2.272.632954231",
> "apps_ts_tx_data_2.273.632954231", "apps_ts_tx_data_2.274.632954231",
> "apps_ts_tx_data_2.275.632954271", "apps_ts_tx_data_2.276.632954291",
> "apps_ts_tx_data_2.277.632954313", "apps_ts_tx_data_2.278.632954313",
> "apps_ts_tx_data_2.278.632954333", "apps_ts_tx_data_2.279.632954313",
> "apps_ts_tx_data_2.279.632954353", "apps_ts_tx_data_2.280.632954373",
> "ctxd.367.632955010", ...], [/* 32 vars */]) = 0 <0.000292>
> access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or
> directory) <0.62>
> open("/etc/ld.so.cache", O_RDONLY)  = 3 <0.62>
> open("/lib/tls/libc.so.6", O_RDONLY)= 3 <0.64>
> open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3 <0.70>
> unlink("apps_ts_tx_data_2.270.632954231") = 0 <10.418224>
> unlink("apps_ts_tx_data_2.271.632954231") = 0 <10.691083>
> unlink("apps_ts_tx_data_2.272.632954231") = 0 <9.708125>
> unlink("apps_ts_tx_data_2.273.632954231") = 0 <11.170446>
> unlink("apps_ts_tx_data_2.274.632954231") = 0 <10.192923>
> unlink("apps_ts_tx_data_2.275.632954271") = 0 <9.677868>
> unlink("apps_ts_tx_data_2.276.632954291") = 0 <10.157322>
> unlink("apps_ts_tx_data_2.277.632954313") = 0 <10.624669>
> unlink("apps_ts_tx_data_2.278.632954313") = 0 <10.640957>
> unlink("apps_ts_tx_data_2.278.632954333") = 0 <10.649074>
> unlink("apps_ts_tx_data_2.279.632954313") = 0 <9.764071>
> unlink("apps_ts_tx_data_2.279.632954353") = 0 <9.486272>
> unlink("apps_ts_tx_data_2.280.632954373") = 0 <9.312557>

As I said, it looks like something related to your file system.
What type of file system are you using?  "df -T ." will tell you.

> unlink("ctxd.367.632955010")= 0 <0.051140>
> unlink("ctxd.367.632955011")= 0 <0.078666>
> unlink("ctxd.367.632955012")= 0 <0.057871>
> unlink("ctxd.367.632955013")= 0 <0.051694>
> unlink("ctxd.367.632955014")= 0 <0.084658>
> unlink("ctxd.367.632955015")= 0 <0.047987>
> unlink("ctxd.367.632955016")= 0 <0.049673>
> unlink("ctxd.367.632955017")= 0 <0.098318>
> unlink("ctxd.367.632955018")= 0 <0.059037>
> unlink("ctxd.367.632955019")= 0 <0.049887>
> unlink("ctxd.367.632955020")= 0 <0.092457>
> unlink("ctxd.367.632955021")= 0 <0.060425>
> unlink("ctxd.367.632955022")= 0 <0.045415>
> unlink("ctxd.367.632955023")= 0 <0.067629>
> unlink("ctxd.367.632955024")= 0 <0.068119>
> unlink("ctxd.367.632955025")= 0 <0.044039>
> unlink("ctxd.367.632955026")= 0 <0.048564>
> unlink("ctxd.367.632955027")= 0 <0.034952>
> unlink("ctxd.367.632955028")= 0 <0.056535>
> unlink("ctxd.367.632955029")= 0 <0.073922>
> unlink("ctxd.367.632955030")= 0 <0.050084>
> unlink("ctxd.367.632955031")= 0 <0.076422>
> unlink("ctxd.367.632955032")= 0 <0.083707>
> unlink("ctxd.367.632955033")= 0 <0.062325>
> unlink("ctxd.367.632955034")= 0 <0.056119>
> unlink("ctxd.367.632955035")= 0 <0.056179>
> unlink("ctxd.367.632955036")= 0 <0.061726>
> unlink("ctxd.367.632955037")= 0 <0.170615>
> unlink("ctxd.367.632955038")= 0 <0.173498>
> unlink("ctxd.367.632955039")= 0 <0.088021>
> unlink("ctxd.368.632955010")= 0 <0.045662>
> unlink("ctxd.368.632955011")= 0 <0.050733>
> unlink("ctxd.368.632955012")= 0 <0.051693>
> unlink("ctxd.368.632955013")= 0 <0.054854>
> unlink("ctxd.368.632955014")= 0 <0.061138>
> unlink("ctxd.368.632955015")= 0 <0.048361>
> unlink("ctxd.368.632955016")= 0 <0.084508>
> unlink("ctxd.368.632955017")= 0 <0.062291>
> unlink("ctxd.368.632955018")= 0 <0.110628>
> unlink("ctxd.368.632955019")= 0 <0.064505>
> unlink("ctxd.368.632955020")= 0 <0.080547>
> unlink("ctxd.368.632955021")= 0 <0.058048>
> unlink("ctxd.368.632955022")= 0 <0.063159>
> unlink("ctxd.368.632955023")= 0 <0.068933>
> unlink("ctxd.368.632955024")= 0 <0.045993>
> unlink("ctxd.368.632955025")= 0 <0.042045>
> unlink("ctxd.368.63295

Re: Rm performance issue

2007-09-26 Thread Andreas Schwab
Philip Rowlands <[EMAIL PROTECTED]> writes:

> unlink shouldn't cause much I/O compared to other read/write
> operations, so I'm surprised you only noticed issues with rm.

Deleting a big file can require quite a bit of block reading, depending
on the filesystem and the fragmentation thereof.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Rm performance issue

2007-09-26 Thread Jim Meyering
"Ken Naim" <[EMAIL PROTECTED]> wrote:
> Thanks for your quick reply.
>
> 1.ext3
> 2. Yes we are using an LVM

Now that we've established this isn't a problem with rm,
you might want to ask around on ext3- or LVM-specific lists.
You might even want to consider using a different type of
file system.  xfs is supposed to be particularly well-suited
to usage patterns involving very large files.
Also, your kernel is rather old.  Upgrading would probably pull
in much newer ext3 support.

> 3. Yes, we aren't seeing any hardware issues.
>
> I am about to perform an strace of the process and will update you on the
> progress.

Please Cc the mailing list with any reply, i.e., use "Reply-All".
Replying to an individual (and dropping the list Cc:) like you have
done two or three times is frowned upon in some circles.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


RE: Rm performance issue

2007-09-26 Thread Ken Naim
We are using ext3 on top of LVM on IBM SAN. I don't know the SAN hardware
specifics, although I have been trying to squeeze this info out of the
client for a while. 

As for bad io experiences, our core production system use raw devices for
our databases so we don't have the same issue(s), this is our production
reporting system that gets cloned over nightly. So the process removes all
the existing files and then writes new versions of them from a backup onto a
file system. I have noticed the poor io performance since I came onsite but
the unix team keeps saying everything is fine. This rm issue is causing the
database clone process to exceed its allocated downtime window so I thought
I'd start there.

If anyone can point me to any specific information on tuning ext3 file
system I'd appreciate it. I am googling it now.

Thanks much for the help,
Ken

-Original Message-
From: Andreas Schwab [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 12:51 PM
To: Philip Rowlands
Cc: Ken Naim; bug-coreutils@gnu.org; [EMAIL PROTECTED]
Subject: Re: Rm performance issue

Philip Rowlands <[EMAIL PROTECTED]> writes:

> unlink shouldn't cause much I/O compared to other read/write
> operations, so I'm surprised you only noticed issues with rm.

Deleting a big file can require quite a bit of block reading, depending
on the filesystem and the fragmentation thereof.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


[bug #21163] join stops on numeric field if last number before double-digit is missing

2007-09-26 Thread anonymous

URL:
  

 Summary: join stops on numeric field if last number before
double-digit is missing
 Project: GNU Core Utilities
Submitted by: None
Submitted on: Wednesday 09/26/2007 at 18:52 UTC
Category: None
Severity: 3 - Normal
  Item Group: None
  Status: None
 Privacy: Public
 Assigned to: None
 Open/Closed: Open
 Discussion Lock: Any

___

Details:

tested with coreutils-5.2.1-31.4 on RHEL4, coreutils-5.97-12.1.el5 on RHEL5

Looks likes join terminates when joining on a numeric field if the last
n-digit number before the n+1-digit number is missing, e.g. I have 2 sorted
files with numbers 1-2000 (and then some data in other fiels). If in one file
the line with "999 some data" is missing, the join output will stop at the
line before. If instead  the "998 some other data" line is missing, join's
output continues as expected. 

# cat a   (sequence)
7
8
9
10
11
12
# cat b   (sequence with 9 missing)
7
8
10
11
12
# cat c   (sequence with 8 missing)
7
9
10
11
12


# join a b
7
8
<--- where's the rest?
# join a c
7
9
10
11
12





___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


[bug #21163] join stops on numeric field if last number before double-digit is missing

2007-09-26 Thread Jim Meyering

Update of bug #21163 (project coreutils):

  Status:None => Invalid
 Open/Closed:Open => Closed 

___

Follow-up Comment #1:

Thanks for the report, but this isn't a bug.
Join requires that its inputs be sorted, and yours are not.
Compare with the output of sort (without -n).

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Rm performance issue

2007-09-26 Thread Bauke Jan Douma

Ken Naim wrote on 26-09-07 19:19:

We are using ext3 on top of LVM on IBM SAN. I don't know the SAN hardware
specifics, although I have been trying to squeeze this info out of the
client for a while. 


As for bad io experiences, our core production system use raw devices for
our databases so we don't have the same issue(s), this is our production
reporting system that gets cloned over nightly. So the process removes all
the existing files and then writes new versions of them from a backup onto a
file system. I have noticed the poor io performance since I came onsite but
the unix team keeps saying everything is fine. This rm issue is causing the
database clone process to exceed its allocated downtime window so I thought
I'd start there.

If anyone can point me to any specific information on tuning ext3 file
system I'd appreciate it. I am googling it now.

Thanks much for the help,
Ken


Why remove them in the first place?
And, although I'm not that familiar with it, isn't /rsync/
better tuned to this kind of process?

bjd


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Rm performance issue

2007-09-26 Thread James Youngman
On 9/26/07, Ken Naim <[EMAIL PROTECTED]> wrote:
> As for bad io experiences, our core production system use raw devices for
> our databases so we don't have the same issue(s), this is our production
> reporting system that gets cloned over nightly. So the process removes all
> the existing files and then writes new versions of them from a backup onto a
> file system. I have noticed the poor io performance since I came onsite but
> the unix team keeps saying everything is fine.

They probably just mean that nothing is reporting an error.   Unless
you have a performance SLA agreed with them, that is.   [If you do
have a performance SLA with them you should start by measuring how
many blocks of I/O you are issuing from your host while trying to
delete the files]

> This rm issue is causing the
> database clone process to exceed its allocated downtime window so I thought
> I'd start there.
>
> If anyone can point me to any specific information on tuning ext3 file
> system I'd appreciate it. I am googling it now.

If you are not space limited, you will find it faster in terms of wall
time to move the old files to a staging area (using either mv(1) or
rename(2)) on the same filesystem, launch a file removal process in
the background, and meanwhile create the new files (from the backup
you refer to).  This will probably get you the biggest win (assuming
there are a reasonable number of spindles to spread the I/O load).

Another option here is to issue all the unlink system calls in
parallel.  Again, this assumes that the performance limitation is due
to SAN latency rather than I/O bandwidth.

If these files are the only things on this filesystem, the ext3
journalling is gaining you nothing except a limited fsck time.   You
could try turning off journalling or moving the journal to a different
LUN.  However, if you go down this route you will need to script
something to avoid fsck on startup after unclean shutdown (e.g. by
making a new, empty, filesystem with mkfs).

Alternatively, if your production database supports point-in-time
recovery, you could just  snapshot the LUNs on the production system,
copy the transaction logs to the clone system, attach the snapshot
LUNs to it, switch the snapshots to read-write, perform point-in-time
recovery with the transaction logs, and then use the copy of the
database you have there.

James.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Fwd: Re: error.c: "Unknown system error" should report errno value

2007-09-26 Thread Martin Koeppe


Hi Eric,

On Tue, 25 Sep 2007, Eric Blake wrote:


Martin, [offlist]

I don't know if you saw this when the discussion migrated to gnulib, but


I didn't, thank you.


rather than patching error.c, I went with the option of fixing Interix's
non-POSIX strerror instead (C99 and POSIX require strerror to always
return non-NULL, even on failure).  Perhaps you can give it a spin, since
I don't have an Interix environment set up?


Wait with the change! I just tested strerror() and strerror_r() on 
Interix, and they already report "Unknown error: 4294967294" for e.g. 
-2. Sorry.


I saw the not resolved errno only in a very special case, i.e. when 
running make in a chroot, and that make calls rm which open()'s a dir, 
not always, but I can reproduce it. It e.g. stops when building the 
coreutils man pages with /bin/rm being the just built rm. The open() 
fails in this case with errno=-1, while it should report an useful 
error if it fails. But it should have no reason to fail here. Then 
strerror() also fails. -1 isn't a documented errno value for open().


Everything is fine when running in the chroot without make, or with 
make and outside a chroot, so strerror() isn't tested here. (Same 
behaviour I got now with "sed -i" on opening the temp file within the 
chroot.)


While trying to debug this I noticed the missing translation from 
errno -1. But I now think it is probably a memory corruption issue or 
something like that, so that strerror()/strerror_r() gets confused, 
too.


Sorry for the inconvenience.

OTOH, rm from coreutils 5.97 works on Interix, while the current 
6.10pre doesn't. I'm still investigating...


BTW: Interix has locales. While I build on 3.5 and never saw nor cared 
about locales (but always built with libintl/libiconv), I saw on 5.2 
(2003R2) suddenly translated messages from the binaries I built before 
for 3.5.


And yes, I'll of course try a new coreutils/gnulib version (but I 
think in this case I shouldn't yet). Are there any coreutils snapshot 
.tar.gz available?



Martin


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils