Yes, I did try that. Doesn't make much of a (speed) difference.

It seems, that the problem is less that rm gets stuck for good, but that it takes really long breaks (about 20 sec.) while deleting - during those breaks the whole partition is stuck and iostat reports 100% utilization compared to ~95% while actually deleting files. Could the "hang-time" be DRBD writing meta-information (internal in my case) and blocking every other access as long the meta-data isn't written to the disk? Of course there is also the ext3-journal that has to be written, but still I don't see why it should take that long: I'm currently timing how long it takes to delete a subdir with 285868 block-sized files in it (already more than 30 min).

dmesg is clear, so it does not seem to be a SATA reset.

any other ideas?




Am 2011-01-28 20:02, schrieb Moti Levy:
Have you tried :
find dirname -type f -exec rm {} \;


On Fri, Jan 28, 2011 at 1:46 PM, Joseph Hauptmann <[email protected] <mailto:[email protected]>> wrote:

    Hello DRBD-users worldwide...

    I've been using DRBD almost a year now, until now without problems
    that I couldn't resolve myself.
    But now I ran into quite a serious problem and I'm interested if
    someone else experienced something similar with or without DRBD
    (as of course I can't really be sure that DRBD is the problem):

    A few months ago a colleague of mine forgot to activate a cronjob,
    that deletes a couple thousand very small temporary files each
    night on a DRBD-device. Now I have a directory with, I guess more
    than a million files, which wouldn't be so bad, if rm -rf {dir}/
    could delete it. But sadly that is not the case.
    rm gets stuck after it deleted a few hundred files and doesn't
    resume operation. Furthermore the all IO-access on the DRBD-device
    is complete stuck until the rm process is killed.

    I've already disconnected all resources from it's peer and shut
    down most of the non essential services on the machine.

    It's running Debian Lenny with

    uname -a
    Linux srv1.xxx.at <http://srv1.xxx.at> 2.6.26-2-openvz-amd64 #1
    SMP Wed May 12 18:14:56 UTC 2010 x86_64 GNU/Linux

    cat /proc/drbd
    version: 8.3.7 (api:88/proto:86-91)
    GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
    [email protected] <mailto:[email protected]>, 2010-03-28 21:47:13
     0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
       ns:1875795496 nr:0 dw:225995436 dr:566154981 al:105639961
    bm:11019801 lo:2 pe:0 ua:0 ap:1 ep:1 wo:b oos:1242040
     1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
       ns:0 nr:31796784 dw:31796784 dr:2253416 al:0 bm:1134 lo:0 pe:0
    ua:0 ap:0 ep:1 wo:d oos:0
     2: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
       ns:0 nr:57709884 dw:143774088 dr:8480 al:0 bm:50 lo:0 pe:0 ua:0
    ap:0 ep:1 wo:d oos:0

    The filesystem on resource 0 is ext3  with a block size of 4096
    and lies on a SW-RAID5 (far from ideal - I know).


    Atm. I'm using a bash-hack, that kills the rm-process every 30
    seconds and restarts it as long as the directory still exists.

    Thanks for any hints to what might cause this problem.

    Joe

-- Joseph Hauptmann

    /digiconcept/ - GmbH.
    1080 Wien
    Blindengasse 52/1

    Tel. +43 1 218 0 212 - 24
    Fax +43 1 218 0 212 - 10

    _______________________________________________
    drbd-user mailing list
    [email protected] <mailto:[email protected]>
    http://lists.linbit.com/mailman/listinfo/drbd-user




--
Joseph Hauptmann

/digiconcept/ - GmbH.
1080 Wien
Blindengasse 52/1

Tel. +43 1 218 0 212 - 24
Fax +43 1 218 0 212 - 10

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to