Frank Steinmetzger wrote:
> Am Sun, Oct 08, 2023 at 07:44:06PM -0500 schrieb Dale:
>
>> Just as a update.  The file system I was trying to do a file system
>> check on was my large one, about 40TBs worth.  While running the file
>> system check, it started using HUGE amounts of memory.  It used almost
>> all my 32GBs and most of swap as well.  It couldn't finish due to not
>> enough memory, it literally crashed itself.  So, I don't know if this is
>> because of some huge problem or what but if this is expected behavior,
>> don't try to do a file system check on devices that large unless you
>> have a LOT of memory. 
> Or use a different filesystem. O:-)

I'm using ext4 which is said to be one of the most reliable and widely
used file systems.  I do wonder tho, am I creating file systems that may
be to large or that it just has trouble with???  I doubt that but I'm up
to about 40TBs now.  I just can't figure out a way to split that data
up, yet.


>> I ended up recreating the LVM devices from scratch and redoing the
>> encryption as well.  I have backups tho.  This all started when using
>> pvmove to replace a hard drive with a larger drive.  I guess pvmove
>> isn't always safe.
> I think that may be a far-fetched conclusion. If it weren’t safe, it 
> wouldn’t be in the software – or at least not advertised as safe.
>

Well, something went sideways.  Honestly, I think it might not be pvmove
but something happened with the file system itself. After all, LVM
wasn't complaining at all and everything showed the move completed with
no errors.  I guess it is possible pvmove had a problem but given it was
the file system that complained so loudly, I'm leaning to it having a
issue. 


>> P. S.  I currently have my backup system on my old Gigabyte 770T mobo
>> and friends.  It is still a bit slower than copying when no encryption
>> is used so I guess encryption does slow things down a bit.  That said,
>> the CPU does hang around 50% most of the time.  htop doesn't show what
>> is using that so it must be IO or encryption.
> You can add more widgets (“meters”) to htop, one of them shows disk 
> throughput. But there is none for I/O wait. One tool that does show that is 
> glances. And also dstat which I mentioned a few days ago. Not only can dstat 
> tell you the total percentage, but also which process is the most expensive 
> one.
>
> I set up bash aliases for different use cases of dstat:
> alias ,d='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? 
> /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem --swap'
> alias ,dd='dstat --time --cpu --disk --disk-util -D $(ls /dev/sd? 
> /dev/nvme?n? /dev/mmcblk? 2>/dev/null | tr "\n" ,) --mem-adv'
> alias ,dm='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? 
> /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem-adv --swap'
> alias ,dt='dstat --time --cpu --disk -D $(ls /dev/sd? /dev/nvme?n? 
> /dev/mmcblk? 2>/dev/null | tr "\n" ,) --net --mem --swap --top-cpu --top-bio 
> --top-io --top-mem'
>
> Because I attach external storage once in a while, I use a dynamic list of 
> devices to watch that is passed to the -D argument. If I don’t use -D, dstat 
> will only show a total for all drives.
>
> The first is a simple overview (d = dstat).
>
> The second is the same but only for disk statistics (dd = dstat disks). I 
> use it mostly on my NAS (five SATA drives in total, which creates a very 
> wide table).
>
> The third shows more memory details like dirty cache (dm = dstat memory), 
> which is interesting when copying large files.
>
> And the last one shows the top “pigs”, i.e. expensive processes in terms of 
> CPU, IO and memory (dt = dstat top).
>
>> Or something kernel
>> related that htop doesn't show.  No idea. 
> Perhaps my tool tips give you ideas. :)
>
> -- Grüße | Greetings | Salut | Qapla’ Please do not share anything
> from, with or about me on any social network. What is the difference
> between two flutes? – A semitone.


Dang, I have a lot of drives here to add to all that.  Bad thing is,
every time I reboot, all but two I think tend to move around, even tho I
haven't moved anything.  This is why I use either labels or UUIDs by the
way.  Once ages ago, I saw a way to make commands/scripts see all drives
on a system with some sort of inclusive trick.  I think it used brackets
but not sure.  I can't find that trick anymore.  I should have saved
that thing. 

I used some command, can't recall which it was, and I think it is the
kernel itself using so much CPU time.  Given when it does it, I think it
is either processing the encryption or working to send the data to the
disks, or both.  I'd suspect both but I dunno. 

Anyway, I'm restoring from a fresh LVM rebuild now.  No way to test
anything to see what the problem was now. 

Dale

:-)  :-) 

Reply via email to