Re: zfs (?) issues?

Sulev-Madis Silber Tue, 22 Apr 2025 19:05:30 -0700

yes, 2 * 8g partitions on separate disks, so i have 16g swap

but issues i see aren't "usual" memory problems. even building rust works, just 
takes 10g swap and entire day. nothing will fail. nothing will lag


but what i see are kernel taking too much. as far as i understand. i can't 
figure out when and how

it all seems to originate from git. but what could it possibly do that makes it 
a perfect stress test tool? number of files in single directory? file size? 
count? type of access method and speed of those calls? order?

basically don't touch git, esp. large repos

but then dovecot, also running there, also nearly caused similar issues. all i 
know is a maildirs are also ton of small files

i don't know what's happening and should it or not

during earlier tests with git on ports tree which had ton of changes, i 
observed that arc was quite low but wired went super high super fast. nothing 
was swapped out

iirc that was unswappable kernel memory of a kind

funnily. git without limits took machine down but git with limits had no real 
speed reduction while not taking it down

i tried to even figure out where kernel memory might be going but couldn't 
figure it out either. it seemed to like have normal small usage for each part, 
including zfs

i wonder what's the best way to look into that?

and is this expected or not. i don't know!

the question of swap would be good if i would run out of   memory in userland 
or so

unsure what runs out. maybe it's even fine. but i refuse to believe it's 
expected?

unsure if it could even be fixed without zfs performance being also affected. 
there are also massive servers and people would be pissed if puny little box 
issues affect them. but why it can't be dynamic or so?

i recall zfs being worse before. first it required ton of ram. like 4g min. 10y 
ago and so. then this was somehow reduced. then r/w speed was low, like 40mb/s. 
then it was fixed too. what's this now?

how to like even figure out what kind of memory is exhausted? and why would i 
need to tune any of this. as system would know what's installed ram size

i kind of have troubles imagining a system where whole ram goes to kernel. 
maybe there is

i tried to look what else besides arc could be limited but couldn't find any. 
don't even know what happens

all i know this isn't usual case of ram runs low, everything grinds to halt 
until things swap out and eventually get killed

i even tested that. if i specifically try to allocate ton of memory from 
userland, arc reduces properly, wired goes down, etc, eventually something gets 
killed off, usually the offending process

but this is something else i don't fully get. as i use zfs, i blame it. maybe i 
should not



On April 22, 2025 6:49:43 PM GMT+03:00, Rick Macklem <rick.mack...@gmail.com> 
wrote:
>I wouldn't normally top post, but all I have is a generic question.
>
>Do you have a swap partition setup?
>(I'd use something like 6-8Gbytes for a 4Gbyte system.)
>
>rick
>
>On Tue, Apr 22, 2025 at 8:23 AM Sulev-Madis Silber
><freebsd-current-freebsd-org...@ketas.si.pri.ee> wrote:
>>
>> well i don't have those errors anymore so there's nothing to give
>>
>> i've tried to tune arc but it didn't do anything so i took those things off 
>> again
>>
>> right now i'm looking at
>>
>> ARC: 1487M Total, 1102M MFU, 128M MRU, 1544K Anon, 56M Header, 199M Other
>>      942M Compressed, 18G Uncompressed, 19.36:1 Ratio
>>
>> and wonder wtf
>>
>> i bet there's issue somewhere and i somehow can't properly recreate it. on 
>> memory pressure it does resize arc down properly so seems like i don't need 
>> any limits
>>
>> and there's no tmpfs. it would be useless at that low memory sizes
>>
>> the problem is that i can't figure out what all those problems are, how to 
>> recreate those conditions and how to workaround or maybe find bugs. also 
>> don't have enough hw to solely test it on. unless i can maybe try it on tiny 
>> 512m vm. and then i would need to know what to try
>>
>> i also don't know why those git settings help me:
>>
>> [core]
>>         packedGitWindowSize = 32m
>>         packedGitLimit = 128m
>>         preloadIndex = false
>> [diff]
>>         renameLimit = 16384
>>
>> how to tune it from some global place. and so on. and why it would even need 
>> fiddiling so much? zfs indeed has improved a lot, previously it was quite a 
>> hell to use
>>
>> i don't even know if this is related to mmap. even then, i don't really get 
>> what that function even does. hence then "zfs (?) issue". it might even not 
>> be zfs at all
>>
>> there are probably multiple combined issues here
>>
>> i also don't really buy the idea that ton of ram would automatically fix this
>>
>> so yeah unsure what to think of this
>>
>> some of the issues i found that others also have. some of them seem new
>>
>> some fixes were like as if trial and errors and nobody seemed to know what's 
>> wrong even. granted, that was forum so maybe here it's better here?
>>
>> i mean i have used below average equipment my entire life and usual case to 
>> cope with this is to just give it more time. put more swap and just wait
>>
>> i think someone tested my git issues in 4g vm and found no issues at all? 
>> other things seem like as i only i have them
>>
>> i also find kind of confusing that if this is hw, why i don't see any other 
>> issues
>>
>> this is not the first time that i have found something confusing in fbsd 
>> that later turned out to be bug and was further tested and fixed by other
>>
>> hence the current mailing list so maybe someone else has ideas. or if it has 
>> already fix. and i hope there are people with much larger labs and could 
>> easily tell / test things
>>
>> so in the end,
>>
>> 1) why should git on large repo cause machine to run out of memory, instead 
>> of just being as slow as it would need to be
>>
>> 2) why / what are fs operations that could cause low power machine to 
>> mysteriously fail on zfs, when expected results would be slow fs behaviour
>>
>> i don't know what really happens and it's way too complex me to get all 
>> memory management that happens in kernel. i only have this wild guess that 
>> any type of caching should happen in "leftover" ram and make things faster 
>> if possible. and any fs operations that have already reported completed by 
>> kernel can't be suddenly found incomplete later. whatever that fs-related 
>> stray buildworld error was that resolved itself somehow. and what i can 
>> recreate
>>
>> and i'm not expert in this so how do i even know?
>>
>> what's fun is how running rsync over several tb's of data doesn't seem to 
>> cause any issues at all. this is still same machine, many would not 
>> recommend using this. different workload?
>>
>> hell knows what's all this. maybe later i could figure it out or actually 
>> save some logs or. those i didn't save as i assumed it repeats itself. 
>> didn't and it went off tmux window history
>>
>> oh well. yes, this is questionable report but those are "heisenbugs" as 
>> well. at least some?
>>
>>
>>
>> On April 22, 2025 3:52:11 PM GMT+03:00, Ronald Klop <ronald-li...@klop.ws> 
>> wrote:
>> >Hi,
>> >
>> >First, instead of writing "it gives vague errors", it really helps others 
>> >on this list if you can copy-paste the errors into your email.
>> >
>> >Second, as far as I can see FreeBSD 13.4 uses OpenZFS 2.1.14. FreeBSD 14 
>> >uses OpenZFS 2.2.X which has bugfixes and improved tuning, although I 
>> >cannot claim that will fix your issues.
>> >What you can try is to limit the growth of the ARC.
>> >
>> >Set "sysctl vfs.zfs.arc_max=1073741824" or add this to /etc/sysctl.conf to 
>> >set the value at boot.
>> >
>> >This will limit the ARC to 1GB. I used similar settings on small machines 
>> >without really noticing a speed difference while usability increased. You 
>> >can play a bit with the value. Maybe 512MB will be even enough for your use 
>> >case.
>> >
>> >NB: sysctl vfs.zfs.arc_max was renamed to vfs.zfs.arc.max with arc_max as a 
>> >legacy alias, but I don't know if that already happened in 13.4.
>> >
>> >Another thing to check is the usage of tmpfs. If you don't restrict the max 
>> >size of a tmpfs filesystem it will compete for memory. Although this will 
>> >also show an increase in swap usage.
>> >
>> >Regards,
>> >Ronald.
>> >
>> >
>> >Van: Sulev-Madis Silber <freebsd-current-freebsd-org...@ketas.si.pri.ee>
>> >Datum: maandag, 21 april 2025 03:25
>> >Aan: freebsd-current <freebsd-current@freebsd.org>
>> >Onderwerp: zfs (?) issues?
>> >>
>> >> i have long running issue in my 13.4 box (amd64)
>> >>
>> >> others don't get it at all and only suggest adding more than 4g ram
>> >>
>> >> it manifests as some mmap or other problems i don't really get
>> >>
>> >> basically unrestricted git consumes all the memory. i had to turn 
>> >> watchdog on because something a git pull on ports tree causes kernel to 
>> >> take 100% of ram. it keeps killing userland off until it's just kernel 
>> >> running there happily. it never panics and killing off userland obviously 
>> >> makes the problem disappear and nothing will do any fs operations anymore
>> >>
>> >> dovecot without tuning or with some tuning tended to do this too
>> >>
>> >> what is it?
>> >>
>> >> now i noticed another issue. if i happen to do too many src git pulls in 
>> >> a row, they never actually "pull" anything. and / or clean my obj tree 
>> >> out. i can't run buildworld anymore. it gives vague errors
>> >>
>> >> if i wait a little before starting buildworld, it always works
>> >>
>> >> what could possibly happening here? the way the buildworld fails means 
>> >> there's serious issue with fs. and how could it be fixed with waiting? it 
>> >> means that some fs operations are still going on in background
>> >>
>> >> i have no idea what's happening here. zfs doesn't report any issues. nor 
>> >> do storage. nothing was killed with out of memory but arc usage somehow 
>> >> increased a lot. and it's compression ratio went weirdly high, like ~22:1 
>> >> or so
>> >>
>> >> i don't know if it's acceptable zfs behaviour if it runs low on memory or 
>> >> not. how to test it. etc. and if this is fixed on 14, on stable, or on 
>> >> current. i don't have enough hw to test it on all
>> >>
>> >> i have done other stuff on that box that might also improper for amoung 
>> >> of ram i have there but then it's just slow, nothing fails like this
>> >>
>> >> unsure how this could be fixed or tuned or something else. or why does it 
>> >> behave like this. as opposed to usual low resource issues that just mean 
>> >> you need more time
>> >>
>> >> i mean it would be easy to add huge amounts of ram but people could also 
>> >> want to use zfs in slightly less powerful embedded systems where lack of 
>> >> power is expected but weird fails maybe not
>> >>
>> >> so is this a bug? a feature? something fixed? something that can't be 
>> >> fixed? what could be acceptable ram size? 8g? 16g? and why can't it just 
>> >> tune everything down and become slower as expected
>> >>
>> >> i tried to look up on any openzfs related bugs but zfs is huge and i'm 
>> >> not fs expert either
>> >>
>> >> i also don't know what happens while i wait. it doesn't show any serious 
>> >> io load. no cpu is taken. load is down. system is responsible
>> >>
>> >> it all feels like bug still
>> >>
>> >> i have wondered if this is second hand hw acting up but i checked and 
>> >> tested it as best as i could and why would it only bug out when i try 
>> >> more complex things on zfs?
>> >>
>> >> i'm curious about using zfs on super low memory systems too, because it 
>> >> offers certain features. maybe we could fix this if whole issue is ram. 
>> >> or if it's elsewhere, maybe that too
>> >>
>> >> i don't know what to think of this all. esp the last issue. i'm not 
>> >> really alone here with earlier issues but unsure
>> >>
>> >>
>> >>
>> >
>>
>

Re: zfs (?) issues?

Reply via email to