Re: FreeBSD 13.2-RELEASE started failing to load i915kms.ko after upgrade from RC5
On 2023-04-09, Yoshihiro Ota wrote: > Hi, > > I've been following releng/13.2 since it was branched. > I use amd64 arch for this. > > I had built kernel modules during BETA/RC period. > The above i915kms had worked until RC5. > I had not built RC6 locally and picked up RELEASE on releng/13.2. [...] > Any hints or same experiences? This is a bit strange - I had that same issue between RC2 and RC3. And there the commit logs show that somebody did explicitely change the version number (from 1302000 to 1302001), for whatever reasons I didn't fully grok. It seems the modules do not like such a change, and have to be rebuilt. That version number is visible with pkg info: > Annotations: >FreeBSD_version: 1302000 It is also present in the base installation ... $ grep FreeBSD_version /usr/include/sys/param.h #define __FreeBSD_version 1302000 /* Master, propagated to newvers */ ... and in the kernel source: $ grep FreeBSD_version /usr/src/sys/sys/param.h #define __FreeBSD_version 1302000 /* Master, propagated to newvers */ And somewhere it lingers also in the kernel itself: $ strings /boot/kernel/kernel | grep 13020 1302000
Re: Camcontrol question related to Seagate disks
On 2023-06-06, Karl Denninger wrote: > Certain "newer" Seagate drives have an EPC profile that doesn't interact > as expected with the camcontrol "standard" way to tell spinning disks to > go into an idle state. > > Specifically those that support this: > https://www.seagate.com/files/docs/pdf/en-GB/whitepaper/tp608-powerchoice-tech-provides-gb.pdf "This" whole lengthy babble sounds just like EPC. > The usual has been "camcontrol idle da{x} -t 600" has typically resulted > in a 10 minute timeout, after which it goes into low power I entertain quite a zoo of disks of various brands and ages, and my impression is, about a third of them might behave as "typically" expected in that regard. And practically every model behaves different, some of them in an obscure and unexpected way. Sadly I don't have actual SAS devices available (and it doesn't get clear to me if Your's is SAS or SATA). With SATA You can send low-level commands to the disk via 'camcontrol cmd' - *IF* you manage to figure out what these commands should read. > (sometimes > you want "standby" rather than "idle" depending on the recovery time and > power mode you're after and the specifics of the drive in question.) And which one would You want? Short abstract of the crap: - deskstar/ultrastar older models may have a two-level timer variable with separate values for idle and stop. - WD (whatever rainbow) may have the timer value hidden behind a "vendor specific" gate, as described on truenas ("hacking green and red..."). - ultrastar (newer) may have EPC, but the timer values have only vague realtime resemblance. . seagate (consumer) may be configured to kill themselves with an incredible amount of almost immediate parkings. - ... I for my part got tired of the whole stuff, and there is a little tool in ports sysutils/gstopd, that can be easily expanded to handle SATA, and then the machine (and not the disk) will control when the disk is to stop. (ask me for patch) > The reason is that /it appears //these drives, on power-up, do not > enable the timers /until and unless you send a SSU "START" with the > correct power conditioning bits. Specifically, the power conditioning > value of "7h" has to be specified. If its not then the EPC timers are > present but, it appears, they are not used. If You can figure out how this command should actually look like byte-wise, you can probably send them. > Does anyone know the proper camcontrol command to do this? The "start" > command sent when the system spins up does not appear to do so. If I > send an "idle" or "standby" to the drive with a timeout it takes > effect That stays unclear to me. "start" is SCSI, "idle" and "standby" is SATA. > immediately but any access to it spins it up (as expected) and it does > not re-enter the lower-power mode on the timer, implying that the SSU > command did not enable the timers, and thus they remain inactive even > though they ARE set and camcontrol does report them. Hm, what is SSU? Staggered Spin-Up? I'm trying to configure delayed spin-up with this one on plain SATA: # camcontrol cmd /dev/ada2 -a "EF 06 00 00 00 00 00 00 00 00 00 00" but that doesn't really work, because the device driver has it's own ideas about when to taste the device... No, wait... > To allow unlimited flexibility in controlling the drive’s > PowerChoice technology feature, the Start/Stop Unit (SSU) SCSI > command can be used. Hm... so this would be the *SCSI* command 0x1B... then this one should work (here is the 'STOP' incantation; replace with Your proper bit values according to that seagate paper): # /sbin/camcontrol cmd /dev/xxx -c "1B 01 00 00 00 00" HTH PMc
Re: EARLY_AP_STARTUP now (effectively) mandatory?
On 2023-08-07, Garrett Wollman wrote: > This option was apparently added in 2016 by jhb@, and in his > PHabricator description, he wrote: > > As a transition aid, the new behavior is moved under a new > kernel option (EARLY_AP_STARTUP). This will allow the option > to be turned off if need be during initial testing. I hope to > enable this on x86 by default in a followup commit and to have > all platforms moved over before 11.0. Once the transition is > complete, the option will be removed along with the > !EARLY_AP_STARTUP code. > I remember reading that stance, so probably I ran into this one also once. It seems, we, who are building our own custom kernels, are becoming a minority. :(
Re: Interesting (Open)ZFS issue
On 2023-08-13, Garrett Wollman wrote: > This seems to me like a bug: `zpool scrub` correctly identified the > damaged parts of the disk, so ZFS knows that those regions of the pool > are bad in some way -- they should cause an error rather than a panic! Yes, but it does. On seriousely inconsistent data -and zerofill is seriousely inconsistent- it can behave bad. I think one almost cannot code&catch every possible exception while still providing excellent performance. OTOH, I adopted ZFS very early for my database, and I am usually running on scrap hardware, but it never gave me a real data loss issue. cheers, PMc
Re: vfs.zfs.compressed_arc_enabled=0 is INCOMPATIBLE with L2ARC at least in FreeBSD 13 (Was: Crash on adding L2ARC to raidz1 pool)
On 2024-01-13, Alexander Burke wrote: > Hello, > > It looks like the issue is fixed in OpenZFS 2.2 (and thus in FreeBSD > 14-RELEASE): > > https://github.com/openzfs/zfs/issues/15764#issuecomment-1890491789 > > Cheers, > Alex > > > Jan 13, 2024 12:26:50 Lev Serebryakov : > >> On 08.01.2024 18:34, Lev Serebryakov wrote: >> >> I've found that all my L2ARC problems (live-locks and crashes) are result >> of OpenZFS bug which can not support L2ARC with un-compressed ARC >> (vfs.zfs.compressed_arc_enabled=0). >> >> It is NOT hardware-depended (and my NVMe is perfectly Ok and healthy) and >> could be easily reproduced under VM with all-virtual disks. >> >> I've opened the ticket in OpenZFS project >> (https://github.com/openzfs/zfs/issues/15764). >> >> Maybe, FreeBSD need ERRATA entry? >> >> >> Previous threads: >> >> [1] ZFS pool hangs (live-locks?) after adding L2ARC >> [2] Crash on adding L2ARC to raidz1 pool >> >> -- >> // Lev Serebryakov > > Just for the records, there is a note in my loader.conf: > vfs.zfs.compressed_arc_enabled="1" # 27.7.17: since R11.1 l2_cksum_errs if 0 Apparently I didn't bother to open a ticket, since the general stance was that one shouldn't mess with the defaults. The more interesting question might be how we managed to improve from checksum-errors (that were otherwise harmless) to "live-locks and crashes" ;)
Re: git log - how to find out latest stable/14 breakage
On 2024-01-20, Harry Schmalzbauer wrote: > How can you all manage your daily jobs with git?!?! For me as a daily jobs? Hm, maybe that's the mistake. This is FreeBSD, this ought to be fun! (I don't have a job, I don't get a job, I'm just normal unemployed trash :/ ). But to answer your question: > part-time RCS user, git is a huge regression. Never had anything to > lookup/read twice with subversion or cvs in the past, but never > found What I did, is put this into /usr/local/etc/gitconfig: [alias] dir = log --topo-order --compact-summary --pretty=fuller This is slow, but it gives a log output that is more exhaustive, and similar to the one I could get from SVN. But then, self-creating sourcefiles are always a bit difficult. At least people here are quite strict with the naming, so the appearance of a second dot in in a sourcefile name should give an alert that something unkosher is going on. Cheerio!
Re: gpart device permissions security hole (/dev/geom.ctl)
On 2024-02-24, Miroslav Lachman <000.f...@quip.cz> wrote: > On 24/02/2024 21:00, Vincent Stemen wrote: >> On Sat, Feb 24, 2024 at 04:40:00PM +0100, Miroslav Lachman wrote: >>> I agree with this security problem. Just a small note - there are >>> backups of partitions (/var/backups/gpart.*) created by periodic script >>> /etc/periodic/daily/221.backup-gpart (if you have >>> daily_backup_gpart_enable="YES" in your /etc/periodic.conf or in a >>> /etc/defaults/periodic.conf which is the default). That way you can get >>> back the number plate on you house in some cases. >> >> Thanks. That's good to know. I was not aware of those features of >> periodic. > > Almost nobody knows. Oh, now I see why there is a problem. Actually I found the partition tables missing when I planned for desaster recovery, and thought it would be helpful to have a copy of them. So I implemented such periodic backup long before it was officially provided. I think there are many possibilities how things can go wrong, and evil action is only one of them. So my first imperative is to get the data savely into backup (and then the backup to offsite). That accomplished, we can in a relaxed mood think about what we will do to the person who actually manages to delete the partition table... cheerio, PMc
Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task
On 2024-02-27, Edward Sanford Sutton, III wrote: >More recently looked and see top showing threads+system processes > shows I have one core getting 100% cpu for kernel{arc_prune} which has > 21.2 hours over a 2 hour 23 minute uptime. Ack. > I started looking to see if > https://www.freebsd.org/security/advisories/FreeBSD-EN-23:18.openzfs.asc > was available as a fix for 13 but it is not (and doesn't quite sound > like it was supposed to apply to this issue). Would a kernel thread time > at 100% cpu for only 1 core explain the system becoming unusually > unresponsive? That depends. This arc_prune issue does usually go alongside with some other kernel thread (vm-whatever) also blocking, so you have two cores busy. How many remain? There is an updated patch in the PR 275594 (5 pieces), that works for 13.3; I have it installed, and only with that I am able to build gcc12 - otherwise the system would just OOM-crash (vm.pageout_oom_seq=5120 does not help with this). I didn't see any lagging behaviour, but then, I have 20 vCore. cheerio, PMc