from:"Mark Millard"

Re: Troubles building world on stable/13 [the little bit of evidence about the compiler failures: a jemalloc-tie/ASLR-tie?]

2022-02-07 Thread Mark Millard

0 00 00  
.@.U
0xaa10: 01 00 00 00 00 00 00 00 c0 0d b1 55 00 00 00 00  
...U
0xaa20: 00 00 00 00 00 00 00 00 e2 40 b2 55 00 00 00 00  
.@.U
0xaa30: 01 00 00 00 00 00 00 00 c0 0d b1 55 00 00 00 00  
...U
0xaa40: 00 00 00 00 00 00 00 00 2a 41 b2 55 00 00 00 00  
*A.U
0xaa50: 01 00 00 00 00 00 00 00 c0 0d b1 55 00 00 00 00  
...U
0xaa60: 00 00 00 00 00 00 00 00 72 41 b2 55 00 00 00 00  
rA.U
0xaa70: 01 00 00 00 00 00 00 00 c0 0d b1 55 00 00 00 00  
...U
0xaa80: 00 00 00 00 00 00 00 00 ba 41 b2 55 00 00 00 00  
.A.U

When the 0x05's show up they are instead of the
0x01's, just after the ": ".

After that the pattern is different. But quickly
something looks like another fp/lr pair in memory,
and tha, in turn, it references another:

0xaa90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

0xaaa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

0xaab0: 00 00 00 00 00 00 00 00 44 c4 95 07 0e 02 46 57  
D.FW
0xaac0: 10 ab ff ff ff ff 00 00 8c c6 aa 02 00 00 00 00  

. . .
0xab10: 90 ac ff ff ff ff 00 00 e0 18 ab 02 00 00 00 00  

. . .

But after that the following does not seem to
fit the pattern:

0xac90: 00 ac ff ff ff ff 00 00 44 c4 95 07 0e 02 46 57  
D.FW

and:

0xac00: 01 00 00 00 00 00 00 00 18 ae ff ff ff ff 00 00  



The a5 sequences make me wonder if jemalloc
assigned a memory allocation to stack space or
was told to handle a stack address as if it was
an assigned address for some aspects of an
allocation (if that can even be requested).

I wonder if there is any chance of ASLR being
involved with the stack and memory allocation
possibly overlapping.

But I've really no clue.


I've given up on trying to isolate what is going
on for the compiler failures. I've only been able
to see after the failure, not just before:
debugger interactions with the compiler process
in times close to the failure point in the code
prevent the failure. I've not found any
alternative that avoids such.

This is on top of the issue that the plain-runs
(no debugger) vary in behavior, sometimes
running to completion, sometimes stopping at
similar but varying places in the source code
being processed. There is still no known way to
get a full reproduction of failure details each
time. (Which instance of the example type of
source code being compiled at the point of
failure does vary.)

For reference: I've been using .sh/.cpp
pairs that Bob published and a copy of the
c++ from his system to investigate. The
.cpp is large. Bob's RPi3* is a RAM+SWAP
context of: 1 GiBYTe + 2 GiByte and I made
such a context on a RPi3* as well. But I
ran his stable/13 c++ on a system with a
non-debug main [so: 14] kernel and either
a main world or a stable/13 chroot. From
the chroot:

# uname -apKU
FreeBSD Rock64_RPi_4_3_2v1p2 14.0-CURRENT FreeBSD 14.0-CURRENT #28 
main-n252475-e76c0108990b-dirty: Sat Jan 15 23:39:27 PST 2022 
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA53-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA53
  arm64 aarch64 1400047 1300524

# freebsd-version -ru
14.0-CURRENT
13.0-STABLE

# ~/fbsd-based-on-what-commit.sh -C /usr/13S-src/
branch: stable/13
merge-base: a5f69859956049b5153b0e1b67f8f4a99622dc6f
merge-base: CommitDate: 2022-01-15 12:55:32 +
a5f698599560 (HEAD -> stable/13, freebsd/stable/13) Ignore debugger-injected 
signals left after detaching

Bob's recent stable/13 context (kernel too) is
more recent than mine. So the problems has been
observed over a range of contexts.

But, as I said, I've given up on finding a
way to isolate whatever is going on.

===
Mark Millard
marklmi at yahoo.com

Re: git: 4a864f624a70 - main - vm_pageout: Print a more accurate message to the console before an OOM kill [MFC in time for 13.1?]

2022-02-26 Thread Mark Millard

On 2022-Jan-15, at 07:55, Mark Johnston  wrote:

> On Fri, Jan 14, 2022 at 09:38:56PM -0800, Mark Millard wrote:
>> Thanks. This will allow me to remove part of my personal additions
>> in this area --and my having to explain the misnomer when trying
>> to help someone analyze why they end up with OOM activity so they
>> can figure out what to do about it.
>> 
>> There seem to be two separate sources of VM_OOM_SWAPZ. Showing
>> my personal additions for them (just making them explicit in the
>> sequence of messages generated):
>> 
>> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
>> index 01cf9233329f..280621ca51be 100644
>> --- a/sys/vm/swap_pager.c
>> +++ b/sys/vm/swap_pager.c
>> @@ -2091,6 +2091,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t 
>> pindex, daddr_t swapblk)
>>0, 1))
>>printf("swap blk zone exhausted, "
>>"increase kern.maxswzone\n");
>> +   printf("swp_pager_meta_build: swap blk uma 
>> zone exhausted\n");
>>vm_pageout_oom(VM_OOM_SWAPZ);
>>pause("swzonxb", 10);
>>} else
>> @@ -2121,6 +2122,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t 
>> pindex, daddr_t swapblk)
>>0, 1))
>>printf("swap pctrie zone exhausted, "
>>"increase kern.maxswzone\n");
>> +   printf("swp_pager_meta_build: swap pctrie 
>> uma zone exhausted\n");
>>vm_pageout_oom(VM_OOM_SWAPZ);
>>pause("swzonxp", 10);
>>} else
>> 
>> Care to comment on the distinctions and why there are two
>> contexts classified as "out of swap space"? Would either
>> one show the swap space as (nearly?) all used in, say, top?
>> Or might one of them still end up looking like a misnomer
>> from just a top (or whatever) display?
> 
> Hmm, those cases should likely be changed from "out of swap space" to
> "failed to allocate swap metadata" or something like that.

The above does not seem to have happened yet in main [so: 14].

Will 13.1 get an MFC of 4a864f624a70 in time, possibly with the
above change also in place to fully avoid misnomer reporting
that misleads folks?

4a864f624a70 listed:

MFC after:  2 weeks

but it has been more than a month.

> . . .
> 

===
Mark Millard
marklmi at yahoo.com

Re: git: 4a864f624a70 - main - vm_pageout: Print a more accurate message to the console before an OOM kill [MFC in time for 13.1?]

2022-02-28 Thread Mark Millard




On 2022-Feb-26, at 17:10, Mark Millard  wrote:

> On 2022-Jan-15, at 07:55, Mark Johnston  wrote:
> 
>> On Fri, Jan 14, 2022 at 09:38:56PM -0800, Mark Millard wrote:
>>> Thanks. This will allow me to remove part of my personal additions
>>> in this area --and my having to explain the misnomer when trying
>>> to help someone analyze why they end up with OOM activity so they
>>> can figure out what to do about it.
>>> 
>>> There seem to be two separate sources of VM_OOM_SWAPZ. Showing
>>> my personal additions for them (just making them explicit in the
>>> sequence of messages generated):
>>> 
>>> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
>>> index 01cf9233329f..280621ca51be 100644
>>> --- a/sys/vm/swap_pager.c
>>> +++ b/sys/vm/swap_pager.c
>>> @@ -2091,6 +2091,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t 
>>> pindex, daddr_t swapblk)
>>>   0, 1))
>>>   printf("swap blk zone exhausted, "
>>>   "increase kern.maxswzone\n");
>>> +   printf("swp_pager_meta_build: swap blk uma 
>>> zone exhausted\n");
>>>   vm_pageout_oom(VM_OOM_SWAPZ);
>>>   pause("swzonxb", 10);
>>>   } else
>>> @@ -2121,6 +2122,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t 
>>> pindex, daddr_t swapblk)
>>>   0, 1))
>>>   printf("swap pctrie zone exhausted, "
>>>   "increase kern.maxswzone\n");
>>> +   printf("swp_pager_meta_build: swap pctrie 
>>> uma zone exhausted\n");
>>>   vm_pageout_oom(VM_OOM_SWAPZ);
>>>   pause("swzonxp", 10);
>>>   } else
>>> 
>>> Care to comment on the distinctions and why there are two
>>> contexts classified as "out of swap space"? Would either
>>> one show the swap space as (nearly?) all used in, say, top?
>>> Or might one of them still end up looking like a misnomer
>>> from just a top (or whatever) display?
>> 
>> Hmm, those cases should likely be changed from "out of swap space" to
>> "failed to allocate swap metadata" or something like that.
> 
> The above does not seem to have happened yet in main [so: 14].
> 
> Will 13.1 get an MFC of 4a864f624a70 in time, possibly with the
> above change also in place to fully avoid misnomer reporting
> that misleads folks?
> 
> 4a864f624a70 listed:
> 
> MFC after:2 weeks
> 
> but it has been more than a month.
> 
>> . . .
>> 
> 

Thanks for the stable/13 MFC as 13ba1d283676. It
provides a big improvement over the prior messaging
for the OOM kills.



For reference, I do still view:

+   case VM_OOM_SWAPZ:
+   reason = "out of swap space";
+   break;

as using a confusing misnomer ("swap space") for its
message. But, so far as I know, VM_OOM_SWAPZ is a
rarity and  possibly very difficult to produce. If
so, any confusions from the message should also be
rare.

===
Mark Millard
marklmi at yahoo.com

Re: panic: data abort in critical section or under mutex (was: Re: panic: Unknown kernel exception 0 esr_el1 2000000 (on 14-CURRENT/aarch64 Feb 28))

2022-03-07 Thread Mark Millard




On 2022-Mar-7, at 08:45, Mark Johnston  wrote:

> On Mon, Mar 07, 2022 at 04:25:22PM +, Andrew Turner wrote:
>> 
>>> On 7 Mar 2022, at 15:13, Mark Johnston  wrote:
>>> ...
>>> A (the?) problem is that the compiler is treating "pc" as an alias
>>> for x18, but the rmlock code assumes that the pcpu pointer is loaded
>>> once, as it dereferences "pc" outside of the critical section.  On
>>> arm64, if a context switch occurs between the store at _rm_rlock+144 and
>>> the load at +152, and the thread is migrated to another CPU, then we'll
>>> end up using the wrong CPU ID in the rm->rm_writecpus test.
>>> 
>>> I suspect the problem is unique to arm64 as its get_pcpu()
>>> implementation is different from the others in that it doesn't use
>>> volatile-qualified inline assembly.  This has been the case since
>>> https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762
>>>  
>>> <https://cgit.freebsd.org/src/commit/?id=63c858a04d56529eddbddf85ad04fc8e99e73762>
>>> .
>>> 
>>> I haven't been able to reproduce any crashes running poudriere in an
>>> arm64 AWS instance, though.  Could you please try the patch below and
>>> confirm whether it fixes your panics?  I verified that the apparent
>>> problem described above is gone with the patch.
>> 
>> Alternatively (or additionally) we could do something like the following. 
>> There are only a few MI users of get_pcpu with the main place being in rm 
>> locks.
>> 
>> diff --git a/sys/arm64/include/pcpu.h b/sys/arm64/include/pcpu.h
>> index 09f6361c651c..59b890e5c2ea 100644
>> --- a/sys/arm64/include/pcpu.h
>> +++ b/sys/arm64/include/pcpu.h
>> @@ -58,7 +58,14 @@ struct pcpu;
>> 
>> register struct pcpu *pcpup __asm ("x18");
>> 
>> -#defineget_pcpu()  pcpup
>> +static inline struct pcpu *
>> +get_pcpu(void)
>> +{
>> +   struct pcpu *pcpu;
>> +
>> +   __asm __volatile("mov   %0, x18" : "=&r"(pcpu));
>> +   return (pcpu);
>> +}
>> 
>> static inline struct thread *
>> get_curthread(void)
> 
> Indeed, I think this is probably the best solution.

Is this just partially reverting:

https://cgit.freebsd.org/src/commit/?id=63c858a04d56

If so, there might need to be comments about why the updated
code is as it will be.

Looks like stable/13 picked up sensitivity to the get_pcpu
details in rmlock in:

https://cgit.freebsd.org/src/commit/?h=stable/13&id=543157870da5

(a 2022-03-04 commit) and stable/13 also has the get_pcpu
misdefinition in:

https://cgit.freebsd.org/src/commit/sys/arm64/include/pcpu.h?h=stable/13&id=63c858a04d56

. So an MFC would be appropriate in order for aarch64
to be reliable for any variations in get_pcpu in stable/13
(and for 13.1 to be so as well).

===
Mark Millard
marklmi at yahoo.com

Re: https://ci.freebsd.org/job/FreeBSD-main-amd64-gcc9_build broken again after openzfs merge: multiple definitions building --- all_subdir_rescue ---

2022-03-18 Thread Mark Millard

On 2022-Mar-18, at 12:32, Mark Millard  wrote:

> Looks like . . .
> 
> /workspace/src/sys/contrib/openzfs/module/zstd/lib/common/error_private.h
> and:
> /workspace/src/sys/contrib/zstd/lib/common/error_private.h
> 
> are both used in building in:
> 
> /tmp/obj/workspace/src/amd64.amd64/rescue/rescue
> 
> and each is providing various definitions that the other also does:
> 
> multiple definition of `ZSTD_versionNumber'
> multiple definition of `ZSTD_versionString';
> multiple definition of `ZSTD_isError';
> multiple definition of `ZSTD_getErrorName';
> multiple definition of `ZSTD_getErrorCode';
> multiple definition of `ZSTD_getErrorString';
> 
> Looks like this goes back to:
> 
> Build #3075 (Mar 8, 2022 9:33:24 PM)
> [c03c5b1c8091: "zfs: merge openzfs/zfs@a86e08941 (master) into main"]
> 
> after Build #3074 (Mar 8, 2022 6:16:32 PM) had built fine.
> 

FYI: I tried to build 13.1-BETA2 with a gcc9 xtoolchain and got:

--- all_subdir_stand/efi/gptboot ---
. . .
/local/bin/x86_64-unknown-freebsd13.0-ld: gptboot.sym.full: error: PHDR segment 
not covered by LOAD segment
collect2: error: ld returned 1 exit status


So I tried continuing using WITHOUT_BOOT= and the next stopping
points were:

--- all_subdir_cxgbe ---
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/overflow.h:45:2: error: 
#error "Compiler does not support __builtin_add_overflow"
   45 | #error "Compiler does not support __builtin_add_overflow"
  |  ^
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/overflow.h:62:2: error: 
#error "Compiler does not support __builtin_mul_overflow"
   62 | #error "Compiler does not support __builtin_mul_overflow"
  |  ^
. . .
--- all_subdir_cxgbe/iw_cxgbe ---
In file included from 
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/slab.h:42,
 from /usr/13_1R-src/sys/dev/cxgbe/iw_cxgbe/ev.c:40:
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/overflow.h:45:2: error: 
#error "Compiler does not support __builtin_add_overflow"
   45 | #error "Compiler does not support __builtin_add_overflow"
  |  ^
. . .
--- device.o ---
 from 
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/sched.h:41,
 from 
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/kernel.h:50,
 from 
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/kobject.h:36,
 from 
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/module.h:43,
 from /usr/13_1R-src/sys/dev/cxgbe/iw_cxgbe/device.c:41:
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/overflow.h: At top 
level:
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/overflow.h:45:2: error: 
#error "Compiler does not support __builtin_add_overflow"
   45 | #error "Compiler does not support __builtin_add_overflow"
  |  ^
/usr/13_1R-src/sys/compat/linuxkpi/common/include/linux/overflow.h:62:2: error: 
#error "Compiler does not support __builtin_mul_overflow"
   62 | #error "Compiler does not support __builtin_mul_overflow"
  |  ^


With that I stopped the experiments.


===
Mark Millard
marklmi at yahoo.com

Re: git: 43741377b143 - main - security/openssl: Security update to 1.1.1n

2022-03-19 Thread Mark Millard

> From: Mark Johnston  
> Date: Sat, 19 Mar 2022 12:42:59 -0400
> On Sat, Mar 19, 2022 at 12:40:45PM +0100, Thomas Zander wrote:
> > On Sat, 19 Mar 2022 at 12:11, Rene Ladan  wrote:
> > >
> > > On Sat, Mar 19, 2022 at 11:04:58AM +0100, Thomas Zander wrote:
> > > > On Sat, 19 Mar 2022 at 09:00, Matthias Fechner 
> > > >  wrote:
> > > >
> > > > > I can confirm now, the problem is definitely related to the -p8 
> > > > > update.
> > > > > I rolled back now to -p7 using `freebsd-update rollback`.
> > > > > [...]
> > > > > System is now up and running again.
> > > > > This all works even if poudriere jail is using -p8. No need to 
> > > > > downgrade the jail/base version poudriere is using.
> > > > > It is caused by the kernel so the ZFS patch seems to be broken and 
> > > > > -p8 should maybe not rolled out to not break more systems of users.
> > > >
> > > > On top of "stop rollout", there is the question how to identify the
> > > > broken files for the users who have already upgraded to -p8. A `zpool
> > > > scrub` presumably won't help.
> > >
> > > I think it also applies to 13.1-BETA2 ?
> > >
> > > Should we involve/CC some src committers?
> > 
> > I have just rolled back to -p7 and run a number of test builds in
> > poudriere (the jails still have the -p8 user land). I see the same as
> > Matthias and Christoph, the rollback to the -p7 kernel/zfs resolved
> > the build problems, there are no NUL byte files generated anymore.
> > Adding markj_at_ to the discussion. Mark, the TLDR so far:
> > - One of the zfs patches in -p8 seems to cause erroneous writes.
> > - We noticed because of many build failures with poudriere (presumably
> > highly io-loaded during build).
> > - Symptom: Production of files with large runs of NUL-bytes.
> 
> I've had zero luck reproducing this locally.  I built several hundred
> ports, including textproc/py-pystemmer mentioned elsewhere in the
> thread, without any failures or instances of zero-filled files.  Another
> member of secteam also hasn't been able to trigger any build failures on
> -p8.  Any hints on a reproducer would be useful.
> 
> We can simply push a -p9 which reverts EN-22:10 and :11, but of course
> it would be preferable to precisely identify the problem.

Anything about the types of hardware involved that is
different for those getting the problem vs. those that
do not get the problem? May be it would be appropriate
for folks getting the problem to detail their hardware
configurations, including storage hardware.

===
Mark Millard
marklmi at yahoo.com

Re: git: 43741377b143 - main - security/openssl: Security update to 1.1.1n

2022-03-19 Thread Mark Millard

On 2022-Mar-19, at 11:07, Thomas Zander  wrote:

> On Sat, 19 Mar 2022 at 18:32, Mark Millard  wrote:
>> May be report to Mark J. how to run the same test builds
>> that failed for -p8 but worked for -p7?
> 
> Sure, good point.
> A build that reliably causes broken packages on p8 but not on p7 for
> me is running:
> 
> poudriere testport -o multimedia/mplayer -j <13.0-amd64-jail here>
> 
> This caused the broken png and python packages when they were built as
> dependencies.
> In poudriere.conf I set this:
> DISTFILES_CACHE=/vcache/distfiles
> CCACHE_DIR=/vcache/ccache
> ALLOW_MAKE_JOBS=yes
> 
> The ALLOW_MAKE_JOBS should increase the number of parallel IO
> operations in-flight on the pool, maybe this increases the likelihood
> of triggering the issue?
> The DISTFILES_CACHE and CCACHE_DIR are in the same zfs pool as
> /poudriere, not sure if this is relevant.
> The zfs pool is a single disk, no raid, mirror or anything fancy.

On a ThreadRipper 1950X, PCIe Optane storage, 128 GiBytes of
RAM, I've used bectl to boot the 13.0_RELEASE-p8 environment
and have started:

poudriere testport -o multimedia/mplayer -j13_0R-amd64-bulk_a

where the jail had nothing built in it at the start. So:

[00:00:08] Building 271 packages using up to 32 builders

The primary difference is that I've never used ccache and
did not try to do so here. The "zfs pool is a single disk,
no raid, mirror or anything fancy" is accurate, as is the
use of ALLOW_MAKE_JOBS= .

That did not take long . . .

It proves that ccache is not required. Also some files
seem to get only small blocks of zero-bytes, others
large ones. But I've not checked for the null characters
being at the end instead of earlier in the file.

libXcomposite-0.4.5,1.log :

--- Xcomposite.lo ---
/bin/sh ../libtool  --tag=CC--mode=compile cc -DHAVE_CONFIG_H  -I. -I..  
-I../include   -Wall -Wpointer-arith -Wmissing-declarations -Wformat=2 
-Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Wbad-function-cast 
-Wold-style-definition -Wdeclaration-after-statement -Wunused -Wuninitialized 
-Wshadow -Wmissing-noreturn -Wmissing-format-attribute -Wredundant-decls 
-Werror=implicit -Werror=nonnull -Werror=init-self -Werror=main 
-Werror=missing-braces -Werror=sequence-point -Werror=return-type 
-Werror=trigraphs -Werror=array-bounds -Werror=write-strings -Werror=address 
-Werror=int-to-pointer-cast -Werror=pointer-to-int-cast -fno-strict-aliasing 
-I/usr/local/include -D_THREAD_SAFE -pthread -I/usr/local/include 
-D_THREAD_SAFE -pthread -pipe  -Werror=uninitialized -g 
-fstack-protector-strong -fno-strict-aliasing -MT Xcomposite.lo -MD -MP -MF 
.deps/Xcomposite.Tpo -c -o Xcomposite.lo Xcomposite.c
libtool: compile:  cc -DHAVE_CONFIG_H -I. -I.. -I../include -Wall 
-Wpointer-arith -Wmissing-declarations -Wformat=2 -Wstrict-prototypes 
-Wmissing-prototypes -Wnested-externs -Wbad-function-cast 
-Wold-style-definition -Wdeclaration-after-statement -Wunused -Wuninitialized 
-Wshadow -Wmissing-noreturn -Wmissing-format-attribute -Wredundant-decls 
-Werror=implicit -Werror=nonnull -Werror=init-self -Werror=main 
-Werror=missing-braces -Werror=sequence-point -Werror=return-type 
-Werror=trigraphs -Werror=array-bounds -Werror=write-strings -Werror=address 
-Werror=int-to-pointer-cast -Werror=pointer-to-int-cast -fno-strict-aliasing 
-I/usr/local/include -D_THREAD_SAFE -pthread -I/usr/local/include 
-D_THREAD_SAFE -pthread -pipe -Werror=uninitialized -g -fstack-protector-strong 
-fno-strict-aliasing -MT Xcomposite.lo -MD -MP -MF .deps/Xcomposite.Tpo -c 
Xcomposite.c  -fPIC -DPIC -o .libs/Xcomposite.o
In file included from Xcomposite.c:45:
In file included from ./xcompositeint.h:53:
In file included from ../include/X11/extensions/Xcomposite.h:49:
/usr/local/include/X11/extensions/Xfixes.h:1:1: warning: null character ignored 
[-Wnull-character]
/usr/local/include/X11/extensions/Xfixes.h:1:2: warning: null character ignored 
[-Wnull-character]
/usr/local/include/X11/extensions/Xfixes.h:1:3: warning: null character ignored 
[-Wnull-character]
/usr/local/include/X11/extensions/Xfixes.h:1:4: warning: null character ignored 
[-Wnull-character]
/usr/local/include/X11/extensions/Xfixes.h:1:5: warning: null character ignored 
[-Wnull-character]
/usr/local/include/X11/extensions/Xfixes.h:1:6: warning: null character ignored 
[-Wnull-character]
/usr/local/include/X11/extensions/Xfixes.h:1:7: warning: null character ignored 
[-Wnull-character]
/usr/local/include/X11/extensions/Xfixes.h:1:8: warning: null character ignored 
[-Wnull-character]
. . . (the list is long) . . .


libXdamage-1.1.5.log . . .

--- Xdamage.lo ---
/bin/sh ../libtool  --tag=CC--mode=compile cc -DHAVE_CONFIG_H  -I. -I..  
-I../include/X11/extensions   -Wall -Wpointer-arith -Wmissing-declarations 
-Wformat=2 -Wstrict-prototypes -Wmissing-prototy
pes -Wnested-externs -Wbad-function-cast -Wold-style-defini

Re: git: 43741377b143 - main - security/openssl: Security update to 1.1.1n

2022-03-19 Thread Mark Millard

The reports of corrupted files with zero-bytes seemed vaguely
familiar, including my own addition to the list. Turns out I'd
reported such for main [so: 14] back in 2021-Nov. My context
at the time was aarch64.

The following has my report sequence, including where I
got past the problem vs. before that point. May be the
history will help.

https://lists.freebsd.org/archives/freebsd-current/2021-November/001052.html

I've not had problems with the issue on main since then.

===
Mark Millard
marklmi at yahoo.com

Re: git: 43741377b143 - main - security/openssl: Security update to 1.1.1n

2022-03-19 Thread Mark Millard

On 2022-Mar-19, at 12:54, Mark Johnston  wrote:

> On Sat, Mar 19, 2022 at 12:00:20PM -0700, Mark Millard wrote:
>> On 2022-Mar-19, at 11:07, Thomas Zander  wrote:
>> 
>>> On Sat, 19 Mar 2022 at 18:32, Mark Millard  wrote:
>>>> May be report to Mark J. how to run the same test builds
>>>> that failed for -p8 but worked for -p7?
>>> 
>>> Sure, good point.
>>> A build that reliably causes broken packages on p8 but not on p7 for
>>> me is running:
>>> 
>>> poudriere testport -o multimedia/mplayer -j <13.0-amd64-jail here>
>>> 
>>> This caused the broken png and python packages when they were built as
>>> dependencies.
>>> In poudriere.conf I set this:
>>> DISTFILES_CACHE=/vcache/distfiles
>>> CCACHE_DIR=/vcache/ccache
>>> ALLOW_MAKE_JOBS=yes
>>> 
>>> The ALLOW_MAKE_JOBS should increase the number of parallel IO
>>> operations in-flight on the pool, maybe this increases the likelihood
>>> of triggering the issue?
>>> The DISTFILES_CACHE and CCACHE_DIR are in the same zfs pool as
>>> /poudriere, not sure if this is relevant.
>>> The zfs pool is a single disk, no raid, mirror or anything fancy.
>> 
>> On a ThreadRipper 1950X, PCIe Optane storage, 128 GiBytes of
>> RAM, I've used bectl to boot the 13.0_RELEASE-p8 environment
>> and have started:
>> 
>> poudriere testport -o multimedia/mplayer -j13_0R-amd64-bulk_a
>> 
>> where the jail had nothing built in it at the start. So:
>> 
>> [00:00:08] Building 271 packages using up to 32 builders
>> 
>> The primary difference is that I've never used ccache and
>> did not try to do so here. The "zfs pool is a single disk,
>> no raid, mirror or anything fancy" is accurate, as is the
>> use of ALLOW_MAKE_JOBS= .
>> 
>> That did not take long . . .
>> 
>> It proves that ccache is not required. Also some files
>> seem to get only small blocks of zero-bytes, others
>> large ones. But I've not checked for the null characters
>> being at the end instead of earlier in the file.
> 
> I still am not able to reproduce it.  I think it's indeed a concurrency
> problem, and I found a possible culprit.  Mark or Thomas, if you're able
> to build a new kernel from the releng/13.0 branch and test it, could you
> please try this patch?
> 

Sure. (I build ports in a way that allows large load
averages relative to the hardware-thread count. I also
have a lot of swap configured. I avoid significant use
of tmpfs.)

> diff --git a/sys/contrib/openzfs/module/zfs/dnode.c 
> b/sys/contrib/openzfs/module/zfs/dnode.c
> index 8592c5f8c3a9..b69ba68ec780 100644
> --- a/sys/contrib/openzfs/module/zfs/dnode.c
> +++ b/sys/contrib/openzfs/module/zfs/dnode.c
> @@ -1661,7 +1661,7 @@ dnode_is_dirty(dnode_t *dn)
>   mutex_enter(&dn->dn_mtx);
> 
>   for (int i = 0; i < TXG_SIZE; i++) {
> - if (list_head(&dn->dn_dirty_records[i]) != NULL) {
> + if (multilist_link_active(&dn->dn_dirty_link[i])) {
>   mutex_exit(&dn->dn_mtx);
>   return (B_TRUE);
>   }
> 

Change made.
Rebuilt.
Reinstalled.
Rebooted into the 13_0R-amd64 be area.
Bulk build started.
Bulk build completed.
(Took longer because I let it run to completion.)

No explicit reports of null characters. The same 2 ports that
failed before, not reporting zero-byte issues, failed again.
Likely independent issues:

[00:28:28] Failed ports: security/libgcrypt:build print/freetype2:package

Overall it skipped something like 54 ports.

libgcrypt-1.9.4.log . . .

--- basic.o ---
basic.c:315:16: error: inline assembly requires more registers than available
  asm volatile("movdqu %[data0], %%xmm0\n"
   ^
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
basic.c:315:16: error: inline assembly requires more registers than available
--- mpitests ---
. . .
--- basi

Re: git: 43741377b143 - main - security/openssl: Security update to 1.1.1n

2022-03-19 Thread Mark Millard




On 2022-Mar-19, at 14:24, Mark Millard  wrote:

> On 2022-Mar-19, at 12:54, Mark Johnston  wrote:
> 
>> On Sat, Mar 19, 2022 at 12:00:20PM -0700, Mark Millard wrote:
>>> On 2022-Mar-19, at 11:07, Thomas Zander  wrote:
>>> 
>>>> On Sat, 19 Mar 2022 at 18:32, Mark Millard  wrote:
>>>>> May be report to Mark J. how to run the same test builds
>>>>> that failed for -p8 but worked for -p7?
>>>> 
>>>> Sure, good point.
>>>> A build that reliably causes broken packages on p8 but not on p7 for
>>>> me is running:
>>>> 
>>>> poudriere testport -o multimedia/mplayer -j <13.0-amd64-jail here>
>>>> 
>>>> This caused the broken png and python packages when they were built as
>>>> dependencies.
>>>> In poudriere.conf I set this:
>>>> DISTFILES_CACHE=/vcache/distfiles
>>>> CCACHE_DIR=/vcache/ccache
>>>> ALLOW_MAKE_JOBS=yes
>>>> 
>>>> The ALLOW_MAKE_JOBS should increase the number of parallel IO
>>>> operations in-flight on the pool, maybe this increases the likelihood
>>>> of triggering the issue?
>>>> The DISTFILES_CACHE and CCACHE_DIR are in the same zfs pool as
>>>> /poudriere, not sure if this is relevant.
>>>> The zfs pool is a single disk, no raid, mirror or anything fancy.
>>> 
>>> On a ThreadRipper 1950X, PCIe Optane storage, 128 GiBytes of
>>> RAM, I've used bectl to boot the 13.0_RELEASE-p8 environment
>>> and have started:
>>> 
>>> poudriere testport -o multimedia/mplayer -j13_0R-amd64-bulk_a
>>> 
>>> where the jail had nothing built in it at the start. So:
>>> 
>>> [00:00:08] Building 271 packages using up to 32 builders
>>> 
>>> The primary difference is that I've never used ccache and
>>> did not try to do so here. The "zfs pool is a single disk,
>>> no raid, mirror or anything fancy" is accurate, as is the
>>> use of ALLOW_MAKE_JOBS= .
>>> 
>>> That did not take long . . .
>>> 
>>> It proves that ccache is not required. Also some files
>>> seem to get only small blocks of zero-bytes, others
>>> large ones. But I've not checked for the null characters
>>> being at the end instead of earlier in the file.
>> 
>> I still am not able to reproduce it.  I think it's indeed a concurrency
>> problem, and I found a possible culprit.  Mark or Thomas, if you're able
>> to build a new kernel from the releng/13.0 branch and test it, could you
>> please try this patch?
>> 
> 
> Sure. (I build ports in a way that allows large load
> averages relative to the hardware-thread count. I also
> have a lot of swap configured. I avoid significant use
> of tmpfs.)
> 
>> diff --git a/sys/contrib/openzfs/module/zfs/dnode.c 
>> b/sys/contrib/openzfs/module/zfs/dnode.c
>> index 8592c5f8c3a9..b69ba68ec780 100644
>> --- a/sys/contrib/openzfs/module/zfs/dnode.c
>> +++ b/sys/contrib/openzfs/module/zfs/dnode.c
>> @@ -1661,7 +1661,7 @@ dnode_is_dirty(dnode_t *dn)
>>  mutex_enter(&dn->dn_mtx);
>> 
>>  for (int i = 0; i < TXG_SIZE; i++) {
>> -if (list_head(&dn->dn_dirty_records[i]) != NULL) {
>> +if (multilist_link_active(&dn->dn_dirty_link[i])) {
>>  mutex_exit(&dn->dn_mtx);
>>  return (B_TRUE);
>>  }
>> 
> 
> Change made.
> Rebuilt.
> Reinstalled.
> Rebooted into the 13_0R-amd64 be area.
> Bulk build started.
> Bulk build completed.
> (Took longer because I let it run to completion.)
> 
> No explicit reports of null characters. The same 2 ports that
> failed before, not reporting zero-byte issues, failed again.
> Likely independent issues:
> 
> [00:28:28] Failed ports: security/libgcrypt:build print/freetype2:package

These are what happens for WITH_DEBUG= style builds. Turns
out that the *make.conf files from my last bulk -a experiment
were still in place and were causing WITH_DEBUG= builds. (Not
my normal context.)

I'll disable that and rerun the bulk from scratch.

> Overall it skipped something like 54 ports.
> 
> libgcrypt-1.9.4.log . . .
> 
> --- basic.o ---
> basic.c:315:16: error: inline assembly requires more registers than available
>  asm volatile("movdqu %[data0], %%xmm0\n"
>   ^
> basic.c:315:16: error: inline assembly requires more registers than available
> basic.c:315:16: error: inline assembly requires more registers than av

Re: git: 43741377b143 - main - security/openssl: Security update to 1.1.1n

2022-03-19 Thread Mark Millard

On 2022-Mar-19, at 14:34, Mark Millard  wrote:

> On 2022-Mar-19, at 14:24, Mark Millard  wrote:
> 
>> On 2022-Mar-19, at 12:54, Mark Johnston  wrote:
>> 
>>> On Sat, Mar 19, 2022 at 12:00:20PM -0700, Mark Millard wrote:
>>>> On 2022-Mar-19, at 11:07, Thomas Zander  wrote:
>>>> 
>>>>> On Sat, 19 Mar 2022 at 18:32, Mark Millard  wrote:
>>>>>> May be report to Mark J. how to run the same test builds
>>>>>> that failed for -p8 but worked for -p7?
>>>>> 
>>>>> Sure, good point.
>>>>> A build that reliably causes broken packages on p8 but not on p7 for
>>>>> me is running:
>>>>> 
>>>>> poudriere testport -o multimedia/mplayer -j <13.0-amd64-jail here>
>>>>> 
>>>>> This caused the broken png and python packages when they were built as
>>>>> dependencies.
>>>>> In poudriere.conf I set this:
>>>>> DISTFILES_CACHE=/vcache/distfiles
>>>>> CCACHE_DIR=/vcache/ccache
>>>>> ALLOW_MAKE_JOBS=yes
>>>>> 
>>>>> The ALLOW_MAKE_JOBS should increase the number of parallel IO
>>>>> operations in-flight on the pool, maybe this increases the likelihood
>>>>> of triggering the issue?
>>>>> The DISTFILES_CACHE and CCACHE_DIR are in the same zfs pool as
>>>>> /poudriere, not sure if this is relevant.
>>>>> The zfs pool is a single disk, no raid, mirror or anything fancy.
>>>> 
>>>> On a ThreadRipper 1950X, PCIe Optane storage, 128 GiBytes of
>>>> RAM, I've used bectl to boot the 13.0_RELEASE-p8 environment
>>>> and have started:
>>>> 
>>>> poudriere testport -o multimedia/mplayer -j13_0R-amd64-bulk_a
>>>> 
>>>> where the jail had nothing built in it at the start. So:
>>>> 
>>>> [00:00:08] Building 271 packages using up to 32 builders
>>>> 
>>>> The primary difference is that I've never used ccache and
>>>> did not try to do so here. The "zfs pool is a single disk,
>>>> no raid, mirror or anything fancy" is accurate, as is the
>>>> use of ALLOW_MAKE_JOBS= .
>>>> 
>>>> That did not take long . . .
>>>> 
>>>> It proves that ccache is not required. Also some files
>>>> seem to get only small blocks of zero-bytes, others
>>>> large ones. But I've not checked for the null characters
>>>> being at the end instead of earlier in the file.
>>> 
>>> I still am not able to reproduce it.  I think it's indeed a concurrency
>>> problem, and I found a possible culprit.  Mark or Thomas, if you're able
>>> to build a new kernel from the releng/13.0 branch and test it, could you
>>> please try this patch?
>>> 
>> 
>> Sure. (I build ports in a way that allows large load
>> averages relative to the hardware-thread count. I also
>> have a lot of swap configured. I avoid significant use
>> of tmpfs.)
>> 
>>> diff --git a/sys/contrib/openzfs/module/zfs/dnode.c 
>>> b/sys/contrib/openzfs/module/zfs/dnode.c
>>> index 8592c5f8c3a9..b69ba68ec780 100644
>>> --- a/sys/contrib/openzfs/module/zfs/dnode.c
>>> +++ b/sys/contrib/openzfs/module/zfs/dnode.c
>>> @@ -1661,7 +1661,7 @@ dnode_is_dirty(dnode_t *dn)
>>> mutex_enter(&dn->dn_mtx);
>>> 
>>> for (int i = 0; i < TXG_SIZE; i++) {
>>> -   if (list_head(&dn->dn_dirty_records[i]) != NULL) {
>>> +   if (multilist_link_active(&dn->dn_dirty_link[i])) {
>>> mutex_exit(&dn->dn_mtx);
>>> return (B_TRUE);
>>> }
>>> 
>> 
>> Change made.
>> Rebuilt.
>> Reinstalled.
>> Rebooted into the 13_0R-amd64 be area.
>> Bulk build started.
>> Bulk build completed.
>> (Took longer because I let it run to completion.)
>> 
>> No explicit reports of null characters. The same 2 ports that
>> failed before, not reporting zero-byte issues, failed again.
>> Likely independent issues:
>> 
>> [00:28:28] Failed ports: security/libgcrypt:build print/freetype2:package
> 
> These are what happens for WITH_DEBUG= style builds. Turns
> out that the *make.conf files from my last bulk -a experiment
> were still in place and were causing WITH_DEBUG= builds. (Not
> my normal context.)
> 
> I'll disable that and rerun the bulk fr

Re: FreeBSD Errata Notice FreeBSD-EN-22:13.zfs

2022-03-21 Thread Mark Millard

This went through with no change to releng/13.0 's sys/conf/newvers.sh ,
so it still has:

BRANCH="RELEASE-p8"

from releng/13.0 c3540b3a2bdf . Similarly,
UPDATING still has just:

20220315:
13.0-RELEASE-p8 FreeBSD-EN-22:10.zfs
FreeBSD-EN-22:11.zfs
FreeBSD-EN-22:12.zfs
FreeBSD-SA-22:02.wifi
FreeBSD-SA-22:03.openssl
. . .

from the same commit.

This might make it more difficult for some to verify
what status they have for the zfs problem.

===
Mark Millard
marklmi at yahoo.com

A possible unintended difference in 13.1-RELEASE vs., for example, 13.1-RELEASE-p3

2022-11-03 Thread Mark Millard

I downloaded and looked at:

FreeBSD-13.1-RELEASE-arm64-aarch64-RPI.img

# mdconfig -u md0 -f FreeBSD-13.1-RELEASE-arm64-aarch64-RPI.img
# mount -onoatime /dev/md0s2a /mnt
# strings /mnt/boot/kern*/kernel | grep 13.1-RELEASE
@(#)FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC
FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC
13.1-RELEASE

Note the: releng/13.1-n250148-fc952ac2212

Looking at the live system after the freebsd-update to
-p3 :

# strings /boot/kernel/kernel | grep 13.1-RELEASE
@(#)FreeBSD 13.1-RELEASE-p3 GENERIC
FreeBSD 13.1-RELEASE-p3 GENERIC
13.1-RELEASE-p3

No text analogous to: releng/13.1-n250148-fc952ac2212

I'll note that the actual 13.1-RELEASE-p3 for the binary
release build appears to have been a build of at:

QUOTE
author  Mark Johnston2022-11-01 20:54:33 +
committer   Mark Johnston2022-11-01 20:55:10 
+
commit  c3c13035ef270dcf0d24d2d847dd590edc535ed0 (patch)
treef6582d69009a70d8ae8b52e00da4cabe6d159fb7
parent  e81b1bd17fb4e83865d60461c2554d90f72cd395 (diff)
downloadsrc-c3c13035ef270dcf0d24d2d847dd590edc535ed0.tar.gz
src-c3c13035ef270dcf0d24d2d847dd590edc535ed0.zip
zfs: Fix an improperly resolved merge conflict releng/13.1
Approved by:so
Fixes:  8838c650cb59 ("Fix use-after-free in btree code")

Diffstat
-rw-r--r--  sys/contrib/openzfs/module/zfs/btree.c  1   
1 files changed, 0 insertions, 1 deletions
diff --git a/sys/contrib/openzfs/module/zfs/btree.c 
b/sys/contrib/openzfs/module/zfs/btree.c
index 77cb2543e93d..09625bc92f92 100644
--- a/sys/contrib/openzfs/module/zfs/btree.c
+++ b/sys/contrib/openzfs/module/zfs/btree.c
@@ -1766,7 +1766,6 @@ zfs_btree_remove_idx(zfs_btree_t *tree, zfs_btree_index_t 
*where)
zfs_btree_poison_node_at(tree, keep_hdr, keep_hdr->bth_count);
 
rm_hdr->bth_count = 0;
-   zfs_btree_node_destroy(tree, rm_hdr);
/* Remove the emptied node from the parent. */
zfs_btree_remove_from_node(tree, parent, rm_hdr);
zfs_btree_node_destroy(tree, rm_hdr);
END QUOTE

I'll also note that none of the FreeBSD-EN-22:* notices
lists an exact match to what was actually built for the
binary update.

That would have been true even without the merge conflict
fix, in that, without such involved, the final build would
normally be based on a "Add UPDATING entries and bump
version" type of commit after the last of the FreeBSD-EN-*
commits reported.

In other words, nothing seems to record and show anything
identifying the actual commit used for the binary update.
That could also be of interest to folks that want to build
by starting with the exact same source code vintage as the
binary update did.

In this case:

https://lists.freebsd.org/archives/freebsd-announce/2022-November/48.html

looks like it needs an update because the reference:

releng/13.1/8838c650cb59  releng/13.1-n250167

is to before the "zfs: Fix an improperly resolved merge conflict".
That update will identify the commit built for the binary update.

===
Mark Millard
marklmi at yahoo.com

Re: A possible unintended difference in 13.1-RELEASE vs., for example, 13.1-RELEASE-p3

2022-11-04 Thread Mark Millard

On 2022-Nov-4, at 11:37, Paul Mather  wrote:

> On Nov 3, 2022, at 11:50 PM, Mark Millard  wrote:
> 
>> I downloaded and looked at:
>> 
>> FreeBSD-13.1-RELEASE-arm64-aarch64-RPI.img
>> 
>> # mdconfig -u md0 -f FreeBSD-13.1-RELEASE-arm64-aarch64-RPI.img
>> # mount -onoatime /dev/md0s2a /mnt
>> # strings /mnt/boot/kern*/kernel | grep 13.1-RELEASE
>> @(#)FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC
>> FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC
>> 13.1-RELEASE
>> 
>> Note the: releng/13.1-n250148-fc952ac2212
>> 
>> Looking at the live system after the freebsd-update to
>> -p3 :
>> 
>> # strings /boot/kernel/kernel | grep 13.1-RELEASE
>> @(#)FreeBSD 13.1-RELEASE-p3 GENERIC
>> FreeBSD 13.1-RELEASE-p3 GENERIC
>> 13.1-RELEASE-p3
>> 
>> No text analogous to: releng/13.1-n250148-fc952ac2212
> 
> 
> I'm just wondering, but could this have anything to reproducible builds?  
> It's my understanding that setting is standard for -RELEASE branches.  Note 
> this entry in /usr/src/UPDATING:
> 
> =
> 20180913:
>Reproducible build mode is now on by default, in preparation for
>FreeBSD 12.0.  This eliminates build metadata such as the user,
>host, and time from the kernel (and uname), unless the working tree
>corresponds to a modified checkout from a version control system.
>The previous behavior can be obtained by setting the /etc/src.conf
>knob WITHOUT_REPRODUCIBLE_BUILD.
> =
> 

Could be, but text like releng/13.1-n250148-fc952ac2212 is reproducible
for use of the same commit to do mulitple builds. Use of different
commits across builds should be detectable even for reproducible build
style activity, or so I would expect.

The releng/13.1-n250148-fc952ac2212 type of text does have an issue
if incremental style builds are sometimes used instead of from-scratch
builds: It is for/from the kernel build and if the kernel is not built
but is left at an older build and, say, only part of world is built,
the identification would then be out of date (inaccurate) overall.

I was not really trying to claim that the releng/13.1-n250148-fc952ac2212
text I referenced was the best place to have the identification of the
exact commit used for binary updates. It is just that right now there
is no place and some manual inspection/analysis is required to (hopefully)
identify the right commit. (The mismerge-fix being an example of something
that would need to be noticed.)

Glen, Warner, etc. may well determine that the current status relative to
the build that produced the binary update is sufficient overall. I've
primarily identified related questions for consideration.

===
Mark Millard
marklmi at yahoo.com

RE: LLVM error when building www/firefox 107.0_2,2

2022-11-19 Thread Mark Millard

Hiroo Ono (小野寛生)  wrote on
Date: Sat, 19 Nov 2022 13:10:15 UTC :


> while building www/firefox with poudriere, following error occurred.
> Should I report it to LLVM project as the error message says? Or is
> this just a bug in firefox or the combination of options I chose?
> 
> The system is:
> FreeBSD 13.1-STABLE #7 stable/13-af3ccd7b6d: Thu Nov 10 08:00:43 JST 2022
> and the llvm suite is LLVM 13.0.1 from ports.
> . . .
> LLVM ERROR: Type mismatch in constant table!
> . . .

That looks like a report of an internally detected error
of soemtihng that should never happen, something that
firefox code or build options should not result in.

As a somewhat confirming note . . .

http://beefy18.nyi.freebsd.org/data/main-amd64-default/pf5ce9b7ee067_s183088934a/logs/firefox-107.0_2,2.log

is a log from a successful build of 107.0_2,2 on/for
main [so: 14] --thus using a more recent clang++
vintage as well.

Thus, it suggests that "LLVM 13.0.1 from ports" has
a problem that firefox's build ran into.


===
Mark Millard
marklmi at yahoo.com

RE: Unusual errors on recent stable/13 22-Dec-2022 (a related problem report on freebsd-ports?)

2022-12-23 Thread Mark Millard

Jonathan Chen  wrote on
Date: Thu, 22 Dec 2022 19:21:37 UTC :

> I recently updated my package builder machine to the 
> stable/13-n253297-fc15d5bf1109of (22-Dec-2022); and appear to be having 
> some unusual issues when building with a high number of jobs. My package 
> builder uses synth (which uses nullfs on ZFS), and I have had failures 
> with missing files, as well as what appears to be sequencing issues with 
> Makefiles. If I re-run the build, these errors usually do not reoccur.
> 
> I'm puzzled as to what is happening. Is this just happening to me? It 
> appears that the ZFS code has been updated recently, and I'm wondering 
> whether a regression has crept in. [Or it could just be my hardware?]


https://lists.freebsd.org/archives/freebsd-ports/2022-December/003153.html

indicates a problem with tmpfs use such that using USE_TMPFS=no
avoids a problem for poudriere bulk builds on 13.1 amd64.
(Unclear if the note is for stable/13 vs. releng/13.1 vs. both.)

I'll note repeat the material here but it might be worth a look.

===
Mark Millard
marklmi at yahoo.com

Re: Unusual errors on recent stable/13 22-Dec-2022 (a related problem report on freebsd-ports?)

2022-12-23 Thread Mark Millard

On Dec 23, 2022, at 10:14, Mark Millard  wrote:

> Jonathan Chen  wrote on
> Date: Thu, 22 Dec 2022 19:21:37 UTC :
> 
>> I recently updated my package builder machine to the 
>> stable/13-n253297-fc15d5bf1109of (22-Dec-2022); and appear to be having 
>> some unusual issues when building with a high number of jobs. My package 
>> builder uses synth (which uses nullfs on ZFS), and I have had failures 
>> with missing files, as well as what appears to be sequencing issues with 
>> Makefiles. If I re-run the build, these errors usually do not reoccur.
>> 
>> I'm puzzled as to what is happening. Is this just happening to me? It 
>> appears that the ZFS code has been updated recently, and I'm wondering 
>> whether a regression has crept in. [Or it could just be my hardware?]
> 
> 
> https://lists.freebsd.org/archives/freebsd-ports/2022-December/003153.html
> 
> indicates a problem with tmpfs use such that using USE_TMPFS=no
> avoids a problem for poudriere bulk builds on 13.1 amd64.
> (Unclear if the note is for stable/13 vs. releng/13.1 vs. both.)

A note on Discord indicates: stable/13 as a context with the
devel/nasm example build problem.

> I'll note repeat the material here but it might be worth a look.



===
Mark Millard
marklmi at yahoo.com

Re: Unusual errors on recent stable/13 22-Dec-2022 (a related problem report on freebsd-ports?)

2022-12-23 Thread Mark Millard

Jonathan Chen  wrote on
Date: Fri, 23 Dec 2022 18:40:27 UTC :

> On 24/12/22 07:14, Mark Millard wrote:
> > Jonathan Chen  wrote on
> > Date: Thu, 22 Dec 2022 19:21:37 UTC :
> > 
> >> I recently updated my package builder machine to the
> >> stable/13-n253297-fc15d5bf1109of (22-Dec-2022); and appear to be having
> >> some unusual issues when building with a high number of jobs. My package
> >> builder uses synth (which uses nullfs on ZFS), and I have had failures
> >> with missing files, as well as what appears to be sequencing issues with
> >> Makefiles. If I re-run the build, these errors usually do not reoccur.
> >>
> >> I'm puzzled as to what is happening. Is this just happening to me? It
> >> appears that the ZFS code has been updated recently, and I'm wondering
> >> whether a regression has crept in. [Or it could just be my hardware?]
> > 
> > 
> > https://lists.freebsd.org/archives/freebsd-ports/2022-December/003153.html
> > 
> > indicates a problem with tmpfs use such that using USE_TMPFS=no
> > avoids a problem for poudriere bulk builds on 13.1 amd64.
> > (Unclear if the note is for stable/13 vs. releng/13.1 vs. both.)
> 
> I'll try disabling tmpfs on synth.
> 


FYI . . .

The following is about the tmpfs issue referenced in
freebsd-ports/2022-December/003153.html .

Here is what is going on (manually entered commands, not a script). First under 
a tmpfs:

# df -m
Filesystem 1M-blocks Used Avail Capacity Mounted on
/dev/ufs/rootfs 221683 97879 106068 48% /
devfs 0 0 0 100% /dev
/dev/msdosfs/MSDOSBOOT 49 31 18 62% /boot/msdos
tmpfs 7716 0 7716 0% /tmp
# cd /tmp
# : > mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 08:56:53 2022 mmjnk.test
# : > mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 08:56:53 2022 mmjnk.test

(no time change). The makefile involved is using ": > NAME" notation
to try to update timestamps on deliberately empty files.

Vs. under (for example) UFS:

# cd ~/
# : > mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 09:00:45 2022 mmjnk.test
# : > mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 09:00:54 2022 mmjnk.test

(time changed).

Back in tmpfs land . . .

Part of this is that the file is already of size zero and continues
to be so. By contrast, starting with a file with 15 bytes in it:

# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 15 Mar 9 09:07:38 2022 mmjnk.test
# : > mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 09:07:49 2022 mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 09:07:49 2022 mmjnk.test

The lack of a timestamp change when the file already has size zero
looks like an example of a bug to me.

truncate for tmpfs files behaves similarly (showing just the
lack of timestamp change context):

# truncate -s 0 mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 09:11:31 2022 mmjnk.test
# truncate -s 0 mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 09:11:31 2022 mmjnk.test

(UFS got a timestamp update from such a sequence.)


I'll note that touch does not get this tmpfs behavior:

# touch mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 09:11:26 2022 mmjnk.test
# touch mmjnk.test
# ls -Tld mmjnk.test
-rw-r--r-- 1 root wheel 0 Mar 9 09:11:31 2022 mmjnk.test

(But it would not force size zero on its down.)

I did these tests on:

# uname -apKU
FreeBSD generic 13.1-STABLE FreeBSD 13.1-STABLE #0 
stable/13-n253133-b51ee7ac252c: Wed Nov 23 03:36:16 UTC 2022 
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 
aarch64 1301509 1301509

However, I previously did a devel/nasm bulk test with with USE_TMPFS=all on:

# uname -apKU
FreeBSD amd64_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #55 
main-n259064-f83db6441a2f-dirty: Sun Nov 6 16:31:55 PST 2022 
root@amd64_ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
 amd64 amd64 1400073 1400073

and it got the problem. (I normally use USE_TMPFS=data , which does not
get the problem because the files in question end up not on a tmpfs.)

So: not specific to amd64 , not specific to stable/13 , existed in early
November in main. This may have been around for some time for tmpfs.

===
Mark Millard
marklmi at yahoo.com

RE: ofw_pci: Fix incorrectly sized softc causing pci(4) out-of-bounds reads (Should it have been MFC'd?)

2022-12-26 Thread Mark Millard

Should the following have been MFC'd? (I ran into this while
looking to see why I see a boot message oddity on 13.* that
I do not see on main [so: 14]. There was a time when main
also produced the odd messages. But I'm not  claiming that
this is what makes the difference. The oddity was observed
on aarch64 RPi4B's.)

author Jessica Clarke 2022-01-15 19:03:53 +
committer Jessica Clarke 2022-01-15 19:03:53 +
commit 4e3a43905e3ff7b9fcf228022f05d636f79c4b42 (patch)
tree b6be66e54604bb2c1fbdfde27bf8a6644e04fd05
parent 3266a0c5d5abe8dd14de8478edec3e878e4a1c0b (diff)
download src-4e3a43905e3ff7b9fcf228022f05d636f79c4b42.tar.gz
  src-4e3a43905e3ff7b9fcf228022f05d636f79c4b42.zip

ofw_pci: Fix incorrectly sized softc causing pci(4) out-of-bounds reads

We do not include sys/rman.h and so machine/resource.h ends up not being 
included by the time pci_private.h is included. This means PCI_RES_BUS is never 
defined, and so the sc_bus member of pci_softc is not present when compiling 
ofw_pci, resulting in the wrong softc size being passed to DEFINE_CLASS_1 and 
thus any attempts by pci(4) to access that member are out-of-bounds reads or 
writes.

This is pretty fragile; arguably pci_private.h should be including sys/rman.h, 
but this is the minimal needed change to fix the bug whilst maintaining the 
status quo.

Found by: CHERI
Reported by: andrew 


Diffstat
-rw-r--r-- sys/dev/ofw/ofw_pci.c 1
1 files changed, 1 insertions, 0 deletions

diff --git a/sys/dev/ofw/ofw_pci.c b/sys/dev/ofw/ofw_pci.c
index 7f7aad379ddc..4bd6ccd64420 100644
--- a/sys/dev/ofw/ofw_pci.c
+++ b/sys/dev/ofw/ofw_pci.c
@@ -33,6 +33,7 @@ __FBSDID("$FreeBSD$");
  #include 
  #include 
  #include 
+#include 
  
  #include 
  #include 




(Note: leading whitespace might not be preserved.)

===
Mark Millard
marklmi at yahoo.com

Re: ofw_pci: Fix incorrectly sized softc causing pci(4) out-of-bounds reads (Should it have been MFC'd?)

2022-12-26 Thread Mark Millard

On Dec 26, 2022, at 19:54, Mark Millard  wrote:

> Should the following have been MFC'd? (I ran into this while
> looking to see why I see a boot message oddity on 13.* that
> I do not see on main [so: 14]. There was a time when main
> also produced the odd messages. But I'm not  claiming that
> this is what makes the difference. The oddity was observed
> on aarch64 RPi4B's.)
> 

Never mind. I got myself confused over the history. 13.* does
not have the file at all.

> author Jessica Clarke 2022-01-15 19:03:53 +
> committer Jessica Clarke 2022-01-15 19:03:53 +
> commit 4e3a43905e3ff7b9fcf228022f05d636f79c4b42 (patch)
> tree b6be66e54604bb2c1fbdfde27bf8a6644e04fd05
> parent 3266a0c5d5abe8dd14de8478edec3e878e4a1c0b (diff)
> download src-4e3a43905e3ff7b9fcf228022f05d636f79c4b42.tar.gz
>  src-4e3a43905e3ff7b9fcf228022f05d636f79c4b42.zip
> 
> ofw_pci: Fix incorrectly sized softc causing pci(4) out-of-bounds reads
> 
> We do not include sys/rman.h and so machine/resource.h ends up not being 
> included by the time pci_private.h is included. This means PCI_RES_BUS is 
> never defined, and so the sc_bus member of pci_softc is not present when 
> compiling ofw_pci, resulting in the wrong softc size being passed to 
> DEFINE_CLASS_1 and thus any attempts by pci(4) to access that member are 
> out-of-bounds reads or writes.
> 
> This is pretty fragile; arguably pci_private.h should be including 
> sys/rman.h, but this is the minimal needed change to fix the bug whilst 
> maintaining the status quo.
> 
> Found by: CHERI
> Reported by: andrew 
> 
> 
> Diffstat
> -rw-r--r-- sys/dev/ofw/ofw_pci.c 1
> 1 files changed, 1 insertions, 0 deletions
> 
> diff --git a/sys/dev/ofw/ofw_pci.c b/sys/dev/ofw/ofw_pci.c
> index 7f7aad379ddc..4bd6ccd64420 100644
> --- a/sys/dev/ofw/ofw_pci.c
> +++ b/sys/dev/ofw/ofw_pci.c
> @@ -33,6 +33,7 @@ __FBSDID("$FreeBSD$");
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> 
> 
> 
> 
> (Note: leading whitespace might not be preserved.)




===
Mark Millard
marklmi at yahoo.com

stable/13 snapshot's /etc/rc.d/machine_id has use of main's startmsg from /etc/rc.subr so it reports 2 "eval: startmsg: not found"

2023-01-21 Thread Mark Millard

When I booted a new stable/13 snapshot install, the messaging
included:

. . .
Updating motd:.
eval: startmsg: not found
eval: startmsg: not found
Clearing /tmp (X related).
. . .

It looks like the "eval: startmsg: not found" lines
are from:

# grep -r "\" /etc/ 
/etc/rc.d/machine_id:   startmsg -n "Creating ${machine_id_file} "
/etc/rc.d/machine_id:   startmsg 'done.'

(No more matches found.)

The following was not found on stable/13:

/etc/rc.subr:# startmsg
/etc/rc.subr:startmsg()
/etc/rc.subr:   startmsg "Starting ${name}."


===
Mark Millard
marklmi at yahoo.com

FYI: upcoming 13.2-RELEASE vs. 8 GiByte RPi4B's based on U-Boot 2023.01 recently in use, given UEFI style booting

2023-02-18 Thread Mark Millard

This is an FYI about 8 GiByte RPI4B coverage by 13.2-RELEASE.
(The existing snapshots and such show the issue now --but I'm
noting the 13.2-RELEASE consequences for as things are.)

When sysutils/u-boot-rpi-arm64 and sysutils/u-boot-rpi4 recently
changed to be based on U-Boot 2023.01, the U-Boot produced no
longer boots 8 GiByte RPi4B's for FreeBSD: U-Boot increased the
number of U-Boot "lmb" regions it uses for RPi4B's for UEFI
booting --without adjusting the bound imposed on the number that
can be in use. The 8 GiByte RPI4B's end up over the limit during
part of the activity and U-Boot aborts the UEFI-boot attempt.

The U-Boot message about this is misleading. The middle line
of:

Found EFI removable media binary efi/boot/bootaa64.efi
** Reading file would overwrite reserved memory **
Failed to load 'efi/boot/bootaa64.efi'

is actually caused by the rejection of adding another lmb
range, having nothing to do with potentially overwriting
reserved memory. (The message is generated far from the
code that did the rejection and no rejection reason is
propogated.)

A FreeBSD bugzilla for this issue is:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269181

I'd not be surprised if U-Boot 2023.04 has things working
by default again. But, until such, either an older U-Boot,
such as 2022.10, or a patched 2023.01 U-Boot, would be
needed for 8 GiByte RPi4B's to end up being directly
bootable by 13.2-RELEASE as-built.

I'm not aware of any other type of FreeBSD aarch64 context
broken via the use of 2023.01 U-Boot.

===
Mark Millard
marklmi at yahoo.com

RE: 13.2 BETA2: how do debug META_MODE?

2023-02-20 Thread Mark Millard

Peter  wrote on
Date: Tue, 21 Feb 2023 03:45:12 UTC :

> on /some/ of my nodes, META_MODE seems not being honored anymore:
> I had to build them another time, and the lengthy lib/clang gets
> built all over again (tried two times).
> This is so since 13.2 (BETA2). It did work in 13.1 (RELENG), at least
> according to the timing from the logfiles. 
> 
> Now I'm trying to figure out the difference, because I have some
> nodes where it appears to more-or-less work (have seen buildworld
> take 5 minutes), and others where it doesn't (take an hour to build).
> The thing is scripted, so it is not so very likely an operator error
> (while not impossible either).
> 
> But it seems difficult to figure out details: "make -n" seems to not
> care about META_MODE, while META_MODE suppresses all useful output from
> make. And the docs say there are *.meta files (yes there are), but no
> info about how to verify their content, or how to get make tell what
> it is going to do and why (and the buildworld is not the most easy
> to understand target)...
> 
> So, some inspiration would be welcome...

On thing to check on is if filemon.ko is loaded and operational.
META_MODE greatly depends on it.


Another thing to know is that the following are very different
for what all is built  for the "(again #0)" line vs. the other
two "again" lines, using buildworld as an example context.
Imagine here the the first buildworld rebuilds llvm/clang
materials.

# cd /usr/src/
# env WITH_META_MODE=yes make buildworld
# env WITH_META_MODE=yes make installworld
# env WITH_META_MODE=yes make buildworld (again #0)
## no more rebuilds below?
# env WITH_META_MODE=yes make buildworld (again #1)
# env WITH_META_MODE=yes make buildworld (again #2)

Unfortunately, the some of the install activity registers
as activity that is to cause later rebuild activity:
updated file dates.

There are also issues of sort of a feedback loop: rm
ends up updated (deleted and replaced) by install but rm
was also listed as part of the sequence of replacing some
other files. Result? The rm removal/replacement ends up
meaning the files are to be regenerated, not just recopied.
There is a long list of such commands, not just rm.

"again #0" will rebuild llvm/clang. The other two "again"s
will not.

See:

https://lists.freebsd.org/pipermail/freebsd-current/2021-January/078488.html

and:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257616 .


===
Mark Millard
marklmi at yahoo.com

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-21 Thread Mark Millard

On Feb 21, 2023, at 04:55, Peter  wrote:

> On Mon, Feb 20, 2023 at 08:44:59PM -0800, Mark Millard wrote:
> ! Peter  wrote on
> ! Date: Tue, 21 Feb 2023 03:45:12 UTC :
> ! 
> ! > on /some/ of my nodes, META_MODE seems not being honored anymore:
> ! > I had to build them another time, and the lengthy lib/clang gets
> ! > built all over again (tried two times).
> ! > This is so since 13.2 (BETA2). It did work in 13.1 (RELENG), at least
> ! > according to the timing from the logfiles. 
> ! > 
> ! > Now I'm trying to figure out the difference, because I have some
> ! > nodes where it appears to more-or-less work (have seen buildworld
> ! > take 5 minutes), and others where it doesn't (take an hour to build).
> ! > The thing is scripted, so it is not so very likely an operator error
> ! > (while not impossible either).
> ! > 
> ! > But it seems difficult to figure out details: "make -n" seems to not
> ! > care about META_MODE, while META_MODE suppresses all useful output from
> ! > make. And the docs say there are *.meta files (yes there are), but no
> ! > info about how to verify their content, or how to get make tell what
> ! > it is going to do and why (and the buildworld is not the most easy
> ! > to understand target)...
> ! > 
> ! > So, some inspiration would be welcome...
> ! 
> ! On thing to check on is if filemon.ko is loaded and operational.
> ! META_MODE greatly depends on it.
> 
> That should be the case - 'kldstat' shows it (and I've seen warnings
> where it didn't).
> 
> ! Another thing to know is that the following are very different
> ! for what all is built  for the "(again #0)" line vs. the other
> ! two "again" lines, using buildworld as an example context.
> ! Imagine here the the first buildworld rebuilds llvm/clang
> ! materials.
> ! 
> ! # cd /usr/src/
> ! # env WITH_META_MODE=yes make buildworld
> ! # env WITH_META_MODE=yes make installworld
> ! # env WITH_META_MODE=yes make buildworld (again #0)
> ! ## no more rebuilds below?
> ! # env WITH_META_MODE=yes make buildworld (again #1)
> ! # env WITH_META_MODE=yes make buildworld (again #2)
> 
> But what is the difference between #0 and #1?

awk, cp, ln, rm, sed, and many more from
. . ./tmp/legacy/usr/sbin/have new dates
for rebuilds after installworld (that targets
the running system). Not true for #1 and #2.

The dates on these tools being more recent than
the files that they were involved in producing
leads to rebuilding those files. That in turn
leads to other files being rebuilt.

make with -dM reports the likes of:

   file '. . ./tmp/legacy/usr/sbin/awk' is newer than the target...

explicitly as it goes. As I remember tmp/legacy/usr/sbin/
was always part of the path for what I found.

One still has to trace back to were rebuild a rebuild
is not due to something rebuilt in earlier in the same
build. Noting that tmp/legacy/usr/sbin/awk is reported
as newer than its target, leaves the question of how
it ended up being newer: earlier in same build vs.
before build activity? It too must be traced back
to something based on just material from prior to
the build in question.

Note that the above make sequence was only intended
for showing the dependency, not as instructions for a
normal update sequence.

> . . .
> 
> ! See:
> ! 
> ! https://lists.freebsd.org/pipermail/freebsd-current/2021-January/078488.html

This (and later messages in the thread) are about the
"awk, cp, ln, rm, sed, and many more" that make with -dM
explicitly reports (likely from tmp/legacy/usr/sbin/ ).
If you trust the make date comparisons, it is the easiest
way to find out what has "is newer than the target" status
that leads to starting a rebuild sequence. (Other dependent
things then rebuild based on this rebuild. One still has
to trace back to where things start.)

I did not do the analysis of how (e.g.) tmp/legacy/usr/sbin/awk
ended up being newer than such a target and, so, causing a
rebuild of that target. I was going the direction: that
it is newer really is unlikely to justify the rebuild for
the target(s) in question. The other direction about how
it got to be newer is also relevant.

> ! 
> ! and:
> ! 
> ! https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257616 .
> 
> Thank You, that's exactly the inspiration I was looking for!
> Diving back in...

===
Mark Millard
marklmi at yahoo.com

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-21 Thread Mark Millard

On Feb 21, 2023, at 11:56, Mark Millard  wrote:

> On Feb 21, 2023, at 04:55, Peter  wrote:
> 
>> On Mon, Feb 20, 2023 at 08:44:59PM -0800, Mark Millard wrote:
>> ! Peter  wrote on
>> ! Date: Tue, 21 Feb 2023 03:45:12 UTC :
>> ! 
>> ! > on /some/ of my nodes, META_MODE seems not being honored anymore:
>> ! > I had to build them another time, and the lengthy lib/clang gets
>> ! > built all over again (tried two times).
>> ! > This is so since 13.2 (BETA2). It did work in 13.1 (RELENG), at least
>> ! > according to the timing from the logfiles. 
>> ! > 
>> ! > Now I'm trying to figure out the difference, because I have some
>> ! > nodes where it appears to more-or-less work (have seen buildworld
>> ! > take 5 minutes), and others where it doesn't (take an hour to build).
>> ! > The thing is scripted, so it is not so very likely an operator error
>> ! > (while not impossible either).
>> ! > 
>> ! > But it seems difficult to figure out details: "make -n" seems to not
>> ! > care about META_MODE, while META_MODE suppresses all useful output from
>> ! > make. And the docs say there are *.meta files (yes there are), but no
>> ! > info about how to verify their content, or how to get make tell what
>> ! > it is going to do and why (and the buildworld is not the most easy
>> ! > to understand target)...
>> ! > 
>> ! > So, some inspiration would be welcome...
>> ! 
>> ! On thing to check on is if filemon.ko is loaded and operational.
>> ! META_MODE greatly depends on it.
>> 
>> That should be the case - 'kldstat' shows it (and I've seen warnings
>> where it didn't).
>> 
>> ! Another thing to know is that the following are very different
>> ! for what all is built  for the "(again #0)" line vs. the other
>> ! two "again" lines, using buildworld as an example context.
>> ! Imagine here the the first buildworld rebuilds llvm/clang
>> ! materials.
>> ! 
>> ! # cd /usr/src/
>> ! # env WITH_META_MODE=yes make buildworld
>> ! # env WITH_META_MODE=yes make installworld
>> ! # env WITH_META_MODE=yes make buildworld (again #0)
>> ! ## no more rebuilds below?
>> ! # env WITH_META_MODE=yes make buildworld (again #1)
>> ! # env WITH_META_MODE=yes make buildworld (again #2)
>> 
>> But what is the difference between #0 and #1?
> 
> awk, cp, ln, rm, sed, and many more from
> . . ./tmp/legacy/usr/sbin/have new dates
> for rebuilds after installworld (that targets
> the running system). Not true for #1 and #2.
> 
> The dates on these tools being more recent than
> the files that they were involved in producing
> leads to rebuilding those files. That in turn
> leads to other files being rebuilt.
> 
> make with -dM reports the likes of:
> 
>   file '. . ./tmp/legacy/usr/sbin/awk' is newer than the target...
> 
> explicitly as it goes. As I remember tmp/legacy/usr/sbin/
> was always part of the path for what I found.
> 
> One still has to trace back to were rebuild a rebuild
> is not due to something rebuilt in earlier in the same
> build. Noting that tmp/legacy/usr/sbin/awk is reported
> as newer than its target, leaves the question of how
> it ended up being newer: earlier in same build vs.
> before build activity? It too must be traced back
> to something based on just material from prior to
> the build in question.
> 
> Note that the above make sequence was only intended
> for showing the dependency, not as instructions for a
> normal update sequence.
> 
>> . . .
>> 
>> ! See:
>> ! 
>> ! 
>> https://lists.freebsd.org/pipermail/freebsd-current/2021-January/078488.html
> 
> This (and later messages in the thread) are about the
> "awk, cp, ln, rm, sed, and many more" that make with -dM
> explicitly reports (likely from tmp/legacy/usr/sbin/ ).
> If you trust the make date comparisons, it is the easiest
> way to find out what has "is newer than the target" status
> that leads to starting a rebuild sequence. (Other dependent
> things then rebuild based on this rebuild. One still has
> to trace back to where things start.)
> 
> I did not do the analysis of how (e.g.) tmp/legacy/usr/sbin/awk
> ended up being newer than such a target and, so, causing a
> rebuild of that target. I was going the direction: that
> it is newer really is unlikely to justify the rebuild for
> the target(s) in question. The other direction about how
> it got to be newer is also relevant.
> 
>> ! 
>> ! and:
>> ! 
>> ! https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257616 .
>> 
>> Thank You, that's exactly the inspiration I was looking for!
>> Diving back in...
> 

I  had forgotten about Simon J. Gerraty's notes in his reply:

https://lists.freebsd.org/pipermail/freebsd-current/2021-January/078628.html

It is about telling META_MODE ignore things that would
otherwise cause rebuild activity. Had I remembered, I would
have also listed it explicitly, not just listing the start
of the thread.

===
Mark Millard
marklmi at yahoo.com

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-21 Thread Mark Millard

On Feb 21, 2023, at 11:56, Mark Millard  wrote:

> On Feb 21, 2023, at 04:55, Peter  wrote:
> 
>> On Mon, Feb 20, 2023 at 08:44:59PM -0800, Mark Millard wrote:
>> ! Peter  wrote on
>> ! Date: Tue, 21 Feb 2023 03:45:12 UTC :
>> ! 
>> ! > on /some/ of my nodes, META_MODE seems not being honored anymore:
>> ! > I had to build them another time, and the lengthy lib/clang gets
>> ! > built all over again (tried two times).
>> ! > This is so since 13.2 (BETA2). It did work in 13.1 (RELENG), at least
>> ! > according to the timing from the logfiles. 
>> ! > 
>> ! > Now I'm trying to figure out the difference, because I have some
>> ! > nodes where it appears to more-or-less work (have seen buildworld
>> ! > take 5 minutes), and others where it doesn't (take an hour to build).
>> ! > The thing is scripted, so it is not so very likely an operator error
>> ! > (while not impossible either).
>> ! > 
>> ! > But it seems difficult to figure out details: "make -n" seems to not
>> ! > care about META_MODE, while META_MODE suppresses all useful output from
>> ! > make. And the docs say there are *.meta files (yes there are), but no
>> ! > info about how to verify their content, or how to get make tell what
>> ! > it is going to do and why (and the buildworld is not the most easy
>> ! > to understand target)...
>> ! > 
>> ! > So, some inspiration would be welcome...
>> ! 
>> ! On thing to check on is if filemon.ko is loaded and operational.
>> ! META_MODE greatly depends on it.
>> 
>> That should be the case - 'kldstat' shows it (and I've seen warnings
>> where it didn't).
>> 
>> ! Another thing to know is that the following are very different
>> ! for what all is built  for the "(again #0)" line vs. the other
>> ! two "again" lines, using buildworld as an example context.
>> ! Imagine here the the first buildworld rebuilds llvm/clang
>> ! materials.
>> ! 
>> ! # cd /usr/src/
>> ! # env WITH_META_MODE=yes make buildworld
>> ! # env WITH_META_MODE=yes make installworld
>> ! # env WITH_META_MODE=yes make buildworld (again #0)
>> ! ## no more rebuilds below?
>> ! # env WITH_META_MODE=yes make buildworld (again #1)
>> ! # env WITH_META_MODE=yes make buildworld (again #2)
>> 
>> But what is the difference between #0 and #1?
> 
> awk, cp, ln, rm, sed, and many more from
> . . ./tmp/legacy/usr/sbin/have new dates
> for rebuilds after installworld (that targets
> the running system). Not true for #1 and #2.
> 
> The dates on these tools being more recent than
> the files that they were involved in producing
> leads to rebuilding those files. That in turn
> leads to other files being rebuilt.
> 
> make with -dM reports the likes of:
> 
>   file '. . ./tmp/legacy/usr/sbin/awk' is newer than the target...
> 
> explicitly as it goes. As I remember tmp/legacy/usr/sbin/
> was always part of the path for what I found.
> 
> One still has to trace back to were rebuild a rebuild
> is not due to something rebuilt in earlier in the same
> build. Noting that tmp/legacy/usr/sbin/awk is reported
> as newer than its target, leaves the question of how
> it ended up being newer: earlier in same build vs.
> before build activity? It too must be traced back
> to something based on just material from prior to
> the build in question.
> 
> Note that the above make sequence was only intended
> for showing the dependency, not as instructions for a
> normal update sequence.
> 
>> . . .
>> 
>> ! See:
>> ! 
>> ! 
>> https://lists.freebsd.org/pipermail/freebsd-current/2021-January/078488.html
> 
> This (and later messages in the thread) are about the
> "awk, cp, ln, rm, sed, and many more" that make with -dM
> explicitly reports (likely from tmp/legacy/usr/sbin/ ).
> If you trust the make date comparisons, it is the easiest
> way to find out what has "is newer than the target" status
> that leads to starting a rebuild sequence. (Other dependent
> things then rebuild based on this rebuild. One still has
> to trace back to where things start.)
> 
> I did not do the analysis of how (e.g.) tmp/legacy/usr/sbin/awk
> ended up being newer than such a target and, so, causing a
> rebuild of that target. I was going the direction: that
> it is newer really is unlikely to justify the rebuild for
> the target(s) in question. The other direction about how
> it got to be newer is also relevant.

Using awk as an example, for the (re)build of awk in:

/usr/obj/BUILDs/main-amd

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-21 Thread Mark Millard

On Feb 21, 2023, at 18:10, Peter  wrote:

> On Tue, Feb 21, 2023 at 11:56:13AM -0800, Mark Millard wrote:
> ! On Feb 21, 2023, at 04:55, Peter  wrote:
> ! 
> ! > ! # cd /usr/src/
> ! > ! # env WITH_META_MODE=yes make buildworld
> ! > ! # env WITH_META_MODE=yes make installworld
> ! > ! # env WITH_META_MODE=yes make buildworld (again #0)
> ! > ! ## no more rebuilds below?
> ! > ! # env WITH_META_MODE=yes make buildworld (again #1)
> ! > ! # env WITH_META_MODE=yes make buildworld (again #2)
> ! > 
> ! > But what is the difference between #0 and #1?
> ! 
> ! awk, cp, ln, rm, sed, and many more from
> ! . . ./tmp/legacy/usr/sbin/have new dates
> ! for rebuilds after installworld (that targets
> ! the running system). Not true for #1 and #2.
> ! 
> ! The dates on these tools being more recent than
> ! the files that they were involved in producing
> ! leads to rebuilding those files. That in turn
> ! leads to other files being rebuilt.
> ! 
> ! make with -dM reports the likes of:
> ! 
> !file '. . ./tmp/legacy/usr/sbin/awk' is newer than the target...
> ! 
> ! explicitly as it goes. As I remember tmp/legacy/usr/sbin/
> ! was always part of the path for what I found.
> 
> Mark, thanks a lot for the proper input at the right time!
> 
> This put me on the right track and I mananged to analyze and
> understand what is actually happening.
> 
> It looks like my issue does resolve itself somehow, and things
> start  to behave as expected again after four builds.

Intersting.

> ! I did not do the analysis of how (e.g.) tmp/legacy/usr/sbin/awk
> ! ended up being newer than such a target and, so, causing a
> ! rebuild of that target. I was going the direction: that
> ! it is newer really is unlikely to justify the rebuild for
> ! the target(s) in question. The other direction about how
> ! it got to be newer is also relevant.
> 
> I have now analyzed some parts of it. META_MODE typically finds some
> build-tools to rebuild, but then if the result is not different
> from what was there before, then "install" will not copy it to the
> bin-dir, and so the avalanche gets usually avoided.
> 

The implication is that "install -C" is in use, quoting the
man page:

 -C  Copy the file.  If the target file already exists and the files
 are the same, then do not change the modification time of the
 target.  If the target's file flags and mode need not to be
 changed, the target's inode change time is also unchanged.

 -c  Copy the file.  This is actually the default.  The -c option is
 only included for backwards compatibility.

-C might have more of an effect in a reproducible-build
style build process than on a non-reproducible-build
style one.


===
Mark Millard
marklmi at yahoo.com

Re: [analysis] 13.2 BETA2: how do debug META_MODE?

2023-02-21 Thread Mark Millard

t; 230217231308.admn.pass1.jail.sst:real 9346.90 jail w/ compiler
> 230217231308.admn.pass2.jail.sst:real 5460.16
> 230217231308.data.pass1.jail.sst:real 4094.04 jail w/o compiler
> 230217231308.data.pass2.jail.sst:real 143.39
> 230217231308.iamk.pass1.jail.sst:real 8050.27 jail w/ compiler
> 230217231308.iamk.pass2.jail.sst:real 5226.32
> 230217231308.oper.pass1.jail.sst:real 2910.28 jail w/o compiler
> 230217231308.oper.pass2.jail.sst:real 92.05
> 230217231308.rail.pass1.jail.sst:real 3236.29 jail w/o compiler
> 230217231308.rail.pass2.jail.sst:real 99.49
> 230217231308.tele.pass1.jail.sst:real 3170.34 jail w/o compiler
> 230217231308.tele.pass2.jail.sst:real 180.65
> 
> pass3
> (10 vcore)
> 230222000242.base.std.sst: real 1162.80 base w/ kernels
> 230222000242.admn.std.jail.sst:real 1759.15 jail w/ compiler
> 230222000242.data.std.jail.sst:real 155.54 jail w/o compiler
> 230222000242.iamk.std.jail.sst:real 1715.07 jail w/ compiler
> 230222000242.oper.std.jail.sst:real 149.51 jail w/o compiler
> 230222000242.rail.std.jail.sst:real 151.73 jail w/o compiler
> 230222000242.tele.std.jail.sst:real 150.52 jail w/o compiler
> 
> pass4
> (10 vcore)
> 230222021535.edge.std.sst:     real 1018.79 base w/ kernels
> 230222021535.admn.std.jail.sst:real 101.61 jail w/ compiler
> 230222021535.data.std.jail.sst:real 67.47 jail w/o compiler
> 230222021535.iamk.std.jail.sst:real 100.91 jail w/ compiler
> 230222021535.oper.std.jail.sst:real 66.52 jail w/o compiler
> 230222021535.rail.std.jail.sst:real 68.00 jail w/o compiler
> 230222021535.tele.std.jail.sst:real 66.54 jail w/o compiler

===
Mark Millard
marklmi at yahoo.com

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-21 Thread Mark Millard

On Feb 21, 2023, at 19:11, Peter  wrote:

> On Tue, Feb 21, 2023 at 06:44:09PM -0800, Mark Millard wrote:
> ! On Feb 21, 2023, at 18:10, Peter  wrote:
> ! 
> ! > On Tue, Feb 21, 2023 at 11:56:13AM -0800, Mark Millard wrote:
> ! > ! On Feb 21, 2023, at 04:55, Peter  wrote:
> ! > ! 
> ! > ! > ! # cd /usr/src/
> ! > ! > ! # env WITH_META_MODE=yes make buildworld
> ! > ! > ! # env WITH_META_MODE=yes make installworld
> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #0)
> ! > ! > ! ## no more rebuilds below?
> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #1)
> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #2)
> ! > ! > 
> ! > ! > But what is the difference between #0 and #1?
> ! > ! 
> ! > ! awk, cp, ln, rm, sed, and many more from
> ! > ! . . ./tmp/legacy/usr/sbin/have new dates
> ! > ! for rebuilds after installworld (that targets
> ! > ! the running system). Not true for #1 and #2.
> ! > ! 
> ! > ! The dates on these tools being more recent than
> ! > ! the files that they were involved in producing
> ! > ! leads to rebuilding those files. That in turn
> ! > ! leads to other files being rebuilt.
> ! > ! 
> ! > ! make with -dM reports the likes of:
> ! > ! 
> ! > !file '. . ./tmp/legacy/usr/sbin/awk' is newer than the target...
> ! > ! 
> ! > ! explicitly as it goes. As I remember tmp/legacy/usr/sbin/
> ! > ! was always part of the path for what I found.
> ! > 
> ! > Mark, thanks a lot for the proper input at the right time!
> ! > 
> ! > This put me on the right track and I mananged to analyze and
> ! > understand what is actually happening.
> ! > 
> ! > It looks like my issue does resolve itself somehow, and things
> ! > start  to behave as expected again after four builds.
> ! 
> ! Intersting.
> ! 
> ! > ! I did not do the analysis of how (e.g.) tmp/legacy/usr/sbin/awk
> ! > ! ended up being newer than such a target and, so, causing a
> ! > ! rebuild of that target. I was going the direction: that
> ! > ! it is newer really is unlikely to justify the rebuild for
> ! > ! the target(s) in question. The other direction about how
> ! > ! it got to be newer is also relevant.
> ! > 
> ! > I have now analyzed some parts of it. META_MODE typically finds some
> ! > build-tools to rebuild, but then if the result is not different
> ! > from what was there before, then "install" will not copy it to the
> ! > bin-dir, and so the avalanche gets usually avoided.
> ! > 
> ! 
> ! The implication is that "install -C" is in use, quoting the
> ! man page:
> ! 
> !  -C  Copy the file.  If the target file already exists and the files
> !  are the same, then do not change the modification time of the
> !  target.  If the target's file flags and mode need not to be
> !  changed, the target's inode change time is also unchanged.
> ! 
> !  -c  Copy the file.  This is actually the default.  The -c option is
> !  only included for backwards compatibility.
> ! 
> ! -C might have more of an effect in a reproducible-build
> ! style build process than on a non-reproducible-build
> ! style one.
> 
> Yepp. "install -p" is used, see /usr/src/tools/install.sh
> 

The code for the _bootstap_tools_links uses "cp -pf",
not install, to establish part of . . ./tmp/legacy/bin/ .
(Note: . . ./tmp/legacy/sbin -> ../bin so is a via a
symbolic link.) Before the "cp -pf" there is a "rm -f"
deleting the target file before the copy: the prior
file in . . ./tmp/legacy/bin/ is never directly
preserved. (The new copy might still be identical
to the old one: the source path one might happen to
be identical as well.)

# Link the tools that we need for building but don't need to bootstrap because
# the host version is known to be compatible into ${WORLDTMP}/legacy
# We do this before building any of the bootstrap tools in case they depend on
# the presence of any of the links (e.g. as m4/lex/awk)
${_bt}-links: .PHONY

.for _tool in ${_bootstrap_tools_links}
${_bt}-link-${_tool}: .PHONY
@rm -f "${WORLDTMP}/legacy/bin/${_tool}"; \
source_path=`which ${_tool}`; \
if [ ! -e "$${source_path}" ] ; then \
echo "Cannot find host tool '${_tool}'"; false; \
fi; \
cp -pf "$${source_path}" "${WORLDTMP}/legacy/bin/${_tool}"
${_bt}-links: ${_bt}-link-${_tool}
.endfor

Note: This is for the !defined(BOOTSTRAP_ALL_TOOLS) case.
Note: the code uses the abbreviation: _bt=_bootst

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-21 Thread Mark Millard

On Feb 21, 2023, at 20:51, Mark Millard  wrote:

> On Feb 21, 2023, at 19:11, Peter  wrote:
> 
>> On Tue, Feb 21, 2023 at 06:44:09PM -0800, Mark Millard wrote:
>> ! On Feb 21, 2023, at 18:10, Peter  wrote:
>> ! 
>> ! > On Tue, Feb 21, 2023 at 11:56:13AM -0800, Mark Millard wrote:
>> ! > ! On Feb 21, 2023, at 04:55, Peter  wrote:
>> ! > ! 
>> ! > ! > ! # cd /usr/src/
>> ! > ! > ! # env WITH_META_MODE=yes make buildworld
>> ! > ! > ! # env WITH_META_MODE=yes make installworld
>> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #0)
>> ! > ! > ! ## no more rebuilds below?
>> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #1)
>> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #2)
>> ! > ! > 
>> ! > ! > But what is the difference between #0 and #1?
>> ! > ! 
>> ! > ! awk, cp, ln, rm, sed, and many more from
>> ! > ! . . ./tmp/legacy/usr/sbin/have new dates
>> ! > ! for rebuilds after installworld (that targets
>> ! > ! the running system). Not true for #1 and #2.
>> ! > ! 
>> ! > ! The dates on these tools being more recent than
>> ! > ! the files that they were involved in producing
>> ! > ! leads to rebuilding those files. That in turn
>> ! > ! leads to other files being rebuilt.
>> ! > ! 
>> ! > ! make with -dM reports the likes of:
>> ! > ! 
>> ! > !file '. . ./tmp/legacy/usr/sbin/awk' is newer than the target...
>> ! > ! 
>> ! > ! explicitly as it goes. As I remember tmp/legacy/usr/sbin/
>> ! > ! was always part of the path for what I found.
>> ! > 
>> ! > Mark, thanks a lot for the proper input at the right time!
>> ! > 
>> ! > This put me on the right track and I mananged to analyze and
>> ! > understand what is actually happening.
>> ! > 
>> ! > It looks like my issue does resolve itself somehow, and things
>> ! > start  to behave as expected again after four builds.
>> ! 
>> ! Intersting.
>> ! 
>> ! > ! I did not do the analysis of how (e.g.) tmp/legacy/usr/sbin/awk
>> ! > ! ended up being newer than such a target and, so, causing a
>> ! > ! rebuild of that target. I was going the direction: that
>> ! > ! it is newer really is unlikely to justify the rebuild for
>> ! > ! the target(s) in question. The other direction about how
>> ! > ! it got to be newer is also relevant.
>> ! > 
>> ! > I have now analyzed some parts of it. META_MODE typically finds some
>> ! > build-tools to rebuild, but then if the result is not different
>> ! > from what was there before, then "install" will not copy it to the
>> ! > bin-dir, and so the avalanche gets usually avoided.
>> ! > 
>> ! 
>> ! The implication is that "install -C" is in use, quoting the
>> ! man page:
>> ! 
>> !  -C  Copy the file.  If the target file already exists and the 
>> files
>> !  are the same, then do not change the modification time of the
>> !  target.  If the target's file flags and mode need not to be
>> !  changed, the target's inode change time is also unchanged.
>> ! 
>> !  -c  Copy the file.  This is actually the default.  The -c option 
>> is
>> !  only included for backwards compatibility.
>> ! 
>> ! -C might have more of an effect in a reproducible-build
>> ! style build process than on a non-reproducible-build
>> ! style one.
>> 
>> Yepp. "install -p" is used, see /usr/src/tools/install.sh

That may be incorrect about what is happening for
_bootstap_tools_links and other things. Why do I
say that? Several points . . .

I do not see "tools" in any PATH= so far, making implicit
use unlikely.

/usr/main-src/share/mk/sys.mk:INSTALL   ?=  ${INSTALL_CMD:Uinstall}
/usr/main-src/share/mk/src.tools.mk:INSTALL_CMD?=   install

vs.

/usr/main-src/Makefile: INSTALL="sh ${.CURDIR}/tools/install.sh"
/usr/main-src/Makefile.inc1:BMAKEENV=   INSTALL="sh 
${.CURDIR}/tools/install.sh" \
/usr/main-src/Makefile.inc1:KTMAKEENV=  INSTALL="sh 
${.CURDIR}/tools/install.sh" \

Also:

# kernel-tools stage
KTMAKEENV=  INSTALL="sh ${.CURDIR}/tools/install.sh" \

vs.

# world stage
WMAKEENV=   ${CROSSENV} \
INSTALL="${INSTALL_CMD} -U" \
and:
.if defined(DB_FROM_SRC) || defined(NO_ROOT)
IMAKE_INSTALL=  INSTALL="${INSTALL_CMD} ${INSTALLFLAGS}"

So: explicitly varying styles for various

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-21 Thread Mark Millard

On Feb 21, 2023, at 21:53, Mark Millard  wrote:

> On Feb 21, 2023, at 20:51, Mark Millard  wrote:
> 
>> On Feb 21, 2023, at 19:11, Peter  wrote:
>> 
>>> On Tue, Feb 21, 2023 at 06:44:09PM -0800, Mark Millard wrote:
>>> ! On Feb 21, 2023, at 18:10, Peter  wrote:
>>> ! 
>>> ! > On Tue, Feb 21, 2023 at 11:56:13AM -0800, Mark Millard wrote:
>>> ! > ! On Feb 21, 2023, at 04:55, Peter  wrote:
>>> ! > ! 
>>> ! > ! > ! # cd /usr/src/
>>> ! > ! > ! # env WITH_META_MODE=yes make buildworld
>>> ! > ! > ! # env WITH_META_MODE=yes make installworld
>>> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #0)
>>> ! > ! > ! ## no more rebuilds below?
>>> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #1)
>>> ! > ! > ! # env WITH_META_MODE=yes make buildworld (again #2)
>>> ! > ! > 
>>> ! > ! > But what is the difference between #0 and #1?
>>> ! > ! 
>>> ! > ! awk, cp, ln, rm, sed, and many more from
>>> ! > ! . . ./tmp/legacy/usr/sbin/have new dates
>>> ! > ! for rebuilds after installworld (that targets
>>> ! > ! the running system). Not true for #1 and #2.
>>> ! > ! 
>>> ! > ! The dates on these tools being more recent than
>>> ! > ! the files that they were involved in producing
>>> ! > ! leads to rebuilding those files. That in turn
>>> ! > ! leads to other files being rebuilt.
>>> ! > ! 
>>> ! > ! make with -dM reports the likes of:
>>> ! > ! 
>>> ! > !file '. . ./tmp/legacy/usr/sbin/awk' is newer than the target...
>>> ! > ! 
>>> ! > ! explicitly as it goes. As I remember tmp/legacy/usr/sbin/
>>> ! > ! was always part of the path for what I found.
>>> ! > 
>>> ! > Mark, thanks a lot for the proper input at the right time!
>>> ! > 
>>> ! > This put me on the right track and I mananged to analyze and
>>> ! > understand what is actually happening.
>>> ! > 
>>> ! > It looks like my issue does resolve itself somehow, and things
>>> ! > start  to behave as expected again after four builds.
>>> ! 
>>> ! Intersting.
>>> ! 
>>> ! > ! I did not do the analysis of how (e.g.) tmp/legacy/usr/sbin/awk
>>> ! > ! ended up being newer than such a target and, so, causing a
>>> ! > ! rebuild of that target. I was going the direction: that
>>> ! > ! it is newer really is unlikely to justify the rebuild for
>>> ! > ! the target(s) in question. The other direction about how
>>> ! > ! it got to be newer is also relevant.
>>> ! > 
>>> ! > I have now analyzed some parts of it. META_MODE typically finds some
>>> ! > build-tools to rebuild, but then if the result is not different
>>> ! > from what was there before, then "install" will not copy it to the
>>> ! > bin-dir, and so the avalanche gets usually avoided.
>>> ! > 
>>> ! 
>>> ! The implication is that "install -C" is in use, quoting the
>>> ! man page:
>>> ! 
>>> !  -C  Copy the file.  If the target file already exists and the 
>>> files
>>> !  are the same, then do not change the modification time of the
>>> !  target.  If the target's file flags and mode need not to be
>>> !  changed, the target's inode change time is also unchanged.
>>> ! 
>>> !  -c  Copy the file.  This is actually the default.  The -c option 
>>> is
>>> !  only included for backwards compatibility.
>>> ! 
>>> ! -C might have more of an effect in a reproducible-build
>>> ! style build process than on a non-reproducible-build
>>> ! style one.
>>> 
>>> Yepp. "install -p" is used, see /usr/src/tools/install.sh
> 
> That may be incorrect about what is happening for
> _bootstap_tools_links and other things. Why do I
> say that? Several points . . .

I missed looking for a obvious type of evidence:

-rw-r--r--  1 root  wheel  2355 Apr 28 15:20:53 2021 
/usr/main-src/tools/install.sh

The script is not executable, which explains the
use of an explicit sh in: INSTALL="sh ${.CURDIR}/tools/install.sh"

This tends to nail down that the likes of:

install   -o root -g wheel -m 555 . . .

in the output is not an example of using the script.

> I do not see "tools" in any PATH= so far, making implicit
> use unlike

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-22 Thread Mark Millard

d64.amd64/tmp/usr/bin/nm'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/touch'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/jot'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/egrep'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/crunchgen'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/cap_mkdb'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/basename'
 is newer than the target...



As for lines unique to the 2nd run (diff output line
starts with "+"):

# diff -u 
/usr/obj/BUILDs/main-amd64-nodbg-clang/sys-typescripts/typescript-make-amd64-nodbg-clang-amd64-host-2023-02-22:12:*
 | grep "^[+].*is newer than the target" | sed -e "s@^.*: file '@file '@" | 
sort | uniq -c | sort -rn | more
2155 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/realpath'
 is newer than the target...
1466 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/Scrt1.o'
 is newer than the target...
 878 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/ln'
 is newer than the target...
 444 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/crti.o'
 is newer than the target...
 235 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/crti.o'
 is newer than the target...
  67 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/crt1.o'
 is newer than the target...
  41 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/libgcc_s.so'
 is newer than the target...
  40 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libgcc_s.so'
 is newer than the target...
   3 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/sh'
 is newer than the target...
   2 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/lib/ncurses/tinfo/./ncurses_dll.h'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/libcxxrt.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/libcrypto.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/libc.so.7'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/Scrt1.o'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libssl.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libcxxrt.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libctf.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libcrypto.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/lib/libc.so.7'
 is newer than the target...

(The list is already short, so no subset listed.)

===
Mark Millard
marklmi at yahoo.com

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-22 Thread Mark Millard

On Feb 22, 2023, at 14:07, Mark Millard  wrote:

> This is just an FYI about an experiment.

The experiment had a flaw: I did not do the builds as -j1
(or no -j at all) but the parallel activities changes
the sequencing of the lines from run to run.

Thus the "diff -u" part finds more differences than I was
intending.

I may try to make a better experiment at some point.

> After having done installworld installkernel for other reasons,
> I did two -dM buildworld buildkernel sequences in a row (no
> source changes, no cleaning activity), producing script files
> logging the output. The below provides some comparison/contrast
> between the two log files.
> 
> Below I report first on the frequencies of the file paths
> reported in the "is newer than the target" lines that were
> unique to the first run (diff output line starts with "-").
> 
> I only show a prefix the full list:
> 
> # diff -u 
> /usr/obj/BUILDs/main-amd64-nodbg-clang/sys-typescripts/typescript-make-amd64-nodbg-clang-amd64-host-2023-02-22:12:*
>  | grep "^-.*is newer than the target" | sed -e "s@^.*: file '@file '@" | 
> sort | uniq -c | sort -rn | more
> 4432 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/gzip'
>  is newer than the target...
> 2692 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/awk'
>  is newer than the target...
> 2155 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/realpath'
>  is newer than the target...
> 1395 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/Scrt1.o'
>  is newer than the target...
> 1381 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/secure/lib/libcrypto/openssl/opensslconf.h'
>  is newer than the target...
> 1318 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/obj-lib32/secure/lib/libcrypto/openssl/opensslconf.h'
>  is newer than the target...
> 1000 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/cat'
>  is newer than the target...
> 962 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/rm'
>  is newer than the target...
> 928 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/sh'
>  is newer than the target...
> 878 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/ln'
>  is newer than the target...
> 624 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/lib/clang/libllvm/llvm/IR/Attributes.inc'
>  is newer than the target...
> 553 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/sed'
>  is newer than the target...
> 437 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/mv'
>  is newer than the target...
> 417 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/mkcsmapper'
>  is newer than the target...
> 398 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/lib/clang/libclang/clang/Basic/DiagnosticCommonKinds.inc'
>  is newer than the target...
> 351 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/grep'
>  is newer than the target...
> 281 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/lib/clang/libllvm/llvm/IR/IntrinsicEnums.inc'
>  is newer than the target...
> 177 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/crti.o'
>  is newer than the target...
> 161 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/lib/clang/libclang/clang/AST/DeclNodes.inc'
>  is newer than the target...
> 115 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/llvm-tblgen'
>  is newer than the target...
>  98 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/crunchide'
>  is newer than the target...
>  86 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/lib/clang/libclang/clang/StaticAnalyzer/Checkers/Checkers.inc'
>  is newer than the target...
>  75 file 
> '/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/legacy/usr/sbin/uudecode'
>  is newer than the target...
>

Re: 13.2 BETA2: how do debug META_MODE?

2023-02-24 Thread Mark Millard

r/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/libcrypto.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/libc.so.7'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib32/Scrt1.o'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libssl.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libcxxrt.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libctf.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libcrypto.so'
 is newer than the target...
   1 file 
'/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/lib/libc.so.7'
 is newer than the target...

So the two end up very similar for what activity happens.

I've only tried this on the amd64 context that I have access
to. I'll set up the aarch64 context as well and see how it
goes over time. (That context builds for aarch64 and for
armv7 .) Similarly, I've only tried main but will be adding
the changes to my releng/13.0 , releng/13.1 , and stable/13
contexts and seeing how it goes. (Not that I'm likely to
rebuild releng/13.0 at this point.)

For now, I've no plans for investigations related to any of
the *.o , *.h , *.so* "is newer than" activity listed above.

For reference:

# git -C /usr/main-src/ diff share/mk/src.sys.obj.mk
diff --git a/share/mk/src.sys.obj.mk b/share/mk/src.sys.obj.mk
index 3b48fc3c5514..3c7e570dbdbd 100644
--- a/share/mk/src.sys.obj.mk
+++ b/share/mk/src.sys.obj.mk
@@ -67,6 +67,9 @@ SB_OBJROOT?=  ${SB}/obj/
 OBJROOT?=  ${SB_OBJROOT}
 .endif
 OBJROOT?=  ${_default_makeobjdirprefix}${SRCTOP}/
+# save the value before we mess with it
+_OBJROOT:= ${OBJROOT:tA}
+.export _OBJROOT
 .if ${OBJROOT:M*/} != ""
 OBJROOT:=  ${OBJROOT:H:tA}/
 .else

(The change is not specific to main .)

The content for the special make.conf has the
following block of lines for having META MODE
avoid specific . . ./tmp/legacy/usr/sbin/*
programs (and 3 tmp/usr/bin/* ones) from
causing rebuild activity based on the dates
on the programs:

# _OBJROOT is an addition to share/mk/src.sys.obj.mk
# provided by Simon J. Gerraty for my experimentation
# with this avoidance of some unnecessary build
# activity in META MODE:
#
#  OBJROOT?=  ${_default_makeobjdirprefix}${SRCTOP}/
# +# save the value before we mess with it
# +_OBJROOT:= ${OBJROOT:tA}
# +.export _OBJROOT
#
# TARGET.TARGET_ARCH   for amd64 stays as  amd64.amd64 for obj-lib32 (correct 
for the purpose)
# MACHINE.MACHINE_ARCH for amd64 turns into i386.i386  for obj-lib32 (wrong   
for the purpose)
#
IGNORELEGACY_NOSYMLINKPREFIX= 
${_OBJROOT}/${TARGET}.${TARGET_ARCH}/tmp/legacy/usr
IGNOREOTHER_NOSYMLINKPREFIX=  ${_OBJROOT}/${TARGET}.${TARGET_ARCH}/tmp/usr/bin
#
.for ignore_legacy_tool in awk basename cap_mkdb cat chmod cmp cp crunchgen 
crunchide cut date dd dirname echo egrep env expr fgrep file2c find gencat grep 
gzip head hostname jot lex lb ln ls m4 make mkcsmapper mkdir mktemp mtree mv 
nawk patch realpath rm sed sh sort touch tr truncate uudecode uuencode wc xargs
.MAKE.META.IGNORE_PATHS+= 
${IGNORELEGACY_NOSYMLINKPREFIX}/sbin/${ignore_legacy_tool}
.endfor
#
.for ignore_other_tool in ctfconvert objcopy nm
.MAKE.META.IGNORE_PATHS+= ${IGNOREOTHER_NOSYMLINKPREFIX}/${ignore_other_tool}
.endfor
#
.MAKE.META.IGNORE_PATHS:= ${.MAKE.META.IGNORE_PATHS}


The . . ./tmp/usr/bin/* ones ( ctfconvert objcopy nm )
may be more questionable than the
. . ./tmp/legacy/usr/sbin/* ones.

This likely will not prevent the likes of a system with
clang14 -> system with clang15 transition having clang15
rebuild itself once the clang15 system is running and
another buildworld is started.

===
Mark Millard
marklmi at yahoo.com

Re: git: a28ccb32bf56 - main - machine-id: generate a compact version of the uuid

2023-03-03 Thread Mark Millard

Mike Karels  wrote on
Date: Fri, 03 Mar 2023 16:12:50 UTC :

> On 3 Mar 2023, at 9:40, Tĳl Coosemans wrote:
> 
> > On Wed, 1 Mar 2023 18:18:33 GMT Baptiste Daroussin  wrote:
> >> The branch main has been updated by bapt:
> >>
> >> URL: 
> >> https://cgit.FreeBSD.org/src/commit/?id=a28ccb32bf5678fc401f1602865ee9b37ca4c990
> >>
> >> commit a28ccb32bf5678fc401f1602865ee9b37ca4c990
> >> Author: Baptiste Daroussin 
> >> AuthorDate: 2023-02-28 10:31:06 +
> >> Commit: Baptiste Daroussin 
> >> CommitDate: 2023-03-01 18:16:25 +
> >>
> >> machine-id: generate a compact version of the uuid
> >>
> >> dbus and other actually expect an uuid without hyphens
> >>
> >> Reported by: tijl
> >> MFC After: 3 days
> >> ---
> >> libexec/rc/rc.d/machine_id | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/libexec/rc/rc.d/machine_id b/libexec/rc/rc.d/machine_id
> >> index 7cfd7b2d92f8..8bf3e41d0603 100644
> >> --- a/libexec/rc/rc.d/machine_id
> >> +++ b/libexec/rc/rc.d/machine_id
> >> @@ -23,7 +23,7 @@ machine_id_start()
> >> if [ ! -f ${machine_id_file} ] ; then
> >> startmsg -n "Creating ${machine_id_file} "
> >> t=$(mktemp -t machine-id)
> >> - /bin/uuidgen -r -o $t
> >> + /bin/uuidgen -r -c -o $t
> >> install -C -o root -g wheel -m ${machine_id_perms} "$t" 
> >> "${machine_id_file}"
> >> rm -f "$t"
> >> startmsg 'done.'
> >
> > I really think this file should be defined to contain the same UUID as
> > /etc/hostid such that there's one and only one UUID per machine. Having
> > two different IDs needlessly complicates things if they end up in logs
> > etc.
> >
> > It also looks like on Linux virtual machines this file contains the
> > SMBIOS UUID just like our /etc/hostid. If /etc/machine-id is supposed
> > to be a portable way to obtain that UUID it should be the same as
> > /etc/hostid.
> 
> I agree. I had the same reaction when the machine-id was added, but
> thought the requirements were different (in particular, the UUID version).
> If at all possible, the two should be the same except for hyphens.
> 
> > Please have another look at https://reviews.freebsd.org/D38811. This
> > file is supposed to remain constant across updates. If we get this
> > wrong in 13.2, applications may have to deal with the complications for
> > a very long time.
> 
> This should be resolved for 13.2 if at all possible.

What are the properties for the content of /etc/hostid
in FreeBSD? Where are they documented?

/etc/machine-id has strong property guarnatee
requirements in linux and dbus (which linux indicates
it has adopted requirements from):

https://man7.org/linux/man-pages/man5/machine-id.5.html

reports:

QUOTE
The machine ID does not change based on local or network
configuration or when hardware is replaced. Due to this and its
greater length, it is a more useful replacement for the
gethostid(3) call that POSIX specifies.

This machine ID adheres to the same format and logic as the D-Bus
machine ID.
END QUOTE

https://dbus.freedesktop.org/doc/dbus-uuidgen.1.html reports:
( used via dbus-uuidgen --ensure=/etc/machine-id as one way
to get a linux-comaptibile /etc/machine-id for at least
some types of contexts )

QUOTE
The important properties of the machine UUID are that 1) it remains
unchanged until the next reboot and 2) it is different for any two
running instances of the OS kernel. That is, if two processes see
the same UUID, they should also see the same shared memory, UNIX
domain sockets, local X displays, localhost.localdomain resolution,
process IDs, and so forth
END QUOTE

Does /etc/hostid generated the normal way in FreeBSD have such
properties? (How do I look that up?)

Returning to:

https://man7.org/linux/man-pages/man5/machine-id.5.html

QUOTE
This ID uniquely identifies the host. It should be considered
"confidential", and must not be exposed in untrusted
environments, in particular on the network. If a stable unique
identifier that is tied to the machine is needed for some
application, the machine ID or any part of it must not be used
directly. Instead the machine ID should be hashed with a
cryptographic, keyed hash function, using a fixed,
application-specific key. That way the ID will be properly
unique, and derived in a constant way from the machine ID but
there will be no way to retrieve the original machine ID from the
application-specific one.
END QUOTE

Is that at least recommended for handling FreeBSD's /etc/hostid
content?

Is FreeBSD going to document /etc/machine-id content properties
in a similar manor?

If FreeBSD ends up with a /etc/machine-id that does not have
the properties and recommended principles of use, it would
appear that the /etc/machine-id path would be highly misleading
and, so, inappropriate.

===
Mark Millard
marklmi at yahoo.com

Re: git: a28ccb32bf56 - main - machine-id: generate a compact version of the uuid

2023-03-04 Thread Mark Millard

On Mar 4, 2023, at 06:32, Tĳl Coosemans  wrote:
> 
> On Fri, 3 Mar 2023 10:36:20 -0800 Mark Millard  wrote:
>> What are the properties for the content of /etc/hostid
>> in FreeBSD? Where are they documented?
>> 
>> /etc/machine-id has strong property guarnatee
>> requirements in linux and dbus (which linux indicates
>> it has adopted requirements from):
>> 
>> https://man7.org/linux/man-pages/man5/machine-id.5.html
>> 
>> reports:
>> 
>> QUOTE
>> The machine ID does not change based on local or network
>> configuration or when hardware is replaced. Due to this and its
>> greater length, it is a more useful replacement for the
>> gethostid(3) call that POSIX specifies.
>> 
>> This machine ID adheres to the same format and logic as the D-Bus
>> machine ID.
>> END QUOTE
> 
> /etc/hostid is written once.  It does not change with network or
> hardware changes.
> 
>> https://dbus.freedesktop.org/doc/dbus-uuidgen.1.html reports:
>> ( used via dbus-uuidgen --ensure=/etc/machine-id as one way
>> to get a linux-comaptibile /etc/machine-id for at least
>> some types of contexts )
>> 
>> QUOTE
>> The important properties of the machine UUID are that 1) it remains
>> unchanged until the next reboot and 2) it is different for any two
>> running instances of the OS kernel. That is, if two processes see
>> the same UUID, they should also see the same shared memory, UNIX
>> domain sockets, local X displays, localhost.localdomain resolution,
>> process IDs, and so forth
>> END QUOTE
>> 
>> 
>> Does /etc/hostid generated the normal way in FreeBSD have such
>> properties? (How do I look that up?)
> 
> Yes.  It's `kenv smbios.system.uuid` if that's available and generated
> by uuidgen otherwise.  The code is in /etc/rc.d/hostid and
> /etc/rc.d/hostid_save.

I probably also should have quoted the below for completeness:

QUOTE
Also, don't make it the same on two different systems; it needs to be
different anytime there are two different kernels running.
END QUOTE

There are implications for some virtual environments.

>> Returning to:
>> 
>> https://man7.org/linux/man-pages/man5/machine-id.5.html
>> 
>> QUOTE
>> This ID uniquely identifies the host. It should be considered
>> "confidential", and must not be exposed in untrusted
>> environments, in particular on the network. If a stable unique
>> identifier that is tied to the machine is needed for some
>> application, the machine ID or any part of it must not be used
>> directly. Instead the machine ID should be hashed with a
>> cryptographic, keyed hash function, using a fixed,
>> application-specific key. That way the ID will be properly
>> unique, and derived in a constant way from the machine ID but
>> there will be no way to retrieve the original machine ID from the
>> application-specific one.
>> END QUOTE
>> 
>> Is that at least recommended for handling FreeBSD's /etc/hostid
>> content?
> 
> No, the file is not documented at all, but this is a recommendation on
> how to use the file not a restriction on the content like the other
> quotes so this isn't an impediment to using the same ID in
> /etc/machine-id.

That presumes that what FreeBSD does with /etc/hostid content
keeps the content confidential by default, such as using hashing
to avoid there being a way to "retrieve the original machine ID".
(It may well, but that is not documented.) Otherwise following
the recommendation would be an impossibility for /etc/hostid
content.

>> Is FreeBSD going to document /etc/machine-id content properties
>> in a similar manor?
>> 
>> 
>> If FreeBSD ends up with a /etc/machine-id that does not have
>> the properties and recommended principles of use, it would
>> appear that the /etc/machine-id path would be highly misleading
>> and, so, inappropriate.

Thanks for the notes.

===
Mark Millard
marklmi at yahoo.com

SYSDECODE_ABI_FREEBSD32 for #include : armv7 for aarch64?

2023-03-07 Thread Mark Millard

https://man.freebsd.org/cgi/man.cgi?query=sysdecode&apropos=0&sektion=3&manpath=FreeBSD+13.2-STABLE&arch=default&format=html

reports:

SYSDECODE_ABI_FREEBSD32 32-bit FreeBSD binaries. Supported on amd64
and powerpc64.


But what of contexts with:

# sysctl kern.supported_archs
kern.supported_archs: aarch64 armv7

===
Mark Millard
marklmi at yahoo.com

I just updated to main-n261544-cee09bda03c8 based (via source) and now /etc/machine-id and /var/db/machine-id disagree ; more

2023-03-16 Thread Mark Millard



# cat /etc/hostid /etc/machine-id /var/db/machine-id
a4f7fbeb-f668-11de-b280-ebb65474e619
a4f7fbebf66811deb280ebb65474e619
7227cd89727a462186e3ba680d0ee142

(I'll not be keeping these values for the example system.)

# ls -Tld /etc/hostid /etc/machine-id /var/db/machine-id
-rw-r--r--  1 root  wheel  37 Dec 31 16:00:18 2009 /etc/hostid
-rw-r--r--  1 root  wheel  33 Mar 16 15:16:18 2023 /etc/machine-id
-r--r--r--  1 root  wheel  33 Mar  3 23:03:25 2023 /var/db/machine-id

I observed the delete-old-files deleting
/etc/machine-id during the upgrade. It did
nothing with /var/db/machine-id .

Also, modern hostid generation was switched to
random to avoid an exposure. But the update kept
the old hostid and propogated it (not "-"s) into
/etc/machine-id . So /etc/machine-id now has the
same exposure.

Later I'll see if stable/13 also got such behavior
for its upgrade.

I've not been dealing with releng/13.2 but upgrades
from releng/13.1 and before likely have the same
questions for what the handling should be vs. what it
might actually be. Different ways of upgrading might
not be in agreement, for all I know.

===
Mark Millard
marklmi at yahoo.com

Re: I just updated to main-n261544-cee09bda03c8 based (via source) and now /etc/machine-id and /var/db/machine-id disagree ; more

2023-03-16 Thread Mark Millard

On Mar 16, 2023, at 15:55, Mark Millard  wrote:

> # cat /etc/hostid /etc/machine-id /var/db/machine-id
> a4f7fbeb-f668-11de-b280-ebb65474e619
> a4f7fbebf66811deb280ebb65474e619
> 7227cd89727a462186e3ba680d0ee142
> 
> (I'll not be keeping these values for the example system.)
> 
> # ls -Tld /etc/hostid /etc/machine-id /var/db/machine-id
> -rw-r--r--  1 root  wheel  37 Dec 31 16:00:18 2009 /etc/hostid
> -rw-r--r--  1 root  wheel  33 Mar 16 15:16:18 2023 /etc/machine-id
> -r--r--r--  1 root  wheel  33 Mar  3 23:03:25 2023 /var/db/machine-id
> 
> I observed the delete-old-files deleting
> /etc/machine-id during the upgrade. It did
> nothing with /var/db/machine-id .
> 
> Also, modern hostid generation was switched to
> random to avoid an exposure. But the update kept
> the old hostid and propogated it (not "-"s) into
> /etc/machine-id . So /etc/machine-id now has the
> same exposure.
> 
> Later I'll see if stable/13 also got such behavior
> for its upgrade.
> 
> I've not been dealing with releng/13.2 but upgrades
> from releng/13.1 and before likely have the same
> questions for what the handling should be vs. what it
> might actually be. Different ways of upgrading might
> not be in agreement, for all I know.
> 

stable/13 was updated to be stable/13-n254805-4e4e299b0950
based. It got the same type of results. (I'll not list the
actual id's for this context.)

# ls -Tld /etc/hostid /etc/machine-id /var/db/machine-id
-rw-r--r--  1 root  wheel  37 Jul  5 20:08:03 2022 /etc/hostid
-rw-r--r--  1 root  wheel  33 Mar 16 13:32:49 2023 /etc/machine-id
-r--r--r--  1 root  wheel  33 Mar  3 23:07:55 2023 /var/db/machine-id

(I'm not sure of the intent on the permissions.)

===
Mark Millard
marklmi at yahoo.com

Re: I just updated to main-n261544-cee09bda03c8 based (via source) and now /etc/machine-id and /var/db/machine-id disagree ; more

2023-03-16 Thread Mark Millard

On Mar 16, 2023, at 16:48, Colin Percival  wrote:

> I think the current situation should be sorted out aside from potential issues
> for people who upgraded to a "broken" version before updating to the latest
> code -- CCing bapt and tijl just in case since they're more familiar with this
> than I am.

A question may be if past dbus port related activity might
have established a /var/db/machine-id independent of the
recent FreeBSD activity. That might not be able to be
classified as a "broken version":

Before upgrade:
/etc/hostid (old style)
/var/db/machine-id (via port)

After binary or source upgrade to releng/13.2 . . . ?

For other source(!) upgrades:
Similarly but to a stable/13 (jumping over the middle)?
Similarly but to a main [so: 14] (jumping over the middle)?

To some extent the "broken" context is
somewhat analogous other possible prior
history sequences with /var/db/machine-id
and /etc/hostid ( but not /etc/machine-id ).

> Colin Percival
> 
> On 3/16/23 15:55, Mark Millard wrote:
>> # cat /etc/hostid /etc/machine-id /var/db/machine-id
>> a4f7fbeb-f668-11de-b280-ebb65474e619
>> a4f7fbebf66811deb280ebb65474e619
>> 7227cd89727a462186e3ba680d0ee142
>> (I'll not be keeping these values for the example system.)
>> # ls -Tld /etc/hostid /etc/machine-id /var/db/machine-id
>> -rw-r--r--  1 root  wheel  37 Dec 31 16:00:18 2009 /etc/hostid
>> -rw-r--r--  1 root  wheel  33 Mar 16 15:16:18 2023 /etc/machine-id
>> -r--r--r--  1 root  wheel  33 Mar  3 23:03:25 2023 /var/db/machine-id
>> I observed the delete-old-files deleting
>> /etc/machine-id during the upgrade. It did
>> nothing with /var/db/machine-id .
>> Also, modern hostid generation was switched to
>> random to avoid an exposure. But the update kept
>> the old hostid and propogated it (not "-"s) into
>> /etc/machine-id . So /etc/machine-id now has the
>> same exposure.
>> Later I'll see if stable/13 also got such behavior
>> for its upgrade.
>> I've not been dealing with releng/13.2 but upgrades
>> from releng/13.1 and before likely have the same
>> questions for what the handling should be vs. what it
>> might actually be. Different ways of upgrading might
>> not be in agreement, for all I know.


===
Mark Millard
marklmi at yahoo.com

Re: I just updated to main-n261544-cee09bda03c8 based (via source) and now /etc/machine-id and /var/db/machine-id disagree ; more

2023-03-16 Thread Mark Millard

On Mar 16, 2023, at 17:27, Mark Millard  wrote:

> On Mar 16, 2023, at 16:48, Colin Percival  wrote:
> 
>> I think the current situation should be sorted out aside from potential 
>> issues
>> for people who upgraded to a "broken" version before updating to the latest
>> code -- CCing bapt and tijl just in case since they're more familiar with 
>> this
>> than I am.
> 
> A question may be if past dbus port related activity might
> have established a /var/db/machine-id independent of the
> recent FreeBSD activity. That might not be able to be
> classified as a "broken version":
> 
> Before upgrade:
> /etc/hostid (old style)
> /var/db/machine-id (via port)

Looks like var/db/machine-id is not a dbus default
place:

# find /var -name machine-id -print | more
# dbus-uuidgen --ensure
# find /var -name machine-id -print | more
/var/lib/dbus/machine-id

So the path in my analogy may not be the right one
for overall question.

> After binary or source upgrade to releng/13.2 . . . ?
> 
> For other source(!) upgrades:
> Similarly but to a stable/13 (jumping over the middle)?
> Similarly but to a main [so: 14] (jumping over the middle)?
> 
> To some extent the "broken" context is
> somewhat analogous other possible prior
> history sequences with /var/db/machine-id
> and /etc/hostid ( but not /etc/machine-id ).
> 
>> Colin Percival
>> 
>> On 3/16/23 15:55, Mark Millard wrote:
>>> # cat /etc/hostid /etc/machine-id /var/db/machine-id
>>> a4f7fbeb-f668-11de-b280-ebb65474e619
>>> a4f7fbebf66811deb280ebb65474e619
>>> 7227cd89727a462186e3ba680d0ee142
>>> (I'll not be keeping these values for the example system.)
>>> # ls -Tld /etc/hostid /etc/machine-id /var/db/machine-id
>>> -rw-r--r--  1 root  wheel  37 Dec 31 16:00:18 2009 /etc/hostid
>>> -rw-r--r--  1 root  wheel  33 Mar 16 15:16:18 2023 /etc/machine-id
>>> -r--r--r--  1 root  wheel  33 Mar  3 23:03:25 2023 /var/db/machine-id
>>> I observed the delete-old-files deleting
>>> /etc/machine-id during the upgrade.

The above is wrong: it was etcupdate activity,
not delete-old-files activity, that did the delete
("D") and did nothing with /var/???/machine-id .

>>> It did
>>> nothing with /var/db/machine-id .
>>> Also, modern hostid generation was switched to
>>> random to avoid an exposure. But the update kept
>>> the old hostid and propogated it (not "-"s) into
>>> /etc/machine-id . So /etc/machine-id now has the
>>> same exposure.
>>> Later I'll see if stable/13 also got such behavior
>>> for its upgrade.
>>> I've not been dealing with releng/13.2 but upgrades
>>> from releng/13.1 and before likely have the same
>>> questions for what the handling should be vs. what it
>>> might actually be. Different ways of upgrading might
>>> not be in agreement, for all I know.
> 

It might just be that there should be notes someplace
about checking and possibly fixing the various
machine-id related file relationships, especially
if "dbus-uuidgen --ensure" (default path) was part of
the prior context.

===
Mark Millard
marklmi at yahoo.com

Re: I just updated to main-n261544-cee09bda03c8 based (via source) and now /etc/machine-id and /var/db/machine-id disagree ; more

2023-03-17 Thread Mark Millard

On Mar 17, 2023, at 10:15, Tĳl Coosemans  wrote:

> On Thu, 16 Mar 2023 16:48:40 -0700 Colin Percival  
> wrote:
>> I think the current situation should be sorted out aside from potential 
>> issues
>> for people who upgraded to a "broken" version before updating to the latest
>> code -- CCing bapt and tijl just in case since they're more familiar with 
>> this
>> than I am.
>> 
>> Colin Percival
>> 
>> On 3/16/23 15:55, Mark Millard wrote:
>>> # cat /etc/hostid /etc/machine-id /var/db/machine-id
>>> a4f7fbeb-f668-11de-b280-ebb65474e619
>>> a4f7fbebf66811deb280ebb65474e619
>>> 7227cd89727a462186e3ba680d0ee142
>>> 
>>> (I'll not be keeping these values for the example system.)
>>> 
>>> # ls -Tld /etc/hostid /etc/machine-id /var/db/machine-id
>>> -rw-r--r--  1 root  wheel  37 Dec 31 16:00:18 2009 /etc/hostid
>>> -rw-r--r--  1 root  wheel  33 Mar 16 15:16:18 2023 /etc/machine-id
>>> -r--r--r--  1 root  wheel  33 Mar  3 23:03:25 2023 /var/db/machine-id
>>> 
>>> I observed the delete-old-files deleting
>>> /etc/machine-id during the upgrade. It did
>>> nothing with /var/db/machine-id .
> 
> delete-old deletes /etc/rc.d/machine-id, etcupdate deletes
> /etc/machine-id.  I suppose delete-old could also delete
> /var/db/machine-id but the file is harmless so I don't think this is
> important for 13.2.

Good to know. I'll remove the /var/db/machine-id that
hte machines happen to have around.

>>> Also, modern hostid generation was switched to
>>> random to avoid an exposure. But the update kept
>>> the old hostid and propogated it (not "-"s) into
>>> /etc/machine-id . So /etc/machine-id now has the
>>> same exposure.
> 
> These files are meant to remain constant across reboots, so the update
> process cannot change an existing /etc/hostid.  For example, it is used
> by NFS servers to restore state when a client crashes and reboots.

Good to know.

Absent man page(s) describing the princples for handling
the hostid and machine-id file(s) (and why), what to
report vs. not was unclear. So, for example, historical
hostid value takes default precedence over potential
adjustment to be random-based instead. That was not
obvious to me prior to the explanation. I'm not aware of
any place to find that in the man pages or other
documentation.

> If nothing relies on the old ID you can generate a new one by running
> "uuidgen -r > /etc/hostid" and rebooting the machine.

Yea, in my context, it appears that I can freely update
the files.

>>> Later I'll see if stable/13 also got such behavior
>>> for its upgrade.
>>> 
>>> I've not been dealing with releng/13.2 but upgrades
>>> from releng/13.1 and before likely have the same
>>> questions for what the handling should be vs. what it
>>> might actually be. Different ways of upgrading might
>>> not be in agreement, for all I know.

Thanks for the notes.

===
Mark Millard
marklmi at yahoo.com

Re: I just updated to main-n261544-cee09bda03c8 based (via source) and now /etc/machine-id and /var/db/machine-id disagree ; more

2023-03-17 Thread Mark Millard

The 13.1-RELEASE (snapshot) to 13.2-RC3 freebsd-update's
upgrade sequence did not go well relative to my being
prompted to do the right thing to establish /etc/machine-id .
After the last reboot (kernel upgrade, presumably) it had me
continue with. . .

# /usr/sbin/freebsd-update install
src component not installed, skipped
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Installing updates...
install: ///var/db/etcupdate/current/etc/rc.d/growfs_fstab: No such file or 
directory
install: ///var/db/etcupdate/current/etc/rc.d/var_run: No such file or directory
install: ///var/db/etcupdate/current/etc/rc.d/zpoolreguid: No such file or 
directory
Scanning //usr/share/certs/blacklisted for certificates...
Scanning //usr/share/certs/trusted for certificates...
rmdir: ///usr/tests/usr.bin/timeout: Directory not empty
 done.
root@generic:~ # cat /etc/hostid /etc/mach*
cat: No match.

It did not indicate the need for another reboot to
end up with a /etc/machine-id file.

I tried "shutdown -r now" anyway. It did establish
an /etc/machine-id file during the reboot:

# ls  -Tld /etc/hostid /etc/machine-id 
-rw-r--r--  1 root  wheel  37 May 12 08:46:21 2022 /etc/hostid
-rw-r--r--  1 root  wheel  33 May 13 09:46:56 2022 /etc/machine-id

So the basic implementation is operational but just
lacks an indication of the need to reboot again.

The date/time is because it is a RPi4B context (no
time of its own) and time is not automatically being
established via ntp, apparently. (I did not make such
adjustments to the snapshot before starting the
upgrade.)

I do not know if any of the "install: ///var/db/etcupdate/ . . . "
lines or the rmdir line are important.

It earlier indicated 5708 patches were fetched and that 377
files were as well.

===
Mark Millard
marklmi at yahoo.com

Re: I just updated to main-n261544-cee09bda03c8 based (via source) and now /etc/machine-id and /var/db/machine-id disagree ; more

2023-03-17 Thread Mark Millard

On Mar 17, 2023, at 18:24, Mark Millard  wrote:

> The 13.1-RELEASE (snapshot) to 13.2-RC3 freebsd-update's
> upgrade sequence did not go well relative to my being
> prompted to do the right thing to establish /etc/machine-id .
> After the last reboot (kernel upgrade, presumably) it had me
> continue with. . .
> 
> # /usr/sbin/freebsd-update install
> src component not installed, skipped
> ZFS filesystem version: 5
> ZFS storage pool version: features support (5000)
> Installing updates...
> install: ///var/db/etcupdate/current/etc/rc.d/growfs_fstab: No such file or 
> directory
> install: ///var/db/etcupdate/current/etc/rc.d/var_run: No such file or 
> directory
> install: ///var/db/etcupdate/current/etc/rc.d/zpoolreguid: No such file or 
> directory
> Scanning //usr/share/certs/blacklisted for certificates...
> Scanning //usr/share/certs/trusted for certificates...
> rmdir: ///usr/tests/usr.bin/timeout: Directory not empty
> done.
> root@generic:~ # cat /etc/hostid /etc/mach*
> cat: No match.
> 
> It did not indicate the need for another reboot to
> end up with a /etc/machine-id file.
> 
> I tried "shutdown -r now" anyway. It did establish
> an /etc/machine-id file during the reboot:
> 
> # ls  -Tld /etc/hostid /etc/machine-id 
> -rw-r--r--  1 root  wheel  37 May 12 08:46:21 2022 /etc/hostid
> -rw-r--r--  1 root  wheel  33 May 13 09:46:56 2022 /etc/machine-id
> 
> So the basic implementation is operational but just
> lacks an indication of the need to reboot again.
> 
> The date/time is because it is a RPi4B context (no
> time of its own) and time is not automatically being
> established via ntp, apparently. (I did not make such
> adjustments to the snapshot before starting the
> upgrade.)
> 
> I do not know if any of the "install: ///var/db/etcupdate/ . . . "
> lines or the rmdir line are important.
> 
> It earlier indicated 5708 patches were fetched and that 377
> files were as well.

Using the likes of:

http://ftp3.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/13.2/FreeBSD-13.2-RC3-arm64-aarch64-RPI.img.xz

directly seems to produce installations with a constant:

kenv -q smbios.system.uuid
30303031-3030-3030-3265-373238346338

that ends up being what is used for /etc/hostid .

It looks like this traces back to the U-Boot
involvement in the boot sequence:

# kenv | grep smbios
hint.smbios.0.mem="0x39c2b000"
smbios.bios.reldate="10/01/2022"
smbios.bios.revision="22.10"
smbios.bios.vendor="U-Boot"
smbios.bios.version="2022.10"
smbios.chassis.maker="Unknown"
smbios.chassis.type="Desktop"
smbios.planar.maker="Unknown"
smbios.planar.product="Unknown Product"
smbios.socket.enabled="1"
smbios.system.maker="Unknown"
smbios.system.product="Unknown Product"
smbios.system.serial="REDACTED"
smbios.system.uuid="30303031-3030-3030-3265-373238346338"
smbios.version="3.0"

===
Mark Millard
marklmi at yahoo.com

Re: I just updated to main-n261544-cee09bda03c8 based (via source) and now /etc/machine-id and /var/db/machine-id disagree ; more

2023-03-17 Thread Mark Millard

On Mar 17, 2023, at 19:04, Mark Millard  wrote:

> On Mar 17, 2023, at 18:24, Mark Millard  wrote:
> 
>> The 13.1-RELEASE (snapshot) to 13.2-RC3 freebsd-update's
>> upgrade sequence did not go well relative to my being
>> prompted to do the right thing to establish /etc/machine-id .
>> After the last reboot (kernel upgrade, presumably) it had me
>> continue with. . .
>> 
>> # /usr/sbin/freebsd-update install
>> src component not installed, skipped
>> ZFS filesystem version: 5
>> ZFS storage pool version: features support (5000)
>> Installing updates...
>> install: ///var/db/etcupdate/current/etc/rc.d/growfs_fstab: No such file or 
>> directory
>> install: ///var/db/etcupdate/current/etc/rc.d/var_run: No such file or 
>> directory
>> install: ///var/db/etcupdate/current/etc/rc.d/zpoolreguid: No such file or 
>> directory
>> Scanning //usr/share/certs/blacklisted for certificates...
>> Scanning //usr/share/certs/trusted for certificates...
>> rmdir: ///usr/tests/usr.bin/timeout: Directory not empty
>> done.
>> root@generic:~ # cat /etc/hostid /etc/mach*
>> cat: No match.
>> 
>> It did not indicate the need for another reboot to
>> end up with a /etc/machine-id file.
>> 
>> I tried "shutdown -r now" anyway. It did establish
>> an /etc/machine-id file during the reboot:
>> 
>> # ls  -Tld /etc/hostid /etc/machine-id 
>> -rw-r--r--  1 root  wheel  37 May 12 08:46:21 2022 /etc/hostid
>> -rw-r--r--  1 root  wheel  33 May 13 09:46:56 2022 /etc/machine-id
>> 
>> So the basic implementation is operational but just
>> lacks an indication of the need to reboot again.
>> 
>> The date/time is because it is a RPi4B context (no
>> time of its own) and time is not automatically being
>> established via ntp, apparently. (I did not make such
>> adjustments to the snapshot before starting the
>> upgrade.)
>> 
>> I do not know if any of the "install: ///var/db/etcupdate/ . . . "
>> lines or the rmdir line are important.
>> 
>> It earlier indicated 5708 patches were fetched and that 377
>> files were as well.
> 
> Using the likes of:
> 
> http://ftp3.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/13.2/FreeBSD-13.2-RC3-arm64-aarch64-RPI.img.xz
> 
> directly seems to produce installations with a constant:
> 
> kenv -q smbios.system.uuid
> 30303031-3030-3030-3265-373238346338
> 
> that ends up being what is used for /etc/hostid .
> 
> It looks like this traces back to the U-Boot
> involvement in the boot sequence:
> 
> # kenv | grep smbios
> hint.smbios.0.mem="0x39c2b000"
> smbios.bios.reldate="10/01/2022"
> smbios.bios.revision="22.10"
> smbios.bios.vendor="U-Boot"
> smbios.bios.version="2022.10"
> smbios.chassis.maker="Unknown"
> smbios.chassis.type="Desktop"
> smbios.planar.maker="Unknown"
> smbios.planar.product="Unknown Product"
> smbios.socket.enabled="1"
> smbios.system.maker="Unknown"
> smbios.system.product="Unknown Product"
> smbios.system.serial="REDACTED"
> smbios.system.uuid="30303031-3030-3030-3265-373238346338"
> smbios.version="3.0"
> 

Looks like if U-Boot ends up with a system
serial number, it uses that as the basis for
the system uuid:

https://github.com/u-boot/u-boot/blob/master/lib/smbios.c

char *serial_str = env_get("serial#");
. . .
if (serial_str) {
t->serial_number = smbios_add_string(ctx, serial_str);
strncpy((char *)t->uuid, serial_str, sizeof(t->uuid));
} else {
t->serial_number = smbios_add_prop(ctx, "serial");
}

For example (some byte reordering also involved someplace):
smbios.system.serial="10002e7284c8"
smbios.system.uuid="30303031-3030-3030-3265-373238346338"
#0 0 0 1- 0 0- 0 0- 2 e- 7 2 8 4 c 8

This explains my seeing the same uuid from 13.1-RELEASE
installation as I later saw from an independent 13.2-RC3
installation (not upgrade): I reused the same RPi4B.
All media produced on the same RPi4B will get the same
hostid and machine-id files by default, given how
U-Boot works and that smbios.system.uuid "wins" when
present.

This may all be fine. But it still leaves me expecting
that there should be man page(s) covering these hostid
and machine-id files and how they should be handled to
match the usages to which they are put, such as the nfs
use that was referenced. A note/reminder to look up
that material could also be relevant.

===
Mark Millard
marklmi at yahoo.com

releng/13.1 amd64 atomic_fcmpset_long parameter order and dst,expect,src (source) vs. src,dst,expect (crash dump report)

2023-03-21 Thread Mark Millard

Anyone know what to make of the below mismatch between the source
and what  crash log is reporting about the atomic_fcmpset_long
parameter order?

A releng/13.1 sys/amd64/include/atomic.h has the likes of:

int atomic_fcmpset_long(volatile u_long *dst, u_long *expect, u_long src);

Note the order: dst, expect, src. Later it has the implementation:

/*
 * Atomic compare and set, used by the mutex functions.
 *
 * cmpset:
 *  if (*dst == expect)
 *  *dst = src
 *
 * fcmpset:
 *  if (*dst == *expect)
 *  *dst = src
 *  else
 *  *expect = *dst
 *
 * Returns 0 on failure, non-zero on success.
 */
#define ATOMIC_CMPSET(TYPE) \
static __inline int \
atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE src) \
{   \
u_char res; \
\
__asm __volatile(   \
" lock; cmpxchg %3,%1 ; "   \
"# atomic_cmpset_" #TYPE "  "   \
: "=@cce" (res),/* 0 */ \
  "+m" (*dst),  /* 1 */ \
  "+a" (expect) /* 2 */ \
: "r" (src) /* 3 */ \
: "memory", "cc");  \
return (res);   \
}   \
\
static __inline int \
atomic_fcmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE *expect, u_##TYPE src) \
{   \
u_char res; \
\
__asm __volatile(   \
" lock; cmpxchg %3,%1 ; "   \
"# atomic_fcmpset_" #TYPE " "   \
: "=@cce" (res),/* 0 */ \
  "+m" (*dst),  /* 1 */ \
  "+a" (*expect)/* 2 */ \
: "r" (src) /* 3 */ \
: "memory", "cc");  \
return (res);   \
}

ATOMIC_CMPSET(char);
ATOMIC_CMPSET(short);
ATOMIC_CMPSET(int);
ATOMIC_CMPSET(long);

which still shows dst,expect,src for the order.


But a releng/13.1 crash dump log shows the name order: src, dst, expect
(in #7 below):

#4 0x80c1ba63 in panic (fmt=)
at /usr/src/sys/kern/kern_shutdown.c:844
#5 0x810addf5 in trap_fatal (frame=0xfe00b555dae0, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:944
#6 
#7 0x80c895cb in atomic_fcmpset_long (src=18446741877726026240, 
dst=, expect=)
at /usr/src/sys/amd64/include/atomic.h:225

The atomic_fcmpset_long (from a mtx_lock(?) use) got a:

Fatal trap 9: general protection fault while in kernel mode

crash. The code was inside nfsd.

( Note: 18446741877726026240 == 0xfe00b52e9a00 )

The crash is not mine. It is a new type of example from
an ongoing crash-evidence gathering session. See:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028#c147
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028#c148

===
Mark Millard
marklmi at yahoo.com

stable/13 missing 2 Windows Dev Kit 2023 related updates that main [so: 14] has

2023-05-10 Thread Mark Millard

I had to substitute a FreeBSD EFI loader from main [so: 14] to boot
the Windows Dev Kit 2023 via USB3 and a dd'd:

http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/13.2/FreeBSD-13.2-STABLE-arm64-aarch64-ROCK64-20230504-7dea7445ba44-255298.img.xz

When I looked as a resut, the following that mention
the Windows Dev Kit 2023 were not in stable/13 :

Commit message (Expand) Author  Age Files   Lines
*   arm64: Disable PAC when booting on a Windows Dev Kit 2023   Mark 
Johnston   2023-04-23  1   -1/+30
*   Add the fixed memory type to the pci ecam driverAndrew Turner   
2023-01-18  1   -3/+20

By contrast, I did find:

Commit message (Expand) Author  Age Files   Lines
*   loader.efi: make sure kernel image is executableRobert 
Clausecker   2023-01-23  1   -4/+4
*   Add Windows Dev Kit 2023 support to if_ure  Andrew Turner   
2023-01-23  2   -0/+2
*   Check for more XHCI ACPI IDsAndrew Turner   2023-01-23  1   
-4/+7

in stable/13 .

(I have not checked the correspondance of what I found
missing vs. the status of the loader. There could be
more involved than what I've found --and some or all
of what I found missing may not be involved in the EFI
loader issue.)

===
Mark Millard
marklmi at yahoo.com

Re: Possible regression in main causing poor performance

2023-08-18 Thread Mark Millard

On Aug 18, 2023, at 19:09, Mark Millard  wrote:

> Glen Barber  wrote on
> Date: Sat, 19 Aug 2023 00:10:59 UTC :
> 
>> I am somewhat inclined to look in the direction of ZFS here, as two
>> things changed:
>> 
>> 1) the build machine in question was recently (as in a week and a half
>> ago) upgraded to the tip of main in order to ease the transition from
>> this machine from building 14.x to building 15.x;
>> 2) there is the recent addition of building ZFS-backed virtual machine
>> and cloud images.
>> 
>> . . .
>> The first machine runs:
>> # uname -a
>> FreeBSD releng1.nyi.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT \
>> amd64 1400093 #5 main-n264224-c84617e87a70: Wed Jul 19 19:10:38 UTC 2023
> 
> I'm confused:
> 
> "the build machine in question was recently (as in a week and a half
> ago) upgraded to the tip of main in order to ease the transition from
> this machine from building 14.x to building 15.x"? But the above
> kernel is from mid July? (-aKU was not used to also get some clue
> about world from the pair of 140009? that would show.)
> 
>> Last week's snapshot builds were completed in a reasonable amount of
>> time:
>> 
>> r...@releng1.nyi:/releng/scripts-snapshot/scripts # ./thermite.sh -c 
>> ./builds-14.conf ; echo ^G
>> 20230811-00:03:11 INFO: Creating /releng/scripts-snapshot/logs
>> 20230811-00:03:11 INFO: Creating /releng/scripts-snapshot/chroots
>> 20230811-00:03:12 INFO: Creating /releng/scripts-snapshot/release
>> 20230811-00:03:12 INFO: Creating /releng/scripts-snapshot/ports
>> 20230811-00:03:12 INFO: Creating /releng/scripts-snapshot/doc
>> 20230811-00:03:13 INFO: Checking out https://git.FreeBSD.org//src.git (main) 
>> to /releng/scripts-snapshot/release
>> [...]
>> 20230811-15:11:13 INFO: Staging for ftp: 14-i386-GENERIC-snap
>> 20230811-16:27:28 INFO: Staging for ftp: 14-amd64-GENERIC-snap
>> 20230811-16:33:43 INFO: Staging for ftp: 14-aarch64-GENERIC-snap
>> 
>> Overall, 17 hours, including the time to upload EC2, Vagrant, and GCE.
>> 
>> With no changes to the system, no stale ZFS datasets laying around from
>> last week (everything is a pristine environment, etc.), this week's
>> builds are taking forever:
> 
> My confusion may extend to this "no changes" status vs. the uname
> output identifying the kernel is from mid July.
> 
>> r...@releng1.nyi:/releng/scripts-snapshot/scripts # ./thermite.sh -c 
>> ./builds-14.conf ; echo ^G
>> 20230818-00:15:44 INFO: Creating /releng/scripts-snapshot/logs
>> 20230818-00:15:44 INFO: Creating /releng/scripts-snapshot/chroots
>> 20230818-00:15:45 INFO: Creating /releng/scripts-snapshot/release
>> 20230818-00:15:45 INFO: Creating /releng/scripts-snapshot/ports
>> 20230818-00:15:45 INFO: Creating /releng/scripts-snapshot/doc
>> 20230818-00:15:46 INFO: Checking out https://git.FreeBSD.org//src.git (main) 
>> to /releng/scripts-snapshot/release
>> [...]
>> 20230818-18:46:22 INFO: Staging for ftp: 14-aarch64-ROCKPRO64-snap
>> 20230818-20:41:02 INFO: Staging for ftp: 14-riscv64-GENERIC-snap
>> 20230818-22:54:49 INFO: Staging for ftp: 14-amd64-GENERIC-snap
>> 
>> Note, it is just about 4 minutes past 00:00 UTC as of this writing, so
>> we are about to cross well over the 24-hour mark, and cloud provider
>> images have not yet even started.
>> 
>> . . .
> 
> In:
> 
> https://lists.freebsd.org/archives/freebsd-current/2023-August/004314.html
> ("HEADS UP: $FreeBSD$ Removed from main", Wed, 16 Aug 2023)
> 
> Warner wrote:
> 
> QUOTE
> . . . , but there's no incremental building
> with this change, . . . Also: expect long build times, git fetch times, etc
> after this.
> END QUOTE
> 
> Might this be contributing? How long did those two
> "Checking out . . ." take? Similar time frames?
> 

The build process and information is not available. So
I looked at something I thought might have a chance
of being somewhat invariant and have a limited range of
types of (parallel) activity: time differences for the
CHECKSUM files taht have timestamps after the last
*.img* timestamp, as seen via:

http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/14.0/?C=M&O=D
(so: most recent to oldest as displayed)

First today's:

CHECKSUM.SHA256-FreeBSD-14.0-ALPHA2-arm64-aarch64-20230819-77013f29d048-264841 
1232 2023-Aug-19 00:26
CHECKSUM.SHA512-FreeBSD-14.0-ALPHA2-arm64-aarch64-20230819-77013f29d048-264841 
1744 2023-Aug-19 00:25
CHECKSUM.SHA256-FreeBSD-14.0-ALPHA2-amd64-20230818-77013f29d048-264841 1168 
2023-Aug-18 22:59
CHECKSUM.SHA512-FreeBSD-14.0-ALPHA2-amd64-20230818-7

Re: Possible regression in main causing poor performance

2023-08-28 Thread Mark Millard

Has any more been learned about this? Is it still an issue?

===
Mark Millard
marklmi at yahoo.com

Re: Possible regression in main causing poor performance

2023-09-05 Thread Mark Millard

On Sep 5, 2023, at 08:58, Cy Schubert  wrote:

> In message <20230830204406.24fd...@slippy.cwsent.com>, Cy Schubert writes:
>> In message <20230830184426.gm1...@freebsd.org>, Glen Barber writes:
>>> 
>>> 
>>> On Mon, Aug 28, 2023 at 06:06:09PM -0700, Mark Millard wrote:
>>>> Has any more been learned about this? Is it still an issue?
>>>> =20
>>> 
>>> I rebooted the machine before the ALPHA3 builds with no other changes,
>>> and the overall times for 14.x builds went back to normal.  I do not
>>> like to experiment with builders during a release cycle, but as we are
>>> going to have 15.x snapshots available moving forward, I will not reboot
>>> that machine next week in hopes to get some useful data.
>>> 
>>> If my memory serves correctly, mm@ has a pending ZFS import from
>>> upstream for both main and stable/14 pending.  Whether or not that will
>>> resolve any issue here, I do not know.
>> 
>> Two of my poudriere builder machines have experienced different panics 
>> since the ZFS import two days ago. The problems have been documented on the 
>> -current list.
> 
> Just an update.
> 
> The three pull requests amotin@ pointed to did resolve all my problems. A 
> subsequent update which included the latest ZFS commits worked just as 
> well, without any new regressions. AFAIAC this problem has been resolved.
> 
> The random email corruptions have also been resolved.
> 
> 
> -- 
> Cheers,
> Cy Schubert 
> FreeBSD UNIX: Web:  https://FreeBSD.org
> NTP:   Web:  https://nwtime.org
> 
> e^(i*pi)+1=0
> 
> 
> 
> 
> 9O8

The just-above quoted line looks like a corruption to me.

Otherwise, I'm just reporting more evidence from separate
testing on amd64 . . .

I will say that my separate-install/boot environment 10hr,
6366 port->package poudriere bulk -a prefix test of:

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #118 
main-n265152-f49d6f583e9d-dirty: Mon Sep  4 14:26:56 PDT 2023 
root@amd64_ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
 amd64 amd64 150 150

did not show any deadlocks. The only oddity that I've noticed
is the 1 extra message shown in:

. . .
[00:03:25] [32] [00:00:00] Builder starting
[00:03:43] [01] [00:00:18] Finished print/indexinfo | indexinfo-0.3.1: Success
[00:03:43] [01] [00:00:00] Building devel/gettext-runtime | 
gettext-runtime-0.22_1
[00:05:20] [01] [00:01:37] Finished devel/gettext-runtime | 
gettext-runtime-0.22_1: Success
23/.p/cleaning/rdeps/gettext-runtime-0.22_1/chemtool-1.6.14_4 copy: open 
failed: No such file or directory
[00:05:23] [01] [00:00:00] Building devel/gmake | gmake-4.3_2
[00:05:55] [02] [00:02:30] Builder started
. . .

I'm comfortable moving my normal environments forward to include
this latest import of openzfs.

The effort established a separate environment set up for
doing testing of jumping to/past an openzfs import(s) in
main. Too many recent imports have
dangerous-to-the-file-system and/or had deadlocking issues
for me to simply update to include them without first
testing on separate media that does not have to stay
operational.

===
Mark Millard
marklmi at yahoo.com

main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard

805d6107380, outoffp=0x811e6eb7,  
outoffp@entry=0xf819860a2c78, lenp=0x0,  lenp@entry=0xfe0352758d50, 
flags=flags@entry=0,  incred=0xf80e32335200, 
outcred=0xf80e32335200,  fsize_td=0xfe03586c0720) at 
/usr/main-src/sys/kern/vfs_vnops.c:3085
#21 0x80c6b998 in kern_copy_file_range (
   td=td@entry=0xfe03586c0720, infd=,  
inoffp=0xf81910c3c7c8, inoffp@entry=0x0, outfd=,  
outoffp=0xf819860a2c78, outoffp@entry=0x0, len=9223372036854775807,  
flags=0) at /usr/main-src/sys/kern/vfs_syscalls.c:4971
#22 0x80c6bab8 in sys_copy_file_range (td=0xfe03586c0720,  
uap=0xfe03586c0b20) at /usr/main-src/sys/kern/vfs_syscalls.c:5009
#23 0x8104bab9 in syscallenter (td=0xfe03586c0720)
   at /usr/main-src/sys/amd64/amd64/../../kern/subr_syscall.c:187
#24 amd64_syscall (td=0xfe03586c0720, traced=0)
   at /usr/main-src/sys/amd64/amd64/trap.c:1197
#25 
#26 0x1ce4506d155a in ?? ()
Backtrace stopped: Cannot access memory at address 0x1ce44ec71e88
(kgdb) 


Context details follow.

Absent a openzfs-2.2 in:

ls -C1 /usr/share/zfs/compatibility.d/openzfs-2.*
/usr/share/zfs/compatibility.d/openzfs-2.0-freebsd
/usr/share/zfs/compatibility.d/openzfs-2.0-linux
/usr/share/zfs/compatibility.d/openzfs-2.1-freebsd
/usr/share/zfs/compatibility.d/openzfs-2.1-linux

I have copied:

/usr/main-src/sys/contrib/openzfs/cmd/zpool/compatibility.d/openzfs-2.2

over to:

# ls -C1 /etc/zfs/compatibility.d/*
/etc/zfs/compatibility.d/openzfs-2.2

and used it:

# zpool get compatibility zamd64
NAMEPROPERTY   VALUE  SOURCE
zamd64  compatibility  openzfs-2.2local

For reference:

# zpool upgrade
This system supports ZFS pool feature flags.

All pools are formatted using feature flags.


Some supported features are not enabled on the following pools. Once a
feature is enabled the pool may become incompatible with software
that does not support the feature. See zpool-features(7) for details.

Note that the pool 'compatibility' feature can be used to inhibit
feature upgrades.

POOL  FEATURE
---
zamd64
 redaction_list_spill

which agrees with openzfs-2.2 .

I did:

# sysctl vfs.zfs.bclone_enabled=1
vfs.zfs.bclone_enabled: 0 -> 1

I also made a snapshot: zamd64@before-bclone-test and
I then made a checkpoint. These were establshed just
after the above enable.

I then did a: zpool trim -w zamd64

The poudriere bulk command was: poudriere bulk -jmain-amd64-bulk_a -a
where main-amd64-bulk_a has nothing prebuilt. USE_TMPFS=no
is in use. No form of ALLOW_MAKE_JOBS is in use. It is a
32 builder context (32 hardware threads).

For reference:

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #118 
main-n265152-f49d6f583e9d-dirty: Mon Sep  4 14:26:56 PDT 2023 
root@amd64_ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
 amd64 amd64 150 150

I'll note that with openzfs-2.1-freebsd compatibility I'd
previously let such a bulk -a run for about 10 hr and it
had reached 6366 port->package builds.

Prior to that I'd done shorter experiments with default
zpool features (no explicit compatibility constraint)
but vfs.zfs.bclone_enabled=0 and I'd had no problems.

(I have a separate M.2 boot media just for such experiments
and can reconstruct its content at will.)

All these have been based on the same personal
main-n265152-f49d6f583e9d-dirty system build. Unfortunately,
no appropriate snapshot of main was available to avoid my
personal context being involved for the system build used.
Similarly, the snapshot(s) of stable/14 predate:

Sun, 03 Sep 2023
. . .
git: f789381671a3 - stable/14 - zfs: merge openzfs/zfs@32949f256 
(zfs-2.2-release) into stable/14

that has required fixes for other issues.

===
Mark Millard
marklmi at yahoo.com

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard

[Drat, the request to rerun my tests did not not mention
the more recent change:

vfs: copy_file_range() between multiple mountpoints of the same fs type

and I'd not noticed on my own and ran the test without updating.]


On Sep 7, 2023, at 11:02, Mark Millard  wrote:

> I was requested to do a test with vfs.zfs.bclone_enabled=1 and
> the bulk -a build paniced (having stored 128 *.pkg files in
> .building/ first):

Unfortunately, rerunning my tests with this set was testing a
context predating:

Wed, 06 Sep 2023
. . .
• git: 969071be938c - main - vfs: copy_file_range() between multiple 
mountpoints of the same fs type Martin Matuska

So the information might be out of date for main and for
stable/14 : I've no clue how good of a test it was.

May be some of those I've cc'd would know.

When I next have time, should I retry based on a more recent
vintage of main that includes 969071be938c ?

> # more /var/crash/core.txt.3
> . . .
> Unread portion of the kernel message buffer:
> panic: Solaris(panic): zfs: accessing past end of object 422/1108c16 
> (size=2560 access=2560+2560)
> cpuid = 15
> time = 1694103674
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0352758590
> vpanic() at vpanic+0x132/frame 0xfe03527586c0
> panic() at panic+0x43/frame 0xfe0352758720
> vcmn_err() at vcmn_err+0xeb/frame 0xfe0352758850
> zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfe03527588b0
> dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x97/frame 
> 0xfe0352758960
> dmu_brt_clone() at dmu_brt_clone+0x61/frame 0xfe03527589f0
> zfs_clone_range() at zfs_clone_range+0xa6a/frame 0xfe0352758bc0
> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x1ae/frame 
> 0xfe0352758c40
> vn_copy_file_range() at vn_copy_file_range+0x11e/frame 0xfe0352758ce0
> kern_copy_file_range() at kern_copy_file_range+0x338/frame 0xfe0352758db0
> sys_copy_file_range() at sys_copy_file_range+0x78/frame 0xfe0352758e00
> amd64_syscall() at amd64_syscall+0x109/frame 0xfe0352758f30
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0352758f30
> --- syscall (569, FreeBSD ELF64, copy_file_range), rip = 0x1ce4506d155a, rsp 
> = 0x1ce44ec71e88, rbp = 0x1ce44ec72320 ---
> KDB: enter: panic
> 
> __curthread () at /usr/main-src/sys/amd64/include/pcpu_aux.h:57
> 57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
> pcpu,
> (kgdb) #0  __curthread () at /usr/main-src/sys/amd64/include/pcpu_aux.h:57
> #1  doadump (textdump=textdump@entry=0)
>   at /usr/main-src/sys/kern/kern_shutdown.c:405
> #2  0x804a442a in db_dump (dummy=,  
> dummy2=, dummy3=, dummy4=)
>   at /usr/main-src/sys/ddb/db_command.c:591
> #3  0x804a422d in db_command (last_cmdp=,  
> cmd_table=, dopager=true)
>   at /usr/main-src/sys/ddb/db_command.c:504
> #4  0x804a3eed in db_command_loop ()
>   at /usr/main-src/sys/ddb/db_command.c:551
> #5  0x804a7876 in db_trap (type=, code=)
>   at /usr/main-src/sys/ddb/db_main.c:268
> #6  0x80bb9e57 in kdb_trap (type=type@entry=3, code=code@entry=0, 
>  tf=tf@entry=0xfe03527584d0) at /usr/main-src/sys/kern/subr_kdb.c:790
> #7  0x8104ad3d in trap (frame=0xfe03527584d0)
>   at /usr/main-src/sys/amd64/amd64/trap.c:608
> #8  
> #9  kdb_enter (why=, msg=)
>   at /usr/main-src/sys/kern/subr_kdb.c:556
> #10 0x80b6aab3 in vpanic (fmt=0x82be52d6 "%s%s",  
> ap=ap@entry=0xfe0352758700)
>   at /usr/main-src/sys/kern/kern_shutdown.c:958
> #11 0x80b6a943 in panic (
>   fmt=0x820aa2e8  "\312C$\201\377\377\377\377")
>   at /usr/main-src/sys/kern/kern_shutdown.c:894
> #12 0x82993c5b in vcmn_err (ce=,  
> fmt=0x82bfdd1f "zfs: accessing past end of object %llx/%llx (size=%u 
> access=%llu+%llu)", adx=0xfe0352758890)
>   at /usr/main-src/sys/contrib/openzfs/module/os/freebsd/spl/spl_cmn_err.c:60
> #13 0x82a84d69 in zfs_panic_recover (
>   fmt=0x12 )
>   at /usr/main-src/sys/contrib/openzfs/module/zfs/spa_misc.c:1594
> #14 0x829f8e27 in dmu_buf_hold_array_by_dnode (dn=0xf813dfc48978, 
>  offset=offset@entry=2560, length=length@entry=2560, read=read@entry=0,   
>tag=0x82bd8175, numbufsp=numbufsp@entry=0xfe03527589bc,  
> dbpp=0xfe03527589c0, flags=0)
>   at /usr/main-src/sys/contrib/openzfs/module/zfs/dmu.c:543
> #15 0x829fc6a1 in dmu_buf_hold_array (os=,  
> object=, read=0, numbufsp=0xfe03527589bc,  
> dbpp=0xfe03527589c0, offset=, length=,  
> tag=)
>   at /usr/main-src/sys/contrib/openzfs/module/zfs/dmu.c:6

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard

On Sep 7, 2023, at 11:48, Glen Barber  wrote:

> On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:
>> When I next have time, should I retry based on a more recent
>> vintage of main that includes 969071be938c ?
>> 
> 
> Yes, please, if you can.

As stands, I rebooted that machine into my normal
enviroment, so the after-crash-with-dump-info
context is preserved. I'll presume lack of a need
to preserve that context unless I hear otherwise.
(But I'll work on this until later today.)

Even my normal environment predates the commit in
question by a few commits. So I'll end up doing a
more general round of updates overall.

Someone can let me know if there is a preference
for debug over non-debug for the next test run.

Looking at "git: 969071be938c - main", the relevant
part seems to be just (white space possibly not
preserved accurately):

diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
index 9fb5aee6a023..4e4161ef1a7f 100644
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -3076,12 +3076,14 @@ vn_copy_file_range(struct vnode *invp, off_t *inoffp, 
struct vnode *outvp,
goto out;

/*
-* If the two vnode are for the same file system, call
+* If the two vnodes are for the same file system type, call
 * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range()
-* which can handle copies across multiple file systems.
+* which can handle copies across multiple file system types.
 */
*lenp = len;
-   if (invp->v_mount == outvp->v_mount)
+   if (invp->v_mount == outvp->v_mount ||
+   strcmp(invp->v_mount->mnt_vfc->vfc_name,
+   outvp->v_mount->mnt_vfc->vfc_name) == 0)
error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp,
lenp, flags, incred, outcred, fsize_td);
else

That looks to call VOP_COPY_FILE_RANGE in more contexts and
vn_generic_copy_file_range in fewer.

The backtrace I reported involves: VOP_COPY_FILE_RANGE
So it appears this change is unlikely to invalidate my
test result,  although failure might happen sooner if
more VOP_COPY_FILE_RANGE calls happen with the newer code.

That in turns means that someone may come up with some
other change for me to test by the time I get around to
setting up another test. Let me know if so.

===
Mark Millard
marklmi at yahoo.com

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard

On Sep 7, 2023, at 13:07, Alexander Motin  wrote:

> Thanks, Mark.
> 
> On 07.09.2023 15:40, Mark Millard wrote:
>> On Sep 7, 2023, at 11:48, Glen Barber  wrote:
>>> On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:
>>>> When I next have time, should I retry based on a more recent
>>>> vintage of main that includes 969071be938c ?
>>> 
>>> Yes, please, if you can.
>> As stands, I rebooted that machine into my normal
>> enviroment, so the after-crash-with-dump-info
>> context is preserved. I'll presume lack of a need
>> to preserve that context unless I hear otherwise.
>> (But I'll work on this until later today.)
>> Even my normal environment predates the commit in
>> question by a few commits. So I'll end up doing a
>> more general round of updates overall.
>> Someone can let me know if there is a preference
>> for debug over non-debug for the next test run.
> 
> It is not unknown when some bugs disappear once debugging is enabled due to 
> different execution timings, but generally debug may to detect the problem 
> closer to its origin instead of looking on random consequences. I am only 
> starting to look on this report (unless Pawel or somebody beat me on it), and 
> don't have additional requests yet, but if you can repeat the same with debug 
> kernel (in-base ZFS's ZFS_DEBUG setting follows kernel's INVARIANTS), it may 
> give us some additional information.

So I did a zpool import, rewinding to the checkpoint.
(This depends on the questionable zfs doing fully as
desired for this. Notably the normal environment has
vfs.zfs.bclone_enabled=0 , including when it was
doing this activity.) My normal environment reported
no problems.

Note: the earlier snapshot from my first setup was
still in place since it was made just before the
original checkpoint used above.

However, the rewind did remove the /var/crash/
material that had been added.

I did the appropriate zfs mount.

I installed a debug kernel and world to the import. Again,
no problems reported.

I did the appropriate zfs umount.

I did the appropriate zpool export.

I rebooted with the test media.

# sysctl vfs.zfs.bclone_enabled
vfs.zfs.bclone_enabled: 1

# zpool trim -w zamd64

# zpool checkpoint zamd64

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #74 
main-n265188-117c54a78ccd-dirty: Tue Sep  5 21:29:53 PDT 2023 
root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG
 amd64 amd64 150 150

(So, before the 969071be938c vintage, same sources as for
my last run but a debug build.)

# poudriere bulk -jmain-amd64-bulk_a -a
. . .
[00:03:23] Building 34214 packages using up to 32 builders
[00:03:23] Hit CTRL+t at any time to see build progress and stats
[00:03:23] [01] [00:00:00] Builder starting
[00:04:19] [01] [00:00:56] Builder started
[00:04:20] [01] [00:00:01] Building ports-mgmt/pkg | pkg-1.20.6
[00:05:33] [01] [00:01:14] Finished ports-mgmt/pkg | pkg-1.20.6: Success
[00:05:53] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
[00:05:53] [02] [00:00:00] Builder starting
. . .
[00:05:54] [32] [00:00:00] Builder starting
[00:06:11] [01] [00:00:18] Finished print/indexinfo | indexinfo-0.3.1: Success
[00:06:12] [01] [00:00:00] Building devel/gettext-runtime | 
gettext-runtime-0.22_1
[00:08:24] [01] [00:02:12] Finished devel/gettext-runtime | 
gettext-runtime-0.22_1: Success
[00:08:31] [01] [00:00:00] Building devel/libtextstyle | libtextstyle-0.22
[00:10:06] [05] [00:04:13] Builder started
[00:10:06] [05] [00:00:00] Building devel/autoconf-switch | 
autoconf-switch-20220527
[00:10:06] [31] [00:04:12] Builder started
[00:10:06] [31] [00:00:00] Building devel/libatomic_ops | libatomic_ops-7.8.0
. . .

Crashed again, with 158 *.pkg files in .building/All/ after
rebooting.

The crash is similar to the non-debug one. No extra output
from the debug build.

For reference:

Unread portion of the kernel message buffer:
panic: Solaris(panic): zfs: accessing past end of object 422/10b1c02 (size=2560 
access=2560+2560)
cpuid = 15
time = 1694127988
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe02e783b5a0
vpanic() at vpanic+0x132/frame 0xfe02e783b6d0
panic() at panic+0x43/frame 0xfe02e783b730
vcmn_err() at vcmn_err+0xeb/frame 0xfe02e783b860
zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfe02e783b8c0
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0xb8/frame 
0xfe02e783b970
dmu_brt_clone() at dmu_brt_clone+0x61/frame 0xfe02e783b9f0
zfs_clone_range() at zfs_clone_range+0xaa3/frame 0xfe02e783bbc0
zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x18a/frame 
0xfe02e783bc40
vn_copy_file_range() at vn_copy_file_range+0x114/frame 0xfe02e783bce0
kern_copy_file_ra

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard

[Today's main-snapshot kernel panics as well.]

On Sep 7, 2023, at 16:32, Mark Millard  wrote:

> On Sep 7, 2023, at 13:07, Alexander Motin  wrote:
> 
>> Thanks, Mark.
>> 
>> On 07.09.2023 15:40, Mark Millard wrote:
>>> On Sep 7, 2023, at 11:48, Glen Barber  wrote:
>>>> On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:
>>>>> When I next have time, should I retry based on a more recent
>>>>> vintage of main that includes 969071be938c ?
>>>> 
>>>> Yes, please, if you can.
>>> As stands, I rebooted that machine into my normal
>>> enviroment, so the after-crash-with-dump-info
>>> context is preserved. I'll presume lack of a need
>>> to preserve that context unless I hear otherwise.
>>> (But I'll work on this until later today.)
>>> Even my normal environment predates the commit in
>>> question by a few commits. So I'll end up doing a
>>> more general round of updates overall.
>>> Someone can let me know if there is a preference
>>> for debug over non-debug for the next test run.
>> 
>> It is not unknown when some bugs disappear once debugging is enabled due to 
>> different execution timings, but generally debug may to detect the problem 
>> closer to its origin instead of looking on random consequences. I am only 
>> starting to look on this report (unless Pawel or somebody beat me on it), 
>> and don't have additional requests yet, but if you can repeat the same with 
>> debug kernel (in-base ZFS's ZFS_DEBUG setting follows kernel's INVARIANTS), 
>> it may give us some additional information.
> 
> So I did a zpool import, rewinding to the checkpoint.
> (This depends on the questionable zfs doing fully as
> desired for this. Notably the normal environment has
> vfs.zfs.bclone_enabled=0 , including when it was
> doing this activity.) My normal environment reported
> no problems.
> 
> Note: the earlier snapshot from my first setup was
> still in place since it was made just before the
> original checkpoint used above.
> 
> However, the rewind did remove the /var/crash/
> material that had been added.
> 
> I did the appropriate zfs mount.
> 
> I installed a debug kernel and world to the import. Again,
> no problems reported.
> 
> I did the appropriate zfs umount.
> 
> I did the appropriate zpool export.
> 
> I rebooted with the test media.
> 
> # sysctl vfs.zfs.bclone_enabled
> vfs.zfs.bclone_enabled: 1
> 
> # zpool trim -w zamd64
> 
> # zpool checkpoint zamd64
> 
> # uname -apKU
> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #74 
> main-n265188-117c54a78ccd-dirty: Tue Sep  5 21:29:53 PDT 2023 
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG
>  amd64 amd64 150 150
> 
> (So, before the 969071be938c vintage, same sources as for
> my last run but a debug build.)
> 
> # poudriere bulk -jmain-amd64-bulk_a -a
> . . .
> [00:03:23] Building 34214 packages using up to 32 builders
> [00:03:23] Hit CTRL+t at any time to see build progress and stats
> [00:03:23] [01] [00:00:00] Builder starting
> [00:04:19] [01] [00:00:56] Builder started
> [00:04:20] [01] [00:00:01] Building ports-mgmt/pkg | pkg-1.20.6
> [00:05:33] [01] [00:01:14] Finished ports-mgmt/pkg | pkg-1.20.6: Success
> [00:05:53] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
> [00:05:53] [02] [00:00:00] Builder starting
> . . .
> [00:05:54] [32] [00:00:00] Builder starting
> [00:06:11] [01] [00:00:18] Finished print/indexinfo | indexinfo-0.3.1: Success
> [00:06:12] [01] [00:00:00] Building devel/gettext-runtime | 
> gettext-runtime-0.22_1
> [00:08:24] [01] [00:02:12] Finished devel/gettext-runtime | 
> gettext-runtime-0.22_1: Success
> [00:08:31] [01] [00:00:00] Building devel/libtextstyle | libtextstyle-0.22
> [00:10:06] [05] [00:04:13] Builder started
> [00:10:06] [05] [00:00:00] Building devel/autoconf-switch | 
> autoconf-switch-20220527
> [00:10:06] [31] [00:04:12] Builder started
> [00:10:06] [31] [00:00:00] Building devel/libatomic_ops | libatomic_ops-7.8.0
> . . .
> 
> Crashed again, with 158 *.pkg files in .building/All/ after
> rebooting.
> 
> The crash is similar to the non-debug one. No extra output
> from the debug build.
> 
> For reference:
> 
> Unread portion of the kernel message buffer:
> panic: Solaris(panic): zfs: accessing past end of object 422/10b1c02 
> (size=2560 access=2560+2560)
> . . .

Same world with newer snapshot main kernel that should
be compatible with the world:

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURREN

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-08 Thread Mark Millard

On Sep 8, 2023, at 06:52, Martin Matuska  wrote:

> I digged a little and was able to reproduce the panic without poudriere with 
> a shell script.
> 
> You may want to increase "repeats".
> The script causes the panic in dmu_buf_hold_array_by_dnode() on my VirtualBox 
> with the cat command on 9th iteration.
> 
> Here is the script:
> 
> #!/bin/sh
> nl='
> '
> sed_script=s/aaa/b/
> for ac_i in 1 2 3 4 5 6 7; do
> sed_script="$sed_script$nl$sed_script"
> done
> echo "$sed_script" 2>/dev/null | sed 99q >conftest.sed
> 
> repeats=8
> count=0
> echo -n 0123456789 >"conftest.in"
> while :
> do
> cat "conftest.in" "conftest.in" >"conftest.tmp"
> mv "conftest.tmp" "conftest.in"
> cp "conftest.in" "conftest.nl"
> echo '' >> "conftest.nl"
> sed -f conftest.sed < "conftest.nl" >"conftest.out" 2>/dev/null || break
> diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break
> count=$(($count + 1))
> echo "count: $count"
> # 10*(2^10) chars as input seems more than enough
> test $count -gt $repeats && break
> done
> rm -f conftest.in conftest.tmp conftest.nl conftest.out

. . . (history removed) . . .

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #0 
main-n265205-03a7c36ddbc0: Thu Sep  7 03:10:34 UTC 2023 
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 
amd64 150 150

In my test environment with yesterday's snapshot kernel
in use and with vfs.zfs.bclone_enabled=1 :

# ~/bclone_panic.sh
count: 1
count: 2
count: 3
count: 4
count: 5
count: 6
count: 7
count: 8

then panic: no 9.


===
Mark Millard
marklmi at yahoo.com

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-08 Thread Mark Millard

On Sep 8, 2023, at 15:30, Martin Matuska  wrote:

> I can confirm that the patch fixes the panic caused by the provided script on 
> my test systems.
> Mark, would it be possible to try poudriere on your system with a patched 
> kernel?

. . .

On 9. 9. 2023 0:09, Alexander Motin wrote:
> On 08.09.2023 09:52, Martin Matuska wrote:
>> . . .
> 
> Thank you, Martin.  I was able to reproduce the issue with your script and 
> found the cause.
> 
> I first though the issue is triggered by the `cp`, but it appeared to be 
> triggered by `cat`.  It also got copy_file_range() support, but later than 
> `cp`.  That is probably why it slipped through testing.  This patch fixes it 
> for me: https://github.com/openzfs/zfs/pull/15251 .
> 
> Mark, could you please try the patch?

If all goes well, this will end up reporting that the
poudriere bulk -a is still running but has gotten past,
say, 320+ port->package builds finished (so: more than
double observed so far for the panic context). Later
would be a report with a larger figure. A normal run
I might let go for 6000+ ports and 10 hr or so.

Notes as I go . . .

Patch applied, built, and installed to the test media.
Also, booted:

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #75 
main-n265228-c9315099f69e-dirty: Thu Sep  7 13:28:47 PDT 2023 
root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG
 amd64 amd64 150 150

Note that this is with a debug kernel (-dbg- in path and -DBG in
the GENERIC* name). Also, the vintage of what it is based on has:

git: 969071be938c - main - vfs: copy_file_range() between multiple mountpoints 
of the same fs type

The usual sort of sequencing previously reported to get to this
point. Media update starts with the rewind to the checkpoint in
hopes of avoiding oddities from the later failure.

. . . :

[main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 
34588 Built: 414   Failed: 0 Skipped: 39Ignored: 335   Fetched: 0 
Tobuild: 33800  Time: 00:30:41

So 414 and and still building.

More later. (It may be a while.)

===
Mark Millard
marklmi at yahoo.com

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-08 Thread Mark Millard

On Sep 8, 2023, at 17:03, Mark Millard  wrote:

> On Sep 8, 2023, at 15:30, Martin Matuska  wrote:
> 
>> I can confirm that the patch fixes the panic caused by the provided script 
>> on my test systems.
>> Mark, would it be possible to try poudriere on your system with a patched 
>> kernel?
> 
> . . .
> 
> On 9. 9. 2023 0:09, Alexander Motin wrote:
>> On 08.09.2023 09:52, Martin Matuska wrote:
>>> . . .
>> 
>> Thank you, Martin.  I was able to reproduce the issue with your script and 
>> found the cause.
>> 
>> I first though the issue is triggered by the `cp`, but it appeared to be 
>> triggered by `cat`.  It also got copy_file_range() support, but later than 
>> `cp`.  That is probably why it slipped through testing.  This patch fixes it 
>> for me: https://github.com/openzfs/zfs/pull/15251 .
>> 
>> Mark, could you please try the patch?
> 
> If all goes well, this will end up reporting that the
> poudriere bulk -a is still running but has gotten past,
> say, 320+ port->package builds finished (so: more than
> double observed so far for the panic context). Later
> would be a report with a larger figure. A normal run
> I might let go for 6000+ ports and 10 hr or so.
> 
> Notes as I go . . .
> 
> Patch applied, built, and installed to the test media.
> Also, booted:
> 
> # uname -apKU
> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #75 
> main-n265228-c9315099f69e-dirty: Thu Sep  7 13:28:47 PDT 2023 
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG
>  amd64 amd64 150 150
> 
> Note that this is with a debug kernel (-dbg- in path and -DBG in
> the GENERIC* name). Also, the vintage of what it is based on has:
> 
> git: 969071be938c - main - vfs: copy_file_range() between multiple 
> mountpoints of the same fs type
> 
> The usual sort of sequencing previously reported to get to this
> point. Media update starts with the rewind to the checkpoint in
> hopes of avoiding oddities from the later failure.
> 
> . . . :
> 
> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 
> 34588 Built: 414   Failed: 0 Skipped: 39Ignored: 335   Fetched: 0 
> Tobuild: 33800  Time: 00:30:41
> 
> 
> So 414 and and still building.
> 
> More later. (It may be a while.)
> 

[main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 
34588 Built: 2013  Failed: 2 Skipped: 179   Ignored: 335   Fetched: 0 
Tobuild: 32059  Time: 01:42:47

and still going. (FYI: The failures are expected.)

After a while I might stop it and start over with a non-debug
kernel installed instead.

===
Mark Millard
marklmi at yahoo.com

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-08 Thread Mark Millard

On Sep 8, 2023, at 18:19, Mark Millard  wrote:

> On Sep 8, 2023, at 17:03, Mark Millard  wrote:
> 
>> On Sep 8, 2023, at 15:30, Martin Matuska  wrote:
>> 
>>> I can confirm that the patch fixes the panic caused by the provided script 
>>> on my test systems.
>>> Mark, would it be possible to try poudriere on your system with a patched 
>>> kernel?
>> 
>> . . .
>> 
>> On 9. 9. 2023 0:09, Alexander Motin wrote:
>>> On 08.09.2023 09:52, Martin Matuska wrote:
>>>> . . .
>>> 
>>> Thank you, Martin.  I was able to reproduce the issue with your script and 
>>> found the cause.
>>> 
>>> I first though the issue is triggered by the `cp`, but it appeared to be 
>>> triggered by `cat`.  It also got copy_file_range() support, but later than 
>>> `cp`.  That is probably why it slipped through testing.  This patch fixes 
>>> it for me: https://github.com/openzfs/zfs/pull/15251 .
>>> 
>>> Mark, could you please try the patch?
>> 
>> If all goes well, this will end up reporting that the
>> poudriere bulk -a is still running but has gotten past,
>> say, 320+ port->package builds finished (so: more than
>> double observed so far for the panic context). Later
>> would be a report with a larger figure. A normal run
>> I might let go for 6000+ ports and 10 hr or so.
>> 
>> Notes as I go . . .
>> 
>> Patch applied, built, and installed to the test media.
>> Also, booted:
>> 
>> # uname -apKU
>> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #75 
>> main-n265228-c9315099f69e-dirty: Thu Sep  7 13:28:47 PDT 2023 
>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG
>>  amd64 amd64 150 150
>> 
>> Note that this is with a debug kernel (-dbg- in path and -DBG in
>> the GENERIC* name). Also, the vintage of what it is based on has:
>> 
>> git: 969071be938c - main - vfs: copy_file_range() between multiple 
>> mountpoints of the same fs type
>> 
>> The usual sort of sequencing previously reported to get to this
>> point. Media update starts with the rewind to the checkpoint in
>> hopes of avoiding oddities from the later failure.
>> 
>> . . . :
>> 
>> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 
>> 34588 Built: 414   Failed: 0 Skipped: 39Ignored: 335   Fetched: 0
>>  Tobuild: 33800  Time: 00:30:41
>> 
>> 
>> So 414 and and still building.
>> 
>> More later. (It may be a while.)
>> 
> 
> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 
> 34588 Built: 2013  Failed: 2 Skipped: 179   Ignored: 335   Fetched: 0 
> Tobuild: 32059  Time: 01:42:47
> 
> and still going. (FYI: The failures are expected.)
> 
> After a while I might stop it and start over with a non-debug
> kernel installed instead.

I did ^C after 2.5 hr (with 2447 built):

^C[02:30:05] Error: Signal SIGINT caught, cleaning up and exiting
[main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [sigint:] Queued: 34588 
Built: 2447  Failed: 5 Skipped: 226   Ignored: 335   Fetched: 0 
Tobuild: 31575  Time: 02:29:59
[02:30:05] Logs: 
/usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-08_16h31m51s
[02:30:05] Cleaning up
[02:38:04] Unmounting file systems
Exiting with status 1

I'll switch it over to a non-debug kernel and, probably, world
and setup/run another test.

. . . (time goes by) . . .

Hmm. This did not get sent when I wrote the above. FYI, non-debug
test status:

[main-amd64-bulk_a-default] [2023-09-08_19h51m52s] [parallel_build:] Queued: 
34588 Built: 2547  Failed: 5 Skipped: 239   Ignored: 335   Fetched: 0 
Tobuild: 31462  Time: 01:59:58

I may let it run overnight.


===
Mark Millard
marklmi at yahoo.com

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-09 Thread Mark Millard

On Sep 8, 2023, at 21:54, Mark Millard  wrote:

> On Sep 8, 2023, at 18:19, Mark Millard  wrote:
> 
>> On Sep 8, 2023, at 17:03, Mark Millard  wrote:
>> 
>>> On Sep 8, 2023, at 15:30, Martin Matuska  wrote:
>>> 
>>>> I can confirm that the patch fixes the panic caused by the provided script 
>>>> on my test systems.
>>>> Mark, would it be possible to try poudriere on your system with a patched 
>>>> kernel?
>>> 
>>> . . .
>>> 
>>> On 9. 9. 2023 0:09, Alexander Motin wrote:
>>>> On 08.09.2023 09:52, Martin Matuska wrote:
>>>>> . . .
>>>> 
>>>> Thank you, Martin.  I was able to reproduce the issue with your script and 
>>>> found the cause.
>>>> 
>>>> I first though the issue is triggered by the `cp`, but it appeared to be 
>>>> triggered by `cat`.  It also got copy_file_range() support, but later than 
>>>> `cp`.  That is probably why it slipped through testing.  This patch fixes 
>>>> it for me: https://github.com/openzfs/zfs/pull/15251 .
>>>> 
>>>> Mark, could you please try the patch?
>>> 
>>> If all goes well, this will end up reporting that the
>>> poudriere bulk -a is still running but has gotten past,
>>> say, 320+ port->package builds finished (so: more than
>>> double observed so far for the panic context). Later
>>> would be a report with a larger figure. A normal run
>>> I might let go for 6000+ ports and 10 hr or so.
>>> 
>>> Notes as I go . . .
>>> 
>>> Patch applied, built, and installed to the test media.
>>> Also, booted:
>>> 
>>> # uname -apKU
>>> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #75 
>>> main-n265228-c9315099f69e-dirty: Thu Sep  7 13:28:47 PDT 2023 
>>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG
>>>  amd64 amd64 150 150
>>> 
>>> Note that this is with a debug kernel (-dbg- in path and -DBG in
>>> the GENERIC* name). Also, the vintage of what it is based on has:
>>> 
>>> git: 969071be938c - main - vfs: copy_file_range() between multiple 
>>> mountpoints of the same fs type
>>> 
>>> The usual sort of sequencing previously reported to get to this
>>> point. Media update starts with the rewind to the checkpoint in
>>> hopes of avoiding oddities from the later failure.
>>> 
>>> . . . :
>>> 
>>> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] 
>>> Queued: 34588 Built: 414   Failed: 0 Skipped: 39Ignored: 335   
>>> Fetched: 0 Tobuild: 33800  Time: 00:30:41
>>> 
>>> 
>>> So 414 and and still building.
>>> 
>>> More later. (It may be a while.)
>>> 
>> 
>> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [parallel_build:] Queued: 
>> 34588 Built: 2013  Failed: 2 Skipped: 179   Ignored: 335   Fetched: 0
>>  Tobuild: 32059  Time: 01:42:47
>> 
>> and still going. (FYI: The failures are expected.)
>> 
>> After a while I might stop it and start over with a non-debug
>> kernel installed instead.
> 
> I did ^C after 2.5 hr (with 2447 built):
> 
> ^C[02:30:05] Error: Signal SIGINT caught, cleaning up and exiting
> [main-amd64-bulk_a-default] [2023-09-08_16h31m51s] [sigint:] Queued: 34588 
> Built: 2447  Failed: 5 Skipped: 226   Ignored: 335   Fetched: 0 
> Tobuild: 31575  Time: 02:29:59
> [02:30:05] Logs: 
> /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-08_16h31m51s
> [02:30:05] Cleaning up
> [02:38:04] Unmounting file systems
> Exiting with status 1
> 
> I'll switch it over to a non-debug kernel and, probably, world
> and setup/run another test.
> 
> . . . (time goes by) . . .
> 
> Hmm. This did not get sent when I wrote the above. FYI, non-debug
> test status:
> 
> [main-amd64-bulk_a-default] [2023-09-08_19h51m52s] [parallel_build:] Queued: 
> 34588 Built: 2547  Failed: 5 Skipped: 239   Ignored: 335   Fetched: 0 
> Tobuild: 31462  Time: 01:59:58
> 
> I may let it run overnight.

I finally stopped it at 7473 built (a little over 13 hrs elapsed):

^C[13:08:30] Error: Signal SIGINT caught, cleaning up and exiting
[main-amd64-bulk_a-default] [2023-09-08_19h51m52s] [sigint:] Queued: 34588 
Built: 7473  Failed: 23Skipped: 798   Ignored: 335   Fetched: 0 
Tobuild: 25959  Time: 13:08:26
[13:08:30] Logs: 
/usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-08

Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-10 Thread Mark Millard

kyua tests that use the:

/usr/tests/sys/cddl/zfs/bin/mkfile

program like so (for example):

mkfile 500M /testpool.1861/bigfile.0

(which should be valid) end up with mkfile
instead reporting:

Standard error:
Usage: mkfile [-nv] [e|p|t|g|m|k|b]  ...

which prevent the kyua test involved from working.

Turns out this is from expecting char to be always
signed (so a -1 vs. 255 distinction, here in an
aarch64 context):

. . .
(gdb) list
179 /* Options. */
180 while ((ch = getopt(argc, argv, "nv")) != -1) {
181 switch (ch) {
182 case 'n':
183 nofill = 1;
184 break;
185 case 'v':
(gdb) print ch
$16 = 255 '\377'
(gdb) print/x -1
$17 = 0x
(gdb) print/x ch
$18 = 0xff
. . .

With the mix of unsigned and signed it ends up
being a 0xffu != 0xu test, which is
always true.

So the switch is reached as if a "-" prefix was
present (that is not). Then the "option" is classified
as invalid and the usage message is produced.

Apparently no one had noticed. That, in turn, suggests a
lack of inspected testing on aarch64, powerpc64,
powerpc64le, armv7, powerpc, and powerpcspe. That, in
turn, suggests that kyua test inspection for the likes
of aarch64 is not historically a part of the release
process for openzfs or for operating systems that include
openzfs.


===
Mark Millard
marklmi at yahoo.com

Re: Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-10 Thread Mark Millard

On Sep 10, 2023, at 05:58, Mike Karels  wrote:

> On 10 Sep 2023, at 2:31, Mark Millard wrote:
> 
>> kyua tests that use the:
>> 
>> /usr/tests/sys/cddl/zfs/bin/mkfile
>> 
>> program like so (for example):
>> 
>> mkfile 500M /testpool.1861/bigfile.0
>> 
>> (which should be valid) end up with mkfile
>> instead reporting:
>> 
>> Standard error:
>> Usage: mkfile [-nv] [e|p|t|g|m|k|b]  ...
>> 
>> which prevent the kyua test involved from working.
>> 
>> Turns out this is from expecting char to be always
>> signed (so a -1 vs. 255 distinction, here in an
>> aarch64 context):
>> 
>> . . .
>> (gdb) list
>> 179 /* Options. */
>> 180 while ((ch = getopt(argc, argv, "nv")) != -1) {
>> 181 switch (ch) {
>> 182 case 'n':
>> 183 nofill = 1;
>> 184 break;
>> 185 case 'v':
>> (gdb) print ch
>> $16 = 255 '\377'
>> (gdb) print/x -1
>> $17 = 0x
>> (gdb) print/x ch
>> $18 = 0xff
>> . . .
>> 
>> With the mix of unsigned and signed it ends up
>> being a 0xffu != 0xu test, which is
>> always true.
> 
> mkfile is broken.  getopt returns an int, and -1 on end.
> It never returns 0xff.  But mkfile declares ch as char,
> which truncates the return value -1.  ch is a bad (misleading)
> variable name, although getopt(3) uses it as well (but declared
> as int).

Yep: for char being signed, the code is still wrong
via the char ch use. But the observed behavior is
very different than for char being used but being
unsigned. In this context, consequences of the
unsigned char behavioral results are observable in
the kyua run results but went unnoticed.

I used to run into examples of the use of unsigned
char for holding the getopt result back in my powerpc
days as well and dealt with upstreams for a port or 2
for getting it fixed after finding such was the source
of odd behavior I'd observed. If I remember right,
this is the first example of running into the specific
issue in my aarch64 and armv7 time frame.

> Mike
> 
>> So the switch is reached as if a "-" prefix was
>> present (that is not). Then the "option" is classified
>> as invalid and the usage message is produced.
>> 
>> Apparently no one had noticed. That, in turn, suggests a
>> lack of inspected testing on aarch64, powerpc64,
>> powerpc64le, armv7, powerpc, and powerpcspe. That, in
>> turn, suggests that kyua test inspection for the likes
>> of aarch64 is not historically a part of the release
>> process for openzfs or for operating systems that include
>> openzfs.
> 


===
Mark Millard
marklmi at yahoo.com

Re: Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-10 Thread Mark Millard

On Sep 10, 2023, at 00:31, Mark Millard  wrote:

> kyua tests that use the:
> 
> /usr/tests/sys/cddl/zfs/bin/mkfile
> 
> program like so (for example):
> 
> mkfile 500M /testpool.1861/bigfile.0
> 
> (which should be valid) end up with mkfile
> instead reporting:
> 
> Standard error:
> Usage: mkfile [-nv] [e|p|t|g|m|k|b]  ...
> 
> which prevent the kyua test involved from working.
> 
> Turns out this is from expecting char to be always
> signed (so a -1 vs. 255 distinction, here in an
> aarch64 context):
> 
> . . .
> (gdb) list
> 179 /* Options. */
> 180 while ((ch = getopt(argc, argv, "nv")) != -1) {
> 181 switch (ch) {
> 182 case 'n':
> 183 nofill = 1;
> 184 break;
> 185 case 'v':
> (gdb) print ch
> $16 = 255 '\377'
> (gdb) print/x -1
> $17 = 0x
> (gdb) print/x ch
> $18 = 0xff
> . . .
> 
> With the mix of unsigned and signed it ends up
> being a 0xffu != 0xu test, which is
> always true.
> 
> So the switch is reached as if a "-" prefix was
> present (that is not). Then the "option" is classified
> as invalid and the usage message is produced.
> 
> Apparently no one had noticed. That, in turn, suggests a
> lack of inspected testing on aarch64, powerpc64,
> powerpc64le, armv7, powerpc, and powerpcspe. That, in
> turn, suggests that kyua test inspection for the likes
> of aarch64 is not historically a part of the release
> process for openzfs or for operating systems that include
> openzfs.
> 

Looks like the mkfile.c traces back to a former port
sysutils/mkfile that was unfetchable as of 2019. And,
looking around, it seems the kyua zfs tests may be a
FreeBSD only thing, not adopted in openzfs.

So various implicit assumptions when I wrote the note
do not actually hold.

FreeBSD would have to do additional testing via kyua,
beyond what openzfs does for testing, to discover the
unsigned char related mis-behavior in the mkfile that
FreeBSD's kyua tests use. Only FreeBSD variants are
likely to have a similar status, not general openzfs
including operating systems.

===
Mark Millard
marklmi at yahoo.com

Re: Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-10 Thread Mark Millard

On Sep 10, 2023, at 11:21, Warner Losh  wrote:

> On Sun, Sep 10, 2023, 11:10 AM Mark Millard  wrote:
>> On Sep 10, 2023, at 00:31, Mark Millard  wrote:
>> 
>> > kyua tests that use the:
>> > 
>> > /usr/tests/sys/cddl/zfs/bin/mkfile
>> > 
>> > program like so (for example):
>> > 
>> > mkfile 500M /testpool.1861/bigfile.0
>> > 
>> > (which should be valid) end up with mkfile
>> > instead reporting:
>> > 
>> > Standard error:
>> > Usage: mkfile [-nv] [e|p|t|g|m|k|b]  ...
>> > 
>> > which prevent the kyua test involved from working.
>> > 
>> > Turns out this is from expecting char to be always
>> > signed (so a -1 vs. 255 distinction, here in an
>> > aarch64 context):
>> > 
>> > . . .
>> > (gdb) list
>> > 179 /* Options. */
>> > 180 while ((ch = getopt(argc, argv, "nv")) != -1) {
>> > 181 switch (ch) {
>> > 182 case 'n':
>> > 183 nofill = 1;
>> > 184 break;
>> > 185 case 'v':
>> > (gdb) print ch
>> > $16 = 255 '\377'
>> > (gdb) print/x -1
>> > $17 = 0x
>> > (gdb) print/x ch
>> > $18 = 0xff
>> > . . .
>> > 
>> > With the mix of unsigned and signed it ends up
>> > being a 0xffu != 0xu test, which is
>> > always true.
>> > 
>> > So the switch is reached as if a "-" prefix was
>> > present (that is not). Then the "option" is classified
>> > as invalid and the usage message is produced.
>> > 
>> > Apparently no one had noticed. That, in turn, suggests a
>> > lack of inspected testing on aarch64, powerpc64,
>> > powerpc64le, armv7, powerpc, and powerpcspe. That, in
>> > turn, suggests that kyua test inspection for the likes
>> > of aarch64 is not historically a part of the release
>> > process for openzfs or for operating systems that include
>> > openzfs.
>> > 
>> 
>> Looks like the mkfile.c traces back to a former port
>> sysutils/mkfile that was unfetchable as of 2019. And,
>> looking around, it seems the kyua zfs tests may be a
>> FreeBSD only thing, not adopted in openzfs.
>> 
>> So various implicit assumptions when I wrote the note
>> do not actually hold.
>> 
>> FreeBSD would have to do additional testing via kyua,
>> beyond what openzfs does for testing, to discover the
>> unsigned char related mis-behavior in the mkfile that
>> FreeBSD's kyua tests use. Only FreeBSD variants are
>> likely to have a similar status, not general openzfs
>> including operating systems.
> 
> I wonder how hard ot would be to look for the char = getopt() pattern with 
> coccinelle
> 

Unsure.

But to be sure that the implication that I was also trying
to point out is not lost: kyua testing of zfs (and more?)
for aarch64 (tier 1) is apparently not being done (or at
least the results are not being inspected). Similarly for
armv7 and all the powerpc*'s (not tier 1's, however, so
not as surprising).


Side note:

Via other exchanges that have been going on I learned to
look in the likes of:

https://ci.freebsd.org/job/FreeBSD-main-amd64-testvm/*/consoleText

for what to "pkg install" for kyua test runs to use for
normal runs (at least the subset compatible with
architecture being tested). I'd only figured out a (large)
subset previously for aarch64 and armv7.

I'm not aware of there being other documentation for what
is appropriate for setting up such for kyua runs.

===
Mark Millard
marklmi at yahoo.com

Re: Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-11 Thread Mark Millard

On Sep 10, 2023, at 23:57, Dag-Erling Smørgrav  wrote:

> Mark Millard  writes:
>> I'm not aware of there being other documentation for what
>> is appropriate for setting up such for kyua runs.
> 
> https://github.com/freebsd/freebsd-ci/blob/master/scripts/build/build-test_image-head.sh#L69-L84
> 

Thanks for the reference that does not involve looking at
CI log files. Filed away for future references.


Side note . . .

Turns out that tcptestsuite does not build for aarch64
do to alignment problems via packing in net/packetdrill :

In file included from run_packet.c:45:
In file included from ./tcp_options_iterator.h:31:
./tcp_options.h:108:2: error: field  within 'struct tcp_option' is less aligned 
than 'union tcp_option::(anonymous at ./tcp_options.h:108:2)' and is usually 
due to 'struct tcp_option' being packed, which can lead to unaligned accesses 
[-Werror,-Wunaligned-access]
   union {
   ^
--- sctp_iterator.o ---
cc  -O2 -pipe -mcpu=cortex-a7  -Wno-deprecated -g -fstack-protector-strong 
-fno-strict-aliasing  -mcpu=cortex-a7 -Wall -Werror -g -c sctp_iterator.c -o 
sctp_iterator.o
--- tcp_options.o ---
cc  -O2 -pipe -mcpu=cortex-a7  -Wno-deprecated -g -fstack-protector-strong 
-fno-strict-aliasing  -mcpu=cortex-a7 -Wall -Werror -g -c tcp_options.c -o 
tcp_options.o
--- run_packet.o ---
1 error generated.
*** [run_packet.o] Error code 1

make[1]: stopped in 
/wrkdirs/usr/ports/net/packetdrill/work/packetdrill-aebdc35/gtests/net/packetdrill
--- tcp_options.o ---
In file included from tcp_options.c:25:
./tcp_options.h:108:2: error: field  within 'struct tcp_option' is less aligned 
than 'union tcp_option::(anonymous at ./tcp_options.h:108:2)' and is usually 
due to 'struct tcp_option' being packed, which can lead to unaligned accesses 
[-Werror,-Wunaligned-access]
   union {
   ^
1 error generated.
*** [tcp_options.o] Error code 1

make[1]: stopped in 
/wrkdirs/usr/ports/net/packetdrill/work/packetdrill-aebdc35/gtests/net/packetdrill
2 errors


===
Mark Millard
marklmi at yahoo.com

Re: Looks like the kyua zfs tests likely are not used on aarch64 or other contexts with unsigned char

2023-09-11 Thread Mark Millard

On Sep 11, 2023, at 00:03, Mark Millard  wrote:

> On Sep 10, 2023, at 23:57, Dag-Erling Smørgrav  wrote:
> 
>> Mark Millard  writes:
>>> I'm not aware of there being other documentation for what
>>> is appropriate for setting up such for kyua runs.
>> 
>> https://github.com/freebsd/freebsd-ci/blob/master/scripts/build/build-test_image-head.sh#L69-L84
>> 
> 
> Thanks for the reference that does not involve looking at
> CI log files. Filed away for future references.
> 
> 
> Side note . . .
> 
> Turns out that tcptestsuite does not build for aarch64
> do to alignment problems via packing in net/packetdrill :
> 
> In file included from run_packet.c:45:
> In file included from ./tcp_options_iterator.h:31:
> ./tcp_options.h:108:2: error: field  within 'struct tcp_option' is less 
> aligned than 'union tcp_option::(anonymous at ./tcp_options.h:108:2)' and is 
> usually due to 'struct tcp_option' being packed, which can lead to unaligned 
> accesses [-Werror,-Wunaligned-access]
>   union {
>   ^
> --- sctp_iterator.o ---
> cc  -O2 -pipe -mcpu=cortex-a7

Looks like I messed up and reported an armv7 context.
aarch64 built net/packetdrill and net/tcptestsuite just
fine. Sorry for the noise.

>  -Wno-deprecated -g -fstack-protector-strong -fno-strict-aliasing  
> -mcpu=cortex-a7 -Wall -Werror -g -c sctp_iterator.c -o sctp_iterator.o
> --- tcp_options.o ---
> cc  -O2 -pipe -mcpu=cortex-a7  -Wno-deprecated -g -fstack-protector-strong 
> -fno-strict-aliasing  -mcpu=cortex-a7 -Wall -Werror -g -c tcp_options.c -o 
> tcp_options.o
> --- run_packet.o ---
> 1 error generated.
> *** [run_packet.o] Error code 1
> 
> make[1]: stopped in 
> /wrkdirs/usr/ports/net/packetdrill/work/packetdrill-aebdc35/gtests/net/packetdrill
> --- tcp_options.o ---
> In file included from tcp_options.c:25:
> ./tcp_options.h:108:2: error: field  within 'struct tcp_option' is less 
> aligned than 'union tcp_option::(anonymous at ./tcp_options.h:108:2)' and is 
> usually due to 'struct tcp_option' being packed, which can lead to unaligned 
> accesses [-Werror,-Wunaligned-access]
>   union {
>   ^
> 1 error generated.
> *** [tcp_options.o] Error code 1
> 
> make[1]: stopped in 
> /wrkdirs/usr/ports/net/packetdrill/work/packetdrill-aebdc35/gtests/net/packetdrill
> 2 errors
> 



===
Mark Millard
marklmi at yahoo.com

sys/net/if_lagg_test:status_stress can lead to use-after-free in main (both before and after stable/14 was created), at least on aarch64

2023-09-13 Thread Mark Millard

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273081#c5
and the backtrace in the prior comment.

The test context is aarch64.

Kyle Evans provided a kgdb patch for devel/gdb for aarch64 that
that finally let me track this down to the level of detail on
how to interpret the register values reported vs. what code
involved using the value.

I will say that I've not managed to produce the crash with
14.0-BETA1. But I have produced the crash in my personal
non-debug kernel builds and with the main snapshots dd'd to
media and booted and used.

===
Mark Millard
marklmi at yahoo.com

FYI: RPi4B via ACPI style boot suggests "armv8crypto0: CPU lacks AES instructions" can lead to a hung up boot sequence in 14.0-BETA2

2023-09-21 Thread Mark Millard

See: https://lists.freebsd.org/archives/freebsd-arm/2023-September/003071.html
for details. (It is not my activity.)

===
Mark Millard
marklmi at yahoo.com

Re: How to Boot FreeBSD Using pftf/RPi4 UEFI (I got: "panic: ram_attach: resource 5 failed to attach" from FreeBSD-14.0-BETA3)

2023-09-22 Thread Mark Millard

[Mitchell H.: I think this has exposed a possibly general issue
not specific to RPi*'s, despite the UEFI/ACPI booting of RPi*'s
not being officially supported. See the "BOOT -V RELATED MATERIAL"
section towards the end, skipping the earlier explorations.]

On Sep 22, 2023, at 08:39, Mark Millard  wrote:

> On Sep 22, 2023, at 01:02, ykla  wrote:
> 
>> But who test  FreeBSD-14.0-BETA2-arm64-aarch64-disc1.iso on UEFI on rpi4b?
> 
> I might get to this this weekend or tonight (local time).
> 
> But, as I do not normally deal with FreeBSD-14.0-*-arm64-aarch64-disc1.iso
> for RPi4B's, could you list step by step instructions so that I'm sure to
> test what you tested in reasonable detail? Please make the step-by-step
> instructions be for having the serial console working.
> 
> (My use of any FreeBSD-*.iso has been historically rare.)
> 
> Most likely FreeBSD-14.0-BETA3-arm64-aarch64-disc1.iso will be available
> by the time I get to this. So that is likely what I'd test.

I'll also note that FreeBSD makes no claim to
support pftf/RPi4 UEFI : official support is via
the U-Boot port that is used for the aarch64 RPI
specific images.

I'll note that the RPi4B here is a 8 GiByte one,
a modern "C0T" one that does not require the
special bounce buffering that was used to avoid
the wrapper logic error that limited some address
ranges in "B0T" parts for specific types of
activity. (But, bounce buffering should still
work.)

As for attempting to use pftf/RPi4 UEFI . . .
(I've no clue how well this matches your procedure.)

Prepare microsd card to have just pftf/RPi4 UEFI :

# gpart show -p da3
=>  63  62521281da3  MBR  (30G)
63 40897 - free -  (20M)
 40960102400  da3s1  fat32lba  (50M)
143360  62377984 - free -  (30G)

# mount -onoatime -tmsdosfs /dev/da3s1 /mnt

# ls -Tloa /mnt/
total 9
drwxr-xr-x   1 root wheel - 16384 Dec 31 16:00:00 1979 .
drwxr-xr-x  63 root wheel uarch70 Sep 21 10:15:27 2023 ..

# tar -xpf RPi4_UEFI_Firmware_v1.35.zip -C /mnt/
RPI_EFI.fd: Can't set user=1001/group=123 for RPI_EFI.fd: Invalid argument
bcm2711-rpi-4-b.dtb: Can't set user=1001/group=123 for bcm2711-rpi-4-b.dtb: 
Invalid argument
bcm2711-rpi-400.dtb: Can't set user=1001/group=123 for bcm2711-rpi-400.dtb: 
Invalid argument
bcm2711-rpi-cm4.dtb: Can't set user=1001/group=123 for bcm2711-rpi-cm4.dtb: 
Invalid argument
config.txt: Can't set user=1001/group=123 for config.txt: Invalid argument
fixup4.dat: Can't set user=1001/group=123 for fixup4.dat: Invalid argument
start4.elf: Can't set user=1001/group=123 for start4.elf: Invalid argument
overlays/: Can't set user=1001/group=123 for overlays: Invalid argument
overlays/upstream-pi4.dtbo: Can't set user=1001/group=123 for 
overlays/upstream-pi4.dtbo: Invalid argument
overlays/miniuart-bt.dtbo: Can't set user=1001/group=123 for 
overlays/miniuart-bt.dtbo: Invalid argument
Readme.md: Can't set user=1001/group=123 for Readme.md: Invalid argument
firmware/: Can't set user=1001/group=123 for firmware: Invalid argument
firmware/Readme.txt: Can't set user=1001/group=123 for firmware/Readme.txt: 
Invalid argument
firmware/brcm/: Can't set user=1001/group=123 for firmware/brcm: Invalid 
argument
firmware/brcm/brcmfmac43455-sdio.txt: Can't set user=1001/group=123 for 
firmware/brcm/brcmfmac43455-sdio.txt: Invalid argument
firmware/brcm/brcmfmac43455-sdio.clm_blob: Can't set user=1001/group=123 for 
firmware/brcm/brcmfmac43455-sdio.clm_blob: Invalid argument
firmware/brcm/brcmfmac43455-sdio.bin: Can't set user=1001/group=123 for 
firmware/brcm/brcmfmac43455-sdio.bin: Invalid argument
firmware/brcm/brcmfmac43455-sdio.Raspberry: Can't set user=1001/group=123 for 
firmware/brcm/brcmfmac43455-sdio.Raspberry: Invalid argument
firmware/LICENCE.txt: Can't set user=1001/group=123 for firmware/LICENCE.txt: 
Invalid argument
tar: Error exit delayed from previous errors.

# find -s /mnt/ -print
/mnt/
/mnt/RPI_EFI.fd
/mnt/Readme.md
/mnt/bcm2711-rpi-4-b.dtb
/mnt/bcm2711-rpi-400.dtb
/mnt/bcm2711-rpi-cm4.dtb
/mnt/config.txt
/mnt/firmware
/mnt/firmware/LICENCE.txt
/mnt/firmware/Readme.txt
/mnt/firmware/brcm
/mnt/firmware/brcm/brcmfmac43455-sdio.Raspberry
/mnt/firmware/brcm/brcmfmac43455-sdio.bin
/mnt/firmware/brcm/brcmfmac43455-sdio.clm_blob
/mnt/firmware/brcm/brcmfmac43455-sdio.txt
/mnt/fixup4.dat
/mnt/overlays
/mnt/overlays/miniuart-bt.dtbo
/mnt/overlays/upstream-pi4.dtbo
/mnt/start4.elf

# umount /mnt/

Prepare separate USB3 media to hold the *.iso content:

# dd if=FreeBSD-14.0-BETA3-arm64-aarch64-disc1.iso of=/dev/da0 bs=1m 
conv=fsync,sync status=progress
  855638016 bytes (856 MB, 816 MiB) transferred 7.097s, 121 MB/s
933+0 records in
933+0 records out
978321408 bytes transferred in 7.956494 secs (122958854 bytes/sec)

Note: the efi part

RE: nvd->nda switch and blocksize changes for ZFS

2023-09-23 Thread Mark Millard

Frank Behrens  wrote on
Date: Sat, 23 Sep 2023 16:31:40 UTC :

> I created a zpool with a FreeBSD-14.0-CURRENT on February. With 
> 15.0-CURRENT/14.0-STABLE from now I get the message:
> 
> status: One or more devices are configured to use a non-native block size.
> Expect reduced performance.
> action: Replace affected devices with devices that support the
> configured block size, or migrate data to a properly configured
> pool.
> NAMESTATE READ WRITE CKSUM
> zsysONLINE   0 0 0
>   raidz1-0  ONLINE   0 0 0
> nda0p4  ONLINE   0 0 0  block size: 4096B 
> configured, 16384B native
> nda1p4  ONLINE   0 0 0  block size: 4096B 
> configured, 16384B native
> nda2p4  ONLINE   0 0 0  block size: 4096B 
> configured, 16384B native
> 
> I use:
> nda0: 
> nda0: nvme version 1.4
> nda0: 953869MB (1953525168 512 byte sectors)
> 
> I cannot imagine, that the native blocksize changed. Do I really expect 
> a reduced performance?
> Is it advisable to switch back to nvd?

Looking at: https://www.techpowerup.com/ssd-specs/samsung-980-1-tb.d58

it reports (indifferent places on the page):

QUOTE
Page Size: 16 KB

Notes
NAND Die:
A Dual-plane Die with 2 sub-planes with 8 KiB pages in order to improve 
performance through paralellism.
Endurance: Could be from 1.500 to 3.000 P.E.C. depending on NAND binning
END QUOTE

That "A Dual-plane Die with 2 sub-planes with 8 KiB pages", for a total
of 16 KB, does suggest to me that the new messages have a chance of
being correct about there being a tradeoff. (But I'm no expert in the
area.)

===
Mark Millard
marklmi at yahoo.com

15 & 14: ram_attach vs. its using regions_to_avail vs. "bus_alloc_resource" can lead to: panic("ram_attach: resource %d failed to attach", rid)

2023-09-30 Thread Mark Millard

ram_attach is based on regions_to_avail but that is a problem for
its later bus_alloc_resource use --and that can lead to:

panic("ram_attach: resource %d failed to attach", rid);

Unfortunately, the known example is use of EDK2 on RPi4B
class systems, not what is considered the supported way.
The panic happens for main [so: 15] and will happen once
the cortex-a72 handling in 14.0-* is in a build fixed by:

• git: 906bcc44641d - releng/14.0 - arm64: Fix errata workarounds that 
depend on smccc Andrew Turner

The lack of the fix leads to an earlier panic as stands.


sys/kern/subr_physmem.c 's regions_to_avail is based on ignoring
phys_avail and using only hwregions and exregions. In other words,
in part:

 * Initially dump_avail and phys_avail are identical.  Boot time memory
 * allocations remove extents from phys_avail that may still be included
 * in dumps.

This means that early, dedicated memory allocations are treated
as available for general use by regions_to_avail . The distinction
is visible in the  boot -v output in that:

real memory  = 3138154496 (2992 MB)
Physical memory chunk(s):
0x20 - 0x002b7f, 727711744 bytes (177664 pages)
0x002ce3a000 - 0x003385, 111304704 bytes (27174 pages)
0x00338c - 0x00338c6fff, 28672 bytes (7 pages)
0x0033a3 - 0x0036ef, 55377920 bytes (13520 pages)
0x00372e - 0x003b2f, 67239936 bytes (16416 pages)
0x004000 - 0x00bb3dcfff, 2067648512 bytes (504797 pages)
avail memory = 3027378176 (2887 MB)

does not list the wider:

0x004000 - 0x00bfff

because of phys_avail . But the earlier dump based on hwregions and
exregions shows:

Physical memory chunk(s):
  0x001d - 0x001e, 0 MB ( 32 pages)
  0x0020 - 0x338c6fff,   822 MB ( 210631 pages)
  0x3392 - 0x3b2f,   121 MB (  31200 pages)
  0x4000 - 0xbfff,  2048 MB ( 524288 pages)
Excluded memory regions:
  0x001d - 0x001e, 0 MB ( 32 pages) NoAlloc 
  0x2b80 - 0x2ce39fff,22 MB (   5690 pages) NoAlloc 
  0x3386 - 0x338b, 0 MB ( 96 pages) NoAlloc 
  0x3392 - 0x33a2, 1 MB (272 pages) NoAlloc 
  0x36f0 - 0x372d, 3 MB (992 pages) NoAlloc 

which indicates:

  0x4000 - 0xbfff

is available as far as it is concerned.

(Note some code works/displays in terms of: 0x4000 - 0xc000
instead.)

For aarch64 , sys/arm64/arm64/nexus.c has a nexus_alloc_resource
that is used as bus_alloc_resource . It ends up rejecting the
RPi4B boot via using the result of the call in ram_attach:

if (bus_alloc_resource(dev, SYS_RES_MEMORY, &rid, start, end,
end - start, 0) == NULL)
panic("ram_attach: resource %d failed to attach", rid);

as shown by the just-prior start/end pair sequence messages:

ram0: reserving memory region:   20-2b80
ram0: reserving memory region:   2ce3a000-3386
ram0: reserving memory region:   338c-338c7000
ram0: reserving memory region:   33a3-36f0
ram0: reserving memory region:   372e-3b30
ram0: reserving memory region:   4000-c000
panic: ram_attach: resource 5 failed to attach

I do not see anything about this that looks inherently RPi*
specific for possibly ending up with an analogous panic. So
I expect the example is sufficient context to identify a
problem is present, despite EDK2 use not being normal for
RPi4B's and the like as far as FreeBSD is concerned.

===
Mark Millard
marklmi at yahoo.com

RE: Base libc++ missing symbol

2023-10-02 Thread Mark Millard

Joel Bodenmann  wrote on
Date: Mon, 02 Oct 2023 20:00:29 UTC :

> It seems like I finally managed to hose a FreeBSD system.
> The machine in question is my workstation at home. It has been running
> stable/13 without any problems. Yesterday I've updated to
> ef295f69abbffb3447771a30df6906ca56a5d0c0 and since then I'm getting an
> undefined symbol on anything using Qt:
> 
> ld-elf.so.1: /usr/local/lib/qt5/libQt5Widgets.so.5: Undefined symbol
> "_ZTVNSt3__13pmr25monotonic_buffer_resourceE"
> 
> Unless I'm missing something, it would seem like my base libc++
> is missing the pmr::monotonic_buffer_resource symbol.

I do not have a 13.2 context, so you may want to run the
analogous steps in your context for confirming/denying
the below applies.

# llvm-cxxfilt _ZTVNSt3__13pmr25monotonic_buffer_resourceE
vtable for std::__1::pmr::monotonic_buffer_resource

Using the example "Run this code" source from:

https://en.cppreference.com/w/cpp/memory/monotonic_buffer_resource

# c++ -std=c++17 -pedantic -O2 monotonic_buffer_resource.cpp

# objdump -x a.out | grep _ZTVNSt3__13pmr25monotonic_buffer_resourceE
00204160 g O .bss.rel.ro 0038 
_ZTVNSt3__13pmr25monotonic_buffer_resourceE

# nm a.out | grep _ZTVNSt3__13pmr25monotonic_buffer_resourceE
00204160 B _ZTVNSt3__13pmr25monotonic_buffer_resourceE

# ./a.out
t1 (default std alloc): 0.491 sec; t1/t1: 1.000
t2 (default pmr alloc): 0.541 sec; t1/t2: 0.906
t3 (pmr alloc  no buf): 0.188 sec; t1/t3: 2.616
t4 (pmr alloc and buf): 0.155 sec; t1/t4: 3.172

Note that the vtable is in the a.out instead of being from
a library. It is global but is in the a.out .bss.rel.ro <http://bss.rel.ro/> in
the example and is defined.

> At first I thought I might have messed up on installworld but rolling
> back to the previous boot environment and then performing the same
> procedure again lead to the same outcome.

If the above works similarly in your context, then I expect
that the issue is on the qt5 or port side of things, not the
system libraries/headers.

As I understand, clang++ 16 is the first vintage with this
directly supported, instead of being just in the experimental
category/area for libc++. May be tracking that transition is
at issue.

For reference:

# c++ -v
FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git 
llvmorg-16.0.6-0-g7cbf1a259152)
Target: x86_64-unknown-freebsd15.0
Thread model: posix
InstalledDir: /usr/bin

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT #124 
main-n265447-e5236d25f2c0-dirty: Thu Sep 21 09:06:08 PDT 2023 
root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
 amd64 amd64 151 151

> Any ideas or wild guesses? Anything obvious I'm missing here?
> 
> uname -a
> FreeBSD beefy02 13.2-STABLE FreeBSD 13.2-STABLE
> stable/13-n256443-ef295f69abbf GENERIC amd64
> 
> freebsd-version -kru
> 13.2-STABLE
> 13.2-STABLE
> 13.2-STABLE
> 
> clang --version
> FreeBSD clang version 16.0.6
> (https://github.com/llvm/llvm-project.git
> llvmorg-16.0.6-0-g7cbf1a259152) Target: x86_64-unknown-freebsd13.2
> Thread model: posix
> InstalledDir: /usr/bin

===
Mark Millard
marklmi at yahoo.com

Re: Base libc++ missing symbol

2023-10-02 Thread Mark Millard

On Oct 2, 2023, at 15:56, Mark Millard  wrote:

> Joel Bodenmann  wrote on
> Date: Mon, 02 Oct 2023 20:00:29 UTC :
> 
>> It seems like I finally managed to hose a FreeBSD system.
>> The machine in question is my workstation at home. It has been running
>> stable/13 without any problems. Yesterday I've updated to
>> ef295f69abbffb3447771a30df6906ca56a5d0c0 and since then I'm getting an
>> undefined symbol on anything using Qt:
>> 
>> ld-elf.so.1: /usr/local/lib/qt5/libQt5Widgets.so.5: Undefined symbol
>> "_ZTVNSt3__13pmr25monotonic_buffer_resourceE"
>> 
>> Unless I'm missing something, it would seem like my base libc++
>> is missing the pmr::monotonic_buffer_resource symbol.
> 
> I do not have a 13.2 context, so you may want to run the
> analogous steps in your context for confirming/denying
> the below applies.
> 
> # llvm-cxxfilt _ZTVNSt3__13pmr25monotonic_buffer_resourceE
> vtable for std::__1::pmr::monotonic_buffer_resource
> 
> Using the example "Run this code" source from:
> 
> https://en.cppreference.com/w/cpp/memory/monotonic_buffer_resource
> 
> # c++ -std=c++17 -pedantic -O2 monotonic_buffer_resource.cpp
> 
> # objdump -x a.out | grep _ZTVNSt3__13pmr25monotonic_buffer_resourceE
> 00204160 g O .bss.rel.ro 0038 
> _ZTVNSt3__13pmr25monotonic_buffer_resourceE
> 
> # nm a.out | grep _ZTVNSt3__13pmr25monotonic_buffer_resourceE
> 00204160 B _ZTVNSt3__13pmr25monotonic_buffer_resourceE
> 
> # ./a.out
> t1 (default std alloc): 0.491 sec; t1/t1: 1.000
> t2 (default pmr alloc): 0.541 sec; t1/t2: 0.906
> t3 (pmr alloc  no buf): 0.188 sec; t1/t3: 2.616
> t4 (pmr alloc and buf): 0.155 sec; t1/t4: 3.172
> 
> Note that the vtable is in the a.out instead of being from
> a library. It is global but is in the a.out .bss.rel.ro <http://bss.rel.ro/> 
> in
> the example and is defined.
> 
>> At first I thought I might have messed up on installworld but rolling
>> back to the previous boot environment and then performing the same
>> procedure again lead to the same outcome.
> 
> If the above works similarly in your context, then I expect
> that the issue is on the qt5 or port side of things, not the
> system libraries/headers.
> 
> As I understand, clang++ 16 is the first vintage with this
> directly supported, instead of being just in the experimental
> category/area for libc++. May be tracking that transition is
> at issue.
> 
> For reference:
> 
> # c++ -v
> FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git 
> llvmorg-16.0.6-0-g7cbf1a259152)
> Target: x86_64-unknown-freebsd15.0
> Thread model: posix
> InstalledDir: /usr/bin
> 
> # uname -apKU
> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT #124 
> main-n265447-e5236d25f2c0-dirty: Thu Sep 21 09:06:08 PDT 2023 
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
>  amd64 amd64 151 151
> 
>> Any ideas or wild guesses? Anything obvious I'm missing here?
>> 
>> uname -a
>> FreeBSD beefy02 13.2-STABLE FreeBSD 13.2-STABLE
>> stable/13-n256443-ef295f69abbf GENERIC amd64
>> 
>> freebsd-version -kru
>> 13.2-STABLE
>> 13.2-STABLE
>> 13.2-STABLE
>> 
>> clang --version
>> FreeBSD clang version 16.0.6
>> (https://github.com/llvm/llvm-project.git
>> llvmorg-16.0.6-0-g7cbf1a259152) Target: x86_64-unknown-freebsd13.2
>> Thread model: posix
>> InstalledDir: /usr/bin
> 

Given Dimitry Andric's notes:

# objdump -x /lib/libc++.so.1 | grep _ZTVNSt3__13pmr25monotonic_buffer_resourceE
001006d8 g O .data.rel.ro 0038 
_ZTVNSt3__13pmr25monotonic_buffer_resourceE

# nm /lib/libc++.so.1 | grep _ZTVNSt3__13pmr25monotonic_buffer_resourceE
001006d8 D _ZTVNSt3__13pmr25monotonic_buffer_resourceE

So /lib/libc++.so.1 has a global symbol naming initialized data
for this in my context.

Reminder for the a.out:

# objdump -x a.out | grep _ZTVNSt3__13pmr25monotonic_buffer_resourceE
00204160 g O .bss.rel.ro 0038 
_ZTVNSt3__13pmr25monotonic_buffer_resourceE

# nm a.out | grep _ZTVNSt3__13pmr25monotonic_buffer_resourceE
00204160 B _ZTVNSt3__13pmr25monotonic_buffer_resourceE

My original thinking makes no sense for this. Sorry for the noise.

The procedure of seeing if the a.out is produced without complaint
might still be useful.

===
Mark Millard
marklmi at yahoo.com

Re: Base libc++ missing symbol

2023-10-08 Thread Mark Millard

 wrote on
Date: Sun, 08 Oct 2023 18:13:16 UTC :

> > The procedure of seeing if the a.out is produced without complaint
> > might still be useful.
> 
> The program compiles & links fine, but then also fails to run:
> 
> ld-elf.so.1: Undefined symbol "_ZTVNSt3__13pmr25monotonic_buffer_resourceE" 
> referenced from COPY relocation in /usr/home/jbo/junk/a.out

Well, for stable/13 's recent snapshot, freshly dd'd to USB3 media,
so an official build, not a personal one that might be odd in some way:

# uname -apKU
FreeBSD generic 13.2-STABLE FreeBSD 13.2-STABLE stable/13-n256505-2464d8c5e296 
GENERIC arm64 aarch64 1302508 1302508

(So, after 2023-Oct-01's ef295f69abbf that you originally referenced: 
2023-Oct-04's 2464d8c5e296.)

# c++ -v
FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git 
llvmorg-16.0.6-0-g7cbf1a259152)
Target: aarch64-unknown-freebsd13.2
Thread model: posix
InstalledDir: /usr/bin

# c++ -std=c++17 -pedantic -O2 monotonic_buffer_resource.cpp

# ./a.out
t1 (default std alloc): 1.827 sec; t1/t1: 1.000
t2 (default pmr alloc): 1.818 sec; t1/t2: 1.005
t3 (pmr alloc  no buf): 0.920 sec; t1/t3: 1.986
t4 (pmr alloc and buf): 0.606 sec; t1/t4: 3.015

The example is from in an aarch64 context. It does not
agree with your report.

> I made no progress on this.

So far I've never reproduced your problem or anything like it.

(I prefer testing official builds for problem isolation. If
only my personal builds fail, then it is likely my build's
problem.)

> I have reinstalled world twice (from different
> commits) and I re-installed all packages multiple times (also from different
> ports tree commits).

I suggest trying the most recent stable/13 snapshot build at
the time of the experiment.

No packages are used/needed for the monotonic_buffer_resource.cpp
test. This fits well with using a snapshot context for such a
test.

> Any other wild ideas on how to fix this?

I've no evidence about your stable/13 build/install. But the
official snapshot that I tried worked just fine.

> None of my other machines have any
> issues whatsoever running on the same or similar stable/13 commit and using
> the same poudriere repository.

That, with my results, tends to suggest you have one odd ball
context that has a problematical FreeBSD build/install. Again,
I've no evidence to work with relative to that build/install.

> This is certainly not Qt5 related. I run into the exact same issue with
> anything that uses Qt6.

> Furthermore, the test program we built experiences
> the same issue without any involvement of the Qt libraries.


There was no problem for my testing of monotonic_buffer_resource.cpp
via the recent official snapshot build of stable/13 .

I've not tried to test Qt5 or Qt6, sticking with the simpler/smaller
context that you also report as failing in the odd context. I suggest
avoiding Qt5/Qt6 testing until you have a context with the
monotonic_buffer_resource.cpp test working.

===
Mark Millard
marklmi at yahoo.com

RE: git: d2025992ab68 - releng/14.0 - release: update releng/14.0 from BETA to RC

2023-10-12 Thread Mark Millard

Glen Barber  wrote on
Date: Fri, 13 Oct 2023 00:00:10 UTC :

> The branch releng/14.0 has been updated by gjb:
> 
> URL: 
> https://cgit.FreeBSD.org/src/commit/?id=d2025992ab6852d2a9ace62006e3a3ffa067364b
> 
> commit d2025992ab6852d2a9ace62006e3a3ffa067364b
> Author: Glen Barber 
> AuthorDate: 2023-10-12 23:55:33 +
> Commit: Glen Barber 
> CommitDate: 2023-10-12 23:55:33 +
> 
> release: update releng/14.0 from BETA to RC


I'll note that today's:

https://github.com/openzfs/zfs/commit/2bba9fd479f5

is another openzfs data corruption fix, this time
involving TRIMs vs. metaslab allocations.

===
Mark Millard
marklmi at yahoo.com

RE: freebsd-update 12.3 to 14.0RC1 takes 12-24 hours (block cloning regression)

2023-10-17 Thread Mark Millard

Kevin Bowling  wrote on
Date: Tue, 17 Oct 2023 16:40:37 UTC :

> I have two systems with a zpool 2x2 mirror on 7.2k RPM disks. One
> system also has a flash SLOG.
> 
> The flash SLOG system took around 12 hours to complete freebsd-update
> from 13.2 to 14.0-RC1. The system without the SLOG took nearly 24
> hours. This was the result of ~50k patches, and ~10k files from
> freebsd-update and a very pathological 'install' command performance.
> 
> 'ps auxww | grep install':
> root 52225 0.0 0.0 12852 2504 0 D+ 20:55 0:00.00
> install -S -o 0 -g 0 -m 0644
> b6850914127c27fe192a41387f5cec04a1d927e6605ff09e8fd88dcd74fdec9d
> ///usr/src/sys/netgraph/ng_vlan.h
> root 68042 0.0 0.0 13580 3648 0 I+ 02:24 0:01.14
> /bin/sh /usr/sbin/freebsd-update install root 69946
> 0.0 0.0 13580 3632 0 S+ 02:24 0:15.65 /bin/sh
> /usr/sbin/freebsd-update install
> 
> 'control+t on freebsd-update':
> 
> load: 0.16 cmd: install 97128 [tx->tx_sync_done_cv] 0.67r 0.00u 0.00s
> 0% 2440k
> mi_switch+0xc2 _cv_wait+0x113 txg_wait_synced_impl+0xb9
> txg_wait_synced+0xb dmu_offset_next+0x77 zfs_holey+0x137 zfs_fre
> ebsd_ioctl+0x4f vn_generic_copy_file_range+0x64b
> kern_copy_file_range+0x327 sys_copy_file_range+0x78
> amd64_syscall+0x10c
> fast_syscall_common+0xf8
> 
> I spoke with mjg about this and because my pools do not have block
> cloning enabled, copy_file_range turns into a massive pessimization in
> 'install'.

Block cloning is new. So the past is sort of like Block cloning not being
enabled now.

This leads me to wonder: prior to block cloning existing, what would
analogous times have been like instead of 12 hrs and 24 hrs? (Not that
analogous would be easy to identify in history or test now.)

Depending on the results, my next question might have been: "What
happened for block cloning being disabled now to make it worse than
before block cloning existed?"

> He suggested a workaround of 'sysctl
> vfs.zfs.dmu_offset_next_sync=0' but we should probably sort this out
> for 14.0-RELEASE.

===
Mark Millard
marklmi at yahoo.com

FYI: 13.2-STABLE stable/13-n256634-c4dfacd0b3c3 snapshot's send mail notices

2023-10-29 Thread Mark Millard

For reasons of investigating a 13.2-STABLE related bugzilla report I'd
dd'd and booted the snapshot that results in:

# uname -apKU
FreeBSD generic 13.2-STABLE FreeBSD 13.2-STABLE stable/13-n256634-c4dfacd0b3c3 
GENERIC arm64 aarch64 1302508 1302508

I noticed the following messages on the console:

Oct 27 03:01:00 generic sendmail[2521]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 27 03:02:00 generic sendmail[2521]: unable to qualify my own domain name 
(generic) -- using short name
Oct 27 03:02:01 generic sendmail[2628]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 27 03:03:02 generic sendmail[2628]: unable to qualify my own domain name 
(generic) -- using short name
Oct 27 03:03:02 generic sendmail[2633]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 27 03:04:02 generic sendmail[2633]: unable to qualify my own domain name 
(generic) -- using short name
Oct 27 03:04:05 generic sendmail[2787]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 27 03:04:05 generic sendmail[2832]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 27 03:04:05 generic sendmail[2833]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 27 03:04:05 generic sendmail[2847]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 27 03:05:05 generic sendmail[2787]: unable to qualify my own domain name 
(generic) -- using short name
Oct 27 03:05:05 generic sendmail[2833]: unable to qualify my own domain name 
(generic) -- using short name
Oct 27 03:05:05 generic sendmail[2832]: unable to qualify my own domain name 
(generic) -- using short name
Oct 27 03:05:05 generic sendmail[2847]: unable to qualify my own domain name 
(generic) -- using short name
Oct 28 03:01:00 generic sendmail[4605]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 28 03:02:00 generic sendmail[4605]: unable to qualify my own domain name 
(generic) -- using short name
Oct 28 03:02:01 generic sendmail[4713]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 28 03:03:01 generic sendmail[4713]: unable to qualify my own domain name 
(generic) -- using short name
Oct 28 03:03:01 generic sendmail[4718]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 28 03:04:01 generic sendmail[4718]: unable to qualify my own domain name 
(generic) -- using short name
Oct 28 03:04:03 generic sendmail[4867]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 28 03:04:03 generic sendmail[4913]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 28 03:04:03 generic sendmail[4912]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 28 03:04:03 generic sendmail[4927]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 28 03:05:03 generic sendmail[4867]: unable to qualify my own domain name 
(generic) -- using short name
Oct 28 03:05:03 generic sendmail[4913]: unable to qualify my own domain name 
(generic) -- using short name
Oct 28 03:05:03 generic sendmail[4912]: unable to qualify my own domain name 
(generic) -- using short name
Oct 28 03:05:03 generic sendmail[4927]: unable to qualify my own domain name 
(generic) -- using short name
Oct 28 04:15:21 generic sendmail[5154]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 28 04:16:21 generic sendmail[5154]: unable to qualify my own domain name 
(generic) -- using short name
Oct 29 03:01:00 generic sendmail[6807]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 29 03:02:00 generic sendmail[6807]: unable to qualify my own domain name 
(generic) -- using short name
Oct 29 03:02:01 generic sendmail[6916]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 29 03:03:01 generic sendmail[6916]: unable to qualify my own domain name 
(generic) -- using short name
Oct 29 03:03:01 generic sendmail[6921]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 29 03:04:01 generic sendmail[6921]: unable to qualify my own domain name 
(generic) -- using short name
Oct 29 03:04:03 generic sendmail[7070]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 29 03:04:03 generic sendmail[7115]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 29 03:04:03 generic sendmail[7116]: My unqualified host name (generic) 
unknown; sleeping for retry
Oct 29 03:04:03 generic sendmail[7130]: My unqualified host name (generic) 
unknown; sleeping for retry

It has been up 2 days 17 hr+ at that last.

I do not know if that indicates a problem or not.

Very little is changed from the snapshot expansion. I was only
looking into the status of the system-clang related libc++ and
such and the media content is just temporary for that purpose.

===
Mark Millard
marklmi at yahoo.com

Is 14.0 to released based on 0 for sysctl vfs.zfs.bclone_enabled ?

2023-11-03 Thread Mark Millard

It looks to me like releng/14.0 (as of 14.0-RC4) still has:

int zfs_bclone_enabled;
SYSCTL_INT(_vfs_zfs, OID_AUTO, bclone_enabled, CTLFLAG_RWTUN,
&zfs_bclone_enabled, 0, "Enable block cloning");

leaving block cloning effectively disabled by default, no
matter what the pool has enabled.

https://www.freebsd.org/releases/14.0R/relnotes/ also reports:

QUOTE
OpenZFS has been upgraded to version 2.2. New features include:
• 
block cloning, which allows shallow copies of blocks in file copies. This is 
optional, and disabled by default; it can be enabled with sysctl 
vfs.zfs.bclone_enabled=1.
END QUOTE

Just curiousity on my part about the default completeness of
openzfs-2.2 support, not an objection either way.


===
Mark Millard
marklmi at yahoo.com

Re: Is 14.0 to released based on 0 for sysctl vfs.zfs.bclone_enabled ?

2023-11-04 Thread Mark Millard

On Nov 4, 2023, at 04:38, Mike Karels  wrote:

> On 4 Nov 2023, at 4:01, Ronald Klop wrote:
> 
>> On 11/4/23 02:39, Mark Millard wrote:
>>> It looks to me like releng/14.0 (as of 14.0-RC4) still has:
>>> 
>>> int zfs_bclone_enabled;
>>> SYSCTL_INT(_vfs_zfs, OID_AUTO, bclone_enabled, CTLFLAG_RWTUN,
>>> &zfs_bclone_enabled, 0, "Enable block cloning");
>>> 
>>> leaving block cloning effectively disabled by default, no
>>> matter what the pool has enabled.
>>> 
>>> https://www.freebsd.org/releases/14.0R/relnotes/ also reports:
>>> 
>>> QUOTE
>>> OpenZFS has been upgraded to version 2.2. New features include:
>>> •
>>> block cloning, which allows shallow copies of blocks in file copies. This 
>>> is optional, and disabled by default; it can be enabled with sysctl 
>>> vfs.zfs.bclone_enabled=1.
>>> END QUOTE
>>> 
>> 
>> 
>> I think this answers your question in the subject.
> 
> I think so too (and I wrote that text).

Thanks for the confirmation of the final intent.

I believe this makes:

QUOTE
author Brian Behlendorf  2023-05-25 20:53:08 +
committer GitHub  2023-05-25 20:53:08 +
commit 91a2325c4a0fbe01d0bf212e44fa9d85017837ce (patch)
tree dd01dfce6aeef357ade1775acf18aade535c6271
. . .
Update compatibility.d files

Add an openzfs-2.2 compatibility file for the next release. Edon-R support has 
been enabled for FreeBSD removing the need for different FreeBSD and Linux 
files. Symlinks for the -linux and -freebsd names are created for any scripts 
expecting that convention. Additionally, a symlink for ubunutu-22.04 was added. 
Signed-off-by: Brian Behlendorf  Closes #14833
END QUOTE

technically incorrect in that compatibility.d/openzfs-2.2-freebsd
should be distinct in content from compatibility.d/openzfs-2.2 so
that block cloning would not be enabled.

>>> Just curiousity on my part about the default completeness of
>>> openzfs-2.2 support, not an objection either way.
>>> 
>> 
>> 
>> I haven't seen new issues with block cloning in the last few weeks mentioned 
>> on the mailing lists. All known issues are fixed AFAIK.
>> But I can imagine that the risk+effect ratio of data corruption is seen as a 
>> bit too high for a 14.0 release for this particular feature. That does not 
>> diminish the rest of the completeness of openzfs-2.2.
>> 
>> NB: I'm not involved in developing openzfs or the decision making in the 
>> release. Just repeating what I read on the lists.
> 
> There was another block cloning fix in 14.0-RC4; see the commit log.
> Maybe there will be no more issues, but it seems that corner cases were
> still being found recently.
>> 

Looks like I'll stay at openzfs-2.1 pool features until there is
a release that no longer has the default status:

0 for sysctl vfs.zfs.bclone_enabled

I use main [so: 15 now] but only enable openzfs-2.* pool features
supported by default on some FreeBSD release, that has an accurate
compatibility.d/openzfs-2.*-freebsd file.

===
Mark Millard
marklmi at yahoo.com

Re: Is 14.0 to released based on 0 for sysctl vfs.zfs.bclone_enabled ?

2023-11-05 Thread Mark Millard

On Nov 5, 2023, at 16:27, Martin Matuška  wrote:

> OpenZFS 2.2.0 in FreeBSD 14 fully supports block cloning. You can work with 
> pools that have feature@block_cloning enabled.
> The sysctl variable vfs.zfs.bclone_enabled affects the behavior of 
> zfs_clone_range() which is called by copy_file_range(). When it is set to 0, 
> zfs_clone_range() does not do block cloning.
> If it is set to anything else than 0, zfs_clone_range() does block cloning 
> (if all conditions are met - same ZFS pool, correct data alignment, etc.).

Ahh. From the naming and vague memories of the history, I did not understand 
that
vfs.zfs.bclone_enabled has a narrower set of consequences than the name suggests
and vfs.zfs.bclone_enabled=0 does not imply any lack of support for pools that
have block cloning active.

May be the wording at, for example 
https://www.freebsd.org/releases/14.0R/relnotes/
should be more explicit about the relationships involved when 
vfs.zfs.bclone_enabled=0
since others may read in the same bad interpretation that I did.

Thanks for the note. Very helpful.

> In FreeBSD-main, this tunable is enabled and I plan to enable it in stable/14 
> somewhere around December 11, 2023.
> 
> As of today I personally use block cloning on all my systems.
> 
> mm
> 
> On 04/11/2023 13:35, Mark Millard wrote:
>> On Nov 4, 2023, at 04:38, Mike Karels  wrote:
>> 
>>> On 4 Nov 2023, at 4:01, Ronald Klop wrote:
>>> 
>>>> On 11/4/23 02:39, Mark Millard wrote:
>>>>> It looks to me like releng/14.0 (as of 14.0-RC4) still has:
>>>>> 
>>>>> int zfs_bclone_enabled;
>>>>> SYSCTL_INT(_vfs_zfs, OID_AUTO, bclone_enabled, CTLFLAG_RWTUN,
>>>>> &zfs_bclone_enabled, 0, "Enable block cloning");
>>>>> 
>>>>> leaving block cloning effectively disabled by default, no
>>>>> matter what the pool has enabled.
>>>>> 
>>>>> https://www.freebsd.org/releases/14.0R/relnotes/ also reports:
>>>>> 
>>>>> QUOTE
>>>>> OpenZFS has been upgraded to version 2.2. New features include:
>>>>> •
>>>>> block cloning, which allows shallow copies of blocks in file copies. This 
>>>>> is optional, and disabled by default; it can be enabled with sysctl 
>>>>> vfs.zfs.bclone_enabled=1.
>>>>> END QUOTE
>>>>> 
>>>> 
>>>> I think this answers your question in the subject.
>>> I think so too (and I wrote that text).
>> Thanks for the confirmation of the final intent.
>> 
>> I believe this makes:
>> 
>> QUOTE
>> author Brian Behlendorf  2023-05-25 20:53:08 +
>> committer GitHub  2023-05-25 20:53:08 +
>> commit 91a2325c4a0fbe01d0bf212e44fa9d85017837ce (patch)
>> tree dd01dfce6aeef357ade1775acf18aade535c6271
>> . . .
>> Update compatibility.d files
>> 
>> Add an openzfs-2.2 compatibility file for the next release. Edon-R support 
>> has been enabled for FreeBSD removing the need for different FreeBSD and 
>> Linux files. Symlinks for the -linux and -freebsd names are created for any 
>> scripts expecting that convention. Additionally, a symlink for ubunutu-22.04 
>> was added. Signed-off-by: Brian Behlendorf  Closes 
>> #14833
>> END QUOTE
>> 
>> technically incorrect in that compatibility.d/openzfs-2.2-freebsd
>> should be distinct in content from compatibility.d/openzfs-2.2 so
>> that block cloning would not be enabled.
>> 
>> 
>>>>> Just curiousity on my part about the default completeness of
>>>>> openzfs-2.2 support, not an objection either way.
>>>>> 
>>>> 
>>>> I haven't seen new issues with block cloning in the last few weeks 
>>>> mentioned on the mailing lists. All known issues are fixed AFAIK.
>>>> But I can imagine that the risk+effect ratio of data corruption is seen as 
>>>> a bit too high for a 14.0 release for this particular feature. That does 
>>>> not diminish the rest of the completeness of openzfs-2.2.
>>>> 
>>>> NB: I'm not involved in developing openzfs or the decision making in the 
>>>> release. Just repeating what I read on the lists.
>>> There was another block cloning fix in 14.0-RC4; see the commit log.
>>> Maybe there will be no more issues, but it seems that corner cases were
>>> still being found recently.
>> Looks like I'll stay at openzfs-2.1 pool features until there is
>> a release that no longer has the default status:
>> 
>> 0 for sysctl vfs.zfs.bclone_enabled
>> 
>> I use main [so: 15 now] but only enable openzfs-2.* pool features
>> supported by default on some FreeBSD release, that has an accurate
>> compatibility.d/openzfs-2.*-freebsd file.
> 

===
Mark Millard
marklmi at yahoo.com

RELENG_14 [process] was killed: failed to reclaim memory

2023-11-14 Thread Mark Millard

mike tancsa  wrote on
Date: Tue, 14 Nov 2023 13:44:22 UTC :

> While testing some new hardware on a recent RELENG_14 image (from Nov 
> 10th), I noticed some of my ssh sessions would get killed off with the 
> errors below (twice in 24hrs)
> 
> pid 1697 (sshd), jid 0, uid 1001, was killed: failed to reclaim memory
> pid 6274 (sshd), jid 0, uid 1001, was killed: failed to reclaim memory
> . . .

[My notes below are not specific to releng/14.0 or to
stable/14 .]

What do you have for ( copied from my /boot/loader.conf ):

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=120

The default is 12 (last I knew, anyway).

The 120 figure has allowed me and others to do buildworld,
buildkernel, and poudriere bulk runs on small arm boards
using all cores that otherwise got "failed to reclaim
memory" (to use the modern, improved [not misleading]
message text).

(The units for the 120 are not time units: more like a
number of (re)tries to gain at least a target amount of
Free RAM before failure handling starts. The comment
wording is based on a consequence of the assignment.)

The 120 is not a maximum, just a figure that has proved
useful in various contexts.


Notes:

"failed to reclaim memory" can happen even with swap
space enabled but no swap in use: sufficiently active
pages are just not paged out to swap space.

There are some other parameters of possible use for some
other modern "was killed" reason texts.

===
Mark Millard
marklmi at yahoo.com

Re: RELENG_14 [process] was killed: failed to reclaim memory

2023-11-15 Thread Mark Millard

On Nov 15, 2023, at 08:58, mike tancsa  wrote:

> On 11/14/2023 8:25 PM, Mark Millard wrote:
>> mike tancsa  wrote on
>> Date: Tue, 14 Nov 2023 13:44:22 UTC :
>> 
>>> While testing some new hardware on a recent RELENG_14 image (from Nov
>>> 10th), I noticed some of my ssh sessions would get killed off with the
>>> errors below (twice in 24hrs)
>>> 
>>> pid 1697 (sshd), jid 0, uid 1001, was killed: failed to reclaim memory
>>> pid 6274 (sshd), jid 0, uid 1001, was killed: failed to reclaim memory
>>> . . .
>> [My notes below are not specific to releng/14.0 or to
>> stable/14 .]
>> 
>> What do you have for ( copied from my /boot/loader.conf ):
> 
> Thanks Mark, no tuning in there other than forcing a particular driver to 
> attach
> 
> # cat /boot/loader.conf
> kern.geom.label.disk_ident.enable="0"
> kern.geom.label.gptid.enable="0"
> cryptodev_load="YES"
> zfs_load="YES"
> hw.mfi.mrsas_enable=1
> t5fw_cfg_load="YES"
> if_cxgbe_load="YES"
> #
> 
> 
> 
>> #
>> # Delay when persistent low free RAM leads to
>> # Out Of Memory killing of processes:
>> vm.pageout_oom_seq=120
>> 
>> The default is 12 (last I knew, anyway).
> 
> Any thoughts for a machine with a lot of RAM, Am I better to limit ARC or 
> change the default to 120 ?
> 

I have vm.pageout_oom_seq=120 everywhere, from the little arm's to the 
ThreadRipper
1950X with 128 GiBytes of RAM. I've hit the kills in all the contexts, even UFS
based on the 1950X (no ARC competing for RAM). (High load average style of bulk 
-a
test run, using USE_TMPFS=data and using even USE_TMPFS=no .)

(My bulk -a testing is rare.)

===
Mark Millard
marklmi at yahoo.com

Is releng/13.2 deliberately missing: #15395 1ca531971 Zpool can start allocating from metaslab before TRIMs have completed

2023-12-06 Thread Mark Millard

zfs: merge openzfs/zfs@d99134be8 (zfs-2.1-release) into stable/13

included a metaslab vs. TRIM related merge:

QUOTE
OpenZFS release 2.1.14

Notable upstream pull request merges:

#15395 1ca531971 Zpool can start allocating from metaslab before TRIMs have 
completed
END QUOTE

that does not seem to have been commited into releng/13.2 . Was this deliberate?


By contrast, the other 2.1.14 notable upstream request merge commited into 
stable/13:

QUOTE
#15571 77b0c6f04 dnode_is_dirty: check dnode and its data for dirtiness
END QUOTE

was also committed into releng/13.2 .

===
Mark Millard
marklmi at yahoo.com

aarch64 and armv6 vs. armv7 support: armv6 is not supported, despite what "man arch" reports

2023-12-06 Thread Mark Millard

man arch reports:

QUOTE
 Some machines support more than one FreeBSD ABI.  Typically these are
 64-bit machines, where the “native” LP64 execution environment is
 accompanied by the “legacy” ILP32 environment, which was the historical
 32-bit predecessor for 64-bit evolution.  Examples are:

   LP64 ILP32 counterpart
   amd64i386
   powerpc64powerpc
   aarch64  armv6/armv7

 aarch64 will support execution of armv6 or armv7 binaries if the CPU
 implements AArch32 execution state, however older armv4 and armv5
 binaries aren't supported.
END QUOTE

(I take "armv6 or armv7 binaries" as what was built targeting a FreeBSD
architecture triple for one of those. FreeBSD keeps them distinct.)

However, the armv6 part of that is wrong: The infrastructure supports
only one 32-bit alternative for a given kernel, not a family of them at
once . . .

sys/kern/kern_mib.c :

static const char *
proc_machine_arch(struct proc *p)
{
  if (p->p_sysent->sv_machine_arch != NULL)
return (p->p_sysent->sv_machine_arch(p));
#ifdef COMPAT_FREEBSD32
if (SV_PROC_FLAG(p, SV_ILP32))
return (MACHINE_ARCH32);
#endif
return (MACHINE_ARCH);
}
. . .
static int
sysctl_kern_supported_archs(SYSCTL_HANDLER_ARGS)
{
const char *supported_archs;

supported_archs =
#ifdef COMPAT_FREEBSD32
compat_freebsd_32bit ? MACHINE_ARCH " " MACHINE_ARCH32 :
#endif
MACHINE_ARCH;
return (SYSCTL_OUT(req, supported_archs, strlen(supported_archs) + 1));
}


sys/arm64/include/param.h :

#define MACHINE_ARCHES MACHINE_ARCH " " MACHINE_ARCH32
. . .
#define MACHINE_ARCH32 "armv7"

(There is no "armv6" alternative present.)

But with something like:

#define MACHINE_ARCH32 "armv7 armv6"

MACHINE_ARCH32 is not interpreted as a list of alternatives, each
supported. There is code that would have to be reworked to allow
a list of alternatives to work.

One can build a custom kernel with:

#define MACHINE_ARCH32 "armv6"

and then, having booted that kernel, then run armv6 on aarch64
--but, then, not armv7.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256132 is
about this and has my messy notes as I explored and discovered
that multiple 32-bit alternatives did not work. I see that I
forgot various quote (") symbols.


This note was prompted by:

https://lists.freebsd.org/archives/freebsd-hackers/2023-December/002728.html

that mentions "the list of valid MACHINE_ARCH" that reminded me
of this old issue.

===
Mark Millard
marklmi at yahoo.com

Re: aarch64 and armv6 vs. armv7 support: armv6 is not supported, despite what "man arch" reports

2023-12-07 Thread Mark Millard

On Dec 7, 2023, at 01:19, Dimitry Andric  wrote:

> On 7 Dec 2023, at 05:31, Mark Millard  wrote:
>> 
>> man arch reports:
>> 
>> QUOTE
>>Some machines support more than one FreeBSD ABI.  Typically these are
>>64-bit machines, where the “native” LP64 execution environment is
>>accompanied by the “legacy” ILP32 environment, which was the historical
>>32-bit predecessor for 64-bit evolution.  Examples are:
>> 
>>  LP64 ILP32 counterpart
>>  amd64i386
>>  powerpc64powerpc
>>  aarch64  armv6/armv7
> 
> So, this might be replaced with "armv6^armv7" or "armv6 xor armv7", then?

Only for folks that build from source. For those folks, a footnote
about updating MACHINE_ARCH32 in sys/arm64/include/param.h would be
appropriate. It is not exactly obvious or commonly known.

Hmm, thinking more about the old bugzilla information . . .

I'll also note that my information predated lib32 on aarch64: just
chroot/jail sorts of use back then, and I just tested chroot back
then. I've never tested a lib32 context for armv6 on aarch64 for
an adjusted MACHINE_ARCH32.

===
Mark Millard
marklmi at yahoo.com

Re: 15 & 14: ram_attach vs. its using regions_to_avail vs. "bus_alloc_resource" can lead to: panic("ram_attach: resource %d failed to attach", rid)

2024-01-12 Thread Mark Millard

On Jan 12, 2024, at 09:57, Doug Rabson  wrote:

> On Sat, 30 Sept 2023 at 08:47, Mark Millard  wrote:
> ram_attach is based on regions_to_avail but that is a problem for
> its later bus_alloc_resource use --and that can lead to:
> 
> panic("ram_attach: resource %d failed to attach", rid);
> 
> Unfortunately, the known example is use of EDK2 on RPi4B
> class systems, not what is considered the supported way.
> The panic happens for main [so: 15] and will happen once
> the cortex-a72 handling in 14.0-* is in a build fixed by:
> 
> • git: 906bcc44641d - releng/14.0 - arm64: Fix errata workarounds that 
> depend on smccc Andrew Turner
> 
> The lack of the fix leads to an earlier panic as stands.
> 
> 
> sys/kern/subr_physmem.c 's regions_to_avail is based on ignoring
> phys_avail and using only hwregions and exregions. In other words,
> in part:
> 
>  * Initially dump_avail and phys_avail are identical.  Boot time memory
>  * allocations remove extents from phys_avail that may still be included
>  * in dumps.
> 
> This means that early, dedicated memory allocations are treated
> as available for general use by regions_to_avail . The distinction
> is visible in the  boot -v output in that:
> 
> real memory  = 3138154496 (2992 MB)
> Physical memory chunk(s):
> 0x20 - 0x002b7f, 727711744 bytes (177664 pages)
> 0x002ce3a000 - 0x003385, 111304704 bytes (27174 pages)
> 0x00338c - 0x00338c6fff, 28672 bytes (7 pages)
> 0x0033a3 - 0x0036ef, 55377920 bytes (13520 pages)
> 0x00372e - 0x003b2f, 67239936 bytes (16416 pages)
> 0x004000 - 0x00bb3dcfff, 2067648512 bytes (504797 pages)
> avail memory = 3027378176 (2887 MB)
> 
> does not list the wider:
> 
> 0x004000 - 0x00bfff
> 
> because of phys_avail . But the earlier dump based on hwregions and
> exregions shows:
> 
> Physical memory chunk(s):
>   0x001d - 0x001e, 0 MB ( 32 pages)
>   0x0020 - 0x338c6fff,   822 MB ( 210631 pages)
>   0x3392 - 0x3b2f,   121 MB (  31200 pages)
>   0x4000 - 0xbfff,  2048 MB ( 524288 pages)
> Excluded memory regions:
>   0x001d - 0x001e, 0 MB ( 32 pages) NoAlloc 
>   0x2b80 - 0x2ce39fff,22 MB (   5690 pages) NoAlloc 
>   0x3386 - 0x338b, 0 MB ( 96 pages) NoAlloc 
>   0x3392 - 0x33a2, 1 MB (272 pages) NoAlloc 
>   0x36f0 - 0x372d, 3 MB (992 pages) NoAlloc 
> 
> which indicates:
> 
>   0x4000 - 0xbfff
> 
> is available as far as it is concerned.
> 
> (Note some code works/displays in terms of: 0x4000 - 0xc000
> instead.)
> 
> For aarch64 , sys/arm64/arm64/nexus.c has a nexus_alloc_resource
> that is used as bus_alloc_resource . It ends up rejecting the
> RPi4B boot via using the result of the call in ram_attach:
> 
> if (bus_alloc_resource(dev, SYS_RES_MEMORY, &rid, start, end,
> end - start, 0) == NULL)
> panic("ram_attach: resource %d failed to attach", 
> rid);
> 
> as shown by the just-prior start/end pair sequence messages:
> 
> ram0: reserving memory region:   20-2b80
> ram0: reserving memory region:   2ce3a000-3386
> ram0: reserving memory region:   338c-338c7000
> ram0: reserving memory region:   33a3-36f0
> ram0: reserving memory region:   372e-3b30
> ram0: reserving memory region:   4000-c000
> panic: ram_attach: resource 5 failed to attach
> 
> I do not see anything about this that looks inherently RPi*
> specific for possibly ending up with an analogous panic. So
> I expect the example is sufficient context to identify a
> problem is present, despite EDK2 use not being normal for
> RPi4B's and the like as far as FreeBSD is concerned.
> 
> I'm not quite clear why phys_avail changes

Do not be confused by common labeling to distinct
data: Note the "phys_avail" vs. "hwregions" despite
the label "Physical memory chunk(s):" :

static void
cpu_startup(void *dummy)
{
vm_paddr_t size;
int i;
   printf("real memory  = %ju (%ju MB)\n", ptoa((uintmax_t)realmem),
ptoa((uintmax_t)realmem) / 1024 / 1024);

if (bootverbose) {
printf("Physical memory chunk(s):\n");
for (i = 0; phys_avail[i + 1] != 0; i += 2) {
size = phys_avail[i + 1] - phys_avail[i];
printf("%#016jx - %#016jx, %ju bytes (%ju pages)\n",
(uintmax_t)phys_avail[i],
(uintmax_t)phys_avail[i + 1] - 1,

RE: Should changes in src/usr.sbin/bhyve/ trigger an llvm rebuild?

2024-01-28 Thread Mark Millard

David Wolfskill  wrote on
Date: Sun, 28 Jan 2024 11:50:44 UTC :

> Context for this is in-place source-based updates using META_MODE, amd64
> arch.
> . . .
> But llvm is now being rebuilt.
> 
> Why?

The following two sequences are very different:

make buildworld
make buildworld

vs.

make buildworld
make installworld
make buildworld

The installworld can update a lot of non-source
files that were used to do the first build world.
META_MODE notices such updates and does rebuild
activity because of them.

One more sequence:

make buildworld
make installworld
update some sources
make buildworld

For that the installworld may be the larger
change compared to the source updates as far
as contributions to rebuild activity go.

This sort of thing is likely what you had
happen.

===
Mark Millard
marklmi at yahoo.com

Re: Should changes in src/usr.sbin/bhyve/ trigger an llvm rebuild?

2024-01-28 Thread Mark Millard

On Jan 28, 2024, at 07:46, David Wolfskill  wrote:

> On Sun, Jan 28, 2024 at 07:30:53AM -0800, Mark Millard wrote:
>> ...
>> The following two sequences are very different:
>> 
>> make buildworld
>> make buildworld
>> 
>> vs.
>> 
>> make buildworld
>> make installworld
>> make buildworld
>> 
>> The installworld can update a lot of non-source
>> files that were used to do the first build world.
>> META_MODE notices such updates and does rebuild
>> activity because of them.
> 
> First: Thank you for replying & suggesting the above.
> 
> That said, one of the machines in question is my local "build machine" --
> and for it, in addition to in-place source updates, I also do (weekly)
> updates of my "production" machines (at home).
> 
> And for that case, the production machines mount the builder's /usr/src
> and /usr/obj (via NFS) read-only.

Which machine(s) are doing the llvm rebuild that
you were hoping would not happen? What was the
context like for the history on that machine?
(The below had to be written without understanding
of such things.)

Here is an example META_MODE line recording a
tool used during a particular file's rebuild:

E 22961 /bin/sh

So installing an update to /bin/sh via isntallworld
would lead to the later META_MODE (re)build
indicating that the file needs to be rebuilt, just
because /bin/sh ends up being newer after the
installworld .

There are other examples of recorded paths to tools
in .meta file, such as (my old context example
used in an old E-mail exchange):

/usr/obj/amd64_clang/amd64.amd64/usr/fbsd/mm-src/amd64.amd64/tmp/legacy/usr/sbin/awk

So if /usr/obj/. . ./tmp/legacy/usr/sbin/awk is newer than the file
potentially being rebuilt, make ends up with:

file 
'/usr/obj/amd64_clang/amd64.amd64/usr/fbsd/mm-src/amd64.amd64/tmp/legacy/usr/sbin/awk'
 is newer than the target...

(make has a mode that reports such things. I used it
to find out what all contributed to some rebuild
activity in order to figure out the general type of
thing that was happeneing. Then I used it to find
all the "is newer than" material that I expected to
be unlikely to contribute to build changes.)

It does not matter if:

/usr/obj/amd64_clang/amd64.amd64/usr/fbsd/mm-src/amd64.amd64/tmp/legacy/usr/sbin/awk

is read-only at the potential-rebuild-of-file time. Only
if it is newer.

Simon J. Gerraty and I had a long exchange about this in
2023-Feb, that was in turn based on a earlier 2021-Jan
report of mine. There are also issues when symbolic links
are involved, if I remember right. At the time (2023) I
was doing experiments with making some of this "unlikely
to cause build differences" material end up being ignored.
Ultimately, Simon provided me a patch to
share/mk/src.sys.obj.mk to help with my experiments.

See "Re: FYI: Why META_MODE rebuilds so much for building again
after installworld (no source changes)", starting with the
2023-Feb material at:

https://lists.freebsd.org/archives/freebsd-current/2023-February/003239.html

I will note that my activity did not involve
NFS mounts, only completely self-hosted builds
on directly connected media, the boot media.
I've no evidence if such NFS involvement makes
any additional differences.

bectl use can be used to keep around an
example "after the build but before the
install" place from the most recent build.
It can be used for doing the next build
to avoid the later installworld consequences
on time relationships for the likes of /bin/sh .
(It is also a place to revert to if an install
went badly.)

> And without complaints of attempts to
> scribble on read-only stuff. :-}

Detailed time relationships are what matter.
You may have to work out what those are.

> So if "make installworld" messes with anything that META_MODE cares
> about ... that would appear to be somewhat surprising.

See above.

> Mind, I've been wrong before, and I do intend to live long enough to be
> wrong again :-)
> 
>> One more sequence:
>> 
>> make buildworld
>> make installworld
>> update some sources
>> make buildworld
>> 
>> For that the installworld may be the larger
>> change compared to the source updates as far
>> as contributions to rebuild activity go.
>> 
>> This sort of thing is likely what you had
>> happen.
>> 
> 
> Hmm  Thanks again.

===
Mark Millard
marklmi at yahoo.com

Re: Should changes in src/usr.sbin/bhyve/ trigger an llvm rebuild?

2024-01-28 Thread Mark Millard

[Note: your email is rejecting my E-mail:
554: 5.7.1 ]

On Jan 28, 2024, at 14:06, David Wolfskill  wrote:

> On Sun, Jan 28, 2024 at 10:20:30AM -0800, Mark Millard wrote:
>> ...
>>> That said, one of the machines in question is my local "build machine" --
>>> and for it, in addition to in-place source updates, I also do (weekly)
>>> updates of my "production" machines (at home).
>>> 
>>> And for that case, the production machines mount the builder's /usr/src
>>> and /usr/obj (via NFS) read-only.
>> 
>> Which machine(s) are doing the llvm rebuild that
>> you were hoping would not happen?
> 
> Each of the 3 machines that I update via in-place source updates: the
> above-cited "buildl machine" and a couple of laptops.
> 
>> What was the context like for the history on that machine?
> 
> Each of the machines is updated daily (except when I'm away and
> off-Net); each is updated to the same commit (as each has a local
> private mirror for the FreeBSD git repositories, and after updating the
> build machine's mirror, I use rsync to ensure that the laptops' mirrors
> are in sync with that).
> 
> Update histories for the build machine and one of the laptops is
> available at https://www.catwhisker.org/~david/FreeBSD/history/
> 
> In each of the 3 cases this morning, the machine was running
> stable/14-n266551-63a7e799b32c and updated to
> stable/14-n266554-2ee407b6068a, which (as noted earlier) only changed
> src/usr.sbin/bhyve/pci_nvme.c.  And each machine rebuilt llvm durng
> "make buildworld".

When you built and then installed stable/14-n266551-63a7e799b32c
if you had then simply started another build where you installed,
it would have rebuilt llvm at that point --before 
stable/14-n266554-2ee407b6068a updated source was even present.

The install of 63a7e799b32c made various tools used to
do builds newer than the files used to do the build of
63a7e799b32c. That is enough for META_MODE to initiate
rebuild activity so that things end up synchronized to
be based on the updated installed tools. (Some tools
might not be updated, others might be. The details
depend on which are updated with new timestamps used
by makes "newer" checks.)

Try running make with the debug mode turned on that
reports the "newer than" notices for what leads to
rebuild activity (make -dM) after a notable installworld
but before any source code updates. You might not like
the full range of things checked but you will see why
things are rebuilt.

META_MODE tests date relationships among more files
than you are considering.

> 
>> (The below had to be written without understanding
>> of such things.)
>> 
>> Here is an example META_MODE line recording a
>> tool used during a particular file's rebuild:
>> 
>> E 22961 /bin/sh
>> 
>> So installing an update to /bin/sh via isntallworld
>> would lead to the later META_MODE (re)build
>> indicating that the file needs to be rebuilt, just
>> because /bin/sh ends up being newer after the
>> installworld .
> 
> Perhaps I should rephrase my query to "*Should* an update of (only)
> src/usr.sbin/bhyve/pci_nvme.c cause 'make buildworld' using META_MODE to
> rebuild llvm?"  I seem to have empirical evidence that it does do that.

Changes to src/usr.sbin/bhyve/pci_nvme.c are not a
cause of the rebuild. The prior installworld of
63a7e799b32c is the cause of the rebuild.

If you had tried the build before updating the
source tree, it still would have rebuilt llvm.

> 
>> There are other examples of recorded paths to tools
>> in .meta file, such as (my old context example
>> used in an old E-mail exchange):
>> 
>> /usr/obj/amd64_clang/amd64.amd64/usr/fbsd/mm-src/amd64.amd64/tmp/legacy/usr/sbin/awk
>> 
>> So if /usr/obj/. . ./tmp/legacy/usr/sbin/awk is newer than the file
>> potentially being rebuilt, make ends up with:
>> 
> 
> Right; after some discussion with Simon and/or Bryan (back on 08 July
> 2017), I augmented /etc/src.conf on the laptops to include:
> 
> .MAKE.META.IGNORE_PATHS += /usr/local/etc/libmap.d
> 
> because I (also) had:
> 
> PORTS_MODULES+=x11/nvidia-driver-390
> 
> in there, so x11/nvidia-driver-390 was being rebuilt every time the
> kernel was being rebuilt, and that caused /usr/local/etc/libmap.d to
> get an update.  So META_MODE wasn't cutting down on the rebuilds in that
> case.
> 
> The above .MAKE.META.IGNORE_PATHS line helped address that issue.
> 
> Perhaps something somewhat similar is wanted to prevent the situation
> that catalyzed the initial message in this thread?



===
Mark Millard
marklmi at yahoo.com

Re: Should changes in src/usr.sbin/bhyve/ trigger an llvm rebuild?

2024-01-28 Thread Mark Millard

On Jan 28, 2024, at 14:34, Mark Millard  wrote:

> [Note: your email is rejecting my E-mail:
> 554: 5.7.1 ]
> 
> On Jan 28, 2024, at 14:06, David Wolfskill  wrote:
> 
>> On Sun, Jan 28, 2024 at 10:20:30AM -0800, Mark Millard wrote:
>>> ...
>>>> That said, one of the machines in question is my local "build machine" --
>>>> and for it, in addition to in-place source updates, I also do (weekly)
>>>> updates of my "production" machines (at home).
>>>> 
>>>> And for that case, the production machines mount the builder's /usr/src
>>>> and /usr/obj (via NFS) read-only.
>>> 
>>> Which machine(s) are doing the llvm rebuild that
>>> you were hoping would not happen?
>> 
>> Each of the 3 machines that I update via in-place source updates: the
>> above-cited "buildl machine" and a couple of laptops.
>> 
>>> What was the context like for the history on that machine?
>> 
>> Each of the machines is updated daily (except when I'm away and
>> off-Net); each is updated to the same commit (as each has a local
>> private mirror for the FreeBSD git repositories, and after updating the
>> build machine's mirror, I use rsync to ensure that the laptops' mirrors
>> are in sync with that).
>> 
>> Update histories for the build machine and one of the laptops is
>> available at https://www.catwhisker.org/~david/FreeBSD/history/
>> 
>> In each of the 3 cases this morning, the machine was running
>> stable/14-n266551-63a7e799b32c and updated to
>> stable/14-n266554-2ee407b6068a, which (as noted earlier) only changed
>> src/usr.sbin/bhyve/pci_nvme.c.  And each machine rebuilt llvm durng
>> "make buildworld".
> 
> When you built and then installed stable/14-n266551-63a7e799b32c
> if you had then simply started another build where you installed,
> it would have rebuilt llvm at that point --before 
> stable/14-n266554-2ee407b6068a updated source was even present.
> 
> The install of 63a7e799b32c made various tools used to
> do builds newer than the files used to do the build of
> 63a7e799b32c. That is enough for META_MODE to initiate
> rebuild activity so that things end up synchronized to
> be based on the updated installed tools. (Some tools
> might not be updated, others might be. The details
> depend on which are updated with new timestamps used
> by makes "newer" checks.)
> 
> Try running make with the debug mode turned on that
> reports the "newer than" notices for what leads to
> rebuild activity (make -dM) after a notable installworld
> but before any source code updates. You might not like
> the full range of things checked but you will see why
> things are rebuilt.
> 
> META_MODE tests date relationships among more files
> than you are considering.
> 
>> 
>>> (The below had to be written without understanding
>>> of such things.)
>>> 
>>> Here is an example META_MODE line recording a
>>> tool used during a particular file's rebuild:
>>> 
>>> E 22961 /bin/sh
>>> 
>>> So installing an update to /bin/sh via isntallworld
>>> would lead to the later META_MODE (re)build
>>> indicating that the file needs to be rebuilt, just
>>> because /bin/sh ends up being newer after the
>>> installworld .
>> 
>> Perhaps I should rephrase my query to "*Should* an update of (only)
>> src/usr.sbin/bhyve/pci_nvme.c cause 'make buildworld' using META_MODE to
>> rebuild llvm?"  I seem to have empirical evidence that it does do that.
> 
> Changes to src/usr.sbin/bhyve/pci_nvme.c are not a
> cause of the rebuild. The prior installworld of
> 63a7e799b32c is the cause of the rebuild.
> 
> If you had tried the build before updating the
> source tree, it still would have rebuilt llvm.
> 
>> 
>>> There are other examples of recorded paths to tools
>>> in .meta file, such as (my old context example
>>> used in an old E-mail exchange):
>>> 
>>> /usr/obj/amd64_clang/amd64.amd64/usr/fbsd/mm-src/amd64.amd64/tmp/legacy/usr/sbin/awk
>>> 
>>> So if /usr/obj/. . ./tmp/legacy/usr/sbin/awk is newer than the file
>>> potentially being rebuilt, make ends up with:
>>> 
>> 
>> Right; after some discussion with Simon and/or Bryan (back on 08 July
>> 2017), I augmented /etc/src.conf on the laptops to include:
>> 
>> .MAKE.META.IGNORE_PATHS += /usr/local/etc/libmap.d
>> 
>> because I (also) had:
>> 
>> PORTS_MODULES+=x11/nvidia-driver-390
&g

Re: Should changes in src/usr.sbin/bhyve/ trigger an llvm rebuild?

2024-01-28 Thread Mark Millard

On Jan 28, 2024, at 16:05, David Wolfskill  wrote:

> On Sun, Jan 28, 2024 at 03:00:59PM -0800, Mark Millard wrote:
>> ...
>> To be clear, referencing details of your context:
>> 
>> When you had the stable/14 machines at 1c090bf880bf:
>> 
>> A) You built (META_MODE): 63a7e799b32c
>> B) You installed: 63a7e799b32c
>> C) You rebooted into: 63a7e799b32c
>> 
>> I'm claiming that next doing:
>> 
>> D) build again (still META_MODE): 63a7e799b32c
>> 
>> would have rebuilt llvm at that point, the
>> time-relationship cause(s) being set up
>> during (B).
> 
> As it happens, I rather fumble-fingered the (intended) reboot on the 2nd
> laptop (and started another rebuild instead).
> 
> And I do these within script(1), as it's handy to have a record.
> 
> Note that this differes from the sequence you cite above, in that I
> failed to do the reboot.
> 
> So I powered it back up and -- without updating sources (or the local
> repo mirror, for that matter) -- did another rebuild.
> 

I'm having trouble identifying the detailed sequencing
being reported below.

Doing on one machine:
installworld
buidlworld
buildworld
buildworld
 . .

Will only take large times for the first one
(potentially).

But doing:
installworld
buidlworld
installworld
buildworld
installworld
buildworld
 . .

Can have each buildworld take large times
depending the the details involved.

I need to understand more about what happened
before each buildworld on each machine to know
what sort of timestamp relationships are
involved for files. installworld can
significantly change various timestamp
relationships.

> Here is an extract of some salient lines from the typescript file:
> 
> g1-48(14.0-S)[4] egrep ' built in |Installing .* (started|completed)|Removing 
> old libraries| stable/14-n' s1
> FreeBSD g1-48.catwhisker.org 14.0-STABLE FreeBSD 14.0-STABLE #38 
> stable/14-n266551-63a7e799b32c: Sat Jan 27 11:40:05 UTC 2024 
> r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY amd64 
> 1400506 1400506
>>>> World built in 2351 seconds, ncpu: 8, make -j16

Was a prior step (ignoring reboots, say) an
installworld of 63a7e799b32c, with no other
buidlworlds after the installworld?

(I'm wording for major steps or my description
the possibilities would get rather complicated
and large.)

>>>> Kernel(s)  CANARY built in 898 seconds, ncpu: 8, make -j16
>>>> Installing kernel CANARY completed on Sun Jan 28 12:25:27 UTC 2024
>>>> Installing everything started on Sun Jan 28 12:25:57 UTC 2024
>>>> Installing everything completed on Sun Jan 28 12:28:01 UTC 2024
> FreeBSD g1-48.catwhisker.org 14.0-STABLE FreeBSD 14.0-STABLE #38 
> stable/14-n266551-63a7e799b32c: Sat Jan 27 11:40:05 UTC 2024 
> r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY amd64 
> 1400506 1400506
>>>> World built in 116 seconds, ncpu: 8, make -j16

Was a prior step (ignoring reboots, say) an
installworld of 63a7e799b32c, with no other
buidlworlds after the installworld?

Is the answer different here?

>>>> Kernel(s)  CANARY built in 920 seconds, ncpu: 8, make -j16
>>>> Installing kernel CANARY completed on Sun Jan 28 12:47:55 UTC 2024
>>>> Installing everything started on Sun Jan 28 12:48:25 UTC 2024
>>>> Installing everything completed on Sun Jan 28 12:50:01 UTC 2024
> FreeBSD g1-48.catwhisker.org 14.0-STABLE FreeBSD 14.0-STABLE #40 
> stable/14-n266554-2ee407b6068a: Sun Jan 28 12:39:17 UTC 2024 
> r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY amd64 
> 1400506 1400506
>>>> Removing old libraries
> FreeBSD g1-48.catwhisker.org 14.0-STABLE FreeBSD 14.0-STABLE #40 
> stable/14-n266554-2ee407b6068a: Sun Jan 28 12:39:17 UTC 2024 
> r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY amd64 
> 1400506 1400506
>>>> World built in 124 seconds, ncpu: 8, make -j16

Was a prior step (ignoring reboots, say) an
installworld of 63a7e799b32c with no other
buidlworlds after the, installworld?

>>>> Kernel(s)  CANARY built in 901 seconds, ncpu: 8, make -j16
>>>> Installing kernel CANARY completed on Sun Jan 28 23:34:39 UTC 2024
>>>> Installing everything started on Sun Jan 28 23:35:09 UTC 2024
>>>> Installing everything completed on Sun Jan 28 23:37:16 UTC 2024
> FreeBSD g1-48.catwhisker.org 14.0-STABLE FreeBSD 14.0-STABLE #41 
> stable/14-n266554-2ee407b6068a: Sun Jan 28 23:26:10 UTC 2024 
> r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY amd64 
> 1400506 1400506
>>>> Removing old libraries
> g1-48(14.0-S)[5]

Re: Should changes in src/usr.sbin/bhyve/ trigger an llvm rebuild?

2024-01-28 Thread Mark Millard

ch the partial rebuild of
63a7e799b32c contributes to timestamps that would cause
more rebuilds. The 116 sec indicates: not much gets new
timestamps this time.

~/Downloads/build_typescript.txt:119629: >>> Kernel(s)  CANARY built in 920 
seconds, ncpu: 8, make -j16
~/Downloads/build_typescript.txt:119636: >>> Installing kernel CANARY on Sun 
Jan 28 12:47:27 UTC 2024
~/Downloads/build_typescript.txt:122450: >>> Installing kernel CANARY completed 
on Sun Jan 28 12:47:55 UTC 2024

installkernel does not change notable timestamp relationships of
tools and such vs. other files.

~/Downloads/build_typescript.txt:123346: >>> Installing everything started on 
Sun Jan 28 12:48:25 UTC 2024
~/Downloads/build_typescript.txt:162156: >>> Installing everything completed on 
Sun Jan 28 12:50:01 UTC 2024

This install's both the partial-63a7e799b32c-rebuild material
and the 2ee407b6068a material. The 116 sec figure suggests that
there is not man files with updated timestamps.

A reboot is involved here (or just below), so 2ee407b6068a will show up.

~/Downloads/build_typescript.txt:162840: To remove old libraries run 'make 
delete-old-libs'.
~/Downloads/build_typescript.txt:162841: >> make delete-old OK
~/Downloads/build_typescript.txt:162895: FreeBSD g1-48.catwhisker.org 
14.0-STABLE FreeBSD 14.0-STABLE #40 stable/14-n266554-2ee407b6068a: Sun Jan 28 
12:39:17 UTC 2024 
r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY amd64 
1400506 1400506

The 2ee407b6068a kernel now shows as being in operation.

~/Downloads/build_typescript.txt:162897: >>> Removing old libraries
~/Downloads/build_typescript.txt:162932: FreeBSD g1-48.catwhisker.org 
14.0-STABLE FreeBSD 14.0-STABLE #40 stable/14-n266554-2ee407b6068a: Sun Jan 28 
12:39:17 UTC 2024 
r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY amd64 
1400506 1400506

Still 2ee407b6068a.

~/Downloads/build_typescript.txt:162938: >>> World build started on Sun Jan 28 
23:17:05 UTC 2024
~/Downloads/build_typescript.txt:180497: >>> World built in 124 seconds, ncpu: 
8, make -j16

It is possible here that little or no 63a7e799b32c related
timestamp changes that lead to rebuild activity were involved
in the above buildworld . It depends on the details of what
was rebuilt. the 116 sec and 124 sec figures both suggest:
no much overall.

~/Downloads/build_typescript.txt:200023: >>> Kernel(s)  CANARY built in 901 
seconds, ncpu: 8, make -j16
~/Downloads/build_typescript.txt:200030: >>> Installing kernel CANARY on Sun 
Jan 28 23:34:11 UTC 2024
~/Downloads/build_typescript.txt:202844: >>> Installing kernel CANARY completed 
on Sun Jan 28 23:34:39 UTC 2024

installkernel does not change notable timestamp relationships of
tools and such vs. other files.

~/Downloads/build_typescript.txt:203743: >>> Installing everything started on 
Sun Jan 28 23:35:09 UTC 2024
~/Downloads/build_typescript.txt:242553: >>> Installing everything completed on 
Sun Jan 28 23:37:16 UTC 2024

2ee407b6068a will still show up after the the reboot.

~/Downloads/build_typescript.txt:243237: To remove old libraries run 'make 
delete-old-libs'.
~/Downloads/build_typescript.txt:243238: >> make delete-old OK
~/Downloads/build_typescript.txt:243292: FreeBSD g1-48.catwhisker.org 
14.0-STABLE FreeBSD 14.0-STABLE #41 stable/14-n266554-2ee407b6068a: Sun Jan 28 
23:26:10 UTC 2024 
r...@g1-48.catwhisker.org:/common/S1/obj/usr/src/amd64.amd64/sys/CANARY amd64 
1400506 1400506

Yep, still 2ee407b6068a.

~/Downloads/build_typescript.txt:243294: >>> Removing old libraries


Overall this sequence fits what I expect. The above wording is
more detailed than my earlier quick summaries.


===
Mark Millard
marklmi at yahoo.com

Re: Should changes in src/usr.sbin/bhyve/ trigger an llvm rebuild?

2024-01-29 Thread Mark Millard

On Jan 29, 2024, at 01:50, Alexander Leidinger  wrote:

> Am 2024-01-29 00:00, schrieb Mark Millard:
> 
>> I would have to see make -dM output from (D) to
>> find the specific timing relationships that lead
>> to that. There is way to much to analyze the
>> specifics manually, especially because dependency
>> chains have to be considered.
> 
> Not -stable, but -current

Sequence going back to where a commit change was involved and
installed/booted? That older commit was what? The newer one?
The content of that change contributes to what range of "is
newer than" stuff shows up in the first buildworld after the
first installworld-then-reboot to the newer commit.

A limiting case is doing a buildworld into an empty /usr/obj/
like area so that its later install has everything freshly
built (new timestamps) compared to the prior context. Then
doing a installworld buildworld sequence may have more "is
newer than" notices. (Some cases of updates approximate
such a "largely rebuilt" status, others do not.)

The list is illustrative as is, just possibly not definitive.

> (no change to src, buildworld after installworld to a new BE and booting this 
> new BE):
> # grep newer buildworld_debug.log | grep -E 'amd64.amd64/tmp/(usr|legacy)/' | 
> cut -d : -f 3 | sort -u
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/include/roken.h'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/asn1_compile'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/awk'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/basename'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/cap_mkdb'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/cat'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/clang-tblgen'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/compile_et'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/cp'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/crunchgen'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/crunchide'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/dd'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/env'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/file2c'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/gencat'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/grep'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/gzip'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/jot'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/lex'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/lldb-tblgen'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/llvm-min-tblgen'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/llvm-tblgen'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/ln'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/m4'
>  is newer than the target...
> file 
> '/space/system/usr_obj/space/system/usr_src/amd64.amd64/tmp/legacy/usr/sbin/make-roken'
>  is newer than the target...
> fi

Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task

2024-02-26 Thread Mark Millard

Questions include (generic list for reference,
even if some has been specified):


For /boot/loader.conf (for example) :

What value of sysctl vm.pageout_oom_seq is in use?

This indirectly adjusts the delay before sustained
low free RAM leads to killing processes. Default 12
but 120 is what I use across a wide variety of
systems. More is possible.


For /etc/sysctl.conf :

What values of sysctl vm.swap_enabled and
sysctl vm.swap_idle_enabled are in use? (They work as
a pair.)

Together they can avoid kernel stacks beings swapped out.
(Processes still can page out inactive pages, but not
their kernel stacks.) Processes withe their kernel stacks
swapped out to storage media do not run until the
kernel stacks are swapped back in. Avoiding such for
kernel stacks of processes involved in interacting with
the system can be important ot maintaining control. This
is a big hammer that is not limited to such processes.
Both being 0 is what leads to kernel stacks not being
swapped out.

For /usr/local/etc/poudriere.conf :

What values of the following are in use?


NO_ZFS
USE_TMPFS
PARALLEL_JOBS
ALLOW_MAKE_JOBS
MAX_EXECUTION_TIME
NOHANG_TIME
MAX_EXECUTION_TIME_EXTRACT
MAX_EXECUTION_TIME_INSTALL
MAX_EXECUTION_TIME_PACKAGE
MAX_EXECUTION_TIME_DEINSTALL

(Some, of course, may still have the default
value so the default value would be the answer
in such cases.)

Also: Other system tmpfs use outside poudriere?

ZFS in use in system even if poudriere has NO_ZFS
set? (Such is likely uncommon but is possible.)

(Other contexts than poudriere could have some
analogous questions.)


For /usr/local/etc/poudriere.d/make.conf (for example) :

What value of the likes of MAKE_JOBS_NUMBER is
in use.


Note: PARALLEL_JOBS, ALLOW_MAKE_JOBS, and the
likes of MAKE_JOBS_NUMBER has as context the
number of hardware threads in the context. The
3 load averages (over different time frames)
vs. the hardware threads for the system is
relevant information.


Note: with various examples of package builds that
use 25+ GiBytes of temporary file space, USE_TMPFS
can be highly relevant, as is the RAM space, SWAP
space, and the resultant RAM+SWAP space. But just
the file I/O can be relevant, even if there is
no tmpfs use.


There are questions like: Spinning rust media
usage? (An over-specific but suggestive reference
form the more general subject area.)


Serial console shows a responsiveness problem?
Simple ssh session over local EtherNet? Only if
there is a GUI present, even it is not being
actively used? Only GUI interactions show a
responsiveness problem?


Going in another direction . . .

I'm no ZFS tuning expert but I had performance
problems that I described on the lists and the
person that had increased
vfs.zfs.per_txg_dirty_frees_percent had me try
setting it back to
vfs.zfs.per_txg_dirty_frees_percent=5 . In my
context, the change was very helpful --but, to
me, it was pure magic. My point is more that you
may need judgments from someone with appropriate
internal ZFS knowledge if you are to explore
tuning ZFS. I've no evidence that the specific
setting would be helpful.

There has been a effort to deal with arc_prune
problems/overhead. See:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275594

===
Mark Millard
marklmi at yahoo.com

Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task

2024-02-29 Thread Mark Millard

Peter 'PMc' Much wrote on
Date: Thu, 29 Feb 2024 13:40:05 UTC :

> On 2024-02-27, Edward Sanford Sutton, III  wrote:
> > More recently looked and see top showing threads+system processes 
> > shows I have one core getting 100% cpu for kernel{arc_prune} which has 
> > 21.2 hours over a 2 hour 23 minute uptime.
> 
> Ack.
> 
> > I started looking to see if 
> > https://www.freebsd.org/security/advisories/FreeBSD-EN-23:18.openzfs.asc 
> > was available as a fix for 13 but it is not (and doesn't quite sound 
> > like it was supposed to apply to this issue). Would a kernel thread time 
> > at 100% cpu for only 1 core explain the system becoming unusually 
> > unresponsive?
> 
> That depends. This arc_prune issue does usually go alongside with some
> other kernel thread (vm-whatever) also blocking, so you have two cores
> busy. How many remain?
> 
> There is an updated patch in the PR 275594 (5 pieces), that works for
> 13.3; I have it installed, and only with that I am able to build gcc12
> - otherwise the system would just OOM-crash (vm.pageout_oom_seq=5120
> does not help with this).

The kernel has multiple, distinct OOM messages. Which type are you
seeing? :

"failed to reclaim memory"
"a thread waited too long to allocate a page"
"swblk or swpctrie zone exhausted"
"unknown OOM reason %d"

Also, but only for boot verbose:

"proc %d (%s) failed to alloc page on fault, starting OOM\n"



vm.pageout_oom_seq is specific to delaying just:
"failed to reclaim memory"


===
Mark Millard
marklmi at yahoo.com

Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task

2024-02-29 Thread Mark Millard

[I grabbed locally modify text for one of those messages.]

On Feb 29, 2024, at 08:02, Mark Millard  wrote:

> Peter 'PMc' Much wrote on
> Date: Thu, 29 Feb 2024 13:40:05 UTC :
> 
>> On 2024-02-27, Edward Sanford Sutton, III  wrote:
>>> More recently looked and see top showing threads+system processes 
>>> shows I have one core getting 100% cpu for kernel{arc_prune} which has 
>>> 21.2 hours over a 2 hour 23 minute uptime.
>> 
>> Ack.
>> 
>>> I started looking to see if 
>>> https://www.freebsd.org/security/advisories/FreeBSD-EN-23:18.openzfs.asc 
>>> was available as a fix for 13 but it is not (and doesn't quite sound 
>>> like it was supposed to apply to this issue). Would a kernel thread time 
>>> at 100% cpu for only 1 core explain the system becoming unusually 
>>> unresponsive?
>> 
>> That depends. This arc_prune issue does usually go alongside with some
>> other kernel thread (vm-whatever) also blocking, so you have two cores
>> busy. How many remain?
>> 
>> There is an updated patch in the PR 275594 (5 pieces), that works for
>> 13.3; I have it installed, and only with that I am able to build gcc12
>> - otherwise the system would just OOM-crash (vm.pageout_oom_seq=5120
>> does not help with this).
> 
> The kernel has multiple, distinct OOM messages. Which type are you
> seeing? :
> 
> "failed to reclaim memory"
> "a thread waited too long to allocate a page"

Local text:
> "swblk or swpctrie zone exhausted"

Should have been:

"out of swap space"

> "unknown OOM reason %d"
> 
> Also, but only for boot verbose:
> 
> "proc %d (%s) failed to alloc page on fault, starting OOM\n"
> 
> 
> 
> vm.pageout_oom_seq is specific to delaying just:
> "failed to reclaim memory"
> 


===
Mark Millard
marklmi at yahoo.com

Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task

2024-02-29 Thread Mark Millard

On Feb 29, 2024, at 08:21, Peter  wrote:

> On Thu, Feb 29, 2024 at 08:02:42AM -0800, Mark Millard wrote:
> ! Peter 'PMc' Much wrote on
> ! Date: Thu, 29 Feb 2024 13:40:05 UTC :
> ! 
> ! > There is an updated patch in the PR 275594 (5 pieces), that works for
> ! > 13.3; I have it installed, and only with that I am able to build gcc12
> ! > - otherwise the system would just OOM-crash (vm.pageout_oom_seq=5120
> ! > does not help with this).
> ! 
> ! The kernel has multiple, distinct OOM messages. Which type are you
> ! seeing? :
> ! 
> ! "a thread waited too long to allocate a page"
> 
> That one.

That explains why vm.pageout_oom_seq=5120 did not make a
notable difference in the time frame.

If you cause a verbose boot the code:

   if (bootverbose)
   printf(
   "proc %d (%s) failed to alloc page on fault, starting OOM\n",
   curproc->p_pid, curproc->p_comm);

likely will report what process had failed to get a
page in a timely manor.

There also is control over the criteria for this but is
is more complicated. In /boot/loader.conf (I'm using
defaults):

#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
#vm.pfault_oom_attempts=-1
#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts= 3
#vm.pfault_oom_wait= 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)

If you can be sure of not running out of swap/paging
space, you might try vm.pfault_oom_attempts=-1 .
If you do run out of swap/paging space, it would
deadlock, as I understand. So, if you can tolerate
that the -1 might be an option even if you do run
out of swap/paging space.

I do not have specific suggestions for alternatives
to 3 and 10. It would be exploratory for me if I had
to try such.

For reference:

# sysctl -Td vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pfault_oom_attempts: Number of page allocation attempts in page fault 
handler before it triggers OOM handling
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying 
the page fault handler

===
Mark Millard
marklmi at yahoo.com

Re: 13-STABLE high idprio load gives poor responsiveness and excessive CPU time per task

2024-02-29 Thread Mark Millard

On Feb 29, 2024, at 09:40, Mark Millard  wrote:

> On Feb 29, 2024, at 08:21, Peter  wrote:
> 
>> On Thu, Feb 29, 2024 at 08:02:42AM -0800, Mark Millard wrote:
>> ! Peter 'PMc' Much wrote on
>> ! Date: Thu, 29 Feb 2024 13:40:05 UTC :
>> ! 
>> ! > There is an updated patch in the PR 275594 (5 pieces), that works for
>> ! > 13.3; I have it installed, and only with that I am able to build gcc12
>> ! > - otherwise the system would just OOM-crash (vm.pageout_oom_seq=5120
>> ! > does not help with this).
>> ! 
>> ! The kernel has multiple, distinct OOM messages. Which type are you
>> ! seeing? :
>> ! 
>> ! "a thread waited too long to allocate a page"
>> 
>> That one.
> 
> That explains why vm.pageout_oom_seq=5120 did not make a
> notable difference in the time frame.
> 
> If you cause a verbose boot the code:
> 
>   if (bootverbose)
>   printf(
>   "proc %d (%s) failed to alloc page on fault, starting OOM\n",
>   curproc->p_pid, curproc->p_comm);
> 
> likely will report what process had failed to get a
> page in a timely manor.
> 
> There also is control over the criteria for this but is
> is more complicated. In /boot/loader.conf (I'm using
> defaults):
> 
> #
> # For plunty of swap/paging space (will not
> # run out), avoid pageout delays leading to
> # Out Of Memory killing of processes:
> #vm.pfault_oom_attempts=-1
> #
> # For possibly insufficient swap/paging space
> # (might run out), increase the pageout delay
> # that leads to Out Of Memory killing of
> # processes (showing defaults at the time):
> #vm.pfault_oom_attempts= 3
> #vm.pfault_oom_wait= 10
> # (The multiplication is the total but there
> # are other potential tradoffs in the factors
> # multiplied, even for nearly the same total.)
> 
> If you can be sure of not running out of swap/paging
> space, you might try vm.pfault_oom_attempts=-1 .
> If you do run out of swap/paging space, it would
> deadlock, as I understand. So, if you can tolerate
> that the -1 might be an option even if you do run
> out of swap/paging space.
> 
> I do not have specific suggestions for alternatives
> to 3 and 10. It would be exploratory for me if I had
> to try such.
> 
> For reference:
> 
> # sysctl -Td vm.pfault_oom_attempts vm.pfault_oom_wait
> vm.pfault_oom_attempts: Number of page allocation attempts in page fault 
> handler before it triggers OOM handling
> vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying 
> the page fault handler


I'll note that vm.pageout_oom_seq , vm.pfault_oom_attempts , and
vm.pfault_oom_wait are all live writable, not just boot-time
tunables. In other words, all show a line of output in:

# sysctl -Wd vm.pageout_oom_seq vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM
vm.pfault_oom_attempts: Number of page allocation attempts in page fault 
handler before it triggers OOM handling
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying 
the page fault handler

Not just in:

# sysctl -Td vm.pageout_oom_seq vm.pfault_oom_attempts vm.pfault_oom_wait
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM
vm.pfault_oom_attempts: Number of page allocation attempts in page fault 
handler before it triggers OOM handling
vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying 
the page fault handler

(To see values, to not use the "d".)


===
Mark Millard
marklmi at yahoo.com

RE: FreeBSD 14-0 file swapping broken.

2024-06-09 Thread Mark Millard

Artem Hevorhian  wrote on
Date: Sun, 09 Jun 2024 15:30:21 UTC :

> I would like to report that, likely, in FreeBSD version 14.0-stable, file
> swapping is broken. To confirm, here is what I tried to do and what I
> achieved. In order to reproduce the problem, please follow the following
> steps.
> 
> I was following this tutorial
> https://www.cyberciti.biz/faq/create-a-freebsd-swap-file/
> 
> I created a large swap file (8192 MiB) and saved it to /root/swap.8G.bin.
> 
> After that, I ran
> 
> sudo chmod 0600 /root/swap.8G.bin
> 
> After that, I updated fstab by adding the following line to the end.
> 
> md42 none swap sw,file=/root/swap.8G.bin 0 0
> 
> On running
> 
> sudo swapon -aq
> 
> I got the swap file working initially, and I saw it after running swapinfo.
> But on reboot, it disappeared.

Going a different direction than how to enable using of swap files is the
following. It is not FreeBSD version specific for any supported version
(RELEASE or STABLE) or for main [future: 15.*] and has a long history going
back into now long unsupported versions.

QUOTE ( of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048#c7 )
On 2017-Feb-13, at 7:20 PM, Konstantin Belousov  wrote
on the freebsd-arm list:

 . .

swapfile write requires the write request to come through the filesystem
write path, which might require the filesystem to allocate more memory
and read some data. E.g. it is known that any ZFS write request
allocates memory, and that write request on large UFS file might require
allocating and reading an indirect block buffer to find the block number
of the written block, if the indirect block was not yet read.

As result, swapfile swapping is more prone to the trivial and unavoidable
deadlocks where the pagedaemon thread, which produces free memory, needs
more free memory to make a progress. Swap write on the raw partition over
simple partitioning scheme directly over HBA are usually safe, while e.g.
zfs over geli over umass is the worst construction.
END QUOTE


Summary consequence: I recommend only using swap partitions, not swap
  files.


Yes, I have suffered deadlocks from attempted swap file use, with just
UFS over umass (USB SSD) being what held the the swap file in question.

===
Mark Millard
marklmi at yahoo.com

RE: New FreeBSD snapshots available: stable/14 (20240606 e77813f7e4a3) [ bad stable/14 info for 21 Jun 2024, empty snapshots/ISO-IMAGES/14.1/ ]

2024-06-28 Thread Mark Millard

Looking at:

https://lists.freebsd.org/archives/freebsd-snapshots/2024-June/000419.html
( Date: Fri, 21 Jun 2024 00:42:00 UTC )

and at:

https://lists.freebsd.org/archives/freebsd-snapshots/2024-June/000414.html
( Date: Fri, 07 Jun 2024 00:37:56 UTC )

they both indicate:

0240606 e77813f7e4a3

Also:

http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/14.1/

is empty. This prevents me from suggesting a test if a bug
report is reproducible from an official stable/14 snapshot
instead of just from someone's personal build of stable/14
(for a RPi3B failure context).


===
Mark Millard
marklmi at yahoo.com

1 2 >

1 - 100 of 148 matches

Mail list logo