Geli encryption issue on r362779

2020-06-30 Thread Thomas Laus
This is a repost to this list because of my submission email address did
not match the one on record.

==
List:

I just upgraded a couple of computers from r362220 to r362779 and have
booting issue on my Core2 duo laptop with the passphrase unlocking the
encrypted partition.  If I type in the correct passphrase, my laptop
reboots.  If I type in an incorrect one, it prompts me to enter another
one.  On my i5 Skylake desktop, everything works as expected.  I had to
copy an older 'gptzfsboot' file from a distribution CD to allow me to
boot my laptop.

The Core2 duo doesn't have the hardware encryption feature and uses
software emulation of AESNI.  The i5 has the hardware encryption
feature.  Once I copied the old 'gptzfsboot' file to my laptop, it boots OK.

Tom


-- 
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Geli encryption issue on r362779

2020-06-30 Thread Toomas Soome


> On 30. Jun 2020, at 15:01, Thomas Laus  wrote:
> 
> This is a repost to this list because of my submission email address did
> not match the one on record.
> 
> ==
> List:
> 
> I just upgraded a couple of computers from r362220 to r362779 and have
> booting issue on my Core2 duo laptop with the passphrase unlocking the
> encrypted partition.  If I type in the correct passphrase, my laptop
> reboots.  If I type in an incorrect one, it prompts me to enter another
> one.  On my i5 Skylake desktop, everything works as expected.  I had to
> copy an older 'gptzfsboot' file from a distribution CD to allow me to
> boot my laptop.
> 
> The Core2 duo doesn't have the hardware encryption feature and uses
> software emulation of AESNI.  The i5 has the hardware encryption
> feature.  Once I copied the old 'gptzfsboot' file to my laptop, it boots OK.
> 
> Tom
> 

The boot bits do not use hardware encryption at all, so it must be something 
else. Unfortunately testing it is a bit annoying and time consuming task, but 
we should go through it if we want to get to the bottom of the issue. It means, 
we need to insert printf’s to the code to see how far we get and why we end up 
with reboot. Please let me know when you can spend time with testing.

rgds,
toomas
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Geli encryption issue on r362779

2020-06-30 Thread Thomas Laus
On 2020-06-30 11:31, Toomas Soome wrote:
> 
> hi!
> 
> 362431: https://svnweb.freebsd.org/base?view=revision&revision=362431
> 
> The majority of the code is now shared with loader (libsa/libi386),
> however, we do have some bits in zfsboot.c, which is common part of
> gptzfsboot and zfsboot.
>
That looks like the problem revision.  My i5 can still boot this
revision, but my Core2 Duo can not.  At least this narrows things a bit.

Tom

-- 
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-30 Thread Ryan Libby
On Sun, Jun 28, 2020 at 9:57 PM Rick Macklem  wrote:
>
> Just in case you were waiting for another email, I have now run several
> cycles of the kernel build over NFS on a recent head kernel with the
> one line change and it has not hung.
>
> I don't know if this is the correct fix, but it would be nice to get something
> into head to fix this.
>
> If I don't hear anything in the next few days, I'll put it in a PR so it
> doesn't get forgotten.
>
> rick

Thanks for the follow through on this.

I think the patch is not complete.  It looks like the problem is that
for systems that do not have UMA_MD_SMALL_ALLOC, we do
uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc);
but we haven't set an appropriate free function.  This is probably why
UMA_ZONE_NOFREE was originally there.  When NOFREE was removed, it was
appropriate for systems with uma_small_alloc.

So by default we get page_free as our free function.  That calls
kmem_free, which calls vmem_free ... but we do our allocs with
vmem_xalloc.  I'm not positive, but I think the problem is that in
effect we vmem_xalloc -> vmem_free, not vmem_xfree.

Three possible fixes:
 1: The one you tested, but this is not best for systems with
uma_small_alloc.
 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC.
 3: Actually provide an appropriate vmem_bt_free function.

I think we should just do option 2 with a comment, it's simple and it's
what we used to do.  I'm not sure how much benefit we would see from
option 3, but it's more work.

Ryan

>
> 
> From: owner-freebsd-curr...@freebsd.org  
> on behalf of Rick Macklem 
> Sent: Thursday, June 18, 2020 11:42 PM
> To: Ryan Libby
> Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org
> Subject: Re: r358252 causes intermittent hangs where processes are stuck 
> sleeping on btalloc
>
> Ryan Libby wrote:
> >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
> >>
> >> Rick Macklem wrote:
> >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over =
> NFS.
> >> >I thought this was the culprit, since I did 6 cycles of r358097 without =
> a hang.
> >> >However, I just got a hang with r358097, but it looks rather different.
> >> >The r358097 hang did not have any processes sleeping on btalloc. They
> >> >appeared to be waiting on two different locks in the buffer cache.
> >> >As such, I think it might be a different problem. (I'll admit I should h=
> ave
> >> >made notes about this one before rebooting, but I was flustrated that
> >> >it happened and rebooted before looking at it mush detail.)
> >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
> >> got a hang.
> >> --> It seems that r358097 is the culprit and r358098 makes it easier
> >>   to reproduce.
> >>   --> Basically runs out of kernel memory.
> >>
> >> It is not obvious if I can revert these two commits without reverting
> >> other ones, since there were a bunch of vm changes after these.
> >>
> >> I'll take a look, but if you guys have any ideas on how to fix this, plea=
> se
> >> let me know.
> >>
> >> Thanks, rick
> >
> >Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
> >zone to see if that rescues it, on whatever base revision gets you a
> >reliable repro?
> Good catch! That seems to fix it. I've done 8 cycles of kernel build over
> NFS without a hang (normally I'd get one in the first 1-3 cycles).
>
> I don't know if the intend was to delete UMA_ZONE_VM and r358097
> had a typo in it and deleted UMA_ZONE_NOFREE or ???
>
> Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and
> the hangs seem to have gone away.
>
> The small patch I did is attached, in case that isn't what you meant.
>
> I'll run a few more cycles just in case, but I think this fixes it.
>
> Thanks, rick
>
> >
> > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium=
>  4
> > (single core i386) with 1.25Gbytes ram when doing kernel builds using
> > head kernels from this winter. (I also saw one when doing a kernel build
> > on UFS, so they aren't NFS specific, although easier to reproduce that wa=
> y.)
> > After a typical hang, there will be a bunch of processes sleeping on "bta=
> lloc"
> > and several processes holding the following lock:
> > exclusive sx lock @ vm/vm_map.c:4761
> > - I have seen hangs where that is the only lock held by any process excep=
> t
> >the interrupt thread.
> > - I have also seen processes waiting on the following locks:
> > kern/subr_vmem.c:1343
> > kern/subr_vmem.c:633
> >
> > I can't be absolutely sure r358098 is the culprit, but it seems to make t=
> he
> > problem more reproducible.
> >
> > If anyone has a patch suggestion, I can test it.
> > Otherwise, I will continue to test r358097 and earlier, to try and see wh=
> at hangs
> > occur. (I've done 8 cycles of testing of r356776 without difficulties, bu=
> t that
> > doesn't guarantee it isn't broken.)
> >
> > There is a bunch more of

Re: Geli encryption issue on r362779

2020-06-30 Thread Toomas Soome



> On 30. Jun 2020, at 20:19, Thomas Laus  wrote:
> 
> On 2020-06-30 11:31, Toomas Soome wrote:
>> 
>> hi!
>> 
>> 362431: https://svnweb.freebsd.org/base?view=revision&revision=362431
>> 
>> The majority of the code is now shared with loader (libsa/libi386),
>> however, we do have some bits in zfsboot.c, which is common part of
>> gptzfsboot and zfsboot.
>> 
> That looks like the problem revision.  My i5 can still boot this
> revision, but my Core2 Duo can not.  At least this narrows things a bit.
> 
> Tom
> 

Yes, that was suspect from start. But lets see where this rabbit hole will go..

rgds,
toomas

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"