from:"Jason A. Donenfeld"

Re: [PATCH] hw/microblaze: pass random seed to fdt

2022-07-21 Thread Jason A. Donenfeld

Hey Edgar,

On Wed, Jul 20, 2022 at 9:13 AM Edgar E. Iglesias
 wrote:
>
>
> On Tue, Jul 19, 2022 at 2:23 PM Jason A. Donenfeld  wrote:
>>
>> If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
>> initialize early. Set this using the usual guest random number
>> generation function. This FDT node is part of the DT specification.
>
>
> Reviewed-by: Edgar E. Iglesias 

Thanks for looking at this. Paolo (CC'd) just sent a few similar
changes for different archs in a pull, but not this one, on the
supposition that you'd roll this into your next pull. If this isn't
the case, please pipe up so Paolo can take it instead later.

Jason

Re: [PULL 7/9] hw/guest-loader: pass random seed to fdt

2022-07-21 Thread Jason A. Donenfeld

Hi Alex,

On Thu, Jul 21, 2022 at 08:36:10PM +0100, Alex Bennée wrote:
> 
> Paolo Bonzini  writes:
> 
> > From: "Jason A. Donenfeld" 
> >
> > If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
> > initialize early. Set this using the usual guest random number
> > generation function. This FDT node is part of the DT specification.
> >
> > Cc: Alex Bennée 
> > Signed-off-by: Jason A. Donenfeld 
> > Message-Id: <20220719121559.135355-1-ja...@zx2c4.com>
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  hw/core/guest-loader.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/hw/core/guest-loader.c b/hw/core/guest-loader.c
> > index 391c875a29..4f8572693c 100644
> > --- a/hw/core/guest-loader.c
> > +++ b/hw/core/guest-loader.c
> > @@ -31,6 +31,7 @@
> >  #include "hw/qdev-properties.h"
> >  #include "qapi/error.h"
> >  #include "qemu/module.h"
> > +#include "qemu/guest-random.h"
> >  #include "guest-loader.h"
> >  #include "sysemu/device_tree.h"
> >  #include "hw/boards.h"
> > @@ -46,6 +47,7 @@ static void loader_insert_platform_data(GuestLoaderState 
> > *s, int size,
> >  g_autofree char *node = g_strdup_printf("/chosen/module@0x%08" PRIx64,
> >  s->addr);
> >  uint64_t reg_attr[2] = {cpu_to_be64(s->addr), cpu_to_be64(size)};
> > +uint8_t rng_seed[32];
> >  
> >  if (!fdt) {
> >  error_setg(errp, "Cannot modify FDT fields if the machine has 
> > none");
> > @@ -55,6 +57,9 @@ static void loader_insert_platform_data(GuestLoaderState 
> > *s, int size,
> >  qemu_fdt_add_subnode(fdt, node);
> >  qemu_fdt_setprop(fdt, node, "reg", ®_attr, sizeof(reg_attr));
> >  
> > +qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
> > +qemu_fdt_setprop(fdt, node, "rng-seed", rng_seed, sizeof(rng_seed));
> > +
> 
> Why are we inserting this here? The guest-loader is only building on
> what the machine type has already constructed which in the case of -M
> virt for riscv and ARM already has code for this.

Wish you would have replied to the list patch before Paolo queued it.

I don't know this mechanism well but I assumed it would pass a unique
seed to each chained loader. Let me know if I'm misunderstanding the
purpose; I have no qualms about dropping this patch.

Jason

Re: [PULL 7/9] hw/guest-loader: pass random seed to fdt

2022-07-22 Thread Jason A. Donenfeld

Hey Alex,

On Fri, Jul 22, 2022 at 10:45:19AM +0100, Alex Bennée wrote:
> All the guest-loader does is add the information about where in memory a
> guest and/or it's initrd have been placed in memory to the DTB. It's
> entirely up to the initial booted code (usually a hypervisor in this
> case) to decide what gets passed up the chain to any subsequent guests.

I think that's also my understanding, but let me tell you what I was
thinking with regards to rng-seed there, and you can tell me if I'm way
off.

The guest-loader puts in memory various loaders in a multistage boot.
Let's call it stage0, stage1, stage2, and finally the kernel. Normally,
rng-seed is only given to one of these stages. That stage may or may not
pass it to the next one, and it most probably does not. And why should
it? The host is in a better position to generate these seeds, rather
than adding finicky and fragile crypto ratcheting code into each stage
bootloader. So, instead, QEMU can just give each stage its own seed, for
it to do whatever with. This way, if stage1 does nothing, at least
there's a fresh unused one available for the kernel when it finally gets
there.

Does what I describe correspond at all with the use of guest-loader? If
so, maybe this patch should stay? If not, discard it as rubbish.

Jason

Re: [PULL 7/9] hw/guest-loader: pass random seed to fdt

2022-07-22 Thread Jason A. Donenfeld

Hi Paolo,

On Fri, Jul 22, 2022 at 02:04:45PM +0200, Paolo Bonzini wrote:
> On 7/21/22 22:20, Jason A. Donenfeld wrote:
> >> Why are we inserting this here? The guest-loader is only building on
> >> what the machine type has already constructed which in the case of -M
> >> virt for riscv and ARM already has code for this.
> > 
> > Wish you would have replied to the list patch before Paolo queued it.
> 
> Come on.
> 
> You posted a couple patches for this work 1 week before soft freeze 
> (which is when maintainer trees should be ready for merge), so that some 
> platforms get the support and some don't depending on how ready they 
> are for the freeze itself.
> 
> Then you post the rest of the implementation on the day of the freeze. 
> This patch has a pretty bad commit message too because any discussion on 
> boot loader chaining belonged there.
> 
> Your own timing was completely off, and the right thing to do would have 
> been to post a single series for all machines.  This way, even if the 
> patches were to go via individual trees, maintainers could coordinate on 
> which version to include, on how to handle migration, and so on.
> 
> Imagine doing the same thing for Linux, you'd be either ignored until 
> the merge window ends, or alternatively shouted at.  Ignoring patches 
> sent so close the soft freeze was my first instinct and it would have 
> been the right call, except that in the meanwhile some architecture had 
> their patches merged and here we are.
> 
> If anything _I_ have to apologize to Alex for picking up the patch in 
> his stead, and for bending the soft freeze rules in an attempt to avoid 
> having half-assed support where some architectures export the seed and 
> some don't.  But you really have no standing to complain to him about 
> not replying timely.

Please simmer down and quit the inane drama.

I don't have any qualms about Alex not replying in the two days before
you sent this pull. What I wish is that this was discussed on the list
before the pull so that we're now not in this awkward situation of
patch review inside of a pull. I don't know the procedures on what
happens now with that. Will this get pulled and now we have to revert?
Do you have to roll a new pull? I just have no idea, as this is all a
new thing for me. So my comment is more about the awkward state of
things than about some kind of failure from Alex. Obviously Alex is fine
here.

Your comments about my timing are also completely unjustified,
ridiculous, and actually a tad bit offensive. For the "high profile"
archs that I really wanted to hit 7.1, I started sending in DTB patches
a good deal of time ago. The only big arch I really wanted to hit 7.1
that wasn't queued up was the i386 patch, which I first posted in June.
Anyway, after it became clear that the i386 work was finally going to be
picked up, I breathed easy and decided to send in patches for the
remaining archs, to be picked up whenever. It was *your* decision that
all the DTB archs get in at the same time, hence picking this up; I had
no particular feelings on it, particularly as I don't know how to test
those remaining architectures like I did with the others. Anyway,
timing-wise, in my own planning, I handled risc-v, or1k, ppc, arm, i386,
and m68k well in advance and have been itching every single day since
posting those patches for them to be queued up somewhere.

So I really find your whole email just obnoxious and unnecessary. I've
been spending time trying to get the rng-seed stuff working on QEMU.
It's been a bit of a learning curve trying to figure out the QEMU
development model, and so I've miss-CC'd a few patches here and there.
But I've definitely tried to get an important subset of those patches in
in a timely manner. As a maintainer, you're definitely having the effect
of turning me off of the project rather than trying to acquaint me with
norms or be helpful.

Please, quit the drama. Enough of this stuff.

Jason

Re: [PULL 7/9] hw/guest-loader: pass random seed to fdt

2022-07-22 Thread Jason A. Donenfeld

Hi Alex,

On Fri, Jul 22, 2022 at 4:37 PM Alex Bennée  wrote:
> That sounds suspiciously like inventing a new ABI between QEMU and
> guests which we generally try to avoid.

Well the ABI is just the "rng-seed" param which is part of the DT
spec. But I can understand why you might find this use a bit "too
creative". So no qualms about dropping it.

Jason

Re: [PATCH v3] target/s390x: support PRNO_TRNG instruction

2022-07-26 Thread Jason A. Donenfeld

Hey David,

On Wed, Jul 20, 2022 at 08:41:48PM +0200, David Hildenbrand wrote:
> I did not review the doc in detail once again, maybe I get to that later
> this week.

Did you ever get around to merging this patch? Is it in some tree
somewhere?

Jason

Re: [PATCH] hw/microblaze: pass random seed to fdt

2022-07-26 Thread Jason A. Donenfeld

Hi Edgar,

On Thu, Jul 21, 2022 at 8:43 PM Edgar E. Iglesias
 wrote:
> Ah OK, Paolo, it would be great if you would take this via your tree!

It looks like Paolo never did this. So you might want to queue this
somewhere, or bug him to take it, or something. I don't know how this
works with 7.1-rc0 just being tagged, but I assume this means this has
to wait until 7.2

Jason

Re: [PATCH] hw/microblaze: pass random seed to fdt

2022-07-27 Thread Jason A. Donenfeld

Hi Richard,

On Tue, Jul 26, 2022 at 08:13:09PM -0700, Richard Henderson wrote:
> On 7/26/22 18:49, Jason A. Donenfeld wrote:
> > Hi Edgar,
> > 
> > On Thu, Jul 21, 2022 at 8:43 PM Edgar E. Iglesias
> >  wrote:
> >> Ah OK, Paolo, it would be great if you would take this via your tree!
> > 
> > It looks like Paolo never did this. So you might want to queue this
> > somewhere, or bug him to take it, or something. I don't know how this
> > works with 7.1-rc0 just being tagged, but I assume this means this has
> > to wait until 7.2
> 
> Yes, it has missed the window by over a week now: soft freeze.
> You really should have kept all of these in one thread.

Shoulda woulda coulda. It doesn't really matter to me much when the
microblaze one goes in, just that it can exist in some maintainer tree.
I'd like to be able to remove this from my "pending patch list" that I
have to monitor lest things be forgotten.

Jason

Re: [PATCH v3] target/s390x: support PRNO_TRNG instruction

2022-07-27 Thread Jason A. Donenfeld

Hey Thomas,

On Wed, Jul 27, 2022 at 08:32:22AM +0200, Thomas Huth wrote:
> On 27/07/2022 03.35, Jason A. Donenfeld wrote:
> > Hey David,
> > 
> > On Wed, Jul 20, 2022 at 08:41:48PM +0200, David Hildenbrand wrote:
> >> I did not review the doc in detail once again, maybe I get to that later
> >> this week.
> > 
> > Did you ever get around to merging this patch? Is it in some tree
> > somewhere?
> 
> QEMU is in the freeze phase now, so new feature won't be merged before the 
> next release, see:

Yea, I understand, that's fine.

> Maybe we could use the time to implement the missing SHA512 stuff to avoid 
> having an inconsistency between the Principles of Operation and the emulated 
> machine in QEMU?

Ooof. You're not /wrong/ of course. This actually makes
a lot of sense. But I was hoping to somehow skip out on this part,
because I don't know much about s390 and wiring up the handlers seems
finicky. But I can learn!

Actually, though, any interest in working together on this? I can work
on the crypto-side of things, fashioning a minimal sha512 implementation
that's small enough it can fit in crypto_helper.c with support for the
incremental block state stuff s390 needs, and then you can work on
wiring in all the instructions and telling me what semantics you need
from the crypto.

Interested? (Offer of working together goes out to David too of course.)
If so, maybe poke me on IRC? I'm zx2c4 on the various networks.

Jason

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-02 Thread Jason A. Donenfeld

Hi,

On Tue, Aug 02, 2022 at 11:28:15AM +0800, Xiaoyao Li wrote:
> >   static void pc_q35_7_0_machine_options(MachineClass *m)
> >   {
> > +PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
> >   pc_q35_7_1_machine_options(m);
> >   m->alias = NULL;
> > +pcmc->legacy_no_rng_seed = true;
> 
> Is making .legacy_no_rng_seed default false and opt-in it for old 
> machines correct?

Not sure I follow what you're saying. ≤7.0 won't pass the RNG seed. It's
only on ≥7.1 that the RNG seed is used.

Either way, this shouldn't cause boot failures.

Jason

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-02 Thread Jason A. Donenfeld

Hi Xiaoyao,

On Tue, Aug 02, 2022 at 10:53:07PM +0800, Xiaoyao Li wrote:
> yes, with >= 7.1, pcmc->legacy_no_rng_seed = false by default, and RNG 
> seed is used.

This is intended behavior. Being on by default is basically the whole
point of it. Otherwise it's useless.

> 
> > Either way, this shouldn't cause boot failures.
> 
> It does fail booting OVMF with #PF. Below diff can fix the #PF for me.

Huh, interesting. Sounds like maybe there's a bug I need to fix. Can you
send me some repro instructions, and I'll look into it right away.

Jason

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-02 Thread Jason A. Donenfeld

Hi Xiaoyao,

On Tue, Aug 2, 2022 at 5:06 PM Jason A. Donenfeld  wrote:
>
> Hi Xiaoyao,
>
> On Tue, Aug 02, 2022 at 10:53:07PM +0800, Xiaoyao Li wrote:
> > yes, with >= 7.1, pcmc->legacy_no_rng_seed = false by default, and RNG
> > seed is used.
>
> This is intended behavior. Being on by default is basically the whole
> point of it. Otherwise it's useless.
>
> >
> > > Either way, this shouldn't cause boot failures.
> >
> > It does fail booting OVMF with #PF. Below diff can fix the #PF for me.
>
> Huh, interesting. Sounds like maybe there's a bug I need to fix. Can you
> send me some repro instructions, and I'll look into it right away.

I just tried booting Fedora using OVMF and didn't have any problems. I
used this command line:

qemu-system-x86_64 -machine q35 -enable-kvm -cpu host,-rdrand,-rdseed
-smp cores=8 -drive file=disk.qcow2,if=virtio -net nic,model=virtio
-net user,hostfwd=tcp::19230-:22 -m 8G -vga qxl -device
virtio-serial-pci -device
virtserialport,chardev=spicechannel0,name=com.redhat.spice.
0 -chardev spicevmc,id=spicechannel0,name=vdagent -spice
unix,addr=/tmp/vm_spice_fedora.socket,disable-ticketing,playback-compression=off,agen
t-mouse=on,seamless-migration,gl=on -device
virtserialport,chardev=spicechannel1,name=org.spice-space.webdav.0
-chardev spiceport,id=spicechan
nel1,name=org.spice-space.webdav.0 -global
driver=cfi.pflash01,property=secure,value=on -drive
if=pflash,format=raw,unit=0,file=OVMF_CODE.secb
oot.fd,readonly=on -drive if=pflash,format=raw,file=OVMF_VARS.secboot.fd

Can you tell me what you're using and give me some links with various
images and such? Doing the straight forward thing doesn't reproduce it
for me.

Thanks,
Jason

Re: [PATCH v3] target/s390x: support PRNO_TRNG instruction

2022-08-02 Thread Jason A. Donenfeld

Hi David, Christian,

While this thread has your attention, I thought I'd reiterate my offer in:
https://lore.kernel.org/qemu-devel/yueouwzdzbqff...@zx2c4.com/

Do either of you want to "take ownership" of this patch to bring it
past the finish line, and I can provide whatever additional crypto
code you need for that?

Jason

Re: [PATCH v3] target/s390x: support PRNO_TRNG instruction

2022-08-02 Thread Jason A. Donenfeld

On Wed, Jul 20, 2022 at 02:08:59PM +0200, Jason A. Donenfeld wrote:
> +case 114:
> +if (r1 & 1 || !r1 || r2 & 1 || !r2)
> +tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);

This is already handled in op_msa. I'm going to remove it for v4.

Re: [PATCH v3] target/s390x: support PRNO_TRNG instruction

2022-08-02 Thread Jason A. Donenfeld

On Tue, Aug 02, 2022 at 05:32:26PM +0200, David Hildenbrand wrote:
> On 02.08.22 17:28, Jason A. Donenfeld wrote:
> > Hi David, Christian,
> > 
> > While this thread has your attention, I thought I'd reiterate my offer in:
> > https://lore.kernel.org/qemu-devel/yueouwzdzbqff...@zx2c4.com/
> > 
> > Do either of you want to "take ownership" of this patch to bring it
> > past the finish line, and I can provide whatever additional crypto
> > code you need for that?
> 
> For me the patch is good enough. But sure, having a SHA512
> implementation would be nice ...
> 
> Long story short, I'll wire up whatever crypto stuff you can come up with ;)

Long story short, I started to take you up on that offer, but because I
am an insane person, before I knew it, the whole thing was done... Patch
series incoming.

Jason

[PATCH v4 0/2] MSA EXT 5 for s390x

2022-08-02 Thread Jason A. Donenfeld

In addition to the prior TRNG patch from v3, this v4 adds SHA-512
support.

I know, I know, I know -- I fussed around asking if somebody would help
me implement this because it was "oh so hard", and offered to do the
crypto part if someone would do the rest. But then once I had the crypto
part, I wanted some way to test it and then... and then the
implementation worked and passed the test vectors.

So now these two patches together implement MSA EXT 5, and appear to be
working with Linux's drivers for it.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 

Jason A. Donenfeld (2):
  target/s390x: support PRNO_TRNG instruction
  target/s390x: support SHA-512 extensions

 target/s390x/gen-features.c  |   4 +
 target/s390x/tcg/crypto_helper.c | 146 +++
 2 files changed, 150 insertions(+)

-- 
2.35.1

[PATCH v4 1/2] target/s390x: support PRNO_TRNG instruction

2022-08-02 Thread Jason A. Donenfeld

In order for hosts running inside of TCG to initialize the kernel's
random number generator, we should support the PRNO_TRNG instruction,
backed in the usual way with the qemu_guest_getrandom helper. This is
confirmed working on Linux 5.19.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |  2 ++
 target/s390x/tcg/crypto_helper.c | 30 ++
 2 files changed, 32 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b9..3d333e2789 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -749,6 +749,8 @@ static uint16_t qemu_V7_0[] = {
  */
 static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
+S390_FEAT_MSA_EXT_5,
+S390_FEAT_PRNO_TRNG,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..8ad4ef1ace 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -12,12 +12,38 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
 #include "s390x-internal.h"
 #include "tcg_s390x.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static void fill_buf_random(CPUS390XState *env, uintptr_t ra,
+uint64_t *buf_reg, uint64_t *len_reg)
+{
+uint8_t tmp[256];
+uint64_t len = *len_reg;
+int reg_len = 64;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+while (len) {
+size_t block = MIN(len, sizeof(tmp));
+
+qemu_guest_getrandom_nofail(tmp, block);
+for (size_t i = 0; i < block; ++i) {
+cpu_stb_data_ra(env, wrap_address(env, *buf_reg), tmp[i], ra);
+*buf_reg = deposit64(*buf_reg, 0, reg_len, *buf_reg + 1);
+--*len_reg;
+}
+len -= block;
+}
+}
+
 uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
  uint32_t type)
 {
@@ -52,6 +78,10 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
uint32_t r2, uint32_t r3,
 cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
 }
 break;
+case 114: /* CPACF_PRNO_TRNG */
+fill_buf_random(env, ra, &env->regs[r1], &env->regs[r1 + 1]);
+fill_buf_random(env, ra, &env->regs[r2], &env->regs[r2 + 1]);
+break;
 default:
 /* we don't implement any other subfunction yet */
 g_assert_not_reached();
-- 
2.35.1

[PATCH v4 2/2] target/s390x: support SHA-512 extensions

2022-08-02 Thread Jason A. Donenfeld

In order to fully support MSA_EXT_5, we have to also support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |   2 +
 target/s390x/tcg/crypto_helper.c | 116 +++
 2 files changed, 118 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 3d333e2789..b6d804fa6d 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -751,6 +751,8 @@ static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
 S390_FEAT_MSA_EXT_5,
 S390_FEAT_PRNO_TRNG,
+S390_FEAT_KIMD_SHA_512,
+S390_FEAT_KLMD_SHA_512,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 8ad4ef1ace..475627aa83 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -19,6 +19,112 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x 
& z); }
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x 
& z) ^ (y & z); }
+static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
+static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
+static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
+static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+
+static const uint64_t K[80] = {
+0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+0x28db77f523047d84ULL, 0x32caab7b40c72493ULL, 0x3c9ebe0a15c9bebcULL,
+0x431d67c49c100d4cULL, 0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
+};
+
+static void kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
parameter_block,
+uint64_t *message_reg, uint64_t *len_reg, uint8_t 
*stack_buffer)
+{
+uint64_t z[8], b[8], a[8], w[16], t;
+int i, j;
+
+for (i = 0; i < 8; ++i)
+z[i] = a[i] = cpu_ldq_be_data_ra(env, wrap_address(env, 
parameter_block + 8 * i), ra);
+
+while (*len_reg >= 128) {
+for (i = 0; i < 16; ++i) {
+if (message_reg)
+w[i] = cpu_ldq_be_data_ra(env, wrap_address(env, *message_reg 
+ 8 * i), ra);
+else
+w[i] = be64_to_cpu(((uint64_t *)stack_buffer)[i]);
+}
+
+for (i = 0; i < 80; ++i) {
+for (j = 0; j < 8; ++j)
+b[j] = a[j];
+t = a[7] + Sigma1(a[4]) + Ch(a[4], a[5], a[6]) + K[i] + w[i % 16];
+b[7] = t + Sigma0(a[0]) + Maj(a[0], a[1], a[2]);
+b[3] += t;
+for (j = 0; j < 8; ++j)
+a[(j

Re: [PATCH v4 2/2] target/s390x: support SHA-512 extensions

2022-08-03 Thread Jason A. Donenfeld

Hi David,

On Wed, Aug 03, 2022 at 01:55:21PM +0200, David Hildenbrand wrote:
> On 02.08.22 21:00, Jason A. Donenfeld wrote:
> > In order to fully support MSA_EXT_5, we have to also support the SHA-512
> > special instructions. So implement those.
> > 
> > The implementation began as something TweetNacl-like, and then was
> > adjusted to be useful here. It's not very beautiful, but it is quite
> > short and compact, which is what we're going for.
> > 
> 
> Do we have to worry about copyright/authorship of the original code or
> did you write that from scratch?

I actually don't really remember how much of that is leftover from
tweetnacl and how much I've rewritten - I've had some variant of this
code or another kicking around in various projects and repos for a long
time. But the tweetnacl stuff is public domain to begin with, so all
good.

> Are we properly handling the length register (r2 + 1) in the
> 24-bit/31-bit addressing mode?
> Similarly, are we properly handling updates to the message register (r2)
> depending on the addressing mode?

Ugh, probably not... I didn't do any of the deposit_64 stuff. I guess
I'll look into that.

> It's worth noting that we might want to implement (also for PRNO-TRNG):
> 
> "The operation is ended when all
> source bytes in the second operand have been pro-
> cessed (called normal completion), or when a CPU-
> determined number of blocks that is less than the
> length of the second operand have been processed
> (called partial completion). The CPU-determined
> number of blocks depends on the model, and may be
> a different number each time the instruction is exe-
> cuted. The CPU-determined number of blocks is usu-
> ally nonzero. In certain unusual situations, this
> number may be zero, and condition code 3 may be
> set with no progress."
> 
> Otherwise, a large length can make us loop quite a while in QEMU,
> without the chance to deliver any other interrupts.

Hmm, okay. Looking at the Linux code, I see:

s.even = (unsigned long)src;
s.odd  = (unsigned long)src_len;
asm volatile(
"   lgr 0,%[fc]\n"
"   lgr 1,%[pba]\n"
"0: .insn   rre,%[opc] << 16,0,%[src]\n"
"   brc 1,0b\n" /* handle partial completion */
: [src] "+&d" (s.pair)
: [fc] "d" (func), [pba] "d" ((unsigned long)(param)),
  [opc] "i" (CPACF_KIMD)
: "cc", "memory", "0", "1");

So I guess that means it'll just loop until it's done? Or do I need to
return "1" from HELPER(msa)?

Jason

[PATCH v5 1/2] target/s390x: support PRNO_TRNG instruction

2022-08-03 Thread Jason A. Donenfeld

In order for hosts running inside of TCG to initialize the kernel's
random number generator, we should support the PRNO_TRNG instruction,
backed in the usual way with the qemu_guest_getrandom helper. This is
confirmed working on Linux 5.19.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |  2 ++
 target/s390x/tcg/crypto_helper.c | 30 ++
 2 files changed, 32 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b9..3d333e2789 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -749,6 +749,8 @@ static uint16_t qemu_V7_0[] = {
  */
 static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
+S390_FEAT_MSA_EXT_5,
+S390_FEAT_PRNO_TRNG,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..8ad4ef1ace 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -12,12 +12,38 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
 #include "s390x-internal.h"
 #include "tcg_s390x.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static void fill_buf_random(CPUS390XState *env, uintptr_t ra,
+uint64_t *buf_reg, uint64_t *len_reg)
+{
+uint8_t tmp[256];
+uint64_t len = *len_reg;
+int reg_len = 64;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+while (len) {
+size_t block = MIN(len, sizeof(tmp));
+
+qemu_guest_getrandom_nofail(tmp, block);
+for (size_t i = 0; i < block; ++i) {
+cpu_stb_data_ra(env, wrap_address(env, *buf_reg), tmp[i], ra);
+*buf_reg = deposit64(*buf_reg, 0, reg_len, *buf_reg + 1);
+--*len_reg;
+}
+len -= block;
+}
+}
+
 uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
  uint32_t type)
 {
@@ -52,6 +78,10 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
uint32_t r2, uint32_t r3,
 cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
 }
 break;
+case 114: /* CPACF_PRNO_TRNG */
+fill_buf_random(env, ra, &env->regs[r1], &env->regs[r1 + 1]);
+fill_buf_random(env, ra, &env->regs[r2], &env->regs[r2 + 1]);
+break;
 default:
 /* we don't implement any other subfunction yet */
 g_assert_not_reached();
-- 
2.35.1

[PATCH v5 2/2] target/s390x: support SHA-512 extensions

2022-08-03 Thread Jason A. Donenfeld

In order to fully support MSA_EXT_5, we have to also support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |   2 +
 target/s390x/tcg/crypto_helper.c | 154 +++
 2 files changed, 156 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 3d333e2789..b6d804fa6d 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -751,6 +751,8 @@ static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
 S390_FEAT_MSA_EXT_5,
 S390_FEAT_PRNO_TRNG,
+S390_FEAT_KIMD_SHA_512,
+S390_FEAT_KLMD_SHA_512,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 8ad4ef1ace..b5e46342d6 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -1,10 +1,12 @@
 /*
  *  s390x crypto helpers
  *
+ *  Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
  *  Copyright (c) 2017 Red Hat Inc
  *
  *  Authors:
  *   David Hildenbrand 
+ *   Jason A. Donenfeld 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -19,6 +21,150 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x 
& z); }
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x 
& z) ^ (y & z); }
+static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
+static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
+static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
+static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+
+static const uint64_t K[80] = {
+0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+0x28db77f523047d84ULL, 0x32caab7b40c72493ULL, 0x3c9ebe0a15c9bebcULL,
+0x431d67c49c100d4cULL, 0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
+};
+
+static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
parameter_block,
+   uint64_t *message_reg, uint64_t *len_reg, uint8_t 
*stack_buffer)
+{
+enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep 
interactivity. */
+uint64_t z[8], b[8], a[8], w[16], t;
+uint64_t message = message_reg ? *message_reg : 0, len = *len_reg, 
processed = 0;
+int i, j, reg_len = 64, blocks = 0, cc = 0;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+for (i = 0; i < 8; ++i) {
+

Re: [PATCH v4 2/2] target/s390x: support SHA-512 extensions

2022-08-03 Thread Jason A. Donenfeld

On Wed, Aug 03, 2022 at 02:14:58PM +0200, Jason A. Donenfeld wrote:
> s.even = (unsigned long)src;
> s.odd  = (unsigned long)src_len;
> asm volatile(
> "   lgr 0,%[fc]\n"
> "   lgr 1,%[pba]\n"
> "0: .insn   rre,%[opc] << 16,0,%[src]\n"
> "   brc 1,0b\n" /* handle partial completion */
> : [src] "+&d" (s.pair)
> : [fc] "d" (func), [pba] "d" ((unsigned long)(param)),
>   [opc] "i" (CPACF_KIMD)
> : "cc", "memory", "0", "1");
> 
> So I guess that means it'll just loop until it's done? Or do I need to
> return "1" from HELPER(msa)?

Looks like returning 3 did the trick. v5 incoming...

Jason

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Jason A. Donenfeld

Hi Daniel,

On Wed, Aug 03, 2022 at 11:52:25AM +0100, Daniel P. Berrangé wrote:
> On Tue, Aug 02, 2022 at 05:13:26PM +0200, Jason A. Donenfeld wrote:
> > Hi Xiaoyao,
> > 
> > On Tue, Aug 2, 2022 at 5:06 PM Jason A. Donenfeld  wrote:
> > >
> > > Hi Xiaoyao,
> > >
> > > On Tue, Aug 02, 2022 at 10:53:07PM +0800, Xiaoyao Li wrote:
> > > > yes, with >= 7.1, pcmc->legacy_no_rng_seed = false by default, and RNG
> > > > seed is used.
> > >
> > > This is intended behavior. Being on by default is basically the whole
> > > point of it. Otherwise it's useless.
> > >
> > > >
> > > > > Either way, this shouldn't cause boot failures.
> > > >
> > > > It does fail booting OVMF with #PF. Below diff can fix the #PF for me.
> > >
> > > Huh, interesting. Sounds like maybe there's a bug I need to fix. Can you
> > > send me some repro instructions, and I'll look into it right away.
> > 
> > I just tried booting Fedora using OVMF and didn't have any problems. I
> > used this command line:
> 
> I managed to reproduce on a Fedora 36 host, using QEMU git master from
> today.
> 
>  $ git clone https://gitlab.com/berrange/tiny-vm-tools
>  $ cd tiny-vm-tools
>  $ ./make-tiny-image.py --run date date
>  tiny-initrd.img
>  Copy lib /lib/ld-musl-x86_64.so.1 -> 
> /tmp/make-tiny-imagebcuv8i_b/lib/ld-musl-x86_64.so.1
>  Copy bin /usr/bin/date -> /tmp/make-tiny-imagebcuv8i_b/bin/date
>  Copy lib /lib64/libc.so.6 -> /tmp/make-tiny-imagebcuv8i_b/lib64/libc.so.6
>  Copy lib /lib64/ld-linux-x86-64.so.2 -> 
> /tmp/make-tiny-imagebcuv8i_b/lib64/ld-linux-x86-64.so.2
> 
>  $ cp /usr/share/edk2/ovmf/OVMF_VARS.fd vars.fd
> 
>  $ ~/src/virt/qemu.git/build/qemu-system-x86_64 \
>-blockdev 
> node-name=file_ovmf_code,driver=file,filename=/usr/share/edk2/ovmf/OVMF_CODE.fd,auto-read-only=on,discard=unmap
>  \
>-blockdev 
> node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \
>-blockdev 
> node-name=file_ovmf_vars,driver=file,filename=vars.fd,auto-read-only=on,discard=unmap
>  \
>-blockdev 
> node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars  \
>-machine pc-q35-7.1,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
>-kernel /boot/vmlinuz-5.18.5-200.fc36.x86_64 \
>-initrd tiny-initrd.img \
>-m 8000 \
>-display none \
>-nodefaults \
>-serial stdio \
>-append 'console=ttyS0 quiet'

Thanks for the info. Very helpful. Looking into it now.

Jason

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Jason A. Donenfeld

On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote:
> Thanks for the info. Very helpful. Looking into it now.

So interestingly, this is not a new issue. If you pass any type of setup
data, OVMF appears to be doing something unusual and passing 0x
for all the entries, rather than the actual data. The reason this isn't
new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the
same page fault, on all QEMU versions. The thing that passes the DTB is
the thing that passes the RNG seed. Same mechanism, same bug.

I'm looking into it...

Jason

[PATCH RFC v1] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Jason A. Donenfeld

The boot parameter header refers to setup_data at an absolute address,
and each setup_data refers to the next setup_data at an absolute address
too. Currently QEMU simply puts the setup_datas right after the kernel
image, and since the kernel_image is loaded at prot_addr -- a fixed
address knowable to QEMU apriori -- the setup_data absolute address
winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.

This mostly works fine, so long as the kernel image really is loaded at
prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
generally EFI doesn't give a good way of predicting where it's going to
load the kernel. So when it loads it at some address != prot_addr, the
absolute addresses in setup_data now point somewhere bogus, causing
crashes when EFI stub tries to follow the next link.

Fix this by placing setup_data at some fixed place in memory, not as
part of the kernel image, and then pointing the setup_data absolute
address to that fixed place in memory. This way, even if OVMF or other
chains relocate the kernel image, the boot parameter still points to the
correct absolute address.

=== NOTE NOTE NOTE NOTE NOTE ===
This commit is currently garbage! It fixes the boot test case, but it
just picks the address 0x1000. That's probably not a good idea. If
somebody with some x86 architectural knowledge could let me know a
better reserved place to put this, that'd be very appreciated.

Fixes: 3cbeb52467 ("hw/i386: add device tree support")
Reported-by: Xiaoyao Li 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Michael S. Tsirkin 
Cc: Daniel P. Berrangé 
Cc: Gerd Hoffmann 
Cc: Ard Biesheuvel 
Cc: linux-...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..0b0083b345 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -773,9 +773,9 @@ void x86_load_linux(X86MachineState *x86ms,
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
 int setup_size, kernel_size, cmdline_size;
-int dtb_size, setup_data_offset;
+int dtb_size, setup_data_item_len, setup_data_total_len = 0;
 uint32_t initrd_max;
-uint8_t header[8192], *setup, *kernel;
+uint8_t header[8192], *setup, *kernel, *setup_datas = NULL;
 hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0;
 FILE *f;
 char *vmode;
@@ -1048,6 +1048,8 @@ void x86_load_linux(X86MachineState *x86ms,
 }
 fclose(f);
 
+#define SETUP_DATA_PHYS_BASE 0x1000
+
 /* append dtb to kernel */
 if (dtb_filename) {
 if (protocol < 0x209) {
@@ -1062,34 +1064,36 @@ void x86_load_linux(X86MachineState *x86ms,
 exit(1);
 }
 
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
-kernel = g_realloc(kernel, kernel_size);
-
-
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data_item_len = sizeof(struct setup_data) + dtb_size;
+setup_datas = g_realloc(setup_datas, setup_data_total_len + 
setup_data_item_len);
+setup_data = (struct setup_data *)(setup_datas + setup_data_total_len);
 setup_data->next = cpu_to_le64(first_setup_data);
-first_setup_data = prot_addr + setup_data_offset;
+first_setup_data = SETUP_DATA_PHYS_BASE + setup_data_total_len;
+setup_data_total_len += setup_data_item_len;
 setup_data->type = cpu_to_le32(SETUP_DTB);
 setup_data->len = cpu_to_le32(dtb_size);
-
 load_image_size(dtb_filename, setup_data->data, dtb_size);
 }
 
 if (!legacy_no_rng_seed) {
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + 
RNG_SEED_LENGTH;
-kernel = g_realloc(kernel, kernel_size);
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data_item_len = sizeof(struct setup_data) + SETUP_RNG_SEED;
+setup_datas = g_realloc(setup_datas, setup_data_total_len + 
setup_data_item_len);
+setup_data = (struct setup_data *)(setup_datas + setup_data_total_len);
 setup_data->next = cpu_to_le64(first_setup_data);
-first_setup_data = prot_addr + setup_data_offset;
+first_setup_data = SETUP_DATA_PHYS_BASE + setup_data_total_len;
+setup_data_total_len += setup_data_item_len;
 setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
 setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
 qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
 }
 
-/* Offset 0x250 is a pointer to the first setup_data link. */
-stq_p(header + 0x250, first_setup_data);
+if (first_set

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Jason A. Donenfeld

On Wed, Aug 03, 2022 at 03:34:04PM +0200, Jason A. Donenfeld wrote:
> On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote:
> > Thanks for the info. Very helpful. Looking into it now.
> 
> So interestingly, this is not a new issue. If you pass any type of setup
> data, OVMF appears to be doing something unusual and passing 0x
> for all the entries, rather than the actual data. The reason this isn't
> new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the
> same page fault, on all QEMU versions. The thing that passes the DTB is
> the thing that passes the RNG seed. Same mechanism, same bug.
> 
> I'm looking into it...

Fixed with: 
https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/

Feel free to join into the discussion there. I CC'd you.

Jason

[PATCH v6 1/2] target/s390x: support PRNO_TRNG instruction

2022-08-03 Thread Jason A. Donenfeld

In order for hosts running inside of TCG to initialize the kernel's
random number generator, we should support the PRNO_TRNG instruction,
backed in the usual way with the qemu_guest_getrandom helper. This is
confirmed working on Linux 5.19.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |  2 ++
 target/s390x/tcg/crypto_helper.c | 30 ++
 2 files changed, 32 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b9..3d333e2789 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -749,6 +749,8 @@ static uint16_t qemu_V7_0[] = {
  */
 static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
+S390_FEAT_MSA_EXT_5,
+S390_FEAT_PRNO_TRNG,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..8ad4ef1ace 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -12,12 +12,38 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
 #include "s390x-internal.h"
 #include "tcg_s390x.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static void fill_buf_random(CPUS390XState *env, uintptr_t ra,
+uint64_t *buf_reg, uint64_t *len_reg)
+{
+uint8_t tmp[256];
+uint64_t len = *len_reg;
+int reg_len = 64;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+while (len) {
+size_t block = MIN(len, sizeof(tmp));
+
+qemu_guest_getrandom_nofail(tmp, block);
+for (size_t i = 0; i < block; ++i) {
+cpu_stb_data_ra(env, wrap_address(env, *buf_reg), tmp[i], ra);
+*buf_reg = deposit64(*buf_reg, 0, reg_len, *buf_reg + 1);
+--*len_reg;
+}
+len -= block;
+}
+}
+
 uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
  uint32_t type)
 {
@@ -52,6 +78,10 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
uint32_t r2, uint32_t r3,
 cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
 }
 break;
+case 114: /* CPACF_PRNO_TRNG */
+fill_buf_random(env, ra, &env->regs[r1], &env->regs[r1 + 1]);
+fill_buf_random(env, ra, &env->regs[r2], &env->regs[r2 + 1]);
+break;
 default:
 /* we don't implement any other subfunction yet */
 g_assert_not_reached();
-- 
2.35.1

[PATCH 1/2] target/s390x: support PRNO_TRNG instruction

2022-08-03 Thread Jason A. Donenfeld

In order for hosts running inside of TCG to initialize the kernel's
random number generator, we should support the PRNO_TRNG instruction,
backed in the usual way with the qemu_guest_getrandom helper. This is
confirmed working on Linux 5.19.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |  2 ++
 target/s390x/tcg/crypto_helper.c | 30 ++
 2 files changed, 32 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b9..3d333e2789 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -749,6 +749,8 @@ static uint16_t qemu_V7_0[] = {
  */
 static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
+S390_FEAT_MSA_EXT_5,
+S390_FEAT_PRNO_TRNG,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..8ad4ef1ace 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -12,12 +12,38 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
 #include "s390x-internal.h"
 #include "tcg_s390x.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static void fill_buf_random(CPUS390XState *env, uintptr_t ra,
+uint64_t *buf_reg, uint64_t *len_reg)
+{
+uint8_t tmp[256];
+uint64_t len = *len_reg;
+int reg_len = 64;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+while (len) {
+size_t block = MIN(len, sizeof(tmp));
+
+qemu_guest_getrandom_nofail(tmp, block);
+for (size_t i = 0; i < block; ++i) {
+cpu_stb_data_ra(env, wrap_address(env, *buf_reg), tmp[i], ra);
+*buf_reg = deposit64(*buf_reg, 0, reg_len, *buf_reg + 1);
+--*len_reg;
+}
+len -= block;
+}
+}
+
 uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
  uint32_t type)
 {
@@ -52,6 +78,10 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
uint32_t r2, uint32_t r3,
 cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
 }
 break;
+case 114: /* CPACF_PRNO_TRNG */
+fill_buf_random(env, ra, &env->regs[r1], &env->regs[r1 + 1]);
+fill_buf_random(env, ra, &env->regs[r2], &env->regs[r2 + 1]);
+break;
 default:
 /* we don't implement any other subfunction yet */
 g_assert_not_reached();
-- 
2.35.1

[PATCH 2/2] target/s390x: support SHA-512 extensions

2022-08-03 Thread Jason A. Donenfeld

In order to fully support MSA_EXT_5, we have to also support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |   2 +
 target/s390x/tcg/crypto_helper.c | 157 +++
 2 files changed, 159 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 3d333e2789..b6d804fa6d 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -751,6 +751,8 @@ static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
 S390_FEAT_MSA_EXT_5,
 S390_FEAT_PRNO_TRNG,
+S390_FEAT_KIMD_SHA_512,
+S390_FEAT_KLMD_SHA_512,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 8ad4ef1ace..bb4823107c 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -1,10 +1,12 @@
 /*
  *  s390x crypto helpers
  *
+ *  Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
  *  Copyright (c) 2017 Red Hat Inc
  *
  *  Authors:
  *   David Hildenbrand 
+ *   Jason A. Donenfeld 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -19,6 +21,153 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x 
& z); }
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x 
& z) ^ (y & z); }
+static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
+static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
+static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
+static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+
+static const uint64_t K[80] = {
+0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+0x28db77f523047d84ULL, 0x32caab7b40c72493ULL, 0x3c9ebe0a15c9bebcULL,
+0x431d67c49c100d4cULL, 0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
+};
+
+static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
parameter_block,
+   uint64_t *message_reg, uint64_t *len_reg, uint8_t 
*stack_buffer)
+{
+enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep 
interactivity. */
+uint64_t z[8], b[8], a[8], w[16], t;
+uint64_t message = message_reg ? *message_reg : 0, len = *len_reg, 
processed = 0;
+int i, j, reg_len = 64, blocks = 0, cc = 0;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+for (i = 0; i < 8; ++i) {
+

[PATCH v6 2/2] target/s390x: support SHA-512 extensions

2022-08-03 Thread Jason A. Donenfeld

In order to fully support MSA_EXT_5, we have to also support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |   2 +
 target/s390x/tcg/crypto_helper.c | 157 +++
 2 files changed, 159 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 3d333e2789..b6d804fa6d 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -751,6 +751,8 @@ static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
 S390_FEAT_MSA_EXT_5,
 S390_FEAT_PRNO_TRNG,
+S390_FEAT_KIMD_SHA_512,
+S390_FEAT_KLMD_SHA_512,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 8ad4ef1ace..bb4823107c 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -1,10 +1,12 @@
 /*
  *  s390x crypto helpers
  *
+ *  Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
  *  Copyright (c) 2017 Red Hat Inc
  *
  *  Authors:
  *   David Hildenbrand 
+ *   Jason A. Donenfeld 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -19,6 +21,153 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x 
& z); }
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x 
& z) ^ (y & z); }
+static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
+static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
+static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
+static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+
+static const uint64_t K[80] = {
+0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+0x28db77f523047d84ULL, 0x32caab7b40c72493ULL, 0x3c9ebe0a15c9bebcULL,
+0x431d67c49c100d4cULL, 0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
+};
+
+static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
parameter_block,
+   uint64_t *message_reg, uint64_t *len_reg, uint8_t 
*stack_buffer)
+{
+enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep 
interactivity. */
+uint64_t z[8], b[8], a[8], w[16], t;
+uint64_t message = message_reg ? *message_reg : 0, len = *len_reg, 
processed = 0;
+int i, j, reg_len = 64, blocks = 0, cc = 0;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+for (i = 0; i < 8; ++i) {
+

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-03 Thread Jason A. Donenfeld

Hi Michael,

On Wed, Aug 03, 2022 at 06:03:20PM -0400, Michael S. Tsirkin wrote:
> On Wed, Aug 03, 2022 at 07:07:52PM +0200, Jason A. Donenfeld wrote:
> > On Wed, Aug 03, 2022 at 03:34:04PM +0200, Jason A. Donenfeld wrote:
> > > On Wed, Aug 03, 2022 at 03:11:48PM +0200, Jason A. Donenfeld wrote:
> > > > Thanks for the info. Very helpful. Looking into it now.
> > > 
> > > So interestingly, this is not a new issue. If you pass any type of setup
> > > data, OVMF appears to be doing something unusual and passing 0x
> > > for all the entries, rather than the actual data. The reason this isn't
> > > new is: try passing `-dtb any/dtb/at/all/from/anywhere` and you get the
> > > same page fault, on all QEMU versions. The thing that passes the DTB is
> > > the thing that passes the RNG seed. Same mechanism, same bug.
> > > 
> > > I'm looking into it...
> > 
> > Fixed with: 
> > https://lore.kernel.org/all/20220803170235.1312978-1-ja...@zx2c4.com/
> > 
> > Feel free to join into the discussion there. I CC'd you.
> > 
> > Jason
> 
> Hmm I don't think this patch will make it in 7.1 given the
> timeframe. I suspect we should revert the patch for now.
> 
> Which is where you maybe begin to see why we generally
> prefer doing it with features - one can then work around
> bugs by turning the feature on and off.

The bug actually precedes this patch. Just boot with -dtb on any qemu
version and you'll trigger it. We're still at rc0; there should be time
enough for a bug fix. Please do chime in on that thread and maybe we can
come up with something reasonable fast enough.

Jason

Re: [PATCH RFC v1] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Jason A. Donenfeld

Hi Michael,

On Wed, Aug 03, 2022 at 06:25:39PM -0400, Michael S. Tsirkin wrote:
> > -/* Offset 0x250 is a pointer to the first setup_data link. */
> > -stq_p(header + 0x250, first_setup_data);
> > +if (first_setup_data) {
> > +/* Offset 0x250 is a pointer to the first setup_data link. */
> > +stq_p(header + 0x250, first_setup_data);
> > +rom_add_blob("setup_data", setup_datas, setup_data_total_len, 
> > setup_data_total_len,
> > + SETUP_DATA_PHYS_BASE, NULL, NULL, NULL, NULL, 
> > false);
> > +}
> > +
> >
> 
> Allocating memory on x86 is tricky business.  Can we maybe use 
> bios-linker-loader
> with COMMAND_WRITE_POINTER to get an address from firmware?

Hmm. Is BIOSLinker even available to us at this stage in preparation?

One thing to note is that this memory doesn't really need to be
persistent. It's only used extrmely early in boot. So it could be
somewhere that gets used/remapped later on.

Jason

Re: [PATCH RFC v1] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Jason A. Donenfeld

Hey again,

On Thu, Aug 04, 2022 at 12:50:50AM +0200, Jason A. Donenfeld wrote:
> Hi Michael,
> 
> On Wed, Aug 03, 2022 at 06:25:39PM -0400, Michael S. Tsirkin wrote:
> > > -/* Offset 0x250 is a pointer to the first setup_data link. */
> > > -stq_p(header + 0x250, first_setup_data);
> > > +if (first_setup_data) {
> > > +/* Offset 0x250 is a pointer to the first setup_data link. */
> > > +stq_p(header + 0x250, first_setup_data);
> > > +rom_add_blob("setup_data", setup_datas, 
> > > setup_data_total_len, setup_data_total_len,
> > > + SETUP_DATA_PHYS_BASE, NULL, NULL, NULL, NULL, 
> > > false);
> > > +}
> > > +
> > >
> > 
> > Allocating memory on x86 is tricky business.  Can we maybe use 
> > bios-linker-loader
> > with COMMAND_WRITE_POINTER to get an address from firmware?
> 
> Hmm. Is BIOSLinker even available to us at this stage in preparation?
> 
> One thing to note is that this memory doesn't really need to be
> persistent. It's only used extrmely early in boot. So it could be
> somewhere that gets used/remapped later on.

Actually, it's possible there's one place that's already available, and
that this isn't so bad after all. In my tests, this seems to be working
in a wide variety of configurations. I'll send a v2.

Jason

[PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-03 Thread Jason A. Donenfeld

The boot parameter header refers to setup_data at an absolute address,
and each setup_data refers to the next setup_data at an absolute address
too. Currently QEMU simply puts the setup_datas right after the kernel
image, and since the kernel_image is loaded at prot_addr -- a fixed
address knowable to QEMU apriori -- the setup_data absolute address
winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.

This mostly works fine, so long as the kernel image really is loaded at
prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
generally EFI doesn't give a good way of predicting where it's going to
load the kernel. So when it loads it at some address != prot_addr, the
absolute addresses in setup_data now point somewhere bogus, causing
crashes when EFI stub tries to follow the next link.

Fix this by placing setup_data at some fixed place in memory, relative
to real_addr, not as part of the kernel image, and then pointing the
setup_data absolute address to that fixed place in memory. This way,
even if OVMF or other chains relocate the kernel image, the boot
parameter still points to the correct absolute address.

Fixes: 3cbeb52467 ("hw/i386: add device tree support")
Reported-by: Xiaoyao Li 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Michael S. Tsirkin 
Cc: Daniel P. Berrangé 
Cc: Gerd Hoffmann 
Cc: Ard Biesheuvel 
Cc: linux-...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..8b853abf38 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -760,36 +760,36 @@ static bool load_elfboot(const char *kernel_filename,
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
 
 return true;
 }
 
 void x86_load_linux(X86MachineState *x86ms,
 FWCfgState *fw_cfg,
 int acpi_data_size,
 bool pvh_enabled,
 bool legacy_no_rng_seed)
 {
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
 int setup_size, kernel_size, cmdline_size;
-int dtb_size, setup_data_offset;
+int dtb_size, setup_data_item_len, setup_data_total_len = 0;
 uint32_t initrd_max;
-uint8_t header[8192], *setup, *kernel;
-hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0;
+uint8_t header[8192], *setup, *kernel, *setup_datas = NULL;
+hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0, setup_data_base;
 FILE *f;
 char *vmode;
 MachineState *machine = MACHINE(x86ms);
 struct setup_data *setup_data;
 const char *kernel_filename = machine->kernel_filename;
 const char *initrd_filename = machine->initrd_filename;
 const char *dtb_filename = machine->dtb;
 const char *kernel_cmdline = machine->kernel_cmdline;
 SevKernelLoaderContext sev_load_ctx = {};
 enum { RNG_SEED_LENGTH = 32 };
 
 /* Align to 16 bytes as a paranoia measure */
 cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
 
 /* load the kernel header */
 f = fopen(kernel_filename, "rb");
@@ -886,32 +886,33 @@ void x86_load_linux(X86MachineState *x86ms,
 if (protocol < 0x200 || !(header[0x211] & 0x01)) {
 /* Low kernel */
 real_addr= 0x9;
 cmdline_addr = 0x9a000 - cmdline_size;
 prot_addr= 0x1;
 } else if (protocol < 0x202) {
 /* High but ancient kernel */
 real_addr= 0x9;
 cmdline_addr = 0x9a000 - cmdline_size;
 prot_addr= 0x10;
 } else {
 /* High and recent kernel */
 real_addr= 0x1;
 cmdline_addr = 0x2;
 prot_addr= 0x10;
 }
+setup_data_base = real_addr + 0x8000;
 
 /* highest address for loading the initrd */
 if (protocol >= 0x20c &&
 lduw_p(header + 0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
 /*
  * Linux has supported initrd up to 4 GB for a very long time (2007,
  * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
  * though it only sets initrd_max to 2 GB to "work around bootloader
  * bugs". Luckily, QEMU firmware(which does something like bootloader)
  * has supported this.
  *
  * It's believed that if XLF_CAN_BE_LOADED_ABOVE_4G is set, initrd can
  * be loaded into any address.
  *
  * In addition, initrd_max is uint32_t simply because QEMU doesn't
  * support the 64-bit boot protocol (specifically the ext_ramdisk_image
@@ -1049,60 +1050,61 @@ void x86_load_li

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

Hi Ard,

On Thu, Aug 4, 2022 at 2:16 PM Ard Biesheuvel  wrote:
>
> On Thu, 4 Aug 2022 at 14:11, Daniel P. Berrangé  wrote:
> >
> > On Thu, Aug 04, 2022 at 02:03:29PM +0200, Jason A. Donenfeld wrote:
> > > Hi Daniel,
> > >
> > > On Thu, Aug 04, 2022 at 10:25:36AM +0100, Daniel P. Berrangé wrote:
> > > > Yep, and ultimately the inability to distinguish UEFI vs other firmware
> > > > is arguably correct by design, as the QEMU <-> firmware interface is
> > > > supposed to be arbitrarily pluggable for any firmware implementation
> > > > not  limited to merely UEFI + seabios.
> > >
> > > Indeed, I agree with this.
> > >
> > > >
> > > > > For now I suggest either reverting the original patch, or at least not
> > > > > enabling the knob by default for any machine types. In particular, 
> > > > > when
> > > > > using MicroVM, the user must leave the knob disabled when direct 
> > > > > booting
> > > > > a kernel on OVMF, and the user may or may not enable the knob when
> > > > > direct booting a kernel on SeaBIOS.
> > > >
> > > > Having it opt-in via a knob would defeat Jason's goal of having the seed
> > > > available automatically.
> > >
> > > Yes, adding a knob is absolutely out of the question.
> > >
> > > It also doesn't actually solve the problem: this triggers when QEMU
> > > passes a DTB too. It's not just for the new RNG seed thing. This bug
> > > isn't new.
> >
> > In the other thread I also mentioned that this RNG Seed addition has
> > caused a bug with AMD SEV too, making boot measurement attestation
> > fail because the kernel blob passed to the firmware no longer matches
> > what the tenant expects, due to the injected seed.
> >
>
> I was actually expecting this to be an issue in the
> signing/attestation department as well, and you just confirmed my
> suspicion.
>
> But does this mean that populating the setup_data pointer is out of
> the question altogether? Or only that putting the setup_data linked
> list nodes inside the image is a problem?

If you look at the v2 patch, populating boot_param->setup_data winds
up being a fixed value. So even if that part was a problem (though I
don't think it is), it won't be with the v2 patch, since it's always
the same.

Jason

Re: [PATCH v4 2/2] target/s390x: support SHA-512 extensions

2022-08-04 Thread Jason A. Donenfeld

Hi,

On Thu, Aug 04, 2022 at 10:10:52AM +0200, David Hildenbrand wrote:
> > Hm, you don't really want to implement some kind of particial complete.
> > Qemu is an emulation and you would have to implement some kind of
> > fragmenting this based on machine generation.
> 
> Do we?
> 
> "The
> CPU-determined number of bytes depends on the
> model, and may be a different number each time the
> instruction is executed. The CPU-determined number
> of bytes is usually nonzero. In certain unusual situa-
> tions, this number may be zero, and condition code 3
> may be set with no progress. However, the CPU pro-
> tects against endless recurrence of this no-progress
> case.
> "
> 
> I read that as "do what you want, even on a given model it might be random."

Just FYI, I implemented this, and it works in v6. Please take a look at:
https://lore.kernel.org/qemu-devel/20220803171536.1314717-2-ja...@zx2c4.com/

So we can keep that. Or I can send a v7 that removes it.

It wasn't very hard to implement, and it's not very hard to remove, so
either way, just tell me what you want to do.

Jason

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

Hi Daniel,

On Thu, Aug 04, 2022 at 10:25:36AM +0100, Daniel P. Berrangé wrote:
> Yep, and ultimately the inability to distinguish UEFI vs other firmware
> is arguably correct by design, as the QEMU <-> firmware interface is
> supposed to be arbitrarily pluggable for any firmware implementation
> not  limited to merely UEFI + seabios.

Indeed, I agree with this.

> 
> > For now I suggest either reverting the original patch, or at least not
> > enabling the knob by default for any machine types. In particular, when
> > using MicroVM, the user must leave the knob disabled when direct booting
> > a kernel on OVMF, and the user may or may not enable the knob when
> > direct booting a kernel on SeaBIOS.
> 
> Having it opt-in via a knob would defeat Jason's goal of having the seed
> available automatically.

Yes, adding a knob is absolutely out of the question.

It also doesn't actually solve the problem: this triggers when QEMU
passes a DTB too. It's not just for the new RNG seed thing. This bug
isn't new.

Jason

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

On Thu, Aug 4, 2022 at 2:17 PM Jason A. Donenfeld  wrote:
>
> Hi Ard,
>
> On Thu, Aug 4, 2022 at 2:16 PM Ard Biesheuvel  wrote:
> >
> > On Thu, 4 Aug 2022 at 14:11, Daniel P. Berrangé  wrote:
> > >
> > > On Thu, Aug 04, 2022 at 02:03:29PM +0200, Jason A. Donenfeld wrote:
> > > > Hi Daniel,
> > > >
> > > > On Thu, Aug 04, 2022 at 10:25:36AM +0100, Daniel P. Berrangé wrote:
> > > > > Yep, and ultimately the inability to distinguish UEFI vs other 
> > > > > firmware
> > > > > is arguably correct by design, as the QEMU <-> firmware interface is
> > > > > supposed to be arbitrarily pluggable for any firmware implementation
> > > > > not  limited to merely UEFI + seabios.
> > > >
> > > > Indeed, I agree with this.
> > > >
> > > > >
> > > > > > For now I suggest either reverting the original patch, or at least 
> > > > > > not
> > > > > > enabling the knob by default for any machine types. In particular, 
> > > > > > when
> > > > > > using MicroVM, the user must leave the knob disabled when direct 
> > > > > > booting
> > > > > > a kernel on OVMF, and the user may or may not enable the knob when
> > > > > > direct booting a kernel on SeaBIOS.
> > > > >
> > > > > Having it opt-in via a knob would defeat Jason's goal of having the 
> > > > > seed
> > > > > available automatically.
> > > >
> > > > Yes, adding a knob is absolutely out of the question.
> > > >
> > > > It also doesn't actually solve the problem: this triggers when QEMU
> > > > passes a DTB too. It's not just for the new RNG seed thing. This bug
> > > > isn't new.
> > >
> > > In the other thread I also mentioned that this RNG Seed addition has
> > > caused a bug with AMD SEV too, making boot measurement attestation
> > > fail because the kernel blob passed to the firmware no longer matches
> > > what the tenant expects, due to the injected seed.
> > >
> >
> > I was actually expecting this to be an issue in the
> > signing/attestation department as well, and you just confirmed my
> > suspicion.
> >
> > But does this mean that populating the setup_data pointer is out of
> > the question altogether? Or only that putting the setup_data linked
> > list nodes inside the image is a problem?
>
> If you look at the v2 patch, populating boot_param->setup_data winds
> up being a fixed value. So even if that part was a problem (though I
> don't think it is), it won't be with the v2 patch, since it's always
> the same.

Actually, `setup` isn't even modified if SEV is being used anyway. So
really, the approach of this v2 -- of not modifying the kernel image
-- should fix that issue, no matter what.

Jason

Re: [PATCH v4 2/2] target/s390x: support SHA-512 extensions

2022-08-04 Thread Jason A. Donenfeld

Hi,

On Thu, Aug 04, 2022 at 08:56:19AM +0200, Christian Borntraeger wrote:
> We do not support the esa390 mode, but the 24/31 bit _addressing_ modes are
> totally valid to be used in zarch mode (with sam31 for example). The kernel
> does that for example for some diagnoses under z/VM.
> Nobody in problem state should probably do that, but its possible.

v6 of this series handles 24/31:

https://lore.kernel.org/qemu-devel/20220803171536.1314717-1-ja...@zx2c4.com/ 
[unchanged for a while now]
https://lore.kernel.org/qemu-devel/20220803171536.1314717-2-ja...@zx2c4.com/ 
[the new sha512 thing]

Jason

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

On Thu, Aug 4, 2022 at 2:11 PM Jason A. Donenfeld  wrote:
>
> Hi Laszlo,
>
> On Thu, Aug 04, 2022 at 01:31:36PM +0200, Laszlo Ersek wrote:
> > None of the existing info passing methods seem early enough, generic
> > enough, and secure enough (at the same time)...
>
> Can you look at the v2 patch? It seems to work on every configuration I
> throw at it. Keep in mind that setup_data is only used very, very early.
> I can think of a few other places to put it too, looking at the x86
> memory map, that will survive long enough.
>
> I think this might actually be a straightforwardly solvable problem if
> you think about it more basically.

And just to put things in perspective here... We only need like 48
bytes or something at some easy fixed address. That's not much. That's
*got* to be a fairly tractable problem. If v2 has issues, I can't see
why there wouldn't be a different easy place to put a meger 48 bytes
of stuff that then is allowed to be wiped out after early boot.

Jason

Re: [PULL 9/9] hw/i386: pass RNG seed via setup_data entry

2022-08-04 Thread Jason A. Donenfeld

Hi Daniel,

On Thu, Aug 4, 2022 at 2:01 PM Daniel P. Berrangé  wrote:
>
> On Thu, Jul 21, 2022 at 06:36:21PM +0200, Paolo Bonzini wrote:
> > From: "Jason A. Donenfeld" 
> >
> > Tiny machines optimized for fast boot time generally don't use EFI,
> > which means a random seed has to be supplied some other way. For this
> > purpose, Linux (â‰¥5.20) supports passing a seed in the setup_data table
> > with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
> > specialized bootloaders. The linked commit shows the upstream kernel
> > implementation.
> >
> > At Paolo's request, we don't pass these to versioned machine types â‰¤7.0.
>
>
> This change has also broken direct kernel measured boot with AMD SEV
> confidential virtualization.
>
> The vmlinuz that we pass in with -kernel is measured by the BIOS and
> since that gets munged with a random seed, the measurement no longer
> matches the expected measurements the person attesting boot will
> have pre-calculated.
>
> The kernel binary passed to the firmware must be 100% unchanged
> from what the user provided in order for boot measurements to
> succeed.
>
> So at the very least this codes needs to be conditionalized to
> not run when AMD SEV is active.

If you look at the v2 patch, I move all of the setup_data stuff
outside of the kernel image, so the kernel image itself doesn't get
modified. So SEV should still work.

Can you test that patch and see?

Jason

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

Hi Laszlo,

On Thu, Aug 4, 2022 at 3:17 PM Laszlo Ersek  wrote:
>
> On 08/04/22 14:47, Jason A. Donenfeld wrote:
> > On Thu, Aug 4, 2022 at 2:11 PM Jason A. Donenfeld  wrote:
> >>
> >> Hi Laszlo,
> >>
> >> On Thu, Aug 04, 2022 at 01:31:36PM +0200, Laszlo Ersek wrote:
> >>> None of the existing info passing methods seem early enough, generic
> >>> enough, and secure enough (at the same time)...
> >>
> >> Can you look at the v2 patch? It seems to work on every configuration I
> >> throw at it. Keep in mind that setup_data is only used very, very early.
> >> I can think of a few other places to put it too, looking at the x86
> >> memory map, that will survive long enough.
> >>
> >> I think this might actually be a straightforwardly solvable problem if
> >> you think about it more basically.
> >
> > And just to put things in perspective here... We only need like 48
> > bytes or something at some easy fixed address. That's not much. That's
> > *got* to be a fairly tractable problem. If v2 has issues, I can't see
> > why there wouldn't be a different easy place to put a meger 48 bytes
> > of stuff that then is allowed to be wiped out after early boot.
>
> I've looked at v2. It still relies on passing information from QEMU to
> the guest kernel through guest RAM such that the whole firmware
> execution takes place in-between, without the firmware knowing anything
> about that particular area -- effectively treating it as free system
> RAM. Such exceptions are time bombs.
>
> We *have* used hard-coded addresses, sometimes they are unavoidable, but
> then they are open-coded in both QEMU and the firmware, and some early
> part of the firmware takes care to either move the data to a "safe"
> place, or to cover it in-place with a kind of reservation that prevents
> other parts of the firmware from trampling over it. I've debugged
> mistakes (memory corruption) when such reservation was forgotten; it's
> not fun.
>
> In short, I have nothing against the QEMU patch, but then the current
> OvmfPkg maintainers should accept a patch for the firmware too, for
> protecting the area from later firmware components, as early as possible.

What you say mostly makes sense. Though, I was wondering if there's
some unused space (maybe a historical low address) that nothing
touches because it's always been considered reserved. Some kind of
ancient video data region, for example. We only need 48 bytes after
all... My hope was that somebody with a lot of deep x86 knowledge
would be able to smell their way to something that's always good like
that. (Alternatively, there's my real_addr thing from v2.)

Jason

Re: [PATCH] pc: add property for Linux setup_data seed

2022-08-04 Thread Jason A. Donenfeld

On Thu, Aug 04, 2022 at 03:13:20PM +0200, Paolo Bonzini wrote:
> Using a property makes it possible to use the normal compat property
> mechanism instead of ad hoc code; it avoids parameter proliferation
> in x86_load_linux; and allows shipping the code even if it is
> disabled by default.

Strong NACK from me here.

If this kind of thing is off by default, it's as good as useless. Indeed
it shouldn't even be a knob at all. Don't do this.

Rather, let's fix the bug. The code as-is -- going back to the 2016 DTB
addition -- is problematic and needs to be fixed. So let's fix that.
Trying to cover up the problem with a default-off knob just ensures this
stuff will never be made to work right.

Please do not commit this patch.

Rather, let's see what a few round of review can do over on the fix I
posted. We're still in rc0. No need to jump to something terrible like
this.

Jason

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

Hi Daniel,

On Thu, Aug 4, 2022 at 2:54 PM Daniel P. Berrangé  wrote:
>
> On Thu, Aug 04, 2022 at 02:44:11AM +0200, Jason A. Donenfeld wrote:
> > The boot parameter header refers to setup_data at an absolute address,
> > and each setup_data refers to the next setup_data at an absolute address
> > too. Currently QEMU simply puts the setup_datas right after the kernel
> > image, and since the kernel_image is loaded at prot_addr -- a fixed
> > address knowable to QEMU apriori -- the setup_data absolute address
> > winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.
> >
> > This mostly works fine, so long as the kernel image really is loaded at
> > prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
> > generally EFI doesn't give a good way of predicting where it's going to
> > load the kernel. So when it loads it at some address != prot_addr, the
> > absolute addresses in setup_data now point somewhere bogus, causing
> > crashes when EFI stub tries to follow the next link.
> >
> > Fix this by placing setup_data at some fixed place in memory, relative
> > to real_addr, not as part of the kernel image, and then pointing the
> > setup_data absolute address to that fixed place in memory. This way,
> > even if OVMF or other chains relocate the kernel image, the boot
> > parameter still points to the correct absolute address.
> >
> > Fixes: 3cbeb52467 ("hw/i386: add device tree support")
> > Reported-by: Xiaoyao Li 
> > Cc: Paolo Bonzini 
> > Cc: Richard Henderson 
> > Cc: Peter Maydell 
> > Cc: Michael S. Tsirkin 
> > Cc: Daniel P. Berrangé 
> > Cc: Gerd Hoffmann 
> > Cc: Ard Biesheuvel 
> > Cc: linux-...@vger.kernel.org
> > Signed-off-by: Jason A. Donenfeld 
> > ---
> >  hw/i386/x86.c | 38 --
> >  1 file changed, 20 insertions(+), 18 deletions(-)
> >
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index 050eedc0c8..8b853abf38 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
>
>
> >  if (!legacy_no_rng_seed) {
> > -setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
> > -kernel_size = setup_data_offset + sizeof(struct setup_data) + 
> > RNG_SEED_LENGTH;
> > -kernel = g_realloc(kernel, kernel_size);
> > -setup_data = (struct setup_data *)(kernel + setup_data_offset);
> > +setup_data_item_len = sizeof(struct setup_data) + RNG_SEED_LENGTH;
> > +setup_datas = g_realloc(setup_datas, setup_data_total_len + 
> > setup_data_item_len);
> > +setup_data = (struct setup_data *)(setup_datas + 
> > setup_data_total_len);
> >  setup_data->next = cpu_to_le64(first_setup_data);
> > -first_setup_data = prot_addr + setup_data_offset;
> > +first_setup_data = setup_data_base + setup_data_total_len;
> > +setup_data_total_len += setup_data_item_len;
> >  setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
> >  setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
> >  qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
> >  }
> >
> > -/* Offset 0x250 is a pointer to the first setup_data link. */
> > -stq_p(header + 0x250, first_setup_data);
> > +if (first_setup_data) {
> > +/* Offset 0x250 is a pointer to the first setup_data link. */
> > +stq_p(header + 0x250, first_setup_data);
> > +rom_add_blob("setup_data", setup_datas, setup_data_total_len, 
> > setup_data_total_len,
> > + setup_data_base, NULL, NULL, NULL, NULL, false);
> > +}
>
> The boot measurements with AMD SEV now succeed, but I'm a little
> worried about the implications of adding this ROM, when a few lines
> later here we're discarding the 'header' changes for AMD SEV. Is
> this still going to operate correctly in the guest OS if we've
> discarded the header changes below ?

I'll add a !sev_enabled() condition to that block too, so it also
skips adding the ROM, for v3.

Jason

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

On Thu, Aug 4, 2022 at 3:25 PM Laszlo Ersek  wrote:
>
> On 08/04/22 14:16, Ard Biesheuvel wrote:
> > On Thu, 4 Aug 2022 at 14:11, Daniel P. Berrangé  wrote:
> >>
> >> On Thu, Aug 04, 2022 at 02:03:29PM +0200, Jason A. Donenfeld wrote:
> >>> Hi Daniel,
> >>>
> >>> On Thu, Aug 04, 2022 at 10:25:36AM +0100, Daniel P. Berrangé wrote:
> >>>> Yep, and ultimately the inability to distinguish UEFI vs other firmware
> >>>> is arguably correct by design, as the QEMU <-> firmware interface is
> >>>> supposed to be arbitrarily pluggable for any firmware implementation
> >>>> not  limited to merely UEFI + seabios.
> >>>
> >>> Indeed, I agree with this.
> >>>
> >>>>
> >>>>> For now I suggest either reverting the original patch, or at least not
> >>>>> enabling the knob by default for any machine types. In particular, when
> >>>>> using MicroVM, the user must leave the knob disabled when direct booting
> >>>>> a kernel on OVMF, and the user may or may not enable the knob when
> >>>>> direct booting a kernel on SeaBIOS.
> >>>>
> >>>> Having it opt-in via a knob would defeat Jason's goal of having the seed
> >>>> available automatically.
> >>>
> >>> Yes, adding a knob is absolutely out of the question.
> >>>
> >>> It also doesn't actually solve the problem: this triggers when QEMU
> >>> passes a DTB too. It's not just for the new RNG seed thing. This bug
> >>> isn't new.
> >>
> >> In the other thread I also mentioned that this RNG Seed addition has
> >> caused a bug with AMD SEV too, making boot measurement attestation
> >> fail because the kernel blob passed to the firmware no longer matches
> >> what the tenant expects, due to the injected seed.
> >>
> >
> > I was actually expecting this to be an issue in the
> > signing/attestation department as well, and you just confirmed my
> > suspicion.
> >
> > But does this mean that populating the setup_data pointer is out of
> > the question altogether? Or only that putting the setup_data linked
> > list nodes inside the image is a problem?
>
> QEMU already has to inject a whole bunch of stuff into confidential
> computing guests. The way it's done (IIRC) is that the non-compressed,
> trailing part of pflash (basically where the reset vector code lives
> too) is populated at OVMF build time with a chain of GUID-ed structures,
> and fields of those structures are filled in (at OVMF build time) from
> various fixed PCDs. The fixed PCDs in turn are populated from the FD
> files, using various MEMFD regions. When QEMU launches the guest, it can
> parse the GPAs from the on-disk pflash image (traversing the list of
> GUID-ed structs), at which addresses the guest firmware will then expect
> the various crypto artifacts to be injected.
>
> The point is that "who's in control" is reversed. The firmware exposes
> (at build time) at what GPAs it can accept data injections, and QEMU
> follows that. Of course the firmware ensures that nothing else in the
> firmware will try to own those GPAs.
>
> The only thing that needed to be hard-coded when this feature was
> introduced was the "entry point", that is, the flash offset at which
> QEMU starts the GUID-ed structure traversal.
>
> AMD and IBM developers can likely much better describe this mechanism,
> as I've not dealt with it in over a year. The QEMU side code is in
> "hw/i386/pc_sysfw_ovmf.c".
>
> We can make setup_data chaining work with OVMF, but the whole chain
> should be located in a GPA range that OVMF dictates.

It sounds like what you describe is pretty OVMF-specific though,
right? Do we want to tie things together so tightly like that?

Given we only need 48 bytes or so, isn't there a more subtle place we
could just throw this in ram that doesn't need such complex
coordination?

Jason

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

Hi Laszlo,

On Thu, Aug 04, 2022 at 01:31:36PM +0200, Laszlo Ersek wrote:
> None of the existing info passing methods seem early enough, generic
> enough, and secure enough (at the same time)...

Can you look at the v2 patch? It seems to work on every configuration I
throw at it. Keep in mind that setup_data is only used very, very early.
I can think of a few other places to put it too, looking at the x86
memory map, that will survive long enough.

I think this might actually be a straightforwardly solvable problem if
you think about it more basically.

Jason

Re: [PATCH] pc: add property for Linux setup_data seed

2022-08-04 Thread Jason A. Donenfeld

Hi Daniel,

On Thu, Aug 04, 2022 at 02:31:06PM +0100, Daniel P. Berrangé wrote:
> On Thu, Aug 04, 2022 at 03:20:59PM +0200, Jason A. Donenfeld wrote:
> > On Thu, Aug 04, 2022 at 03:13:20PM +0200, Paolo Bonzini wrote:
> > > Using a property makes it possible to use the normal compat property
> > > mechanism instead of ad hoc code; it avoids parameter proliferation
> > > in x86_load_linux; and allows shipping the code even if it is
> > > disabled by default.
> > 
> > Strong NACK from me here.
> > 
> > If this kind of thing is off by default, it's as good as useless. Indeed
> > it shouldn't even be a knob at all. Don't do this.
> 
> You're misunderstanding the patch. This remains on by default for
>  the 7.1 machine type.

Ahhh, I think you're right. Sorry for mis understanding. The "even if it
is disabled by default" of the commit message isn't quite true then,
right?

If I understand correctly, this is a yes/no/auto, which defaults to
auto on newer machine types. And auto triggers if the kernel has a newer
boot header. Is that correct?

if (x86ms->linuxboot_seed != ON_OFF_AUTO_OFF &&
(protocol >= 0x209 || x86ms->linuxboot_seed == ON_OFF_AUTO_ON)) {

I think it's working as described (after applying the below fixup to
this broken patch).

> The patch is merely exposing a knob so that users can override the
> built-in default if they need to. Imagine if we had shipped this
> existing code before today's bugs were discovered.  The knob
> proposed her would allow users to turn off the broken pieces.
> This is a good thing.

I'm still not really keen on adding a knob for this. I understand ARM
has a knob for it for different reasons (better named "dtb-randomness").
If this knob thing is to live on here, maybe it should have
"-randomness" in the name also.

> > Rather, let's fix the bug. The code as-is -- going back to the 2016 DTB
> > addition -- is problematic and needs to be fixed. So let's fix that.
> > Trying to cover up the problem with a default-off knob just ensures this
> > stuff will never be made to work right.
> 
> It isn't covering up the problem, just providing a workaround
> option, should another bug be discovered after release. We
> still need to fix current discussed problems of course.

Thanks for the explanation. I don't like adding a knob. But if it's on
by default for the default machine type, then that's a compromise I
could accept.

Jason


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 00c21f6e4d..074571bc03 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -446,6 +446,7 @@ DEFINE_I440FX_MACHINE(v7_1, "pc-i440fx-7.1", NULL,

 static void pc_i440fx_7_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_7_1_machine_options(m);
 m->alias = NULL;
 m->is_default = false;
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 5bcf100b35..f3aa4694a2 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -383,6 +383,7 @@ DEFINE_Q35_MACHINE(v7_1, "pc-q35-7.1", NULL,

 static void pc_q35_7_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_7_1_machine_options(m);
 m->alias = NULL;
 pcmc->enforce_amd_1tb_hole = false;
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 3fbab258a9..206ce6c547 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -785,6 +785,7 @@ void x86_load_linux(X86MachineState *x86ms,
 const char *dtb_filename = machine->dtb;
 const char *kernel_cmdline = machine->kernel_cmdline;
 SevKernelLoaderContext sev_load_ctx = {};
+enum { RNG_SEED_LENGTH = 32 };

 /* Align to 16 bytes as a paranoia measure */
 cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
@@ -1075,7 +1076,7 @@ void x86_load_linux(X86MachineState *x86ms,
 }

 if (x86ms->linuxboot_seed != ON_OFF_AUTO_OFF &&
-(data.protocol >= 0x209 || x86ms->linuxboot_seed == ON_OFF_AUTO_ON)) {
+(protocol >= 0x209 || x86ms->linuxboot_seed == ON_OFF_AUTO_ON)) {
 setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
 kernel_size = setup_data_offset + sizeof(struct setup_data) + 
RNG_SEED_LENGTH;
 kernel = g_realloc(kernel, kernel_size);

Re: [PATCH 0/2] vmgenid: add generation counter

2022-08-04 Thread Jason A. Donenfeld

Hi Babis,

On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchal...@amazon.es wrote:
> From: Babis Chalios 
> 
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
> 
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
> 
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

As I wrote on LKML:
https://lore.kernel.org/lkml/yuve4vuanu85m...@zx2c4.com/
you seem to be rehashing something already discussed in earlier threads.
I don't think we should rush to adding something like this to QEMU.

Jason

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

Hey Laszlo,

On Thu, Aug 04, 2022 at 03:56:54PM +0200, Laszlo Ersek wrote:
> - do we want setup_data chaining work generally?
> 
> - or do we want only the random seed injection to stop crashing OVMF guests?

Preferably the first - generally. Which brings us to your point:

> > Given we only need 48 bytes or so, isn't there a more subtle place we
> > could just throw this in ram that doesn't need such complex
> > coordination?
> 
> These tricks add up and go wrong after a while. The pedantic
> reservations in the firmware have proved necessary.
> 
> IIUC, with v2, the setup_data_base address would (most frequently) be 96
> KB. edk2 does have uses for very low memory. If OVMF's PlatformPei does
> not reserve away the area, UefiCpuPkg or other drivers might allocate an
> overlapping chunk, even if only temporarily. That might not break the
> firmware, but it could overwrite the random seed. 

Yea, so we don't want an address that something else might overwrite. So
my question is: isn't there some 48 bytes or so available in some low
address (or maybe a high one?) that is traditionally reserved for some
hardware function, and so software doesn't use it, but it turns out QEMU
doesn't use it for anything either, so we can get away placing it at
that address? It seems like there *ought* to be something like that. I
just don't (yet) know what it is...

Jason

[PATCH v3] hw/i386: place setup_data at fixed place in memory

2022-08-04 Thread Jason A. Donenfeld

The boot parameter header refers to setup_data at an absolute address,
and each setup_data refers to the next setup_data at an absolute address
too. Currently QEMU simply puts the setup_datas right after the kernel
image, and since the kernel_image is loaded at prot_addr -- a fixed
address knowable to QEMU apriori -- the setup_data absolute address
winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.

This mostly works fine, so long as the kernel image really is loaded at
prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
generally EFI doesn't give a good way of predicting where it's going to
load the kernel. So when it loads it at some address != prot_addr, the
absolute addresses in setup_data now point somewhere bogus, causing
crashes when EFI stub tries to follow the next link.

Fix this by placing setup_data at some fixed place in memory, not as
part of the kernel image, and then pointing the setup_data absolute
address to that fixed place in memory. This way, even if OVMF or other
chains relocate the kernel image, the boot parameter still points to the
correct absolute address.

For this, an unused part of the hardware mapped area is used, which
isn't used by anything else.

Fixes: 3cbeb52467 ("hw/i386: add device tree support")
Reported-by: Xiaoyao Li 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Peter Maydell 
Cc: Michael S. Tsirkin 
Cc: Daniel P. Berrangé 
Cc: Gerd Hoffmann 
Cc: Ard Biesheuvel 
Cc: Laszlo Ersek 
Cc: linux-...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c | 39 +--
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..3affef3277 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -773,10 +773,10 @@ void x86_load_linux(X86MachineState *x86ms,
 bool linuxboot_dma_enabled = 
X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled;
 uint16_t protocol;
 int setup_size, kernel_size, cmdline_size;
-int dtb_size, setup_data_offset;
+int dtb_size, setup_data_item_len, setup_data_total_len = 0;
 uint32_t initrd_max;
-uint8_t header[8192], *setup, *kernel;
-hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0;
+uint8_t header[8192], *setup, *kernel, *setup_datas = NULL;
+hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
first_setup_data = 0, setup_data_base;
 FILE *f;
 char *vmode;
 MachineState *machine = MACHINE(x86ms);
@@ -899,6 +899,8 @@ void x86_load_linux(X86MachineState *x86ms,
 cmdline_addr = 0x2;
 prot_addr= 0x10;
 }
+/* Nothing else uses this part of the hardware mapped region */
+setup_data_base = 0xf - 0x1000;
 
 /* highest address for loading the initrd */
 if (protocol >= 0x20c &&
@@ -1062,34 +1064,35 @@ void x86_load_linux(X86MachineState *x86ms,
 exit(1);
 }
 
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
-kernel = g_realloc(kernel, kernel_size);
-
-
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data_item_len = sizeof(struct setup_data) + dtb_size;
+setup_datas = g_realloc(setup_datas, setup_data_total_len + 
setup_data_item_len);
+setup_data = (struct setup_data *)(setup_datas + setup_data_total_len);
 setup_data->next = cpu_to_le64(first_setup_data);
-first_setup_data = prot_addr + setup_data_offset;
+first_setup_data = setup_data_base + setup_data_total_len;
+setup_data_total_len += setup_data_item_len;
 setup_data->type = cpu_to_le32(SETUP_DTB);
 setup_data->len = cpu_to_le32(dtb_size);
-
 load_image_size(dtb_filename, setup_data->data, dtb_size);
 }
 
 if (!legacy_no_rng_seed) {
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + 
RNG_SEED_LENGTH;
-kernel = g_realloc(kernel, kernel_size);
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data_item_len = sizeof(struct setup_data) + RNG_SEED_LENGTH;
+setup_datas = g_realloc(setup_datas, setup_data_total_len + 
setup_data_item_len);
+setup_data = (struct setup_data *)(setup_datas + setup_data_total_len);
 setup_data->next = cpu_to_le64(first_setup_data);
-first_setup_data = prot_addr + setup_data_offset;
+first_setup_data = setup_data_base + setup_data_total_len;
+setup_data_total_len += setup_data_item_len;
 setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
 setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
 qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
 }
 
-/* Offset 0x250 is a pointer to the first setup

Re: [PATCH v3] hw/i386: place setup_data at fixed place in memory

2022-08-05 Thread Jason A. Donenfeld

Hi Paolo,

On Fri, Aug 05, 2022 at 10:10:02AM +0200, Paolo Bonzini wrote:
> On 8/5/22 01:04, Jason A. Donenfeld wrote:
> > +/* Nothing else uses this part of the hardware mapped region */
> > +setup_data_base = 0xf - 0x1000;
> 
> Isn't this where the BIOS lives?  I don't think this works.

That's the segment dedicated to ROM and hardware mapped addresses. So
that's a place to put ROM material. No actual software will use it.

Jason

Re: [PATCH v6 2/2] target/s390x: support SHA-512 extensions

2022-08-05 Thread Jason A. Donenfeld

Hi David,

On Fri, Aug 05, 2022 at 01:28:18PM +0200, David Hildenbrand wrote:
> On 03.08.22 19:15, Jason A. Donenfeld wrote:
> > In order to fully support MSA_EXT_5, we have to also support the SHA-512
> > special instructions. So implement those.
> > 
> > The implementation began as something TweetNacl-like, and then was
> > adjusted to be useful here. It's not very beautiful, but it is quite
> > short and compact, which is what we're going for.
> > 
> 
> NIT: we could think about reversing the order of patches. IIRC, patch #1
> itself would trigger a warning when starting QEMU. Having this patch
> first make sense logically.

Good idea. Will do.

> > +static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
> > parameter_block,
> > +   uint64_t *message_reg, uint64_t *len_reg, uint8_t 
> > *stack_buffer)
> > +{
> > +enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep 
> > interactivity. */
> 
> I'd just use a #define outside of the function for that.

Why? What does leaking this into file-level scope do?

> 
> > +uint64_t z[8], b[8], a[8], w[16], t;
> > +uint64_t message = message_reg ? *message_reg : 0, len = *len_reg, 
> > processed = 0;
> > +int i, j, reg_len = 64, blocks = 0, cc = 0;
> > +
> > +if (!(env->psw.mask & PSW_MASK_64)) {
> > +len = (uint32_t)len;
> > +reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
> > +}
> 
> 
> I'd call that message_reg_len. (same in other function)

Will do.

> 
> 
> > +
> > +for (i = 0; i < 8; ++i) {
> > +z[i] = a[i] = cpu_ldq_be_data_ra(env, wrap_address(env, 
> > parameter_block + 8 * i), ra);
> 
> I assume if we get any exception here, we simply didn't make any progress.
> 
> > +}
> > +
> > +while (len >= 128) {
> > +if (++blocks > MAX_BLOCKS_PER_RUN) {
> > +cc = 3;
> > +break;
> > +}
> > +
> > +for (i = 0; i < 16; ++i) {
> > +if (message) {
> > +w[i] = cpu_ldq_be_data_ra(env, wrap_address(env, message + 
> > 8 * i), ra);
> 
> dito

Right, there's no progress, because it's only ever incremented at the
end. And, more importantly, we only ever update the parameter_block
after having done things successfully.

> > +for (i = 0; i < 8; ++i) {
> > +cpu_stq_be_data_ra(env, wrap_address(env, parameter_block + 8 * 
> > i), z[i], ra);
> 
> I wonder what happens if we get an exception somewhere in the middle
> here ... fortunately we can only involve 2 pages.

If this fails, then message_reg and len_reg won't be updated, so it will
have to start over. If it fails part way through, though, then things
are inconsistent. I don't think we want to hassle with trying to restore
the previous state or something insane though. That seems a bit much.

> > +cc = kimd_sha512(env, ra, parameter_block, message_reg, len_reg, NULL);
> > +if (cc) {
> > +return cc;
> > +}
> 
> Doesn't kimd_sha512() update the length register? And if we return with
> cc=3, we'd be in trouble, no?

cc=3 means partial completion. In that case, klmd also returns with a
partial completion. That's good and expected! It means that the next
time it's called, it'll keep going where it left off.

I've actually tried this with the Linux implementation, and it works as
expected.

> One idea could be to simply only process one block at a time. Read all
> inputs first for that block and handle it completely without any
> register modifications. Perform all memory writes in a single call.

That *is* what already happens. Actually, the memory writes only ever
happen at the very end of kimd_sha512.

> Further, I wonder if we should factor out the core of kimd_sha512() to
> only work on temp buffers without any loading/storing of memory, and let
> only kimd_sha512/klmd_sha512 perform all loading/storing. Then it's much
> cleaner who modifies what.

That's not necessary and will complicate things ultimately. See the
above; this is already working as expected.

Jason

Re: [PATCH v3] hw/i386: place setup_data at fixed place in memory

2022-08-09 Thread Jason A. Donenfeld

Hey Paolo,

On Fri, Aug 05, 2022 at 02:47:27PM +0200, Jason A. Donenfeld wrote:
> Hi Paolo,
> 
> On Fri, Aug 05, 2022 at 10:10:02AM +0200, Paolo Bonzini wrote:
> > On 8/5/22 01:04, Jason A. Donenfeld wrote:
> > > +/* Nothing else uses this part of the hardware mapped region */
> > > +setup_data_base = 0xf - 0x1000;
> > 
> > Isn't this where the BIOS lives?  I don't think this works.
> 
> That's the segment dedicated to ROM and hardware mapped addresses. So
> that's a place to put ROM material. No actual software will use it.
> 
> Jason

Unless I've misread the thread, I don't think there are any remaining
objections, right? Can we try merging this and seeing if it fixes the
issue for good?

Jason

[PATCH v7 1/2] target/s390x: support SHA-512 extensions

2022-08-09 Thread Jason A. Donenfeld

In order to fully support MSA_EXT_5, we have to support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |   3 +
 target/s390x/tcg/crypto_helper.c | 157 +++
 2 files changed, 160 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b9..85ab69d04e 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -749,6 +749,9 @@ static uint16_t qemu_V7_0[] = {
  */
 static uint16_t qemu_MAX[] = {
 S390_FEAT_VECTOR_ENH2,
+S390_FEAT_MSA_EXT_5,
+S390_FEAT_KIMD_SHA_512,
+S390_FEAT_KLMD_SHA_512,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..4d45de8faa 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -1,10 +1,12 @@
 /*
  *  s390x crypto helpers
  *
+ *  Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
  *  Copyright (c) 2017 Red Hat Inc
  *
  *  Authors:
  *   David Hildenbrand 
+ *   Jason A. Donenfeld 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -18,6 +20,153 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x 
& z); }
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x 
& z) ^ (y & z); }
+static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
+static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
+static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
+static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+
+static const uint64_t K[80] = {
+0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+0x28db77f523047d84ULL, 0x32caab7b40c72493ULL, 0x3c9ebe0a15c9bebcULL,
+0x431d67c49c100d4cULL, 0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
+};
+
+static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t 
parameter_block,
+   uint64_t *message_reg, uint64_t *len_reg, uint8_t 
*stack_buffer)
+{
+enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep 
interactivity. */
+uint64_t z[8], b[8], a[8], w[16], t;
+uint64_t message = message_reg ? *message_reg : 0, len = *len_reg, 
processed = 0;
+int i, j, message_reg_len = 64, blocks = 0, cc = 0;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+message_reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+f

[PATCH v7 2/2] target/s390x: support PRNO_TRNG instruction

2022-08-09 Thread Jason A. Donenfeld

In order for hosts running inside of TCG to initialize the kernel's
random number generator, we should support the PRNO_TRNG instruction,
backed in the usual way with the qemu_guest_getrandom helper. This is
confirmed working on Linux 5.19.

Cc: Thomas Huth 
Cc: David Hildenbrand 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Cornelia Huck 
Cc: Harald Freudenberger 
Cc: Holger Dengler 
Signed-off-by: Jason A. Donenfeld 
---
 target/s390x/gen-features.c  |  1 +
 target/s390x/tcg/crypto_helper.c | 30 ++
 2 files changed, 31 insertions(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 85ab69d04e..423ae44315 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -752,6 +752,7 @@ static uint16_t qemu_MAX[] = {
 S390_FEAT_MSA_EXT_5,
 S390_FEAT_KIMD_SHA_512,
 S390_FEAT_KLMD_SHA_512,
+S390_FEAT_PRNO_TRNG,
 };
 
 /** END FEATURE DEFS **/
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 4d45de8faa..e155ae1f54 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -14,6 +14,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
 #include "s390x-internal.h"
 #include "tcg_s390x.h"
 #include "exec/helper-proto.h"
@@ -167,6 +168,31 @@ static int klmd_sha512(CPUS390XState *env, uintptr_t ra, 
uint64_t parameter_bloc
 return 0;
 }
 
+static void fill_buf_random(CPUS390XState *env, uintptr_t ra,
+uint64_t *buf_reg, uint64_t *len_reg)
+{
+uint8_t tmp[256];
+uint64_t len = *len_reg;
+int message_reg_len = 64;
+
+if (!(env->psw.mask & PSW_MASK_64)) {
+len = (uint32_t)len;
+message_reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+}
+
+while (len) {
+size_t block = MIN(len, sizeof(tmp));
+
+qemu_guest_getrandom_nofail(tmp, block);
+for (size_t i = 0; i < block; ++i) {
+cpu_stb_data_ra(env, wrap_address(env, *buf_reg), tmp[i], ra);
+*buf_reg = deposit64(*buf_reg, 0, message_reg_len, *buf_reg + 1);
+--*len_reg;
+}
+len -= block;
+}
+}
+
 uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
  uint32_t type)
 {
@@ -209,6 +235,10 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
uint32_t r2, uint32_t r3,
 return klmd_sha512(env, ra, env->regs[1], &env->regs[r2], 
&env->regs[r2 + 1]);
 }
 break;
+case 114: /* CPACF_PRNO_TRNG */
+fill_buf_random(env, ra, &env->regs[r1], &env->regs[r1 + 1]);
+fill_buf_random(env, ra, &env->regs[r2], &env->regs[r2 + 1]);
+break;
 default:
 /* we don't implement any other subfunction yet */
 g_assert_not_reached();
-- 
2.35.1

Re: [PATCH qemu v3] x86: don't let decompressed kernel image clobber setup_data

2023-01-10 Thread Jason A. Donenfeld

Hi Michael,

Could you queue up this patch and mark it as a fix for 7.2.1? It is a
straight-up bug fix for a 7.2 regression that's now affected several
users.

- It has two Tested-by tags on the thread.
- hpa, the maintainer of the kernel side of this, confirmed on one of
  the various tributary threads that this approach is a correct one.
- It doesn't introduce any new functionality.

For your convenience, you can grab this out of lore here:

  https://lore.kernel.org/lkml/20221230220725.618763-1-ja...@zx2c4.com/

Or if you want to yolo it:

  curl https://lore.kernel.org/lkml/20221230220725.618763-1-ja...@zx2c4.com/raw 
| git am -s

It's now sat silent on the mailing list for a while. So let's please get
this committed and backported so that the bug reports stop coming in.

Thanks,
Jason

Re: [RFC PATCH 0/1] Implement entropy leak reporting for virtio-rng

2023-04-03 Thread Jason A. Donenfeld

Hi Babis,

Why are you resending this? As I mentioned before, I'm going to move
forward in implementing this feature in a way that actually works with
the RNG. I'll use your RFC patch as a base, but I think beyond that, I
can take it from here.

Thanks,
Jason

Re: [RFC PATCH 0/1] Implement entropy leak reporting for virtio-rng

2023-04-03 Thread Jason A. Donenfeld

On Mon, Apr 3, 2023 at 4:15 PM Jason A. Donenfeld  wrote:
>
> Hi Babis,
>
> Why are you resending this? As I mentioned before, I'm going to move
> forward in implementing this feature in a way that actually works with
> the RNG. I'll use your RFC patch as a base, but I think beyond that, I
> can take it from here.

Grrr, sorry! This is for QEMU! I understand.

(Kernel ends from me are forthcoming.)

Jason

Re: [RFC PATCH 0/1] Implement entropy leak reporting for virtio-rng

2023-04-11 Thread Jason A. Donenfeld

On Tue, Apr 11, 2023 at 6:19 PM Amit Shah  wrote:
>
> Hey Babis,
>
> On Mon, 2023-04-03 at 12:52 +0200, Babis Chalios wrote:
> > This patchset implements the entropy leak reporting feature proposal [1]
> > for virtio-rng devices.
> >
> > Entropy leaking (as defined in the specification proposal) typically
> > happens when we take a snapshot of a VM or while we resume a VM from a
> > snapshot. In these cases, we want to let the guest know so that it can
> > reset state that needs to be uniqueue, for example.
> >
> > This feature is offering functionality similar to what VMGENID does.
> > However, it allows to build mechanisms on the guest side to notify
> > user-space applications, like VMGENID for userspace and additionally for
> > kernel.
> >
> > The new specification describes two request types that the guest might
> > place in the queues for the device to perform, a fill-on-leak request
> > where the device needs to fill with random bytes a buffer and a
> > copy-on-leak request where the device needs to perform a copy between
> > two guest-provided buffers. We currently trigger the handling of guest
> > requests when saving the VM state and when loading a VM from a snapshot
> > file.
> >
> > This is an RFC, since the corresponding specification changes have not
> > yet been merged. It also aims to allow testing a respective patch-set
> > implementing the feature in the Linux front-end driver[2].
> >
> > However, I would like to ask the community's opinion regarding the
> > handling of the fill-on-leak requests. Essentially, these requests are
> > very similar to the normal virtio-rng entropy requests, with the catch
> > that we should complete these requests before resuming the VM, so that
> > we avoid race-conditions in notifying the guest about entropy leak
> > events. This means that we cannot rely on the RngBackend's API, which is
> > asynchronous. At the moment, I have handled that using getrandom(), but
> > I would like a solution which doesn't work only with (relatively new)
> > Linux hosts. I am inclined to solve that by extending the RngBackend API
> > with a synchronous call to request for random bytes and I'd like to hear
> > opinion's on this approach.
>
> The patch looks OK - I suggest you add a new sync call that also probes
> for the availability of getrandom().

qemu_guest_getrandom_nofail?

Re: [PATCH qemu v3] x86: don't let decompressed kernel image clobber setup_data

2023-01-23 Thread Jason A. Donenfeld

On Mon, Jan 23, 2023 at 6:12 AM Michael S. Tsirkin  wrote:
>
> On Sun, Jan 22, 2023 at 08:21:30PM -0800, Eric Biggers wrote:
> > Hi Michael,
> >
> > On Tue, Jan 10, 2023 at 12:50:42PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Jan 10, 2023 at 04:34:49PM +0100, Jason A. Donenfeld wrote:
> > > > Hi Michael,
> > > >
> > > > Could you queue up this patch and mark it as a fix for 7.2.1? It is a
> > > > straight-up bug fix for a 7.2 regression that's now affected several
> > > > users.
> > >
> > > OK. In the future pls cc me if you want me to merge a patch. Thanks!
> > >
> > > > - It has two Tested-by tags on the thread.
> > > > - hpa, the maintainer of the kernel side of this, confirmed on one of
> > > >   the various tributary threads that this approach is a correct one.
> > > > - It doesn't introduce any new functionality.
> > > >
> > > > For your convenience, you can grab this out of lore here:
> > > >
> > > >   https://lore.kernel.org/lkml/20221230220725.618763-1-ja...@zx2c4.com/
> > > >
> > > > Or if you want to yolo it:
> > > >
> > > >   curl 
> > > > https://lore.kernel.org/lkml/20221230220725.618763-1-ja...@zx2c4.com/raw
> > > >  | git am -s
> > > >
> > > > It's now sat silent on the mailing list for a while. So let's please get
> > > > this committed and backported so that the bug reports stop coming in.
> > > >
> >
> > This patch still isn't on QEMU's master branch.  What happened to it?
> >
> > - Eric
>
> Indeed though I remember picking it up. Tagged again now. Thanks!

Thanks. What branch is this in? I didn't see it on:
https://gitlab.com/mstredhat/qemu/-/branches/active
https://github.com/mstsirkin/qemu/branches

Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data

2023-01-31 Thread Jason A. Donenfeld

On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote:
> From: "Jason A. Donenfeld" 
> 
> The setup_data links are appended to the compressed kernel image. Since
> the kernel image is typically loaded at 0x10, setup_data lives at
> `0x10 + compressed_size`, which does not get relocated during the
> kernel's boot process.
> 
> The kernel typically decompresses the image starting at address
> 0x100 (note: there's one more zero there than the compressed image
> above). This usually is fine for most kernels.
> 
> However, if the compressed image is actually quite large, then
> setup_data will live at a `0x10 + compressed_size` that extends into
> the decompressed zone at 0x100. In other words, if compressed_size
> is larger than `0x100 - 0x10`, then the decompression step will
> clobber setup_data, resulting in crashes.
> 
> Visually, what happens now is that QEMU appends setup_data to the kernel
> image:
> 
>   kernel imagesetup_data
>|--|||
> 0x10  0x10+l1 0x10+l1+l2
> 
> The problem is that this decompresses to 0x100 (one more zero). So
> if l1 is > (0x100-0x10), then this winds up looking like:
> 
>   kernel imagesetup_data
>|--|||
> 0x10  0x10+l1 0x10+l1+l2
> 
>  d e c o m p r e s s e d   k e r n e l
>  
> |-|
> 0x100 
> 0x100+l3
> 
> The decompressed kernel seemingly overwriting the compressed kernel
> image isn't a problem, because that gets relocated to a higher address
> early on in the boot process, at the end of startup_64. setup_data,
> however, stays in the same place, since those links are self referential
> and nothing fixes them up.  So the decompressed kernel clobbers it.
> 
> Fix this by appending setup_data to the cmdline blob rather than the
> kernel image blob, which remains at a lower address that won't get
> clobbered.
> 
> This could have been done by overwriting the initrd blob instead, but
> that poses big difficulties, such as no longer being able to use memory
> mapped files for initrd, hurting performance, and, more importantly, the
> initrd address calculation is hard coded in qboot, and it always grows
> down rather than up, which means lots of brittle semantics would have to
> be changed around, incurring more complexity. In contrast, using cmdline
> is simple and doesn't interfere with anything.
> 
> The microvm machine has a gross hack where it fiddles with fw_cfg data
> after the fact. So this hack is updated to account for this appending,
> by reserving some bytes.
> 
> Fixup-by: Michael S. Tsirkin 
> Cc: x...@kernel.org
> Cc: Philippe Mathieu-Daudé 
> Cc: H. Peter Anvin 
> Cc: Borislav Petkov 
> Cc: Eric Biggers 
> Signed-off-by: Jason A. Donenfeld 
> Message-Id: <20221230220725.618763-1-ja...@zx2c4.com>
> Message-ID: <20230128061015-mutt-send-email-...@kernel.org>
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> Tested-by: Eric Biggers 
> Tested-by: Mathias Krause 

This one should wind up in the stable point release too. Dunno what the
procedure for that is.

Jason

Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data

2023-01-31 Thread Jason A. Donenfeld

On Tue, Jan 31, 2023, 15:55 H. Peter Anvin  wrote:

> On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin" 
> wrote:
> >From: "Jason A. Donenfeld" 
> >
> >The setup_data links are appended to the compressed kernel image. Since
> >the kernel image is typically loaded at 0x10, setup_data lives at
> >`0x10 + compressed_size`, which does not get relocated during the
> >kernel's boot process.
> >
> >The kernel typically decompresses the image starting at address
> >0x100 (note: there's one more zero there than the compressed image
> >above). This usually is fine for most kernels.
> >
> >However, if the compressed image is actually quite large, then
> >setup_data will live at a `0x10 + compressed_size` that extends into
> >the decompressed zone at 0x100. In other words, if compressed_size
> >is larger than `0x100 - 0x10`, then the decompression step will
> >clobber setup_data, resulting in crashes.
> >
> >Visually, what happens now is that QEMU appends setup_data to the kernel
> >image:
> >
> >  kernel imagesetup_data
> >   |--|||
> >0x10  0x10+l1 0x10+l1+l2
> >
> >The problem is that this decompresses to 0x100 (one more zero). So
> >if l1 is > (0x100-0x10), then this winds up looking like:
> >
> >  kernel imagesetup_data
> >   |--|||
> >0x10  0x10+l1 0x10+l1+l2
> >
> > d e c o m p r e s s e d   k e r n e l
> >
>  |-|
> >0x100
>  0x100+l3
> >
> >The decompressed kernel seemingly overwriting the compressed kernel
> >image isn't a problem, because that gets relocated to a higher address
> >early on in the boot process, at the end of startup_64. setup_data,
> >however, stays in the same place, since those links are self referential
> >and nothing fixes them up.  So the decompressed kernel clobbers it.
> >
> >Fix this by appending setup_data to the cmdline blob rather than the
> >kernel image blob, which remains at a lower address that won't get
> >clobbered.
> >
> >This could have been done by overwriting the initrd blob instead, but
> >that poses big difficulties, such as no longer being able to use memory
> >mapped files for initrd, hurting performance, and, more importantly, the
> >initrd address calculation is hard coded in qboot, and it always grows
> >down rather than up, which means lots of brittle semantics would have to
> >be changed around, incurring more complexity. In contrast, using cmdline
> >is simple and doesn't interfere with anything.
> >
> >The microvm machine has a gross hack where it fiddles with fw_cfg data
> >after the fact. So this hack is updated to account for this appending,
> >by reserving some bytes.
> >
> >Fixup-by: Michael S. Tsirkin 
> >Cc: x...@kernel.org
> >Cc: Philippe Mathieu-Daudé 
> >Cc: H. Peter Anvin 
> >Cc: Borislav Petkov 
> >Cc: Eric Biggers 
> >Signed-off-by: Jason A. Donenfeld 
> >Message-Id: <20221230220725.618763-1-ja...@zx2c4.com>
> >Message-ID: <20230128061015-mutt-send-email-...@kernel.org>
> >Reviewed-by: Michael S. Tsirkin 
> >Signed-off-by: Michael S. Tsirkin 
> >Tested-by: Eric Biggers 
> >Tested-by: Mathias Krause 
> >---
> > include/hw/i386/microvm.h |  5 ++--
> > include/hw/nvram/fw_cfg.h |  9 +++
> > hw/i386/microvm.c | 15 +++
> > hw/i386/x86.c | 52 +--
> > hw/nvram/fw_cfg.c |  9 +++
> > 5 files changed, 59 insertions(+), 31 deletions(-)
> >
> >diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
> >index fad97a891d..e8af61f194 100644
> >--- a/include/hw/i386/microvm.h
> >+++ b/include/hw/i386/microvm.h
> >@@ -50,8 +50,9 @@
> >  */
> >
> > /* Platform virtio definitions */
> >-#define VIRTIO_MMIO_BASE  0xfeb0
> >-#define VIRTIO_CMDLINE_MAXLEN 64
> >+#define VIRTIO_MMIO_BASE0xfeb0
> >+#define VIRTIO_CMDLINE_MAXLEN   64
> >+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN((VIRTIO_CMDLINE_MAXLEN + 1) *
> 16)
> >
> > #define GED_MMIO_BASE 0xfea0
> > #define GED_MMIO_BASE_MEMHP   (GED_MMIO_BASE + 0x100)
> >diff --git a/include/hw/nvram/fw_cfg.h b/include

Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data

2023-01-31 Thread Jason A. Donenfeld

On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote:
> From: "Jason A. Donenfeld" 
> 
> The setup_data links are appended to the compressed kernel image. Since
> the kernel image is typically loaded at 0x10, setup_data lives at
> `0x10 + compressed_size`, which does not get relocated during the
> kernel's boot process.
> 
> The kernel typically decompresses the image starting at address
> 0x100 (note: there's one more zero there than the compressed image
> above). This usually is fine for most kernels.
> 
> However, if the compressed image is actually quite large, then
> setup_data will live at a `0x10 + compressed_size` that extends into
> the decompressed zone at 0x100. In other words, if compressed_size
> is larger than `0x100 - 0x10`, then the decompression step will
> clobber setup_data, resulting in crashes.
> 
> Visually, what happens now is that QEMU appends setup_data to the kernel
> image:
> 
>   kernel imagesetup_data
>|--|||
> 0x10  0x10+l1 0x10+l1+l2
> 
> The problem is that this decompresses to 0x100 (one more zero). So
> if l1 is > (0x100-0x10), then this winds up looking like:
> 
>   kernel imagesetup_data
>|--|||
> 0x10  0x10+l1 0x10+l1+l2
> 
>  d e c o m p r e s s e d   k e r n e l
>  
> |-|
> 0x100 
> 0x100+l3
> 
> The decompressed kernel seemingly overwriting the compressed kernel
> image isn't a problem, because that gets relocated to a higher address
> early on in the boot process, at the end of startup_64. setup_data,
> however, stays in the same place, since those links are self referential
> and nothing fixes them up.  So the decompressed kernel clobbers it.
> 
> Fix this by appending setup_data to the cmdline blob rather than the
> kernel image blob, which remains at a lower address that won't get
> clobbered.
> 
> This could have been done by overwriting the initrd blob instead, but
> that poses big difficulties, such as no longer being able to use memory
> mapped files for initrd, hurting performance, and, more importantly, the
> initrd address calculation is hard coded in qboot, and it always grows
> down rather than up, which means lots of brittle semantics would have to
> be changed around, incurring more complexity. In contrast, using cmdline
> is simple and doesn't interfere with anything.
> 
> The microvm machine has a gross hack where it fiddles with fw_cfg data
> after the fact. So this hack is updated to account for this appending,
> by reserving some bytes.
> 
> Fixup-by: Michael S. Tsirkin 
> Cc: x...@kernel.org
> Cc: Philippe Mathieu-Daudé 
> Cc: H. Peter Anvin 
> Cc: Borislav Petkov 
> Cc: Eric Biggers 
> Signed-off-by: Jason A. Donenfeld 
> Message-Id: <20221230220725.618763-1-ja...@zx2c4.com>
> Message-ID: <20230128061015-mutt-send-email-...@kernel.org>
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> Tested-by: Eric Biggers 
> Tested-by: Mathias Krause 
> ---
>  include/hw/i386/microvm.h |  5 ++--
>  include/hw/nvram/fw_cfg.h |  9 +++
>  hw/i386/microvm.c | 15 +++
>  hw/i386/x86.c | 52 +--
>  hw/nvram/fw_cfg.c |  9 +++
>  5 files changed, 59 insertions(+), 31 deletions(-)

Cc: qemu-sta...@nongnu.org
Fixes: 67f7e426 ("hw/i386: pass RNG seed via setup_data entry")

Re: [PATCH] x86: fix q35 kernel measurements broken due to rng seeding

2023-02-01 Thread Jason A. Donenfeld

This is already fixed via the patch that MST just sent in his pull. So wait
a few days for that to be merged and it'll be all set.

No need for this patch here. Do not merge.

On Wed, Feb 1, 2023, 08:57 James Bottomley  wrote:

> The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG seed
> via setup_data entry") modifies the kernel image file to append a
> random seed.  Obviously this makes the hash of the kernel file
> non-deterministic and so breaks both measured and some signed boots.
> The commit notes it's only for non-EFI (because EFI has a different
> RNG seeding mechanism) so, since there are no non-EFI q35 systems, this
> should be disabled for the whole of the q35 machine type to bring back
> deterministic kernel file hashes.
>
> Obviously this still leaves the legacy bios case broken for at least
> measured boot, but I don't think anyone cares about that now.
>
> Signed-off-by: James Bottomley 
> ---
>  hw/i386/pc_q35.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 83c57c6eb1..11e8dd7ca7 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -357,6 +357,7 @@ static void pc_q35_machine_options(MachineClass *m)
>  pcmc->default_nic_model = "e1000e";
>  pcmc->pci_root_uid = 0;
>  pcmc->default_cpu_version = 1;
> +pcmc->legacy_no_rng_seed = true;
>
>  m->family = "pc_q35";
>  m->desc = "Standard PC (Q35 + ICH9, 2009)";
> @@ -394,9 +395,7 @@ DEFINE_Q35_MACHINE(v7_2, "pc-q35-7.2", NULL,
>
>  static void pc_q35_7_1_machine_options(MachineClass *m)
>  {
> -PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>  pc_q35_7_2_machine_options(m);
> -pcmc->legacy_no_rng_seed = true;
>  compat_props_add(m->compat_props, hw_compat_7_1,
> hw_compat_7_1_len);
>  compat_props_add(m->compat_props, pc_compat_7_1,
> pc_compat_7_1_len);
>  }
> --
> 2.35.3
>
>
>

Re: [PATCH] x86: fix q35 kernel measurements broken due to rng seeding

2023-02-01 Thread Jason A. Donenfeld

This patch is not needed. It is already fixed in a pending pull. Do not
merge.

On Wed, Feb 1, 2023, 09:57 James Bottomley  wrote:

> On Wed, 2023-02-01 at 14:35 +, Daniel P. Berrangé wrote:
> > On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote:
> > > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG
> > > seed
> > > via setup_data entry") modifies the kernel image file to append a
> > > random seed.  Obviously this makes the hash of the kernel file
> > > non-deterministic and so breaks both measured and some signed
> > > boots.
> >
> > I recall raising that at the time
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html
> >
> > and Jason pointed me to a followup which I tested and believe
> > fixed it for SEV:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html
> >
> > but it doesn't look like that second patch ever merged. We went
> > through so many patches I think it probably got obsoleted by
> > something else, and no one rechecked SEV again.
>
> The kernel file problem is a pretty huge one.  OVMF lays it down on an
> internal file system and without the second patch, it now contains
> random junk at the end.  Anything that hashes the whole file (which
> includes not only the measured direct boot but also grub signatures and
> probably other bootloader signing mechanisms) will have an issue.
>
> > > The commit notes it's only for non-EFI (because EFI has a different
> > > RNG seeding mechanism) so, since there are no non-EFI q35 systems,
> > > this should be disabled for the whole of the q35 machine type to
> > > bring back deterministic kernel file hashes.
> >
> > SeaBIOS is the default firmware for both q35 and i440fx. The
> > majority of systems using q35 will be non-EFI today, and that
> > is what the random seed was intended to address. I don't think
> > we can just disable this for the whole of q35.
> >
> > When you say it breaks measured / signed boots, I presume you
> > are specifically referring to SEV kernel hashes measurements ?
> > Or is there a more general problem to solve ?
>
> No it generally breaks measured/signed boots because it adds random
> junk to the kernel file.  The second patch will fix this if you apply
> it because setup data isn't measured or signed (yet ... however see the
> linux-coco debate about how it should be).
>
> I also note there was a v3 of the patch and considerable discussion
> saying it couldn't work:
>
> https://lore.kernel.org/qemu-devel/20220804230411.17720-1-ja...@zx2c4.com/
>
> Which is likely why it never went in ... although the discussion does
> seem to resolve towards the end.
>
> James
>
>

Re: [PATCH] x86: fix q35 kernel measurements broken due to rng seeding

2023-02-01 Thread Jason A. Donenfeld

It's not a secret, but I have so little internet right now that I can't
even load a webpage, and I'm on my phone, hence the short HTMLified emails.

In brief, though, it gets rid of all modifications to the kernel image all
together, so it should fix your issue.

On Wed, Feb 1, 2023, 10:24 James Bottomley  wrote:

> On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote:
> > This is already fixed via the patch that MST just sent in his pull.
> > So wait a few days for that to be merged and it'll be all set.
> >
> > No need for this patch here. Do not merge.
>
> If it's not a secret, would it be too much trouble to point to the
> branch so we can actually test it?  It does seem that the biggest
> problem this issue shows is that there wasn't wide enough configuration
> testing done on the prior commits before they were merged.
>
> James
>
>

Re: [PATCH] x86: fix q35 kernel measurements broken due to rng seeding

2023-02-01 Thread Jason A. Donenfeld

Hi James,

On Wed, Feb 1, 2023, 15:39 James Bottomley  wrote:

> On Wed, 2023-02-01 at 12:51 -0500, Jason A. Donenfeld wrote:
> > It's not a secret, but I have so little internet right now that I
> > can't even load a webpage, and I'm on my phone, hence the short
> > HTMLified emails.
> >
> > In brief, though, it gets rid of all modifications to the kernel
> > image all together, so it should fix your issue.
>
> We've already tested it and established it doesn't because you simply
> added your rng data to the end of a different integrity protected file
> which now fails the integrity check instead of the kernel.
>
> I checked the kernel source as well; I thought you'd have done the
> usual thing and bumped the boot protocol version to steal space in
> __pad9, but you didn't apparently.  To fix this up after the fact, I
> recommend that we still steal space in _pad9[] but we make it have
> enough space for a setup_data header as well as the 32 random bytes, so
> we've officially reserved the space, but in earlier kernels than this
> change gets to you can still use the setup_data_offset method, except
> that it now uses the empty space in _pad9 via the setup_data mechanism.
> That should find you space and get you out of having to expand any
> integrity protected files.  The SEV direct boot will still work because
> there's a check further down that doesn't copy the modified header back
> over the kernel because it is ignored on efi stub boot anyway.

Ahh, it looks like there's now an integrity check on the cmdline file. Darn.

The patch in that PULL is still good and necessary and fixed a *different*
bug, though. So we should still move forward on that.

But it sounds like you might now have a concrete suggestion on something
even better. I'm CCing hpa, as this is his wheelhouse, and maybe you two
can divise the next step while I'm away. Maybe the pad9 thing you mentioned
is the super nice solution we've been searching for this whole time. When
I'm home in 10 days and have internet again, I'll take a look at where
thing's are out and try to figure out how I can be productive again with it.

And sorry again for the short HTML emails. A day ago I was using mosh from
my phone to use mutt on a server to reply to emails downloaded from lore.
But today the cloud cover means the best I can do is queue these up in the
Android gmail client and hope they eventually send.

Jason

[PATCH v2] hw/openrisc: virt: pass random seed to fdt

2022-06-22 Thread Jason A. Donenfeld

If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This is confirmed to successfully initialize the
RNG on Linux 5.19-rc2.

Cc: Stafford Horne 
Signed-off-by: Jason A. Donenfeld 
---
Changes v1->v2:
- This is rebased on top of your "or1k-virt-2" branch.
- It makes the change to the new "virt" platform, since that's where it
  makes most sense to have.

 hw/openrisc/virt.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/openrisc/virt.c b/hw/openrisc/virt.c
index 13b0a1d7dc..f1d62fcb7d 100644
--- a/hw/openrisc/virt.c
+++ b/hw/openrisc/virt.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "qemu/guest-random.h"
 #include "qapi/error.h"
 #include "cpu.h"
 #include "exec/address-spaces.h"
@@ -141,6 +142,7 @@ static void openrisc_create_fdt(OR1KVirtState *state,
 void *fdt;
 int cpu;
 char *nodename;
+uint8_t rng_seed[32];
 
 fdt = state->fdt = create_device_tree(&state->fdt_size);
 if (!fdt) {
@@ -197,6 +199,10 @@ static void openrisc_create_fdt(OR1KVirtState *state,
 qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
 }
 
+/* Pass seed to RNG. */
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+qemu_fdt_setprop(fdt, "/chosen", "rng-seed", rng_seed, sizeof(rng_seed));
+
 /* Create aliases node for use by devices. */
 qemu_fdt_add_subnode(fdt, "/aliases");
 }
-- 
2.35.1

[PATCH] m68k: use correct variable name in boot info string macro

2022-06-25 Thread Jason A. Donenfeld

Every time this macro is used, the caller is passing in
"parameters_base", so this bug wasn't spotted. But the actual macro
variable name is "base", so use that instead.

Signed-off-by: Jason A. Donenfeld 
---
 hw/m68k/bootinfo.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/m68k/bootinfo.h b/hw/m68k/bootinfo.h
index adbf0c5521..ff4e155a3c 100644
--- a/hw/m68k/bootinfo.h
+++ b/hw/m68k/bootinfo.h
@@ -54,6 +54,6 @@
 stb_phys(as, base++, string[i]); \
 } \
 stb_phys(as, base++, 0); \
-base = (parameters_base + 1) & ~1; \
+base = (base + 1) & ~1; \
 } while (0)
 #endif
-- 
2.35.1

[PATCH qemu] m68k: virt: pass RNG seed via bootinfo block

2022-06-25 Thread Jason A. Donenfeld

This commit wires up bootinfo's RNG seed attribute so that Linux VMs can
have their RNG seeded from the earliest possible time in boot, just like
the "rng-seed" device tree property on those platforms. The link
contains the corresponding Linux patch.

Link: https://lore.kernel.org/lkml/20220625153841.143928-1-ja...@zx2c4.com/
Signed-off-by: Jason A. Donenfeld 
---
This requires this trivial cleanup commit first:
https://lore.kernel.org/qemu-devel/20220625152318.120849-1-ja...@zx2c4.com/

 hw/m68k/bootinfo.h   | 16 
 hw/m68k/virt.c   |  7 +++
 .../standard-headers/asm-m68k/bootinfo-virt.h|  1 +
 3 files changed, 24 insertions(+)

diff --git a/hw/m68k/bootinfo.h b/hw/m68k/bootinfo.h
index ff4e155a3c..2f31c13b6e 100644
--- a/hw/m68k/bootinfo.h
+++ b/hw/m68k/bootinfo.h
@@ -56,4 +56,20 @@
 stb_phys(as, base++, 0); \
 base = (base + 1) & ~1; \
 } while (0)
+
+#define BOOTINFODATA(as, base, id, data, len) \
+do { \
+int i; \
+stw_phys(as, base, id); \
+base += 2; \
+stw_phys(as, base, \
+ (sizeof(struct bi_record) + len + 5) & ~1); \
+base += 2; \
+stl_phys(as, base, len); \
+base += 4; \
+for (i = 0; i < len; ++i) { \
+stb_phys(as, base++, data[i]); \
+} \
+base = (base + 1) & ~1; \
+} while (0)
 #endif
diff --git a/hw/m68k/virt.c b/hw/m68k/virt.c
index e215aa3d42..0aa383fa6b 100644
--- a/hw/m68k/virt.c
+++ b/hw/m68k/virt.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
+#include "qemu/guest-random.h"
 #include "sysemu/sysemu.h"
 #include "cpu.h"
 #include "hw/boards.h"
@@ -120,6 +121,7 @@ static void virt_init(MachineState *machine)
 hwaddr io_base;
 int i;
 ResetInfo *reset_info;
+uint8_t rng_seed[32];
 
 if (ram_size > 3399672 * KiB) {
 /*
@@ -245,6 +247,11 @@ static void virt_init(MachineState *machine)
 kernel_cmdline);
 }
 
+   /* Pass seed to RNG. */
+   qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+   BOOTINFODATA(cs->as, parameters_base, BI_VIRT_RNG_SEED,
+rng_seed, sizeof(rng_seed));
+
 /* load initrd */
 if (initrd_filename) {
 initrd_size = get_image_size(initrd_filename);
diff --git a/include/standard-headers/asm-m68k/bootinfo-virt.h 
b/include/standard-headers/asm-m68k/bootinfo-virt.h
index 81be1e0924..1b1ffd4705 100644
--- a/include/standard-headers/asm-m68k/bootinfo-virt.h
+++ b/include/standard-headers/asm-m68k/bootinfo-virt.h
@@ -12,6 +12,7 @@
 #define BI_VIRT_GF_TTY_BASE0x8003
 #define BI_VIRT_VIRTIO_BASE0x8004
 #define BI_VIRT_CTRL_BASE  0x8005
+#define BI_VIRT_RNG_SEED   0x8006
 
 #define VIRT_BOOTI_VERSION MK_BI_VERSION(2, 0)
 
-- 
2.35.1

[PATCH v2] m68k: virt: pass RNG seed via bootinfo block

2022-06-26 Thread Jason A. Donenfeld

This commit wires up bootinfo's RNG seed attribute so that Linux VMs can
have their RNG seeded from the earliest possible time in boot, just like
the "rng-seed" device tree property on those platforms. The link
contains the corresponding Linux patch.

Link: https://lore.kernel.org/lkml/20220626111509.330159-1-ja...@zx2c4.com/
Based-on: <20220625152318.120849-1-ja...@zx2c4.com>
Reviewed-by: Laurent Vivier 
Signed-off-by: Jason A. Donenfeld 
---
 hw/m68k/bootinfo.h   | 16 
 hw/m68k/virt.c   |  7 +++
 .../standard-headers/asm-m68k/bootinfo-virt.h|  1 +
 3 files changed, 24 insertions(+)

diff --git a/hw/m68k/bootinfo.h b/hw/m68k/bootinfo.h
index ff4e155a3c..bd8b212fd3 100644
--- a/hw/m68k/bootinfo.h
+++ b/hw/m68k/bootinfo.h
@@ -56,4 +56,20 @@
 stb_phys(as, base++, 0); \
 base = (base + 1) & ~1; \
 } while (0)
+
+#define BOOTINFODATA(as, base, id, data, len) \
+do { \
+int i; \
+stw_phys(as, base, id); \
+base += 2; \
+stw_phys(as, base, \
+ (sizeof(struct bi_record) + len + 3) & ~1); \
+base += 2; \
+stw_phys(as, base, len); \
+base += 2; \
+for (i = 0; i < len; ++i) { \
+stb_phys(as, base++, data[i]); \
+} \
+base = (base + 1) & ~1; \
+} while (0)
 #endif
diff --git a/hw/m68k/virt.c b/hw/m68k/virt.c
index e215aa3d42..0aa383fa6b 100644
--- a/hw/m68k/virt.c
+++ b/hw/m68k/virt.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
+#include "qemu/guest-random.h"
 #include "sysemu/sysemu.h"
 #include "cpu.h"
 #include "hw/boards.h"
@@ -120,6 +121,7 @@ static void virt_init(MachineState *machine)
 hwaddr io_base;
 int i;
 ResetInfo *reset_info;
+uint8_t rng_seed[32];
 
 if (ram_size > 3399672 * KiB) {
 /*
@@ -245,6 +247,11 @@ static void virt_init(MachineState *machine)
 kernel_cmdline);
 }
 
+   /* Pass seed to RNG. */
+   qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+   BOOTINFODATA(cs->as, parameters_base, BI_VIRT_RNG_SEED,
+rng_seed, sizeof(rng_seed));
+
 /* load initrd */
 if (initrd_filename) {
 initrd_size = get_image_size(initrd_filename);
diff --git a/include/standard-headers/asm-m68k/bootinfo-virt.h 
b/include/standard-headers/asm-m68k/bootinfo-virt.h
index 81be1e0924..1b1ffd4705 100644
--- a/include/standard-headers/asm-m68k/bootinfo-virt.h
+++ b/include/standard-headers/asm-m68k/bootinfo-virt.h
@@ -12,6 +12,7 @@
 #define BI_VIRT_GF_TTY_BASE0x8003
 #define BI_VIRT_VIRTIO_BASE0x8004
 #define BI_VIRT_CTRL_BASE  0x8005
+#define BI_VIRT_RNG_SEED   0x8006
 
 #define VIRT_BOOTI_VERSION MK_BI_VERSION(2, 0)
 
-- 
2.35.1

[PATCH] hw/arm/virt: dt: add rng-seed property

2022-06-27 Thread Jason A. Donenfeld

In 60592cfed2 ("hw/arm/virt: dt: add kaslr-seed property"), the
kaslr-seed property was added, but the equally as important rng-seed
property was forgotten about, which has identical semantics for a
similar purpose. This commit implements it in exactly the same way as
kaslr-seed.

Cc: Peter Maydell 
Signed-off-by: Jason A. Donenfeld 
---
 hw/arm/virt.c | 40 
 include/hw/arm/virt.h |  1 +
 2 files changed, 41 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 097238faa7..8a3436a370 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -221,6 +221,16 @@ static bool cpu_type_valid(const char *cpu)
 return false;
 }
 
+static void create_rng_seed(MachineState *ms, const char *node)
+{
+uint8_t seed[32];
+
+if (qemu_guest_getrandom(&seed, sizeof(seed), NULL)) {
+return;
+}
+qemu_fdt_setprop(ms->fdt, node, "rng-seed", seed, sizeof(seed));
+}
+
 static void create_kaslr_seed(MachineState *ms, const char *node)
 {
 uint64_t seed;
@@ -251,6 +261,9 @@ static void create_fdt(VirtMachineState *vms)
 
 /* /chosen must exist for load_dtb to fill in necessary properties later */
 qemu_fdt_add_subnode(fdt, "/chosen");
+if (vms->dtb_rng_seed) {
+create_rng_seed(ms, "/chosen");
+}
 if (vms->dtb_kaslr_seed) {
 create_kaslr_seed(ms, "/chosen");
 }
@@ -260,6 +273,9 @@ static void create_fdt(VirtMachineState *vms)
 if (vms->dtb_kaslr_seed) {
 create_kaslr_seed(ms, "/secure-chosen");
 }
+if (vms->dtb_rng_seed) {
+create_rng_seed(ms, "/secure-chosen");
+}
 }
 
 /* Clock node, for the benefit of the UART. The kernel device tree
@@ -2348,6 +2364,20 @@ static void virt_set_its(Object *obj, bool value, Error 
**errp)
 vms->its = value;
 }
 
+static bool virt_get_dtb_rng_seed(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->dtb_rng_seed;
+}
+
+static void virt_set_dtb_rng_seed(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->dtb_rng_seed = value;
+}
+
 static bool virt_get_dtb_kaslr_seed(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2988,6 +3018,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable "
   "ITS instantiation");
 
+object_class_property_add_bool(oc, "dtb-rng-seed",
+   virt_get_dtb_rng_seed,
+   virt_set_dtb_rng_seed);
+object_class_property_set_description(oc, "dtb-rng-seed",
+  "Set off to disable passing of 
rng-seed "
+  "dtb node to guest");
+
 object_class_property_add_bool(oc, "dtb-kaslr-seed",
virt_get_dtb_kaslr_seed,
virt_set_dtb_kaslr_seed);
@@ -3061,6 +3098,9 @@ static void virt_instance_init(Object *obj)
 /* MTE is disabled by default.  */
 vms->mte = false;
 
+/* Supply a rng-seed by default */
+vms->dtb_rng_seed = true;
+
 /* Supply a kaslr-seed by default */
 vms->dtb_kaslr_seed = true;
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 15feabac63..cf652f1f3d 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -152,6 +152,7 @@ struct VirtMachineState {
 bool virt;
 bool ras;
 bool mte;
+bool dtb_rng_seed;
 bool dtb_kaslr_seed;
 OnOffAuto acpi;
 VirtGICType gic_version;
-- 
2.35.1

Re: [PATCH] hw/arm/virt: dt: add rng-seed property

2022-06-27 Thread Jason A. Donenfeld

On 6/27/22, Peter Maydell  wrote:
> On Mon, 27 Jun 2022 at 17:07, Jason A. Donenfeld  wrote:
>>
>> In 60592cfed2 ("hw/arm/virt: dt: add kaslr-seed property"), the
>> kaslr-seed property was added, but the equally as important rng-seed
>> property was forgotten about, which has identical semantics for a
>> similar purpose. This commit implements it in exactly the same way as
>> kaslr-seed.
>
> Not an objection, since if this is what the dtb spec says we need
> to provide then I guess we need to provide it, but:
> Why do we need to give the kernel two separate random seeds?
> Isn't one sufficient for the kernel to seed its RNG and generate
> whatever randomness it needs for whatever purposes it wants it?
>

Seems a bit silly to me too. `rng-seed` alone ought to be sufficient.
After the kernel calls add_bootloader_randomness() on it,
get_random_long() can be used for kaslr'ing and everything else too.
So I'm not sure what's up, but here we are. Maybe down the line I'll
look into the details and formulate a plan to remove `kaslr-seed` if
my supposition is correct.

Jason

Re: [PATCH] hw/arm/virt: dt: add rng-seed property

2022-06-28 Thread Jason A. Donenfeld

On 6/27/22, Jason A. Donenfeld  wrote:
> On 6/27/22, Peter Maydell  wrote:
>> On Mon, 27 Jun 2022 at 17:07, Jason A. Donenfeld  wrote:
>>>
>>> In 60592cfed2 ("hw/arm/virt: dt: add kaslr-seed property"), the
>>> kaslr-seed property was added, but the equally as important rng-seed
>>> property was forgotten about, which has identical semantics for a
>>> similar purpose. This commit implements it in exactly the same way as
>>> kaslr-seed.
>>
>> Not an objection, since if this is what the dtb spec says we need
>> to provide then I guess we need to provide it, but:
>> Why do we need to give the kernel two separate random seeds?
>> Isn't one sufficient for the kernel to seed its RNG and generate
>> whatever randomness it needs for whatever purposes it wants it?
>>
>
> Seems a bit silly to me too. `rng-seed` alone ought to be sufficient.
> After the kernel calls add_bootloader_randomness() on it,
> get_random_long() can be used for kaslr'ing and everything else too.
> So I'm not sure what's up, but here we are. Maybe down the line I'll
> look into the details and formulate a plan to remove `kaslr-seed` if
> my supposition is correct.
>
> Jason
>

Was wondering if you planned to queue this up?

Jason

Re: [PATCH] hw/arm/virt: dt: add rng-seed property

2022-06-29 Thread Jason A. Donenfeld

On Wed, Jun 29, 2022 at 11:18:23AM +0100, Alex Bennée wrote:
> 
> Peter Maydell  writes:
> 
> > On Tue, 28 Jun 2022 at 19:45, Jason A. Donenfeld  wrote:
> >>
> >> On 6/27/22, Jason A. Donenfeld  wrote:
> >> > On 6/27/22, Peter Maydell  wrote:
> >> >> On Mon, 27 Jun 2022 at 17:07, Jason A. Donenfeld  
> >> >> wrote:
> >> >>>
> >> >>> In 60592cfed2 ("hw/arm/virt: dt: add kaslr-seed property"), the
> >> >>> kaslr-seed property was added, but the equally as important rng-seed
> >> >>> property was forgotten about, which has identical semantics for a
> >> >>> similar purpose. This commit implements it in exactly the same way as
> >> >>> kaslr-seed.
> >> >>
> >> >> Not an objection, since if this is what the dtb spec says we need
> >> >> to provide then I guess we need to provide it, but:
> >> >> Why do we need to give the kernel two separate random seeds?
> >> >> Isn't one sufficient for the kernel to seed its RNG and generate
> >> >> whatever randomness it needs for whatever purposes it wants it?
> >> >>
> >> >
> >> > Seems a bit silly to me too. `rng-seed` alone ought to be sufficient.
> >> > After the kernel calls add_bootloader_randomness() on it,
> >> > get_random_long() can be used for kaslr'ing and everything else too.
> >> > So I'm not sure what's up, but here we are. Maybe down the line I'll
> >> > look into the details and formulate a plan to remove `kaslr-seed` if
> >> > my supposition is correct.
> 
> Sorry now I've had my coffee and read properly I see you are already
> aware of kaslr-seed. However my point about suppression would still
> stand because for the secure boot flow you need checksum-able DTBs.

Please read the patch. Maybe take a sip of coffee first. There's a knob
for this too.

The code is exactly the same for kaslr-seed and rng-seed. Everytime
there's some kaslr-seed thing, there is now the same rng-seed thing.

Jason

Re: [PATCH] hw/arm/virt: dt: add rng-seed property

2022-06-29 Thread Jason A. Donenfeld

Hi Alex,

On Wed, Jun 29, 2022 at 04:24:20PM +0100, Alex Bennée wrote:
> > The code is exactly the same for kaslr-seed and rng-seed. Everytime
> > there's some kaslr-seed thing, there is now the same rng-seed thing.
> 
> The duplication is annoying but specs are specs - where is this written
> by the way?

The same place as all the ordinary specs:
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/chosen.yaml

> Given the use case for the dtb-kaslr-seed knob I wonder if we should
> have a common property and deprecate the kaslr one? As of this patch
> existing workflows will break until command lines are updated to suppress
> the second source of randomness.
> 
> Maybe it would be better to have a single a new property
> (dtb-rng-seeds?) which suppresses both dtb entries and make
> dtb-kaslr-seed an alias and mark it as deprecated.

No, I don't think so. If anything, I'll try to get rid of kaslr-seed
upstream at some point if that makes sense. But until that happens --
that is, until I have the conversations with people who added these and
care about their semantics -- assume that there's granularity for some
good reason. No need to put the cart before the horse.

This is a simple patch doing a simple thing in exactly the way that
things are already being done. I really don't want to do much more than
that here. If you want to bikeshed it further, send a follow up patch.

Jason

Re: [PATCH] hw/arm/virt: dt: add rng-seed property

2022-06-30 Thread Jason A. Donenfeld

On Thu, Jun 30, 2022 at 10:15:29AM +0100, Peter Maydell wrote:
> On Wed, 29 Jun 2022 at 16:56, Jason A. Donenfeld  wrote:
> > On Wed, Jun 29, 2022 at 04:24:20PM +0100, Alex Bennée wrote:
> > > Given the use case for the dtb-kaslr-seed knob I wonder if we should
> > > have a common property and deprecate the kaslr one? As of this patch
> > > existing workflows will break until command lines are updated to suppress
> > > the second source of randomness.
> > >
> > > Maybe it would be better to have a single a new property
> > > (dtb-rng-seeds?) which suppresses both dtb entries and make
> > > dtb-kaslr-seed an alias and mark it as deprecated.
> >
> > No, I don't think so. If anything, I'll try to get rid of kaslr-seed
> > upstream at some point if that makes sense. But until that happens --
> > that is, until I have the conversations with people who added these and
> > care about their semantics -- assume that there's granularity for some
> > good reason. No need to put the cart before the horse.
> >
> > This is a simple patch doing a simple thing in exactly the way that
> > things are already being done. I really don't want to do much more than
> > that here. If you want to bikeshed it further, send a follow up patch.
> 
> It's adding a command line option, though. Those we have to get
> right the first time, because for QEMU they're kind of like ABI
> to our users. We *can* clean them up if we find we've made a mistake,
> but we have to go through a multi-release deprecation process to do it,
> so it's much less effort overall to make sure we have the command line
> syntax right to start with.
> 
> If there's a good use case for the two seeds to be separately
> controllable, that's fine. But I'd rather we find that out for
> certain before we put a second control knob and make all our
> users with workflows where they want non-random dtb blobs find
> out about it and flip it.

Okay. Do you want me to just make this controllable by dtb-kaslr-seed
for now, then, and we can rename that in a follow-up commit? I'll send a
patch for that.

Jason

[PATCH v2] hw/arm/virt: dt: add rng-seed property

2022-06-30 Thread Jason A. Donenfeld

In 60592cfed2 ("hw/arm/virt: dt: add kaslr-seed property"), the
kaslr-seed property was added, but the equally as important rng-seed
property was forgotten about, which has identical semantics for a
similar purpose. This commit implements it in exactly the same way as
kaslr-seed. It then changes the name of the disabling option to reflect
that this has more to do with randomness vs determinism, rather than
something particular about kaslr.

Cc: Peter Maydell 
Signed-off-by: Jason A. Donenfeld 
---
 docs/system/arm/virt.rst | 17 ++--
 hw/arm/virt.c| 44 
 include/hw/arm/virt.h|  2 +-
 3 files changed, 39 insertions(+), 24 deletions(-)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 3d1058a80c..3b6ba69a9a 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -126,13 +126,18 @@ ras
   Set ``on``/``off`` to enable/disable reporting host memory errors to a guest
   using ACPI and guest external abort exceptions. The default is off.
 
+dtb-randomness
+  Set ``on``/``off`` to pass random seeds via the guest DTB
+  rng-seed and kaslr-seed nodes (in both "/chosen" and
+  "/secure-chosen") to use for features like the random number
+  generator and address space randomisation. The default is
+  ``on``. You will want to disable it if your trusted boot chain
+  will verify the DTB it is passed, since this option causes the
+  DTB to be non-deterministic. It would be the responsibility of
+  the firmware to come up with a seed and pass it on if it wants to.
+
 dtb-kaslr-seed
-  Set ``on``/``off`` to pass a random seed via the guest dtb
-  kaslr-seed node (in both "/chosen" and /secure-chosen) to use
-  for features like address space randomisation. The default is
-  ``on``. You will want to disable it if your trusted boot chain will
-  verify the DTB it is passed. It would be the responsibility of the
-  firmware to come up with a seed and pass it on if it wants to.
+  A deprecated synonym for dtb-randomness.
 
 Linux guest kernel configuration
 """"""""""""""""""""""""""""""""
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 097238faa7..924ded7f85 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -221,14 +221,18 @@ static bool cpu_type_valid(const char *cpu)
 return false;
 }
 
-static void create_kaslr_seed(MachineState *ms, const char *node)
+static void create_randomness(MachineState *ms, const char *node)
 {
-uint64_t seed;
+struct {
+uint64_t kaslr;
+uint8_t rng[32];
+} seed;
 
 if (qemu_guest_getrandom(&seed, sizeof(seed), NULL)) {
 return;
 }
-qemu_fdt_setprop_u64(ms->fdt, node, "kaslr-seed", seed);
+qemu_fdt_setprop_u64(ms->fdt, node, "kaslr-seed", seed.kaslr);
+qemu_fdt_setprop(ms->fdt, node, "rng-seed", seed.rng, sizeof(seed.rng));
 }
 
 static void create_fdt(VirtMachineState *vms)
@@ -251,14 +255,14 @@ static void create_fdt(VirtMachineState *vms)
 
 /* /chosen must exist for load_dtb to fill in necessary properties later */
 qemu_fdt_add_subnode(fdt, "/chosen");
-if (vms->dtb_kaslr_seed) {
-create_kaslr_seed(ms, "/chosen");
+if (vms->dtb_randomness) {
+create_randomness(ms, "/chosen");
 }
 
 if (vms->secure) {
 qemu_fdt_add_subnode(fdt, "/secure-chosen");
-if (vms->dtb_kaslr_seed) {
-create_kaslr_seed(ms, "/secure-chosen");
+if (vms->dtb_randomness) {
+create_randomness(ms, "/secure-chosen");
 }
 }
 
@@ -2348,18 +2352,18 @@ static void virt_set_its(Object *obj, bool value, Error 
**errp)
 vms->its = value;
 }
 
-static bool virt_get_dtb_kaslr_seed(Object *obj, Error **errp)
+static bool virt_get_dtb_randomness(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
 
-return vms->dtb_kaslr_seed;
+return vms->dtb_randomness;
 }
 
-static void virt_set_dtb_kaslr_seed(Object *obj, bool value, Error **errp)
+static void virt_set_dtb_randomness(Object *obj, bool value, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
 
-vms->dtb_kaslr_seed = value;
+vms->dtb_randomness = value;
 }
 
 static char *virt_get_oem_id(Object *obj, Error **errp)
@@ -2988,12 +2992,18 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable "
   "ITS instantiation");
 
+object_class_property_add_bool(oc, "dtb-randomness",
+   virt_get_dtb_ra

[PATCH] hw/i386: pass RNG seed to e820 setup table

2022-06-30 Thread Jason A. Donenfeld

Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way, in this
case by the e820 setup table, which supplies a place for one. This
commit adds passing this random seed via the table. It is confirmed to
be working with the Linux patch in the link.

Link: https://lore.kernel.org/lkml/20220630113300.1892799-1-ja...@zx2c4.com/
Signed-off-by: Jason A. Donenfeld 
---
 hw/i386/x86.c| 19 ++-
 include/standard-headers/asm-x86/bootparam.h |  1 +
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 6003b4b2df..0724759eec 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -26,6 +26,7 @@
 #include "qemu/cutils.h"
 #include "qemu/units.h"
 #include "qemu/datadir.h"
+#include "qemu/guest-random.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qapi-visit-common.h"
@@ -1045,6 +1046,16 @@ void x86_load_linux(X86MachineState *x86ms,
 }
 fclose(f);
 
+setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
+kernel_size = setup_data_offset + sizeof(struct setup_data) + 32;
+kernel = g_realloc(kernel, kernel_size);
+stq_p(header + 0x250, prot_addr + setup_data_offset);
+setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data->next = 0;
+setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
+setup_data->len = cpu_to_le32(32);
+qemu_guest_getrandom_nofail(setup_data->data, 32);
+
 /* append dtb to kernel */
 if (dtb_filename) {
 if (protocol < 0x209) {
@@ -1059,13 +1070,11 @@ void x86_load_linux(X86MachineState *x86ms,
 exit(1);
 }
 
-setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
+kernel_size += sizeof(struct setup_data) + dtb_size;
 kernel = g_realloc(kernel, kernel_size);
 
-stq_p(header + 0x250, prot_addr + setup_data_offset);
-
-setup_data = (struct setup_data *)(kernel + setup_data_offset);
+setup_data->next = prot_addr + setup_data_offset + sizeof(*setup_data) 
+ setup_data->len;
+++setup_data;
 setup_data->next = 0;
 setup_data->type = cpu_to_le32(SETUP_DTB);
 setup_data->len = cpu_to_le32(dtb_size);
diff --git a/include/standard-headers/asm-x86/bootparam.h 
b/include/standard-headers/asm-x86/bootparam.h
index 072e2ed546..b8cb1fa313 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -10,6 +10,7 @@
 #define SETUP_EFI  4
 #define SETUP_APPLE_PROPERTIES 5
 #define SETUP_JAILHOUSE6
+#define SETUP_RNG_SEED 8
 
 #define SETUP_INDIRECT (1<<31)
 
-- 
2.35.1

Re: [PATCH v2] hw/arm/virt: dt: add rng-seed property

2022-07-04 Thread Jason A. Donenfeld

Hi Peter,

On Mon, Jul 04, 2022 at 03:42:55PM +0100, Peter Maydell wrote:
> We should also add a section to docs/about/deprecated.rst
> noting that the old name is deprecated in favour of the new one.
> I'll fold that in when I add the patch to target-arm.next, to
> save you doing a respin:

Thanks for doing that. Your text looks good to me.

Jason

Re: [PATCH v2 03/11] goldfish_rtc: Add endianness property

2022-07-04 Thread Jason A. Donenfeld

On Tue, Jul 05, 2022 at 05:40:31AM +0900, Stafford Horne wrote:
>   riscv{LE}--->goldfish_rtc{LE}
>   mips-longsoon3{LE}-->goldfish_rtc{LE}
>   openrisc{BE}>goldfish_rtc{LE} (LE to BE conversion done in 
> driver)
>   m68k{BE}>goldfish_rtc{BE} (only big-endian user)

I wish the powers that be would lighten up a little bit and let us
change m68k to be LE, and then we could avoid all this...

Just a last grumble, I guess.

Jason

Re: [PATCH] hw/riscv: virt: pass random seed to fdt

2022-07-04 Thread Jason A. Donenfeld

Hi Alistair,

On Wed, Jun 29, 2022 at 4:09 AM Alistair Francis  wrote:
> I have a Linux 5.8 test case that is failing due to this patch.

Before I started fixing things in random.c, there were a lot of early
boot bugs with the RNG in Linux. I backported the fixes for these to
all stable kernels. It's a bummer that risc-v got hit by these bugs,
but I think that's just the way things go unfortunately.

Jason

Re: [PATCH v7 2/2] drivers/virt: vmgenid: add vm generation id driver

2022-02-22 Thread Jason A. Donenfeld

Hi Adrian,

This thread seems to be long dead, but I couldn't figure out what
happened to the ideas in it. I'm specifically interested in this part:

On Wed, Feb 24, 2021 at 9:48 AM Adrian Catangiu  wrote:
> +static void vmgenid_acpi_notify(struct acpi_device *device, u32 event)
> +{
> +   uuid_t old_uuid;
> +
> +   if (!device || acpi_driver_data(device) != &vmgenid_data) {
> +   pr_err("VMGENID notify with unexpected driver private 
> data\n");
> +   return;
> +   }
> +
> +   /* update VM Generation UUID */
> +   old_uuid = vmgenid_data.uuid;
> +   memcpy_fromio(&vmgenid_data.uuid, vmgenid_data.uuid_iomap, 
> sizeof(uuid_t));
> +
> +   if (memcmp(&old_uuid, &vmgenid_data.uuid, sizeof(uuid_t))) {
> +   /* HW uuid updated */
> +   sysgenid_bump_generation();
> +   add_device_randomness(&vmgenid_data.uuid, sizeof(uuid_t));
> +   }
> +}

As Jann mentioned in an earlier email, we probably want this to
immediately reseed the crng, not just dump it into
add_device_randomness alone. But either way, the general idea seems
interesting to me. As far as I can tell, QEMU still supports this. Was
it not deemed to be sufficiently interesting?

Thanks,
Jason

Re: [PATCH v7 2/2] drivers/virt: vmgenid: add vm generation id driver

2022-02-22 Thread Jason A. Donenfeld

Hey again,

On Tue, Feb 22, 2022 at 10:24 PM Jason A. Donenfeld  wrote:
> This thread seems to be long dead, but I couldn't figure out what
> happened to the ideas in it. I'm specifically interested in this part:
>
> On Wed, Feb 24, 2021 at 9:48 AM Adrian Catangiu  wrote:
> > +static void vmgenid_acpi_notify(struct acpi_device *device, u32 event)
> > +{
> > +   uuid_t old_uuid;
> > +
> > +   if (!device || acpi_driver_data(device) != &vmgenid_data) {
> > +   pr_err("VMGENID notify with unexpected driver private 
> > data\n");
> > +   return;
> > +   }
> > +
> > +   /* update VM Generation UUID */
> > +   old_uuid = vmgenid_data.uuid;
> > +   memcpy_fromio(&vmgenid_data.uuid, vmgenid_data.uuid_iomap, 
> > sizeof(uuid_t));
> > +
> > +   if (memcmp(&old_uuid, &vmgenid_data.uuid, sizeof(uuid_t))) {
> > +   /* HW uuid updated */
> > +   sysgenid_bump_generation();
> > +   add_device_randomness(&vmgenid_data.uuid, sizeof(uuid_t));
> > +   }
> > +}
>
> As Jann mentioned in an earlier email, we probably want this to
> immediately reseed the crng, not just dump it into
> add_device_randomness alone. But either way, the general idea seems
> interesting to me. As far as I can tell, QEMU still supports this. Was
> it not deemed to be sufficiently interesting?
>
> Thanks,
> Jason

Well I cleaned up this v7 and refactored it into something along the
lines of what I'm thinking. I don't yet know enough about this general
problem space to propose the patch and I haven't tested it either, but
in case you're curious, something along the lines of what I'm thinking
about lives at 
https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/commit/?h=jd/vmgenid
if you (or somebody else) feels inclined to pick this up.

Looking forward to learning more from you in general, though, about
what the deal is with the VM gen ID, and if this is a real thing or
not.

Regards,
Jason

[PATCH RFC v1 0/2] VM fork detection for RNG

2022-02-23 Thread Jason A. Donenfeld

This small series picks up work from Amazon that seems to have stalled
out later year around this time: listening for the vmgenid ACPI
notification, and using it to "do something." Last year, that something
involved a complicated userspace mmap chardev, which seems frought with
difficulty. This year, I have something much simpler in mind: simply
using those ACPI notifications to tell the RNG to reinitialize safely,
so we don't repeat random numbers in cloned, forked, or rolled-back VM
instances.

This series consists of two patches. The first is a rather
straightforward addition to random.c, which I feel fine about. The
second patch is the reason this is just an RFC: it's a cleanup of the
ACPI driver from last year, and I don't really have much experience
writing, testing, debugging, or maintaining these types of drivers.
Ideally this thread would yield somebody saying, "I see the intent of
this; I'm happy to take over ownership of this part." That way, I can
focus on the RNG part, and whoever steps up for the paravirt ACPI part
can focus on that.

As a final note, this series intentionally does _not_ focus on
notification of these events to userspace or to other kernel consumers.
Since these VM fork detection events first need to hit the RNG, we can
later talk about what sorts of notifications or mmap'd counters the RNG
should be making accessible to elsewhere. But that's a different sort of
project and ties into a lot of more complicated concerns beyond this
more basic patchset. So hopefully we can keep the discussion rather
focused here to this ACPI business.

Cc: d...@amazon.co.uk
Cc: aca...@amazon.com
Cc: g...@amazon.com
Cc: colmm...@amazon.com
Cc: sbl...@amazon.com
Cc: raduw...@amazon.com
Cc: ja...@google.com
Cc: gre...@linuxfoundation.org
Cc: ty...@mit.edu

Jason A. Donenfeld (2):
  random: add mechanism for VM forks to reinitialize crng
  drivers/virt: add vmgenid driver for reinitializing RNG

 drivers/char/random.c  |  58 ++
 drivers/virt/Kconfig   |   8 +++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 133 +
 include/linux/random.h |   1 +
 5 files changed, 201 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

-- 
2.35.1

[PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Jason A. Donenfeld

When a VM forks, we must immediately mix in additional information to
the stream of random output so that two forks or a rollback don't
produce the same stream of random numbers, which could have catastrophic
cryptographic consequences. This commit adds a simple API, add_vmfork_
randomness(), for that.

Cc: Theodore Ts'o 
Cc: Jann Horn 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/char/random.c  | 58 ++
 include/linux/random.h |  1 +
 2 files changed, 59 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 536237a0f073..29d6ce484d15 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -344,6 +344,46 @@ static void crng_reseed(void)
}
 }
 
+/*
+ * This mixes unique_vm_id directly into the base_crng key as soon as
+ * possible, similarly to crng_pre_init_inject(), even if the crng is
+ * already running, in order to immediately branch streams from prior
+ * VM instances.
+ */
+static void crng_vm_fork_inject(const void *unique_vm_id, size_t len)
+{
+   unsigned long flags, next_gen;
+   struct blake2s_state hash;
+
+   /*
+* Unlike crng_reseed(), we take the lock as early as possible,
+* since we don't want the RNG to be used until it's updated.
+*/
+   spin_lock_irqsave(&base_crng.lock, flags);
+
+   /*
+* Also update the generation, while locked, as early as
+* possible. This will mean unlocked reads of the generation
+* will cause a reseeding of per-cpu crngs, and those will
+* spin on the base_crng lock waiting for the rest of this
+* operation to complete, which achieves the goal of blocking
+* the production of new output until this is done.
+*/
+   next_gen = base_crng.generation + 1;
+   if (next_gen == ULONG_MAX)
+   ++next_gen;
+   WRITE_ONCE(base_crng.generation, next_gen);
+   WRITE_ONCE(base_crng.birth, jiffies);
+
+   /* This is the same formulation used by crng_pre_init_inject(). */
+   blake2s_init(&hash, sizeof(base_crng.key));
+   blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
+   blake2s_update(&hash, unique_vm_id, len);
+   blake2s_final(&hash, base_crng.key);
+
+   spin_unlock_irqrestore(&base_crng.lock, flags);
+}
+
 /*
  * This generates a ChaCha block using the provided key, and then
  * immediately overwites that key with half the block. It returns
@@ -935,6 +975,7 @@ static bool drain_entropy(void *buf, size_t nbytes)
  * void add_hwgenerator_randomness(const void *buffer, size_t count,
  * size_t entropy);
  * void add_bootloader_randomness(const void *buf, size_t size);
+ * void add_vmfork_randomness(const void *unique_vm_id, size_t size);
  * void add_interrupt_randomness(int irq);
  *
  * add_device_randomness() adds data to the input pool that
@@ -966,6 +1007,11 @@ static bool drain_entropy(void *buf, size_t nbytes)
  * add_device_randomness(), depending on whether or not the configuration
  * option CONFIG_RANDOM_TRUST_BOOTLOADER is set.
  *
+ * add_vmfork_randomness() adds a unique (but not neccessarily secret) ID
+ * representing the current instance of a VM to the pool, without crediting,
+ * and then immediately mixes that ID into the current base_crng key, so
+ * that it takes effect prior to a reseeding.
+ *
  * add_interrupt_randomness() uses the interrupt timing as random
  * inputs to the entropy pool. Using the cycle counters and the irq source
  * as inputs, it feeds the input pool roughly once a second or after 64
@@ -1195,6 +1241,18 @@ void add_bootloader_randomness(const void *buf, size_t 
size)
 }
 EXPORT_SYMBOL_GPL(add_bootloader_randomness);
 
+/*
+ * Handle a new unique VM ID, which is unique, not secret, so we
+ * don't credit it, but we do mix it into the entropy pool and
+ * inject it into the crng.
+ */
+void add_vmfork_randomness(const void *unique_vm_id, size_t size)
+{
+   add_device_randomness(unique_vm_id, size);
+   crng_vm_fork_inject(unique_vm_id, size);
+}
+EXPORT_SYMBOL_GPL(add_vmfork_randomness);
+
 struct fast_pool {
union {
u32 pool32[4];
diff --git a/include/linux/random.h b/include/linux/random.h
index 6148b8d1ccf3..51b8ed797732 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,6 +34,7 @@ extern void add_input_randomness(unsigned int type, unsigned 
int code,
 extern void add_interrupt_randomness(int irq) __latent_entropy;
 extern void add_hwgenerator_randomness(const void *buffer, size_t count,
   size_t entropy);
+extern void add_vmfork_randomness(const void *unique_vm_id, size_t size);
 
 extern void get_random_bytes(void *buf, size_t nbytes);
 extern int wait_for_random_bytes(void);
-- 
2.35.1

[PATCH RFC v1 2/2] drivers/virt: add vmgenid driver for reinitializing RNG

2022-02-23 Thread Jason A. Donenfeld

VM Generation ID is a feature from Microsoft, described at
<https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
<https://aka.ms/win10rng>, as:

If the OS is running in a VM, there is a problem that most
hypervisors can snapshot the state of the machine and later rewind
the VM state to the saved state. This results in the machine running
a second time with the exact same RNG state, which leads to serious
security problems.  To reduce the window of vulnerability, Windows
10 on a Hyper-V VM will detect when the VM state is reset, retrieve
a unique (not random) value from the hypervisor, and reseed the root
RNG with that unique value.  This does not eliminate the
vulnerability, but it greatly reduces the time during which the RNG
system will produce the same outputs as it did during a previous
instantiation of the same VM state.

Linux has the same issue, and given that vmgenid is supported already by
multiple hypervisors, we can implement more or less the same solution.
So this commit wires up the vmgenid ACPI notification to the RNG's newly
added add_vmfork_randomness() function.

This driver builds on prior work from Adrian Catangiu at Amazon, and it
is my hope that that team can resume maintenance of this driver.

Cc: Adrian Catangiu 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/virt/Kconfig   |   8 +++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 133 +
 3 files changed, 142 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..e7f9c1bca02b 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -13,6 +13,14 @@ menuconfig VIRT_DRIVERS
 
 if VIRT_DRIVERS
 
+config VMGENID
+   bool "Virtual Machine Generation ID driver"
+   default y
+   depends on ACPI
+   help
+ Say Y here to use the hypervisor provided Virtual Machine Generation 
ID
+ to reseed the RNG when the VM is cloned.
+
 config FSL_HV_MANAGER
tristate "Freescale hypervisor management driver"
depends on FSL_SOC
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..108d0ffcc9aa 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -4,6 +4,7 @@
 #
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
+obj-$(CONFIG_VMGENID)  += vmgenid.o
 obj-y  += vboxguest/
 
 obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
new file mode 100644
index ..11ed7f668bb1
--- /dev/null
+++ b/drivers/virt/vmgenid.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtual Machine Generation ID driver
+ *
+ * Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
+ * Copyright (C) 2020 Amazon. All rights reserved.
+ * Copyright (C) 2018 Red Hat Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) "vmgenid: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DEV_NAME "vmgenid"
+ACPI_MODULE_NAME(DEV_NAME);
+
+static struct {
+   uuid_t uuid;
+   void *uuid_iomap;
+} vmgenid_data;
+
+static int vmgenid_acpi_map(acpi_handle handle)
+{
+   phys_addr_t phys_addr = 0;
+   struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+   acpi_status status;
+   union acpi_object *pss;
+   union acpi_object *element;
+   int i;
+
+   status = acpi_evaluate_object(handle, "ADDR", NULL, &buffer);
+   if (ACPI_FAILURE(status)) {
+   ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
+   return -ENODEV;
+   }
+   pss = buffer.pointer;
+   if (!pss || pss->type != ACPI_TYPE_PACKAGE || pss->package.count != 2)
+   return -EINVAL;
+
+   for (i = 0; i < pss->package.count; ++i) {
+   element = &pss->package.elements[i];
+   if (element->type != ACPI_TYPE_INTEGER)
+   return -EINVAL;
+   phys_addr |= element->integer.value << i * 32;
+   }
+
+   vmgenid_data.uuid_iomap = acpi_os_map_memory(phys_addr, 
sizeof(vmgenid_data.uuid));
+   if (!vmgenid_data.uuid_iomap) {
+   pr_err("failed to map memory at %pa, size %zu\n",
+   &phys_addr, sizeof(vmgenid_data.uuid));
+   return -ENOMEM;
+   }
+
+   memcpy_fromio(&vmgenid_data.uuid, vmgenid_data.uuid_iomap, 
sizeof(vmgenid_data.uuid));
+
+   return 0;
+}
+
+static int vmgenid_acpi_add(struct acpi_device *device)
+{
+   int ret;
+
+   if (!device)
+   return -EINVAL;
+   ret = vmgenid_acpi_map(device->handle);
+   if (ret < 0) {
+   pr_err("failed

Re: [PATCH v7 2/2] drivers/virt: vmgenid: add vm generation id driver

2022-02-23 Thread Jason A. Donenfeld

On Tue, Feb 22, 2022 at 11:17 PM Jason A. Donenfeld  wrote:
> Well I cleaned up this v7 and refactored it into something along the
> lines of what I'm thinking. I don't yet know enough about this general
> problem space to propose the patch and I haven't tested it either

A little further along, there's now this series:
https://lore.kernel.org/lkml/20220223131231.403386-1-ja...@zx2c4.com/T/
We can resume discussion there.

Jason

Re: [PATCH RFC v1 0/2] VM fork detection for RNG

2022-02-23 Thread Jason A. Donenfeld

On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld  wrote:
> second patch is the reason this is just an RFC: it's a cleanup of the
> ACPI driver from last year, and I don't really have much experience
> writing, testing, debugging, or maintaining these types of drivers.
> Ideally this thread would yield somebody saying, "I see the intent of
> this; I'm happy to take over ownership of this part." That way, I can
> focus on the RNG part, and whoever steps up for the paravirt ACPI part
> can focus on that.

I actually managed to test this in QEMU, and it seems to work quite well. Steps:

$ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
(qemu) savevm blah
(qemu) quit
$ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
(qemu) loadvm blah

Doing this successfully triggers the function to reinitialize the RNG
with the new GUID. (It appears there's a bug in QEMU which prevents
the GUID from being reinitialized when running `loadvm` without
quitting first; I suppose this should be discussed with QEMU
upstream.)

So that's very positive. But I would appreciate hearing from some
ACPI/Virt/Amazon people about this.

Jason

Re: [PATCH RFC v1 0/2] VM fork detection for RNG

2022-02-23 Thread Jason A. Donenfeld

On Wed, Feb 23, 2022 at 5:08 PM Jason A. Donenfeld  wrote:
>
> On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld  wrote:
> > second patch is the reason this is just an RFC: it's a cleanup of the
> > ACPI driver from last year, and I don't really have much experience
> > writing, testing, debugging, or maintaining these types of drivers.
> > Ideally this thread would yield somebody saying, "I see the intent of
> > this; I'm happy to take over ownership of this part." That way, I can
> > focus on the RNG part, and whoever steps up for the paravirt ACPI part
> > can focus on that.
>
> I actually managed to test this in QEMU, and it seems to work quite well. 
> Steps:
>
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) savevm blah
> (qemu) quit
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) loadvm blah
>
> Doing this successfully triggers the function to reinitialize the RNG
> with the new GUID. (It appears there's a bug in QEMU which prevents
> the GUID from being reinitialized when running `loadvm` without
> quitting first; I suppose this should be discussed with QEMU
> upstream.)
>
> So that's very positive. But I would appreciate hearing from some
> ACPI/Virt/Amazon people about this.

Because something something picture thousand words something, here's a
gif to see this working as expected:
https://data.zx2c4.com/vmgenid-appears-to-work.gif

Jason

Re: [PATCH RFC v1 2/2] drivers/virt: add vmgenid driver for reinitializing RNG

2022-02-23 Thread Jason A. Donenfeld

Adding the Hyper-V people to this:

On Wed, Feb 23, 2022 at 2:13 PM Jason A. Donenfeld  wrote:
>
> VM Generation ID is a feature from Microsoft, described at
> <https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
> Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
> <https://aka.ms/win10rng>, as:
>
> If the OS is running in a VM, there is a problem that most
> hypervisors can snapshot the state of the machine and later rewind
> the VM state to the saved state. This results in the machine running
> a second time with the exact same RNG state, which leads to serious
> security problems.  To reduce the window of vulnerability, Windows
> 10 on a Hyper-V VM will detect when the VM state is reset, retrieve
> a unique (not random) value from the hypervisor, and reseed the root
> RNG with that unique value.  This does not eliminate the
> vulnerability, but it greatly reduces the time during which the RNG
> system will produce the same outputs as it did during a previous
> instantiation of the same VM state.
>
> Linux has the same issue, and given that vmgenid is supported already by
> multiple hypervisors, we can implement more or less the same solution.
> So this commit wires up the vmgenid ACPI notification to the RNG's newly
> added add_vmfork_randomness() function.
>
> This driver builds on prior work from Adrian Catangiu at Amazon, and it
> is my hope that that team can resume maintenance of this driver.

If any of you have some experience with the Hyper-V side of this
protocol, could you take a look at this and see if it matches the way
this is supposed to work? It appears to work fine with QEMU's
behavior, at least, but I know Hyper-V has a lot of additional
complexities.

Thanks,
Jason

[PATCH v2 0/2] VM fork detection for RNG

2022-02-23 Thread Jason A. Donenfeld

This small series picks up work from Amazon that seems to have stalled
out last year around this time: listening for the vmgenid ACPI
notification, and using it to "do something." Last year, folks proposed
a complicated userspace mmap chardev, which was frought with
difficulty and evidently abandoned. This year, instead, I have something
much simpler in mind: simply using those ACPI notifications to tell the
RNG to reinitialize safely, so we don't repeat random numbers in cloned,
forked, or rolled-back VM instances.

This series consists of two patches. The first one adds the right hooks
into the actual RNG, and the second is a driver for the ACPI
notification.

I had posted an RFC v1 earlier today, thinking I really needed to
request comments, lacking much experience with ACPI drivers. But having
spent all day reworking this driver, and then testing and debugging it
in a variety of circumstances, I feel fairly confident that it works
well, so this is now the real thing. Please review! Here's a little
screencast showing it in action: 
https://data.zx2c4.com/vmgenid-appears-to-work.gif

As a side note, this series intentionally does _not_ focus on
notification of these events to userspace or to other kernel consumers.
Since these VM fork detection events first need to hit the RNG, we can
later talk about what sorts of notifications or mmap'd counters the RNG
should be making accessible to elsewhere. But that's a different sort of
project and ties into a lot of more complicated concerns beyond this
more basic patchset. So hopefully we can keep the discussion rather
focused here to this ACPI business.

Changes v1->v2:
- [Ard] Correct value of MODULE_LICENSE().
- [Ard] Use ordinary memory accesses instead of memcpy_fromio.
- [Ard] Make module a tristate and set MODULE_DEVICE_TABLE().
- [Ard] Free buffer after using.
- Use { } instead of { "", 0 }.
- Clean up interface into RNG.
- Minimize ACPI driver a bit.

In addition to the usual suspects, I'm CCing the original team from
Amazon who proposed this last year and the QEMU developers who added it
there, as well as the kernel Hyper-V maintainers, since this is
technically a Microsoft-proposed thing, though QEMU now implements it.

Cc: adr...@parity.io
Cc: d...@amazon.co.uk
Cc: g...@amazon.com
Cc: colmm...@amazon.com
Cc: raduw...@amazon.com
Cc: imamm...@redhat.com
Cc: ehabk...@redhat.com
Cc: b...@skyportsystems.com
Cc: m...@redhat.com
Cc: k...@microsoft.com
Cc: haiya...@microsoft.com
Cc: sthem...@microsoft.com
Cc: wei@kernel.org
Cc: de...@microsoft.com
Cc: li...@dominikbrodowski.net
Cc: a...@kernel.org
Cc: ja...@google.com
Cc: gre...@linuxfoundation.org
Cc: ty...@mit.edu

Jason A. Donenfeld (2):
  random: add mechanism for VM forks to reinitialize crng
  virt: vmgenid: introduce driver for reinitializing RNG on VM fork

 drivers/char/random.c  |  53 ++
 drivers/virt/Kconfig   |   9 
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 120 +
 include/linux/random.h |   1 +
 5 files changed, 184 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

-- 
2.35.1

[PATCH v2 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Jason A. Donenfeld

When a VM forks, we must immediately mix in additional information to
the stream of random output so that two forks or a rollback don't
produce the same stream of random numbers, which could have catastrophic
cryptographic consequences. This commit adds a simple API, add_vmfork_
randomness(), for that.

Cc: Dominik Brodowski 
Cc: Theodore Ts'o 
Cc: Jann Horn 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/char/random.c  | 53 ++
 include/linux/random.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 536237a0f073..95584f6646f9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -508,6 +508,40 @@ static size_t crng_pre_init_inject(const void *input, 
size_t len,
return len;
 }
 
+/*
+ * This mixes unique_vm_id directly into the base_crng key as soon as
+ * possible, similarly to crng_pre_init_inject(), even if the crng is
+ * already running, in order to immediately branch streams from prior
+ * VM instances.
+ */
+static void crng_vmfork_inject(const void *unique_vm_id, size_t len)
+{
+   unsigned long flags, next_gen;
+   struct blake2s_state hash;
+
+   spin_lock_irqsave(&base_crng.lock, flags);
+
+   /*
+* Update the generation, while locked, as early as possible
+* This will mean unlocked reads of the generation will
+* cause a reseeding of per-cpu crngs, and those will spin
+* on the base_crng lock waiting for the rest of this function
+* to complete, which achieves the goal of blocking the
+* production of new output until this is done.
+*/
+   next_gen = base_crng.generation + 1;
+   if (next_gen == ULONG_MAX)
+   ++next_gen;
+   WRITE_ONCE(base_crng.generation, next_gen);
+
+   blake2s_init(&hash, sizeof(base_crng.key));
+   blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
+   blake2s_update(&hash, unique_vm_id, len);
+   blake2s_final(&hash, base_crng.key);
+
+   spin_unlock_irqrestore(&base_crng.lock, flags);
+}
+
 static void _get_random_bytes(void *buf, size_t nbytes)
 {
u32 chacha_state[CHACHA_STATE_WORDS];
@@ -935,6 +969,7 @@ static bool drain_entropy(void *buf, size_t nbytes)
  * void add_hwgenerator_randomness(const void *buffer, size_t count,
  * size_t entropy);
  * void add_bootloader_randomness(const void *buf, size_t size);
+ * void add_vmfork_randomness(const void *unique_vm_id, size_t size);
  * void add_interrupt_randomness(int irq);
  *
  * add_device_randomness() adds data to the input pool that
@@ -966,6 +1001,11 @@ static bool drain_entropy(void *buf, size_t nbytes)
  * add_device_randomness(), depending on whether or not the configuration
  * option CONFIG_RANDOM_TRUST_BOOTLOADER is set.
  *
+ * add_vmfork_randomness() adds a unique (but not neccessarily secret) ID
+ * representing the current instance of a VM to the pool, without crediting,
+ * and then immediately mixes that ID into the current base_crng key, so
+ * that it takes effect prior to a reseeding.
+ *
  * add_interrupt_randomness() uses the interrupt timing as random
  * inputs to the entropy pool. Using the cycle counters and the irq source
  * as inputs, it feeds the input pool roughly once a second or after 64
@@ -1195,6 +1235,19 @@ void add_bootloader_randomness(const void *buf, size_t 
size)
 }
 EXPORT_SYMBOL_GPL(add_bootloader_randomness);
 
+/*
+ * Handle a new unique VM ID, which is unique, not secret, so we
+ * don't credit it, but we do mix it into the entropy pool and
+ * inject it into the crng.
+ */
+void add_vmfork_randomness(const void *unique_vm_id, size_t size)
+{
+   add_device_randomness(unique_vm_id, size);
+   if (crng_ready())
+   crng_vmfork_inject(unique_vm_id, size);
+}
+EXPORT_SYMBOL_GPL(add_vmfork_randomness);
+
 struct fast_pool {
union {
u32 pool32[4];
diff --git a/include/linux/random.h b/include/linux/random.h
index 6148b8d1ccf3..51b8ed797732 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,6 +34,7 @@ extern void add_input_randomness(unsigned int type, unsigned 
int code,
 extern void add_interrupt_randomness(int irq) __latent_entropy;
 extern void add_hwgenerator_randomness(const void *buffer, size_t count,
   size_t entropy);
+extern void add_vmfork_randomness(const void *unique_vm_id, size_t size);
 
 extern void get_random_bytes(void *buf, size_t nbytes);
 extern int wait_for_random_bytes(void);
-- 
2.35.1

[PATCH v2 2/2] virt: vmgenid: introduce driver for reinitializing RNG on VM fork

2022-02-23 Thread Jason A. Donenfeld

VM Generation ID is a feature from Microsoft, described at
<https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
<https://aka.ms/win10rng>, as:

If the OS is running in a VM, there is a problem that most
hypervisors can snapshot the state of the machine and later rewind
the VM state to the saved state. This results in the machine running
a second time with the exact same RNG state, which leads to serious
security problems.  To reduce the window of vulnerability, Windows
10 on a Hyper-V VM will detect when the VM state is reset, retrieve
a unique (not random) value from the hypervisor, and reseed the root
RNG with that unique value.  This does not eliminate the
vulnerability, but it greatly reduces the time during which the RNG
system will produce the same outputs as it did during a previous
instantiation of the same VM state.

Linux has the same issue, and given that vmgenid is supported already by
multiple hypervisors, we can implement more or less the same solution.
So this commit wires up the vmgenid ACPI notification to the RNG's newly
added add_vmfork_randomness() function.

It can be used from qemu via the `-device vmgenid,guid=auto` parameter.
After setting that, use `savevm` in the monitor to save the VM state,
then quit QEMU, start it again, and use `loadvm`. That will trigger this
driver's notify function, which hands the new UUID to the RNG.

This driver builds on prior work from Adrian Catangiu at Amazon, and it
is my hope that that team can resume maintenance of this driver.

Cc: Adrian Catangiu 
Cc: Dominik Brodowski 
Cc: Ard Biesheuvel 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/virt/Kconfig   |   9 
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 120 +
 3 files changed, 130 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..d3276dc2095c 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -13,6 +13,15 @@ menuconfig VIRT_DRIVERS
 
 if VIRT_DRIVERS
 
+config VMGENID
+   tristate "Virtual Machine Generation ID driver"
+   default y
+   depends on ACPI
+   help
+ Say Y here to use the hypervisor-provided Virtual Machine Generation 
ID
+ to reseed the RNG when the VM is cloned. This is highly recommended if
+ you intend to do any rollback / cloning / snapshotting of VMs.
+
 config FSL_HV_MANAGER
tristate "Freescale hypervisor management driver"
depends on FSL_SOC
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..108d0ffcc9aa 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -4,6 +4,7 @@
 #
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
+obj-$(CONFIG_VMGENID)  += vmgenid.o
 obj-y  += vboxguest/
 
 obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
new file mode 100644
index ..c2255ea6be59
--- /dev/null
+++ b/drivers/virt/vmgenid.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtual Machine Generation ID driver
+ *
+ * Copyright (C) 2022 Jason A. Donenfeld . All Rights 
Reserved.
+ * Copyright (C) 2020 Amazon. All rights reserved.
+ * Copyright (C) 2018 Red Hat Inc. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+ACPI_MODULE_NAME("vmgenid");
+
+static struct {
+   uuid_t this_uuid;
+   uuid_t *next_uuid;
+} state;
+
+static int vmgenid_acpi_add(struct acpi_device *device)
+{
+   struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER };
+   union acpi_object *pss;
+   phys_addr_t phys_addr;
+   acpi_status status;
+   int ret = 0;
+
+   if (!device)
+   return -EINVAL;
+
+   status = acpi_evaluate_object(device->handle, "ADDR", NULL, &buffer);
+   if (ACPI_FAILURE(status)) {
+   ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
+   return -ENODEV;
+   }
+   pss = buffer.pointer;
+   if (!pss || pss->type != ACPI_TYPE_PACKAGE || pss->package.count != 2 ||
+   pss->package.elements[0].type != ACPI_TYPE_INTEGER ||
+   pss->package.elements[1].type != ACPI_TYPE_INTEGER) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   phys_addr = (pss->package.elements[0].integer.value << 0) |
+   (pss->package.elements[1].integer.value << 32);
+   state.next_uuid = acpi_os_map_memory(phys_addr, 
sizeof(*state.next_uuid));
+   if (!state.next_uuid) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   state.this_uuid = *state.next_uuid;
+   device->driver_data = &

Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Jason A. Donenfeld

On 2/24/22, Eric Biggers  wrote:
> I think we should be removing cases where the base_crng key is changed
> directly
> besides extraction from the input_pool, not adding new ones.  Why not
> implement
> this as add_device_randomness() followed by crng_reseed(force=true), where
> the
> 'force' argument forces a reseed to occur even if the entropy_count is too
> low?

Because that induces a "premature next" condition which can let that
entropy, potentially newly acquired by a storm of IRQs at power-on, be
bruteforced by unprivileged userspace. I actually had it exactly the
way you describe at first, but decided that this here is the lesser of
evils and doesn't really complicate things the way an intentional
premature next would. The only thing we care about here is branching
the crng stream, and so this does explicitly that, without having to
interfere with how we collect entropy. Of course we *also* add it as
non-credited "device randomness" so that it's part of the next
reseeding, whenever that might occur.

Re: [PATCH RFC v1 0/2] VM fork detection for RNG

2022-02-24 Thread Jason A. Donenfeld

Hi Lazlo,

Thanks for your reply.

On Thu, Feb 24, 2022 at 9:23 AM Laszlo Ersek  wrote:
> QEMU's related design is documented in
> .

I'll link to this document on the 2/2 patch next to the other ones.

> "they can also use the data provided in the 128-bit identifier as a high
> entropy random data source"
>
> So reinitializing an RNG from it is an express purpose.

It seems like this is indeed meant to be used for RNG purposes, but
the Windows 10 RNG document says: "Windows 10 on a Hyper-V VM will
detect when the VM state is reset, retrieve a unique (not random)
value from the hypervisor." I gather from that that it's not totally
clear what the "quality" of those 128 bits are. So this patchset mixes
them into the entropy pool, but does not credit it, which is
consistent with how the RNG deals with other data where the conclusion
is, "probably pretty good but maybe not," erring on the side of
caution. Either way, it's certainly being used -- and combined with
what was there before -- to reinitialize the RNG following a VM fork.

>
> More info in the libvirt docs (see "genid"):
>
> https://libvirt.org/formatdomain.html#general-metadata

Thanks, noted in the 2/2 patch too.

> QEMU's interpretation of the VMGENID specifically as a UUID (which I
> believe comes from me) has received (valid) criticism since:
>
> https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt
>
> (This document also investigates VMGENID on other hypervisors, which I
> think pertains to your other message.)

Thank you very much for this reference! You're absolutely right here.
v3 will treat this as just an opaque 128-bit binary blob. There's no
point, anyway, in treating it as a UUID in the kernel, since it never
should be printed or exposed to anywhere except random.c (and my gifs,
of course :-P).

>
> > (It appears there's a bug in QEMU which prevents
> > the GUID from being reinitialized when running `loadvm` without
> > quitting first; I suppose this should be discussed with QEMU
> > upstream.)
>
> That's not (necessarily) a bug; see the end of the above-linked QEMU
> document:
>
> "There are no known use cases for changing the GUID once QEMU is
> running, and adding this capability would greatly increase the complexity."

I read that, and I think I might disagree? If you're QEMUing with the
monitor and are jumping back and forth and all around between saved
snapshots, probably those snapshots should have their RNG
reinitialized through this mechanism, right? It seems like doing that
would be the proper behavior for `guid=auto`, but not for
`guid={some-fixed-thing}`.

> > So that's very positive. But I would appreciate hearing from some
> > ACPI/Virt/Amazon people about this.
>
> I've only made some random comments; I didn't see a question so I
> couldn't attempt to answer :)

"Am I on the right track," I guess, and your reply has been very
informative. Thanks for your feedback. I'll have a v3 sent out not
before long.

Jason

Re: [PATCH RFC v1 0/2] VM fork detection for RNG

2022-02-24 Thread Jason A. Donenfeld

Hi Alex,

Strangely your message never made it to me, and I had to pull this out
of Lore after seeing Daniel's reply to it. I wonder what's up.

On Thu, Feb 24, 2022 at 09:53:59AM +0100, Alexander Graf wrote:
> The main problem with VMGenID is that it is inherently racy. There will 
> always be a (short) amount of time where the ACPI notification is not 
> processed, but the VM could use its RNG to for example establish TLS 
> connections.
> 
> Hence we as the next step proposed a multi-stage quiesce/resume 
> mechanism where the system is aware that it is going into suspend - can 
> block network connections for example - and only returns to a fully 
> functional state after an unquiesce phase:
> 
>    https://github.com/systemd/systemd/issues/20222
> 
> Looking at the issue again, it seems like we completely missed to follow 
> up with a PR to implement that functionality :(.
> 
> What exact use case do you have in mind for the RNG/VMGenID update? Can 
> you think of situations where the race is not an actual concern?

No, I think the race is something that remains a problem for the
situations I care about. There are simpler ways of fixing that -- just
expose a single incrementing integer so that it can be checked every
time the RNG does something, without being expensive, via the same
mechanism -- and then you don't need any complexity. But anyway, that
doesn't exist right now, so this series tries to implement something for
what does exist and is already supported by multiple hypervisors. I'd
suggest sending a proposal for an improved mechanism as part of a
different thread, and pull the various parties into that, and we can
make something good for the future. I'm happy to implement whatever the
virtual hardware exposes.

Jason

Re: [PATCH RFC v1 0/2] VM fork detection for RNG

2022-02-24 Thread Jason A. Donenfeld

On Thu, Feb 24, 2022 at 11:56 AM Daniel P. Berrangé  wrote:
> IIRC this part of the QEMU doc was making an implicit assumption
> about the way QEMU is to be used by mgmt apps doing snapshots.
>
> Instead of using the 'loadvm' command on the existing running QEMU
> process, the doc seems to tacitly expect the management app will
> throwaway the existing QEMU process and spawn a brand new QEMU
> process to load the snapshot into, thus getting the new GUID on
> the QEMU command line.

Right, exactly. The "there are no known use cases" bit I think just
forgot about one very common use case that perhaps just wasn't in use
by the original author. So I'm pretty sure this remains a QEMU bug.

Jason

Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-24 Thread Jason A. Donenfeld

Hey Eric,

On Thu, Feb 24, 2022 at 2:27 AM Eric Biggers  wrote:
>
> On Thu, Feb 24, 2022 at 01:54:54AM +0100, Jason A. Donenfeld wrote:
> > On 2/24/22, Eric Biggers  wrote:
> > > I think we should be removing cases where the base_crng key is changed
> > > directly
> > > besides extraction from the input_pool, not adding new ones.  Why not
> > > implement
> > > this as add_device_randomness() followed by crng_reseed(force=true), where
> > > the
> > > 'force' argument forces a reseed to occur even if the entropy_count is too
> > > low?
> >
> > Because that induces a "premature next" condition which can let that
> > entropy, potentially newly acquired by a storm of IRQs at power-on, be
> > bruteforced by unprivileged userspace. I actually had it exactly the
> > way you describe at first, but decided that this here is the lesser of
> > evils and doesn't really complicate things the way an intentional
> > premature next would. The only thing we care about here is branching
> > the crng stream, and so this does explicitly that, without having to
> > interfere with how we collect entropy. Of course we *also* add it as
> > non-credited "device randomness" so that it's part of the next
> > reseeding, whenever that might occur.
>
> Can you make sure to properly explain this in the code?

The carousel keeps turning, and after I wrote to you last night I kept
thinking about the matter. Here's how it breaks down:

Injection method:
- Assumes existing pool of entropy is still sacred.
- Assumes base_crng timestamp is representative of pool age.
- Result: Mixes directly into base_crng to avoid premature-next of pool.

Input pool method:
- Assumes existing pool of entropy is old / out of date / used by a
different fork, so not sacred.
- Assumes base_crng timestamp is tied to irrelevant entropy pool.
- Result: Force-drains input pool, causing intentional premature-next.

Which of these assumptions better models the situation? I started in
the input pool method camp, then by the time I posted v1, was
concerned about power-on IRQs, but now I think relying at all on
snapshotted entropy represents the biggest issue. And judging from
your email, it appears that you do too. So v3 of this patchset will
switch back to the input pool method, per your suggestion.

As a plus, it means we go through the RDSEED'd extraction algorithm
too, which always helps.

Jason

[PATCH v3 0/2] VM fork detection for RNG

2022-02-24 Thread Jason A. Donenfeld

This small series picks up work from Amazon that seems to have stalled
out last year around this time: listening for the vmgenid ACPI
notification, and using it to "do something." Last year, folks proposed
a complicated userspace mmap chardev, which was frought with difficulty
and evidently abandoned. This year, instead, I have something much
simpler in mind: simply using those ACPI notifications to tell the RNG
to reinitialize safely, so we don't repeat random numbers in cloned,
forked, or rolled-back VM instances.

This series consists of two patches. The first one adds the right hooks
into the actual RNG, and the second is a driver for the ACPI notification.

Here's a little screencast showing it in action: 
https://data.zx2c4.com/vmgenid-appears-to-work.gif

As a side note, this series intentionally does _not_ focus on
notification of these events to userspace or to other kernel consumers.
Since these VM fork detection events first need to hit the RNG, we can
later talk about what sorts of notifications or mmap'd counters the RNG
should be making accessible to elsewhere. But that's a different sort of
project and ties into a lot of more complicated concerns beyond this
more basic patchset. So hopefully we can keep the discussion rather
focused here to this ACPI business.

Changes v2->v3:
- [Eric] Always put generation ID through the input pool, and then re-extract.
- [Lazlo] The ACPI bytes are just an opaque binary blob, rather than a real 
UUID.
Changes v1->v2:
- [Ard] Correct value of MODULE_LICENSE().
- [Ard] Use ordinary memory accesses instead of memcpy_fromio.
- [Ard] Make module a tristate and set MODULE_DEVICE_TABLE().
- [Ard] Free buffer after using.
- Use { } instead of { "", 0 }.
- Clean up interface into RNG.
- Minimize ACPI driver a bit.

In addition to the usual suspects, I'm CCing the original team from
Amazon who proposed this last year and the QEMU developers who added it
there, as well as the kernel Hyper-V maintainers, since this is
technically a Microsoft-proposed thing, though QEMU now implements it.

Cc: adr...@parity.io
Cc: d...@amazon.co.uk
Cc: g...@amazon.com
Cc: colmm...@amazon.com
Cc: raduw...@amazon.com
Cc: berra...@redhat.com
Cc: ler...@redhat.com
Cc: imamm...@redhat.com
Cc: ehabk...@redhat.com
Cc: b...@skyportsystems.com
Cc: m...@redhat.com
Cc: k...@microsoft.com
Cc: haiya...@microsoft.com
Cc: sthem...@microsoft.com
Cc: wei@kernel.org
Cc: de...@microsoft.com
Cc: li...@dominikbrodowski.net
Cc: ebigg...@kernel.org
Cc: a...@kernel.org
Cc: ja...@google.com
Cc: gre...@linuxfoundation.org
Cc: ty...@mit.edu

Jason A. Donenfeld (2):
  random: add mechanism for VM forks to reinitialize crng
  virt: vmgenid: introduce driver for reinitializing RNG on VM fork

 drivers/char/random.c  |  50 -
 drivers/virt/Kconfig   |   9 +++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 121 +
 include/linux/random.h |   1 +
 5 files changed, 167 insertions(+), 15 deletions(-)
 create mode 100644 drivers/virt/vmgenid.c

-- 
2.35.1

1 2 3 4 5 >

1 - 100 of 418 matches

Mail list logo