Re: [PATCH v1 6/6] x86: remove memory hotplug support on X86_32
On 07.10.21 11:15, Oscar Salvador wrote: On Wed, Sep 29, 2021 at 04:36:00PM +0200, David Hildenbrand wrote: CONFIG_MEMORY_HOTPLUG was marked BROKEN over one year and we just restricted it to 64 bit. Let's remove the unused x86 32bit implementation and simplify the Kconfig. Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador Thanks for the review Oscar! -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] virtio: write back F_VERSION_1 before validate
On Wed, Oct 06 2021, Halil Pasic wrote: > The virtio specification virtio-v1.1-cs01 states: Transitional devices > MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not > been acknowledged by the driver. This is exactly what QEMU as of 6.1 > has done relying solely on VIRTIO_F_VERSION_1 for detecting that. > > However, the specification also says: driver MAY read (but MUST NOT > write) the device-specific configuration fields to check that it can > support the device before setting FEATURES_OK. Suggest to put the citations from the spec into quotes, so that they are distinguishable from the rest of the text. > > In that case, any transitional device relying solely on > VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in > legacy format. In particular, this implies that it is in big endian > format for big endian guests. This naturally confuses the driver which > expects little endian in the modern mode. > > It is probably a good idea to amend the spec to clarify that > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation > is complete. However, we already have regression so let's try to address s/regression/a regression/ > it. Maybe mention what the regression is? Also mention that we use this workaround for modern on BE only? > > Signed-off-by: Halil Pasic > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config > space") > Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range") > Reported-by: mark...@us.ibm.com > --- > drivers/virtio/virtio.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index 0a5b54034d4b..494cfecd3376 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -239,6 +239,16 @@ static int virtio_dev_probe(struct device *_d) > driver_features_legacy = driver_features; > } > > + /* > + * Some devices detect legacy solely via F_VERSION_1. Write > + * F_VERSION_1 to force LE for these when needed. "...to force LE config space accesses before FEATURES_OK for these when needed (BE)." ? > + */ > + if (drv->validate && !virtio_legacy_is_little_endian() > + && BIT_ULL(VIRTIO_F_VERSION_1) & device_features) { Nit: putting device_features first would read more naturally to me. > + dev->features = BIT_ULL(VIRTIO_F_VERSION_1); > + dev->config->finalize_features(dev); > + } > + > if (device_features & (1ULL << VIRTIO_F_VERSION_1)) > dev->features = driver_features & device_features; > else Patch LGTM. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio-net: kernel panic in virtio_net.c
On Thu, Oct 07, 2021 at 02:04:22PM +0200, Corentin Noël wrote: > I've been experiencing crashes with 5.14-rc1 and above that do not > occur with 5.13, > > here is the crash trace: > > [ 61.346677] skbuff: skb_over_panic: text:881ae2c7 len:3762 > put:3762 head:8a5ec8c22000 data:8a5ec8c22010 tail:0xec2 > end:0xec0 dev: > [ 61.369192] kernel BUG at net/core/skbuff.c:111! > [ 61.372840] invalid opcode: [#1] SMP PTI > [ 61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0- > rc1linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1 > [ 61.376450] Hardware name: ChromiumOS crosvm, BIOS 0 > [ 61.377222] RIP: 0010:skb_panic+0x43/0x45 > [ 61.377833] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 > ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 18 f1 cf 88 e8 6a 43 fb > ff <0f> 0b 48 8b 14 24 48 c7 c1 20 35 b1 88 e8 ab ff ff ff 48 c7 c6 60 > [ 61.380566] RSP: 0018:ae258017cce0 EFLAGS: 00010246 > [ 61.381267] RAX: 008b RBX: 0010 RCX: > dfff > [ 61.382246] RDX: RSI: ffea RDI: > > [ 61.383376] RBP: de6a80230880 R08: 88f45568 R09: > 9ffb > [ 61.384494] R10: e000 R11: 3fff R12: > 8a5ec7461200 > [ 61.385696] R13: 8a5ec8c22000 R14: R15: > 0eb2 > [ 61.386825] FS: () GS:8a5febd4() > knlGS: > [ 61.388055] CS: 0010 DS: ES: CR0: 80050033 > [ 61.389221] CR2: 0148a060 CR3: 00011ae0e005 CR4: > 00370ee0 > [ 61.390871] DR0: DR1: DR2: > > [ 61.392335] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 61.393635] Call Trace: > [ 61.394127] > [ 61.394488] skb_put.cold+0x10/0x10 > [ 61.395095] page_to_skb+0xf7/0x410 > [ 61.395689] receive_buf+0x81/0x1660 > [ 61.396228] ? netif_receive_skb_list_internal+0x1ad/0x2b0 > [ 61.397180] ? napi_gro_flush+0x97/0xe0 > [ 61.397896] ? detach_buf_split+0x67/0x120 > [ 61.398573] virtnet_poll+0x2cf/0x420 > [ 61.399197] __napi_poll+0x25/0x150 > [ 61.399764] net_rx_action+0x22f/0x280 > [ 61.400394] __do_softirq+0xba/0x257 > [ 61.401012] irq_exit_rcu+0x8e/0xb0 > [ 61.401618] common_interrupt+0x7b/0xa0 > [ 61.402270] > [ 61.402620] asm_common_interrupt+0x1e/0x40 > [ 61.403302] RIP: 0010:default_idle+0xb/0x10 > [ 61.404018] Code: 8b 04 25 00 6d 01 00 f0 80 60 02 df c3 0f ae f0 0f > ae 38 0f ae f0 eb b9 0f 1f 80 00 00 00 00 eb 07 0f 00 2d df 3e 44 00 fb > f4 cc cc cc cc 65 8b 15 31 2f a4 77 89 d2 48 8b 05 d0 a1 0c 01 48 > [ 61.407636] RSP: 0018:ae258008fef8 EFLAGS: 0202 > [ 61.408394] RAX: 885ce620 RBX: 0005 RCX: > 8a5febd56f80 > [ 61.409451] RDX: 00c1ec32 RSI: 7ff1b7a1e726 RDI: > 8a5febd5dd00 > [ 61.410530] RBP: 8a5fc01f8000 R08: 00c1ec32 R09: > > [ 61.411715] R10: 0006 R11: 0002 R12: > > [ 61.412984] R13: R14: R15: > > [ 61.414183] ? mwait_idle+0x70/0x70 > [ 61.414805] ? mwait_idle+0x70/0x70 > [ 61.415592] default_idle_call+0x2a/0xa0 > [ 61.416216] do_idle+0x1e8/0x250 > [ 61.416722] cpu_startup_entry+0x14/0x20 > [ 61.417347] secondary_startup_64_no_verify+0xc2/0xcb > [ 61.418144] Modules linked in: > [ 61.418622] ---[ end trace 3741c3e580a52bbd ]--- > [ 61.419399] RIP: 0010:skb_panic+0x43/0x45 > [ 61.420054] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 > ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 18 f1 cf 88 e8 6a 43 fb > ff <0f> 0b 48 8b 14 24 48 c7 c1 20 35 b1 88 e8 ab ff ff ff 48 c7 c6 60 > [ 61.422606] RSP: 0018:ae258017cce0 EFLAGS: 00010246 > [ 61.423865] RAX: 008b RBX: 0010 RCX: > dfff > [ 61.425031] RDX: RSI: ffea RDI: > > [ 61.426229] RBP: de6a80230880 R08: 88f45568 R09: > 9ffb > [ 61.427439] R10: e000 R11: 3fff R12: > 8a5ec7461200 > [ 61.428615] R13: 8a5ec8c22000 R14: R15: > 0eb2 > [ 61.429799] FS: () GS:8a5febd4() > knlGS: > [ 61.431048] CS: 0010 DS: ES: CR0: 80050033 > [ 61.431997] CR2: 0148a060 CR3: 00011ae0e005 CR4: > 00370ee0 > [ 61.433206] DR0: DR1: DR2: > > [ 61.434502] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 61.435799] Kernel panic - not syncing: Fatal exception in interrupt > [ 61.439250] Kernel Offset: 0x6a0 from 0x8100 > (relocation range: 0x8000-0xbfff) > > Here is my kernel config: > https://gitlab.freedesktop.org/tintou/mes
Re: virtio-net: kernel panic in virtio_net.c
On Thu, Oct 07, 2021 at 04:02:10PM +0200, Corentin Noël wrote: > Le jeudi 07 octobre 2021 à 06:51 -0700, Eric Dumazet a écrit : > > On Thu, Oct 7, 2021 at 6:11 AM Michael S. Tsirkin > > wrote: > > > On Thu, Oct 07, 2021 at 02:04:22PM +0200, Corentin Noël wrote: > > > > I've been experiencing crashes with 5.14-rc1 and above that do > > > > not > > > > occur with 5.13, > > > > What about 5.14 ? > > > > 5.14-rc1 has many bugs we do not want to spend time rediscovering > > them... > > > > I've tested on 5.14, 5.15-rc4 and 5.15-rc4 with latest netdev and could > reproduce the crash on them all. Great, any chance you can use 'git bisect' to find the offending commit? thanks, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] virtio: write back F_VERSION_1 before validate
On Thu, 07 Oct 2021 13:52:24 +0200 Cornelia Huck wrote: > On Wed, Oct 06 2021, Halil Pasic wrote: > > > The virtio specification virtio-v1.1-cs01 states: Transitional devices > > MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not > > been acknowledged by the driver. This is exactly what QEMU as of 6.1 > > has done relying solely on VIRTIO_F_VERSION_1 for detecting that. > > > > However, the specification also says: driver MAY read (but MUST NOT > > write) the device-specific configuration fields to check that it can > > support the device before setting FEATURES_OK. > > Suggest to put the citations from the spec into quotes, so that they are > distinguishable from the rest of the text. For the record: I basically took Michael's description, the one which you said you prefer, with some minor changes. This is one of the changes, which renders this a paraphrase and not a quote. Michael didn't use quotation marks so I was not sure it is was a word by word quote anyway. It was. But the spec depends on "During this step" which does not make any sense without the context. That is why I made the end of step explicit. I think we are fine without quotation marks. Those who care can read the spec. > > > > > In that case, any transitional device relying solely on > > VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in > > legacy format. In particular, this implies that it is in big endian > > format for big endian guests. This naturally confuses the driver which > > expects little endian in the modern mode. > > > > It is probably a good idea to amend the spec to clarify that > > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation > > is complete. However, we already have regression so let's try to address > > s/regression/a regression/ > Yes. Was like this in the original. Will change > > it. > > Maybe mention what the regression is? How about the following? The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when virtio 1.0 is used on both sides. The latter renders virtio-blk unusable with DASD backing, because things simply don't work with the default. > > Also mention that we use this workaround for modern on BE only? We have that already, don't we. The sentence that starts with "In particular". The regression description should reinforce that sufficiently IMHO. > > > > > Signed-off-by: Halil Pasic > > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config > > space") > > Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range") > > Reported-by: mark...@us.ibm.com > > --- > > drivers/virtio/virtio.c | 10 ++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > > index 0a5b54034d4b..494cfecd3376 100644 > > --- a/drivers/virtio/virtio.c > > +++ b/drivers/virtio/virtio.c > > @@ -239,6 +239,16 @@ static int virtio_dev_probe(struct device *_d) > > driver_features_legacy = driver_features; > > } > > > > + /* > > +* Some devices detect legacy solely via F_VERSION_1. Write > > +* F_VERSION_1 to force LE for these when needed. > > "...to force LE config space accesses before FEATURES_OK for these when > needed (BE)." > > ? Can do, but I would rather omit the (BE) at the end. All the conditions are necessary: * have validate callback * device offered VERSION_1 * virtio legacy is be > > > +*/ > > + if (drv->validate && !virtio_legacy_is_little_endian() > > + && BIT_ULL(VIRTIO_F_VERSION_1) & device_features) { > > Nit: putting device_features first would read more naturally to me. > Can do. > > + dev->features = BIT_ULL(VIRTIO_F_VERSION_1); > > + dev->config->finalize_features(dev); > > + } > > + > > if (device_features & (1ULL << VIRTIO_F_VERSION_1)) > > dev->features = driver_features & device_features; > > else > > Patch LGTM. > > Thanks for having a look. If you are fine with the proposed solution please tell me, so I can send out a v2. If not let us work towards an acceptable solution. Regards, Halil ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio-net: kernel panic in virtio_net.c
On Thu, 07 Oct 2021 14:04:22 +0200, Corentin Noël wrote: > I've been experiencing crashes with 5.14-rc1 and above that do not > occur with 5.13, I should have fixed this problem before. I don't know why, I just looked at the latest net code, and this commit seems to be lost. 1a8024239dacf53fcf39c0f07fbf2712af22864f virtio-net: fix for skb_over_panic inside big mode Can you test this patch again? Thanks. > > here is the crash trace: > > [ 61.346677] skbuff: skb_over_panic: text:881ae2c7 len:3762 > put:3762 head:8a5ec8c22000 data:8a5ec8c22010 tail:0xec2 > end:0xec0 dev: > [ 61.369192] kernel BUG at net/core/skbuff.c:111! > [ 61.372840] invalid opcode: [#1] SMP PTI > [ 61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0- > rc1linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1 > [ 61.376450] Hardware name: ChromiumOS crosvm, BIOS 0 > [ 61.377222] RIP: 0010:skb_panic+0x43/0x45 > [ 61.377833] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 > ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 18 f1 cf 88 e8 6a 43 fb > ff <0f> 0b 48 8b 14 24 48 c7 c1 20 35 b1 88 e8 ab ff ff ff 48 c7 c6 60 > [ 61.380566] RSP: 0018:ae258017cce0 EFLAGS: 00010246 > [ 61.381267] RAX: 008b RBX: 0010 RCX: > dfff > [ 61.382246] RDX: RSI: ffea RDI: > > [ 61.383376] RBP: de6a80230880 R08: 88f45568 R09: > 9ffb > [ 61.384494] R10: e000 R11: 3fff R12: > 8a5ec7461200 > [ 61.385696] R13: 8a5ec8c22000 R14: R15: > 0eb2 > [ 61.386825] FS: () GS:8a5febd4() > knlGS: > [ 61.388055] CS: 0010 DS: ES: CR0: 80050033 > [ 61.389221] CR2: 0148a060 CR3: 00011ae0e005 CR4: > 00370ee0 > [ 61.390871] DR0: DR1: DR2: > > [ 61.392335] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 61.393635] Call Trace: > [ 61.394127] > [ 61.394488] skb_put.cold+0x10/0x10 > [ 61.395095] page_to_skb+0xf7/0x410 > [ 61.395689] receive_buf+0x81/0x1660 > [ 61.396228] ? netif_receive_skb_list_internal+0x1ad/0x2b0 > [ 61.397180] ? napi_gro_flush+0x97/0xe0 > [ 61.397896] ? detach_buf_split+0x67/0x120 > [ 61.398573] virtnet_poll+0x2cf/0x420 > [ 61.399197] __napi_poll+0x25/0x150 > [ 61.399764] net_rx_action+0x22f/0x280 > [ 61.400394] __do_softirq+0xba/0x257 > [ 61.401012] irq_exit_rcu+0x8e/0xb0 > [ 61.401618] common_interrupt+0x7b/0xa0 > [ 61.402270] > [ 61.402620] asm_common_interrupt+0x1e/0x40 > [ 61.403302] RIP: 0010:default_idle+0xb/0x10 > [ 61.404018] Code: 8b 04 25 00 6d 01 00 f0 80 60 02 df c3 0f ae f0 0f > ae 38 0f ae f0 eb b9 0f 1f 80 00 00 00 00 eb 07 0f 00 2d df 3e 44 00 fb > f4 cc cc cc cc 65 8b 15 31 2f a4 77 89 d2 48 8b 05 d0 a1 0c 01 48 > [ 61.407636] RSP: 0018:ae258008fef8 EFLAGS: 0202 > [ 61.408394] RAX: 885ce620 RBX: 0005 RCX: > 8a5febd56f80 > [ 61.409451] RDX: 00c1ec32 RSI: 7ff1b7a1e726 RDI: > 8a5febd5dd00 > [ 61.410530] RBP: 8a5fc01f8000 R08: 00c1ec32 R09: > > [ 61.411715] R10: 0006 R11: 0002 R12: > > [ 61.412984] R13: R14: R15: > > [ 61.414183] ? mwait_idle+0x70/0x70 > [ 61.414805] ? mwait_idle+0x70/0x70 > [ 61.415592] default_idle_call+0x2a/0xa0 > [ 61.416216] do_idle+0x1e8/0x250 > [ 61.416722] cpu_startup_entry+0x14/0x20 > [ 61.417347] secondary_startup_64_no_verify+0xc2/0xcb > [ 61.418144] Modules linked in: > [ 61.418622] ---[ end trace 3741c3e580a52bbd ]--- > [ 61.419399] RIP: 0010:skb_panic+0x43/0x45 > [ 61.420054] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 > ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 18 f1 cf 88 e8 6a 43 fb > ff <0f> 0b 48 8b 14 24 48 c7 c1 20 35 b1 88 e8 ab ff ff ff 48 c7 c6 60 > [ 61.422606] RSP: 0018:ae258017cce0 EFLAGS: 00010246 > [ 61.423865] RAX: 008b RBX: 0010 RCX: > dfff > [ 61.425031] RDX: RSI: ffea RDI: > > [ 61.426229] RBP: de6a80230880 R08: 88f45568 R09: > 9ffb > [ 61.427439] R10: e000 R11: 3fff R12: > 8a5ec7461200 > [ 61.428615] R13: 8a5ec8c22000 R14: R15: > 0eb2 > [ 61.429799] FS: () GS:8a5febd4() > knlGS: > [ 61.431048] CS: 0010 DS: ES: CR0: 80050033 > [ 61.431997] CR2: 0148a060 CR3: 00011ae0e005 CR4: > 00370ee0 > [ 61.433206] DR0: DR1: DR2: > > [ 61.434502] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 61.
Re: virtio-net: kernel panic in virtio_net.c
On Thu, Oct 07, 2021 at 11:06:12PM +0800, Xuan Zhuo wrote: > On Thu, 07 Oct 2021 14:04:22 +0200, Corentin Noël > wrote: > > I've been experiencing crashes with 5.14-rc1 and above that do not > > occur with 5.13, > > I should have fixed this problem before. I don't know why, I just looked at > the > latest net code, and this commit seems to be lost. > > 1a8024239dacf53fcf39c0f07fbf2712af22864f virtio-net: fix for > skb_over_panic inside big mode > > Can you test this patch again? That commit showed up in 5.13-rc5, so 5.14-rc1 and 5.13 should have had it in it, right? thanks, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] virtio: write back F_VERSION_1 before validate
On Thu, Oct 07 2021, Halil Pasic wrote: > On Thu, 07 Oct 2021 13:52:24 +0200 > Cornelia Huck wrote: > >> On Wed, Oct 06 2021, Halil Pasic wrote: >> >> > The virtio specification virtio-v1.1-cs01 states: Transitional devices >> > MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not >> > been acknowledged by the driver. This is exactly what QEMU as of 6.1 >> > has done relying solely on VIRTIO_F_VERSION_1 for detecting that. >> > >> > However, the specification also says: driver MAY read (but MUST NOT >> > write) the device-specific configuration fields to check that it can >> > support the device before setting FEATURES_OK. >> >> Suggest to put the citations from the spec into quotes, so that they are >> distinguishable from the rest of the text. > > For the record: I basically took Michael's description, the one which you > said you prefer, with some minor changes. Well I did look at what the text said, not the details in the formatting... > > This is one of the changes, which renders this a paraphrase and not a > quote. Michael didn't use quotation marks so I was not sure it is was > a word by word quote anyway. It was. But the spec depends on "During this > step" which does not make any sense without the context. That is why I made > the end of step explicit. I still think that would be nicer while using some quotation marks, even if you are just doing a partial quote. In the first paragraph, however, we really should mark the quote properly. It gave me a stop when I first read it. > > I think we are fine without quotation marks. Those who care can read the > spec. > >> >> > >> > In that case, any transitional device relying solely on >> > VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in >> > legacy format. In particular, this implies that it is in big endian >> > format for big endian guests. This naturally confuses the driver which >> > expects little endian in the modern mode. >> > >> > It is probably a good idea to amend the spec to clarify that >> > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation >> > is complete. However, we already have regression so let's try to address >> >> s/regression/a regression/ >> > > Yes. Was like this in the original. Will change > >> > it. >> >> Maybe mention what the regression is? > > How about the following? > > The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and the > VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when virtio > 1.0 is used on both sides. The latter renders virtio-blk unusable with > DASD backing, because things simply don't work with the default. Sounds good to me. > >> >> Also mention that we use this workaround for modern on BE only? > > We have that already, don't we. The sentence that starts with "In > particular". The regression description should reinforce that > sufficiently IMHO. No strong opinion here. Anyone else? > >> >> > >> > Signed-off-by: Halil Pasic >> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config >> > space") >> > Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range") >> > Reported-by: mark...@us.ibm.com >> > --- >> > drivers/virtio/virtio.c | 10 ++ >> > 1 file changed, 10 insertions(+) >> > >> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c >> > index 0a5b54034d4b..494cfecd3376 100644 >> > --- a/drivers/virtio/virtio.c >> > +++ b/drivers/virtio/virtio.c >> > @@ -239,6 +239,16 @@ static int virtio_dev_probe(struct device *_d) >> >driver_features_legacy = driver_features; >> >} >> > >> > + /* >> > + * Some devices detect legacy solely via F_VERSION_1. Write >> > + * F_VERSION_1 to force LE for these when needed. >> >> "...to force LE config space accesses before FEATURES_OK for these when >> needed (BE)." >> >> ? > > Can do, but I would rather omit the (BE) at the end. All the conditions > are necessary: > * have validate callback > * device offered VERSION_1 > * virtio legacy is be > Ok, let's use that without the trailing BE. >> >> > + */ >> > + if (drv->validate && !virtio_legacy_is_little_endian() >> > +&& BIT_ULL(VIRTIO_F_VERSION_1) & device_features) { >> >> Nit: putting device_features first would read more naturally to me. >> > > Can do. > >> > + dev->features = BIT_ULL(VIRTIO_F_VERSION_1); >> > + dev->config->finalize_features(dev); >> > + } >> > + >> >if (device_features & (1ULL << VIRTIO_F_VERSION_1)) >> >dev->features = driver_features & device_features; >> >else >> >> Patch LGTM. >> >> > > Thanks for having a look. If you are fine with the proposed solution > please tell me, so I can send out a v2. No further comments other than what I wrote above, but maybe others have comments as well? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.or
Re: [PATCH 1/1] virtio: write back F_VERSION_1 before validate
On Thu, 07 Oct 2021 17:25:52 +0200 Cornelia Huck wrote: > On Thu, Oct 07 2021, Halil Pasic wrote: > > > On Thu, 07 Oct 2021 13:52:24 +0200 > > Cornelia Huck wrote: > > > >> On Wed, Oct 06 2021, Halil Pasic wrote: > >> > >> > The virtio specification virtio-v1.1-cs01 states: "Transitional devices > >> > MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not > >> > been acknowledged by the driver." This is exactly what QEMU as of 6.1 > >> > has done relying solely on VIRTIO_F_VERSION_1 for detecting that. > >> > > >> > However, the specification also says: "... driver MAY read (but MUST NOT > >> > write) the device-specific configuration fields to check that it can > >> > support the device ..." before setting FEATURES_OK. > >> > >> Suggest to put the citations from the spec into quotes, so that they are > >> distinguishable from the rest of the text. > > > > For the record: I basically took Michael's description, the one which you > > said you prefer, with some minor changes. > > Well I did look at what the text said, not the details in the formatting... > > > > > This is one of the changes, which renders this a paraphrase and not a > > quote. Michael didn't use quotation marks so I was not sure it is was > > a word by word quote anyway. It was. But the spec depends on "During this > > step" which does not make any sense without the context. That is why I made > > the end of step explicit. > > I still think that would be nicer while using some quotation marks, even > if you are just doing a partial quote. > > In the first paragraph, however, we really should mark the quote > properly. It gave me a stop when I first read it. I've added in some quotation marks and ellipsis marks. Does that look good for you? > > > > > I think we are fine without quotation marks. Those who care can read the > > spec. > > > >> > >> > > >> > In that case, any transitional device relying solely on > >> > VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in > >> > legacy format. In particular, this implies that it is in big endian > >> > format for big endian guests. This naturally confuses the driver which > >> > expects little endian in the modern mode. > >> > > >> > It is probably a good idea to amend the spec to clarify that > >> > VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation > >> > is complete. However, we already have regression so let's try to address > >> > > >> > >> s/regression/a regression/ > >> > > > > Yes. Was like this in the original. Will change > > > >> > it. > >> > >> Maybe mention what the regression is? > > > > How about the following? > > > > The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and the > > VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when virtio > > 1.0 is used on both sides. The latter renders virtio-blk unusable with > > DASD backing, because things simply don't work with the default. > > Sounds good to me. Will add it to the end. > > > > >> > >> Also mention that we use this workaround for modern on BE only? > > > > We have that already, don't we. The sentence that starts with "In > > particular". The regression description should reinforce that > > sufficiently IMHO. > > No strong opinion here. Anyone else? > > > > >> > >> > > >> > Signed-off-by: Halil Pasic > >> > Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in > >> > config space") > >> > Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range") > >> > Reported-by: mark...@us.ibm.com > >> > --- > >> > drivers/virtio/virtio.c | 10 ++ > >> > 1 file changed, 10 insertions(+) > >> > > >> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > >> > index 0a5b54034d4b..494cfecd3376 100644 > >> > --- a/drivers/virtio/virtio.c > >> > +++ b/drivers/virtio/virtio.c > >> > @@ -239,6 +239,16 @@ static int virtio_dev_probe(struct device *_d) > >> > driver_features_legacy = driver_features; > >> > } > >> > > >> > +/* > >> > + * Some devices detect legacy solely via F_VERSION_1. Write > >> > + * F_VERSION_1 to force LE for these when needed. > >> > >> "...to force LE config space accesses before FEATURES_OK for these when > >> needed (BE)." > >> > >> ? > > > > Can do, but I would rather omit the (BE) at the end. All the conditions > > are necessary: > > * have validate callback > > * device offered VERSION_1 > > * virtio legacy is be > > > > Ok, let's use that without the trailing BE. > Nod. > >> > >> > + */ > >> > +if (drv->validate && !virtio_legacy_is_little_endian() > >> > + && BIT_ULL(VIRTIO_F_VERSION_1) & > >> > device_features) { > >> > >> Nit: putting device_features first would read more naturally to me. > >> > > > > Can do. > > > >> > +dev->features = BIT_ULL(VIRTIO_F_VERSION_1); > >> > +de
Re: virtio-net: kernel panic in virtio_net.c
On Thu, 7 Oct 2021 17:25:02 +0200, Greg KH wrote: > On Thu, Oct 07, 2021 at 11:06:12PM +0800, Xuan Zhuo wrote: > > On Thu, 07 Oct 2021 14:04:22 +0200, Corentin Noël > > wrote: > > > I've been experiencing crashes with 5.14-rc1 and above that do not > > > occur with 5.13, > > > > I should have fixed this problem before. I don't know why, I just looked at > > the > > latest net code, and this commit seems to be lost. > > > > 1a8024239dacf53fcf39c0f07fbf2712af22864f virtio-net: fix for > > skb_over_panic inside big mode > > > > Can you test this patch again? > > That commit showed up in 5.13-rc5, so 5.14-rc1 and 5.13 should have had > it in it, right? > Yes, it may be lost due to conflicts during a certain merge. Thanks. > thanks, > > greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] virtio: write back F_VERSION_1 before validate
On Thu, Oct 07 2021, Halil Pasic wrote: > On Thu, 07 Oct 2021 17:25:52 +0200 > Cornelia Huck wrote: > >> On Thu, Oct 07 2021, Halil Pasic wrote: >> >> > On Thu, 07 Oct 2021 13:52:24 +0200 >> > Cornelia Huck wrote: >> > >> >> On Wed, Oct 06 2021, Halil Pasic wrote: >> >> >> >> > The virtio specification virtio-v1.1-cs01 states: "Transitional devices >> >> > MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not >> >> > been acknowledged by the driver." This is exactly what QEMU as of 6.1 >> >> > has done relying solely on VIRTIO_F_VERSION_1 for detecting that. >> >> > >> >> > However, the specification also says: "... driver MAY read (but MUST NOT >> >> > write) the device-specific configuration fields to check that it can >> >> > support the device ..." before setting FEATURES_OK. >> >> >> >> Suggest to put the citations from the spec into quotes, so that they are >> >> distinguishable from the rest of the text. >> > >> > For the record: I basically took Michael's description, the one which you >> > said you prefer, with some minor changes. >> >> Well I did look at what the text said, not the details in the formatting... >> >> > >> > This is one of the changes, which renders this a paraphrase and not a >> > quote. Michael didn't use quotation marks so I was not sure it is was >> > a word by word quote anyway. It was. But the spec depends on "During this >> > step" which does not make any sense without the context. That is why I made >> > the end of step explicit. >> >> I still think that would be nicer while using some quotation marks, even >> if you are just doing a partial quote. >> >> In the first paragraph, however, we really should mark the quote >> properly. It gave me a stop when I first read it. > > I've added in some quotation marks and ellipsis marks. Does that look > good for you? Yep, works for me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer
The following patches apply over Linus's, Jens's or mst's trees, but were made over linux-next because patch 6: io_uring: switch to kernel_worker has merge conflicts with Jens Axboe's for-next branch and Paul Moore's selinux tree's next branch. This is V4 of the patchset. It should handle all the review comments posted in V1 - V3. If I missed a comment, please let me know. This patchset allows the vhost layer to do a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how io_uring does a copy_process against its userspace app (Jens, the patches make create_io_thread more generic so that's why you are cc'd). This allows the vhost layer's worker threads to inherit cgroups, namespaces, address space, etc and this worker thread will also be accounted for against that owner/parent process's RLIMIT_NPROC limit. If you are not familiar with qemu and vhost here is more detailed problem description: Qemu will create vhost devices in the kernel which perform network, SCSI, etc IO and management operations from worker threads created by the kthread API. Because the kthread API does a copy_process on the kthreadd thread, the vhost layer has to use kthread_use_mm to access the Qemu thread's memory and cgroup_attach_task_all to add itself to the Qemu thread's cgroups. The problem with this approach is that we then have to add new functions/ args/functionality for every thing we want to inherit. I started doing that here: https://lkml.org/lkml/2021/6/23/1233 for the RLIMIT_NPROC check, but it seems it might be easier to just inherit everything from the beginning, becuase I'd need to do something like that patch several times. For example, the current approach does not support cgroups v2 so commands like virsh emulatorpin do not work. The qemu process can go over its RLIMIT_NPROC. And for future vhost interfaces where we export the vhost thread pid we will want the namespace info. V4: - Drop NO_SIG patch and replaced with Christian's SIG_IGN patch. - Merged Christian's kernel_worker_flags_valid helpers into patch 5 that added the new kernel worker functions. - Fixed extra "i" issue. - Added PF_USER_WORKER flag and added check that kernel_worker_start users had that flag set. Also dropped patches that passed worker flags to copy_thread and replaced with PF_USER_WORKER check. V3: - Add parentheses in p->flag and work_flags check in copy_thread. - Fix check in arm/arm64 which was doing the reverse of other archs where it did likely(!flags) instead of unlikely(flags). V2: - Rename kernel_copy_process to kernel_worker. - Instead of exporting functions, make kernel_worker() a proper function/API that does common work for the caller. - Instead of adding new fields to kernel_clone_args for each option make it flag based similar to CLONE_*. - Drop unused completion struct in vhost. - Fix compile warnings by merging vhost cgroup cleanup patch and vhost conversion patch. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V4 1/8] fork: Make IO worker options flag based
This patchset adds a couple new options to kernel_clone_args for IO thread like/related users. Instead of adding new fields to kernel_clone_args for each option, this moves us to a flags based approach by first converting io_thread. Signed-off-by: Mike Christie Suggested-by: Christian Brauner Acked-by: Christian Brauner --- include/linux/sched/task.h | 4 +++- kernel/fork.c | 4 ++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index ef02be869cf2..48417c735438 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -18,8 +18,11 @@ struct css_set; /* All the bits taken by the old clone syscall. */ #define CLONE_LEGACY_FLAGS 0xULL +#define KERN_WORKER_IO BIT(0) + struct kernel_clone_args { u64 flags; + u32 worker_flags; int __user *pidfd; int __user *child_tid; int __user *parent_tid; @@ -31,7 +34,6 @@ struct kernel_clone_args { /* Number of elements in *set_tid */ size_t set_tid_size; int cgroup; - int io_thread; struct cgroup *cgrp; struct css_set *cset; }; diff --git a/kernel/fork.c b/kernel/fork.c index 38681ad44c76..3988106e9609 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2026,7 +2026,7 @@ static __latent_entropy struct task_struct *copy_process( p = dup_task_struct(current, node); if (!p) goto fork_out; - if (args->io_thread) { + if (args->worker_flags & KERN_WORKER_IO) { /* * Mark us an IO worker, and block any signal that isn't * fatal or STOP @@ -2526,7 +2526,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node) .exit_signal= (lower_32_bits(flags) & CSIGNAL), .stack = (unsigned long)fn, .stack_size = (unsigned long)arg, - .io_thread = 1, + .worker_flags = KERN_WORKER_IO, }; return copy_process(NULL, 0, node, &args); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V4 3/8] fork: add option to not clone or dup files
Each vhost device gets a thread that is used to perform IO and management operations. Instead of a thread that is accessing a device, the thread is part of the device, so when it calls the kernel_worker() function added in the next patch we can't dup or clone the parent's files/FDS because it would do an extra increment on ourself. Later, when we do: Qemu process exits: do_exit -> exit_files -> put_files_struct -> close_files we would leak the device's resources because of that extra refcount on the fd or file_struct. This patch adds a no_files option so these worker threads can prevent taking an extra refcount on themselves. Signed-off-by: Mike Christie Acked-by: Christian Brauner --- include/linux/sched/task.h | 1 + kernel/fork.c | 11 +-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 53599a99d7e0..1153f9e5d10e 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -20,6 +20,7 @@ struct css_set; #define KERN_WORKER_IO BIT(0) #define KERN_WORKER_USER BIT(1) +#define KERN_WORKER_NO_FILES BIT(2) struct kernel_clone_args { u64 flags; diff --git a/kernel/fork.c b/kernel/fork.c index 4f780424de46..3161edac1236 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1532,7 +1532,8 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk) return 0; } -static int copy_files(unsigned long clone_flags, struct task_struct *tsk) +static int copy_files(unsigned long clone_flags, struct task_struct *tsk, + int no_files) { struct files_struct *oldf, *newf; int error = 0; @@ -1544,6 +1545,11 @@ static int copy_files(unsigned long clone_flags, struct task_struct *tsk) if (!oldf) goto out; + if (no_files) { + tsk->files = NULL; + goto out; + } + if (clone_flags & CLONE_FILES) { atomic_inc(&oldf->count); goto out; @@ -2181,7 +2187,8 @@ static __latent_entropy struct task_struct *copy_process( retval = copy_semundo(clone_flags, p); if (retval) goto bad_fork_cleanup_security; - retval = copy_files(clone_flags, p); + retval = copy_files(clone_flags, p, + args->worker_flags & KERN_WORKER_NO_FILES); if (retval) goto bad_fork_cleanup_semundo; retval = copy_fs(clone_flags, p); -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V4 5/8] fork: add helper to clone a process
The vhost layer has similar requirements as io_uring where its worker threads need to access the userspace thread's memory, want to inherit the parents's cgroups and namespaces, and be checked against the parent's RLIMITs. Right now, the vhost layer uses the kthread API which has kthread_use_mm for mem access, and those threads can use cgroup_attach_task_all for v1 cgroups, but there are no helpers for the other items. This adds a helper to clone a process so we can inherit everything we want in one call. It's a more generic version of create_io_thread which will be used by the vhost layer and io_uring in later patches in this set. [added flag validation code from Christian Brauner's patch] Signed-off-by: Mike Christie Acked-by: Christian Brauner --- include/linux/sched/task.h | 4 +++ kernel/fork.c | 71 ++ 2 files changed, 75 insertions(+) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 5f3928fe0544..2bfc0629c868 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -89,6 +89,10 @@ extern void exit_itimers(struct signal_struct *); extern pid_t kernel_clone(struct kernel_clone_args *kargs); struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node); +struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node, + unsigned long clone_flags, u32 worker_flags); +__printf(2, 3) +void kernel_worker_start(struct task_struct *tsk, const char namefmt[], ...); struct task_struct *fork_idle(int); struct mm_struct *copy_init_mm(void); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); diff --git a/kernel/fork.c b/kernel/fork.c index 07f9e410fb64..b04e61a965e2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2544,6 +2544,77 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node) return copy_process(NULL, 0, node, &args); } +static bool kernel_worker_flags_valid(struct kernel_clone_args *kargs) +{ + /* Verify that no unknown flags are passed along. */ + if (kargs->worker_flags & ~(KERN_WORKER_IO | KERN_WORKER_USER | + KERN_WORKER_NO_FILES | KERN_WORKER_SIG_IGN)) + return false; + + /* +* If we're ignoring all signals don't allow sharing struct sighand and +* don't bother clearing signal handlers. +*/ + if ((kargs->flags & (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) && + (kargs->worker_flags & KERN_WORKER_SIG_IGN)) + return false; + + return true; +} + +/** + * kernel_worker - create a copy of a process to be used by the kernel + * @fn: thread stack + * @arg: data to be passed to fn + * @node: numa node to allocate task from + * @clone_flags: CLONE flags + * @worker_flags: KERN_WORKER flags + * + * This returns a created task, or an error pointer. The returned task is + * inactive, and the caller must fire it up through kernel_worker_start(). If + * this is an PF_IO_WORKER all singals but KILL and STOP are blocked. + */ +struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node, + unsigned long clone_flags, u32 worker_flags) +{ + struct kernel_clone_args args = { + .flags = ((lower_32_bits(clone_flags) | CLONE_VM | + CLONE_UNTRACED) & ~CSIGNAL), + .exit_signal= (lower_32_bits(clone_flags) & CSIGNAL), + .stack = (unsigned long)fn, + .stack_size = (unsigned long)arg, + .worker_flags = KERN_WORKER_USER | worker_flags, + }; + + if (!kernel_worker_flags_valid(&args)) + return ERR_PTR(-EINVAL); + + return copy_process(NULL, 0, node, &args); +} +EXPORT_SYMBOL_GPL(kernel_worker); + +/** + * kernel_worker_start - Start a task created with kernel_worker + * @tsk: task to wake up + * @namefmt: printf-style format string for the thread name + * @arg: arguments for @namefmt + */ +void kernel_worker_start(struct task_struct *tsk, const char namefmt[], ...) +{ + char name[TASK_COMM_LEN]; + va_list args; + + WARN_ON(!(tsk->flags & PF_USER_WORKER)); + + va_start(args, namefmt); + vsnprintf(name, sizeof(name), namefmt, args); + set_task_comm(tsk, name); + va_end(args); + + wake_up_new_task(tsk); +} +EXPORT_SYMBOL_GPL(kernel_worker_start); + /* * Ok, this is the main fork-routine. * -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V4 4/8] fork: Add KERNEL_WORKER flag to ignore signals
From: Christian Brauner Since this is mirroring kthread's sig ignore api introduced in commit 10ab825bdef8 ("change kernel threads to ignore signals instead of blocking them") this patch adds an option flag, KERNEL_WORKER_SIG_IGN, handled in copy_process() after copy_sighand() and copy_signals() to ease the transition from kthreads to this new api. Signed-off-by: Christian Brauner Signed-off-by: Mike Christie --- include/linux/sched/task.h | 1 + kernel/fork.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 1153f9e5d10e..5f3928fe0544 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -21,6 +21,7 @@ struct css_set; #define KERN_WORKER_IO BIT(0) #define KERN_WORKER_USER BIT(1) #define KERN_WORKER_NO_FILES BIT(2) +#define KERN_WORKER_SIG_IGNBIT(3) struct kernel_clone_args { u64 flags; diff --git a/kernel/fork.c b/kernel/fork.c index 3161edac1236..07f9e410fb64 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2213,6 +2213,9 @@ static __latent_entropy struct task_struct *copy_process( if (retval) goto bad_fork_cleanup_io; + if (args->worker_flags & KERN_WORKER_SIG_IGN) + ignore_signals(p); + stackleak_task_init(p); if (pid != &init_struct_pid) { -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V4 6/8] io_uring: switch to kernel_worker
Convert io_uring and io-wq to use kernel_worker. Signed-off-by: Mike Christie Reviewed-by: Christian Brauner Reviewed-by: Jens Axboe --- Note: To avoid patch application conflicts this patch was made over linux-next which has Jens Axboe's block tree's for-next branch and Paul Moore's selinux tree's next branch because: commit: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") from Paul's tree had conflicts. fs/io-wq.c | 15 --- fs/io_uring.c | 11 +-- include/linux/sched/task.h | 1 - 3 files changed, 13 insertions(+), 14 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 962952951126..32b140754496 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -70,6 +70,9 @@ struct io_worker { #define IO_WQ_NR_HASH_BUCKETS (1u << IO_WQ_HASH_ORDER) +#define IO_WQ_CLONE_FLAGS (CLONE_FS | CLONE_FILES | CLONE_SIGHAND | \ +CLONE_THREAD | CLONE_IO) + struct io_wqe_acct { unsigned nr_workers; unsigned max_workers; @@ -550,13 +553,9 @@ static int io_wqe_worker(void *data) struct io_wqe *wqe = worker->wqe; struct io_wq *wq = wqe->wq; bool last_timeout = false; - char buf[TASK_COMM_LEN]; worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING); - snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid); - set_task_comm(current, buf); - audit_alloc_kernel(current); while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) { @@ -654,7 +653,7 @@ static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker, list_add_tail_rcu(&worker->all_list, &wqe->all_list); worker->flags |= IO_WORKER_F_FREE; raw_spin_unlock(&wqe->lock); - wake_up_new_task(tsk); + kernel_worker_start(tsk, "iou-wrk-%d", wqe->wq->task->pid); } static bool io_wq_work_match_all(struct io_wq_work *work, void *data) @@ -684,7 +683,8 @@ static void create_worker_cont(struct callback_head *cb) worker = container_of(cb, struct io_worker, create_work); clear_bit_unlock(0, &worker->create_state); wqe = worker->wqe; - tsk = create_io_thread(io_wqe_worker, worker, wqe->node); + tsk = kernel_worker(io_wqe_worker, worker, wqe->node, + IO_WQ_CLONE_FLAGS, KERN_WORKER_IO); if (!IS_ERR(tsk)) { io_init_new_worker(wqe, worker, tsk); io_worker_release(worker); @@ -754,7 +754,8 @@ static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index) if (index == IO_WQ_ACCT_BOUND) worker->flags |= IO_WORKER_F_BOUND; - tsk = create_io_thread(io_wqe_worker, worker, wqe->node); + tsk = kernel_worker(io_wqe_worker, worker, wqe->node, IO_WQ_CLONE_FLAGS, + KERN_WORKER_IO); if (!IS_ERR(tsk)) { io_init_new_worker(wqe, worker, tsk); } else if (!io_should_retry_thread(PTR_ERR(tsk))) { diff --git a/fs/io_uring.c b/fs/io_uring.c index 488aa14da287..5e21ae6d4ff4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7440,12 +7440,8 @@ static int io_sq_thread(void *data) struct io_sq_data *sqd = data; struct io_ring_ctx *ctx; unsigned long timeout = 0; - char buf[TASK_COMM_LEN]; DEFINE_WAIT(wait); - snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); - set_task_comm(current, buf); - if (sqd->sq_cpu != -1) set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); else @@ -8669,6 +8665,8 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, fdput(f); } if (ctx->flags & IORING_SETUP_SQPOLL) { + unsigned long flags = CLONE_FS | CLONE_FILES | CLONE_SIGHAND | + CLONE_THREAD | CLONE_IO; struct task_struct *tsk; struct io_sq_data *sqd; bool attached; @@ -8710,7 +8708,8 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, sqd->task_pid = current->pid; sqd->task_tgid = current->tgid; - tsk = create_io_thread(io_sq_thread, sqd, NUMA_NO_NODE); + tsk = kernel_worker(io_sq_thread, sqd, NUMA_NO_NODE, + flags, KERN_WORKER_IO); if (IS_ERR(tsk)) { ret = PTR_ERR(tsk); goto err_sqpoll; @@ -8718,7 +8717,7 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, sqd->thread = tsk; ret = io_uring_alloc_task_context(tsk, ctx); - wake_up_new_task(tsk); + kernel_worker_start(tsk, "iou-sqp-%d", sqd->task_pid); if (ret) goto err; } else if (p->flags & IORING_SETUP_SQ_AFF) { diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 2bfc0629c868..c8ab90e28f11 10
[PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag
The vhost worker threads need the same frame setup as io_uring's worker threads, but handle signals differently and do not need the same scheduling behavior. This patch separate's the frame setup parts of PF_IO_WORKER into a new PF flag PF_USER_WORKER. Signed-off-by: Mike Christie --- arch/alpha/kernel/process.c | 2 +- arch/arc/kernel/process.c| 2 +- arch/arm/kernel/process.c| 2 +- arch/arm64/kernel/process.c | 2 +- arch/csky/kernel/process.c | 2 +- arch/h8300/kernel/process.c | 2 +- arch/hexagon/kernel/process.c| 2 +- arch/ia64/kernel/process.c | 2 +- arch/m68k/kernel/process.c | 2 +- arch/microblaze/kernel/process.c | 2 +- arch/mips/kernel/process.c | 2 +- arch/nds32/kernel/process.c | 2 +- arch/nios2/kernel/process.c | 2 +- arch/openrisc/kernel/process.c | 2 +- arch/parisc/kernel/process.c | 2 +- arch/powerpc/kernel/process.c| 2 +- arch/riscv/kernel/process.c | 2 +- arch/s390/kernel/process.c | 2 +- arch/sh/kernel/process_32.c | 2 +- arch/sparc/kernel/process_32.c | 2 +- arch/sparc/kernel/process_64.c | 2 +- arch/um/kernel/process.c | 2 +- arch/x86/kernel/process.c| 2 +- arch/xtensa/kernel/process.c | 2 +- include/linux/sched.h| 1 + include/linux/sched/task.h | 1 + kernel/fork.c| 4 +++- 27 files changed, 29 insertions(+), 25 deletions(-) diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c index a5123ea426ce..e350fff2ea14 100644 --- a/arch/alpha/kernel/process.c +++ b/arch/alpha/kernel/process.c @@ -249,7 +249,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, childti->pcb.ksp = (unsigned long) childstack; childti->pcb.flags = 1; /* set FEN, clear everything else */ - if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) { + if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) { /* kernel thread */ memset(childstack, 0, sizeof(struct switch_stack) + sizeof(struct pt_regs)); diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 3793876f42d9..c3f4952cce17 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -191,7 +191,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, childksp[0] = 0;/* fp */ childksp[1] = (unsigned long)ret_from_fork; /* blink */ - if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) { + if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) { memset(c_regs, 0, sizeof(struct pt_regs)); c_callee->r13 = kthread_arg; diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 0e2d3051741e..449c9db3942a 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -247,7 +247,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, thread->cpu_domain = get_domain(); #endif - if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER { + if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER { *childregs = *current_pt_regs(); childregs->ARM_r0 = 0; if (stack_start) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 40adb8cdbf5a..e2fe88a3ae90 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -333,7 +333,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, ptrauth_thread_init_kernel(p); - if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER { + if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER { *childregs = *current_pt_regs(); childregs->regs[0] = 0; diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c index 3d0ca22cd0e2..509f2bfe4ace 100644 --- a/arch/csky/kernel/process.c +++ b/arch/csky/kernel/process.c @@ -49,7 +49,7 @@ int copy_thread(unsigned long clone_flags, /* setup thread.sp for switch_to !!! */ p->thread.sp = (unsigned long)childstack; - if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) { + if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) { memset(childregs, 0, sizeof(struct pt_regs)); childstack->r15 = (unsigned long) ret_from_kernel_thread; childstack->r10 = kthread_arg; diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c index 2ac27e4248a4..11baf058b6c5 100644 --- a/arch/h8300/kernel/process.c +++ b/arch/h8300/kernel/process.c @@ -112,7 +112,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, childregs = (struct pt_regs *) (THREAD_SIZE + task_stack_page(p)) - 1; - if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) { + if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) { memset(childregs, 0, s
[PATCH V4 8/8] vhost: use kernel_worker to check RLIMITs and inherit v2 cgroups
For vhost workers we use the kthread API which inherit's its values from and checks against the kthreadd thread. This results in cgroups v2 not working and the wrong RLIMITs being checked. This patch has us use the kernel_copy_process function which will inherit its values/checks from the thread that owns the device. Signed-off-by: Mike Christie Acked-by: Michael S. Tsirkin --- drivers/vhost/vhost.c | 65 +++ drivers/vhost/vhost.h | 7 - 2 files changed, 28 insertions(+), 44 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index c9a1f706989c..9aa04fcdf210 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -22,7 +22,6 @@ #include #include #include -#include #include #include #include @@ -344,17 +343,14 @@ static void vhost_vq_reset(struct vhost_dev *dev, static int vhost_worker(void *data) { struct vhost_worker *worker = data; - struct vhost_dev *dev = worker->dev; struct vhost_work *work, *work_next; struct llist_node *node; - kthread_use_mm(dev->mm); - for (;;) { /* mb paired w/ kthread_stop */ set_current_state(TASK_INTERRUPTIBLE); - if (kthread_should_stop()) { + if (test_bit(VHOST_WORKER_FLAG_STOP, &worker->flags)) { __set_current_state(TASK_RUNNING); break; } @@ -376,8 +372,9 @@ static int vhost_worker(void *data) schedule(); } } - kthread_unuse_mm(dev->mm); - return 0; + + complete(worker->exit_done); + do_exit(0); } static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq) @@ -517,31 +514,6 @@ long vhost_dev_check_owner(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_dev_check_owner); -struct vhost_attach_cgroups_struct { - struct vhost_work work; - struct task_struct *owner; - int ret; -}; - -static void vhost_attach_cgroups_work(struct vhost_work *work) -{ - struct vhost_attach_cgroups_struct *s; - - s = container_of(work, struct vhost_attach_cgroups_struct, work); - s->ret = cgroup_attach_task_all(s->owner, current); -} - -static int vhost_attach_cgroups(struct vhost_dev *dev) -{ - struct vhost_attach_cgroups_struct attach; - - attach.owner = current; - vhost_work_init(&attach.work, vhost_attach_cgroups_work); - vhost_work_queue(dev, &attach.work); - vhost_work_dev_flush(dev); - return attach.ret; -} - /* Caller should have device mutex */ bool vhost_dev_has_owner(struct vhost_dev *dev) { @@ -579,6 +551,16 @@ static void vhost_detach_mm(struct vhost_dev *dev) dev->mm = NULL; } +static void vhost_worker_stop(struct vhost_worker *worker) +{ + DECLARE_COMPLETION_ONSTACK(exit_done); + + worker->exit_done = &exit_done; + set_bit(VHOST_WORKER_FLAG_STOP, &worker->flags); + wake_up_process(worker->task); + wait_for_completion(worker->exit_done); +} + static void vhost_worker_free(struct vhost_dev *dev) { struct vhost_worker *worker = dev->worker; @@ -588,7 +570,7 @@ static void vhost_worker_free(struct vhost_dev *dev) dev->worker = NULL; WARN_ON(!llist_empty(&worker->work_list)); - kthread_stop(worker->task); + vhost_worker_stop(worker); kfree(worker); } @@ -603,27 +585,24 @@ static int vhost_worker_create(struct vhost_dev *dev) return -ENOMEM; dev->worker = worker; - worker->dev = dev; worker->kcov_handle = kcov_common_handle(); init_llist_head(&worker->work_list); - task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid); + /* +* vhost used to use the kthread API which ignores all signals by +* default and the drivers expect this behavior. +*/ + task = kernel_worker(vhost_worker, worker, NUMA_NO_NODE, CLONE_FS, +KERN_WORKER_NO_FILES | KERN_WORKER_SIG_IGN); if (IS_ERR(task)) { ret = PTR_ERR(task); goto free_worker; } worker->task = task; - wake_up_process(task); /* avoid contributing to loadavg */ - - ret = vhost_attach_cgroups(dev); - if (ret) - goto stop_worker; - + kernel_worker_start(task, "vhost-%d", current->pid); return 0; -stop_worker: - kthread_stop(worker->task); free_worker: kfree(worker); dev->worker = NULL; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 102ce25e4e13..09748694cb66 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -25,11 +25,16 @@ struct vhost_work { unsigned long flags; }; +enum { + VHOST_WORKER_FLAG_STOP, +}; + struct vhost_worker { struct task_struct *task; + struct completion *exit_done; struct ll
[PATCH V4 7/8] vhost: move worker thread fields to new struct
This is just a prep patch. It moves the worker related fields to a new vhost_worker struct and moves the code around to create some helpers that will be used in the next patches. Signed-off-by: Mike Christie Reviewed-by: Stefan Hajnoczi Acked-by: Michael S. Tsirkin --- drivers/vhost/vhost.c | 98 --- drivers/vhost/vhost.h | 11 +++-- 2 files changed, 72 insertions(+), 37 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 59edb5a1ffe2..c9a1f706989c 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -263,8 +263,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) * sure it was not in the list. * test_and_set_bit() implies a memory barrier. */ - llist_add(&work->node, &dev->work_list); - wake_up_process(dev->worker); + llist_add(&work->node, &dev->worker->work_list); + wake_up_process(dev->worker->task); } } EXPORT_SYMBOL_GPL(vhost_work_queue); @@ -272,7 +272,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue); /* A lockless hint for busy polling code to exit the loop */ bool vhost_has_work(struct vhost_dev *dev) { - return !llist_empty(&dev->work_list); + return dev->worker && !llist_empty(&dev->worker->work_list); } EXPORT_SYMBOL_GPL(vhost_has_work); @@ -343,7 +343,8 @@ static void vhost_vq_reset(struct vhost_dev *dev, static int vhost_worker(void *data) { - struct vhost_dev *dev = data; + struct vhost_worker *worker = data; + struct vhost_dev *dev = worker->dev; struct vhost_work *work, *work_next; struct llist_node *node; @@ -358,7 +359,7 @@ static int vhost_worker(void *data) break; } - node = llist_del_all(&dev->work_list); + node = llist_del_all(&worker->work_list); if (!node) schedule(); @@ -368,7 +369,7 @@ static int vhost_worker(void *data) llist_for_each_entry_safe(work, work_next, node, node) { clear_bit(VHOST_WORK_QUEUED, &work->flags); __set_current_state(TASK_RUNNING); - kcov_remote_start_common(dev->kcov_handle); + kcov_remote_start_common(worker->kcov_handle); work->fn(work); kcov_remote_stop(); if (need_resched()) @@ -487,7 +488,6 @@ void vhost_dev_init(struct vhost_dev *dev, dev->byte_weight = byte_weight; dev->use_worker = use_worker; dev->msg_handler = msg_handler; - init_llist_head(&dev->work_list); init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); INIT_LIST_HEAD(&dev->pending_list); @@ -579,10 +579,60 @@ static void vhost_detach_mm(struct vhost_dev *dev) dev->mm = NULL; } +static void vhost_worker_free(struct vhost_dev *dev) +{ + struct vhost_worker *worker = dev->worker; + + if (!worker) + return; + + dev->worker = NULL; + WARN_ON(!llist_empty(&worker->work_list)); + kthread_stop(worker->task); + kfree(worker); +} + +static int vhost_worker_create(struct vhost_dev *dev) +{ + struct vhost_worker *worker; + struct task_struct *task; + int ret; + + worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); + if (!worker) + return -ENOMEM; + + dev->worker = worker; + worker->dev = dev; + worker->kcov_handle = kcov_common_handle(); + init_llist_head(&worker->work_list); + + task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid); + if (IS_ERR(task)) { + ret = PTR_ERR(task); + goto free_worker; + } + + worker->task = task; + wake_up_process(task); /* avoid contributing to loadavg */ + + ret = vhost_attach_cgroups(dev); + if (ret) + goto stop_worker; + + return 0; + +stop_worker: + kthread_stop(worker->task); +free_worker: + kfree(worker); + dev->worker = NULL; + return ret; +} + /* Caller should have device mutex */ long vhost_dev_set_owner(struct vhost_dev *dev) { - struct task_struct *worker; int err; /* Is there an owner already? */ @@ -593,36 +643,21 @@ long vhost_dev_set_owner(struct vhost_dev *dev) vhost_attach_mm(dev); - dev->kcov_handle = kcov_common_handle(); if (dev->use_worker) { - worker = kthread_create(vhost_worker, dev, - "vhost-%d", current->pid); - if (IS_ERR(worker)) { - err = PTR_ERR(worker); - goto err_worker; - } - - dev->worker = worker; - wake_up_process(worker); /* avoid contributing to loadav
Re: [PATCH V4 6/8] io_uring: switch to kernel_worker
Hi Mike, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on next-20211007] [cannot apply to mst-vhost/linux-next vgupta-arc/for-next arm64/for-next/core uclinux-h8/h8300-next geert-m68k/for-next openrisc/for-next deller-parisc/for-next powerpc/next s390/features linus/master v5.15-rc4 v5.15-rc3 v5.15-rc2 v5.15-rc4] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610 base:f8dc23b3dc0cc5b32dfd0c446e59377736d073a7 config: hexagon-randconfig-r045-20211007 (attached as .config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project b1a45c62f03ecbeb4544b0c65a01ee4586235a61) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/19238ec927cb55bbd6fd6bdf64bac6a99f457b8c git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610 git checkout 19238ec927cb55bbd6fd6bdf64bac6a99f457b8c # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): kernel/fork.c:161:13: warning: no previous prototype for function 'arch_release_task_struct' [-Wmissing-prototypes] void __weak arch_release_task_struct(struct task_struct *tsk) ^ kernel/fork.c:161:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void __weak arch_release_task_struct(struct task_struct *tsk) ^ static kernel/fork.c:814:20: warning: no previous prototype for function 'arch_task_cache_init' [-Wmissing-prototypes] void __init __weak arch_task_cache_init(void) { } ^ kernel/fork.c:814:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void __init __weak arch_task_cache_init(void) { } ^ static kernel/fork.c:909:12: warning: no previous prototype for function 'arch_dup_task_struct' [-Wmissing-prototypes] int __weak arch_dup_task_struct(struct task_struct *dst, ^ kernel/fork.c:909:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int __weak arch_dup_task_struct(struct task_struct *dst, ^ static >> kernel/fork.c:2581:21: warning: no previous prototype for function >> 'create_io_thread' [-Wmissing-prototypes] struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node) ^ kernel/fork.c:2581:1: note: declare 'static' if the function is not intended to be used outside of this translation unit struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node) ^ static 4 warnings generated. vim +/create_io_thread +2581 kernel/fork.c 13585fa0668c72 Nadav Amit2019-04-25 2574 cc440e8738e5c8 Jens Axboe2021-03-04 2575 /* cc440e8738e5c8 Jens Axboe2021-03-04 2576 * This is like kernel_clone(), but shaved down and tailored to just cc440e8738e5c8 Jens Axboe2021-03-04 2577 * creating io_uring workers. It returns a created task, or an error pointer. cc440e8738e5c8 Jens Axboe2021-03-04 2578 * The returned task is inactive, and the caller must fire it up through cc440e8738e5c8 Jens Axboe2021-03-04 2579 * wake_up_new_task(p). All signals are blocked in the created task. cc440e8738e5c8 Jens Axboe2021-03-04 2580 */ cc440e8738e5c8 Jens Axboe2021-03-04 @2581 struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node) cc440e8738e5c8 Jens Axboe2021-03-04 2582 { cc440e8738e5c8 Jens Axboe2021-03-04 2583 unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD| cc440e8738e5c8 Jens Axboe2021-03-04 2584 CLONE_IO; cc440e8738e5c8 Jens Axboe2021-03-04 2585 struct kernel_clone_args args = { cc440e8738e5c8 Jens Axboe2021-03-04 2586 .flags = ((lower_32_bits(flags) | CLONE_VM | cc440e8738e5c8 Jens Axboe2021-03-04 2587 CLONE_UNTRACED) & ~CSIGNAL), cc440e8738e5c8 Jens Axboe2021-03-04 2588 .exit_signal= (lower_32_bits(flags) & CSIGNAL), cc440e8738e5c8 Jens Axboe2021-03-04 2589 .stack