Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND

2020-07-07 Thread Michal Hocko
On Fri 03-07-20 15:29:22, Jann Horn wrote:
> On Fri, Jul 3, 2020 at 1:30 PM Michal Hocko  wrote:
> > On Fri 03-07-20 10:34:09, Catangiu, Adrian Costin wrote:
> > > This patch adds logic to the kernel power code to zero out contents of
> > > all MADV_WIPEONSUSPEND VMAs present in the system during its transition
> > > to any suspend state equal or greater/deeper than Suspend-to-memory,
> > > known as S3.
> >
> > How does the application learn that its memory got wiped? S2disk is an
> > async operation and it can happen at any time during the task execution.
> > So how does the application work to prevent from corrupted state - e.g.
> > when suspended between two memory loads?
> 
> You can do it seqlock-style, kind of - you reserve the first byte of
> the page or so as a "is this page initialized" marker, and after every
> read from the page, you do a compiler barrier and check whether that
> byte has been cleared.

This is certainly possible yet wery awkwar interface to use IMHO.
MADV_EXTERNALY_VOLATILE would express the actual semantic much better.
I might not still understand the expected usecase but if the target
application has to be changed anyway then why not simply use a
transparent and proper signaling mechanism like poll on a fd. That would
be certainly a more natural and less error prone programming interface.

-- 
Michal Hocko
SUSE Labs
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND

2020-07-07 Thread Michal Hocko
On Fri 03-07-20 18:45:06, Colm MacCárthaigh wrote:
> 
> 
> On 3 Jul 2020, at 4:30, Michal Hocko wrote:
> 
> > On Fri 03-07-20 10:34:09, Catangiu, Adrian Costin wrote:
> > > This patch adds logic to the kernel power code to zero out contents
> > > of
> > > all MADV_WIPEONSUSPEND VMAs present in the system during its
> > > transition
> > > to any suspend state equal or greater/deeper than Suspend-to-memory,
> > > known as S3.
> > 
> > How does the application learn that its memory got wiped? S2disk is an
> > async operation and it can happen at any time during the task execution.
> > So how does the application work to prevent from corrupted state - e.g.
> > when suspended between two memory loads?
> 
> The usual trick when using MADV_WIPEONFORK, or BSD’s MAP_INHERIT_ZERO, is to
> store a guard variable in the page and to check the variable any time that
> random data is generated.

Well, MADV_WIPEONFORK is a completely different beast because the
forking is under a full control of the parent process and the
information about the fork can be forwarded to child process. It is
not like the child would reborn into a new world in the middle of the
execution.

-- 
Michal Hocko
SUSE Labs
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND

2020-07-07 Thread Michal Hocko
On Mon 06-07-20 14:52:07, Jann Horn wrote:
> On Mon, Jul 6, 2020 at 2:27 PM Alexander Graf  wrote:
> > Unless we create a vsyscall that returns both the PID as well as the
> > epoch and thus handles fork *and* suspend. I need to think about this a
> > bit more :).
> 
> You can't reliably detect forking by checking the PID if it is
> possible for multiple forks to be chained before the reuse check runs:
> 
>  - pid 1000 remembers its PID
>  - pid 1000 forks, creating child pid 1001
>  - pid 1000 exits and is waited on by init
>  - the pid allocator wraps around
>  - pid 1001 forks, creating child pid 1000
>  - child with pid 1000 tries to check for forking, determines that its
> PID is 1000, and concludes that it is still the original process

I must be really missing something here because I really fail to see why
there has to be something new even invented. Sure, checking for pid is
certainly a suboptimal solution because pids are terrible tokens to work
with. We do have a concept of file descriptors which a much better and
supports signaling. There is a clear source of the signal IIUC
(migration) and there are consumers to act upon that (e.g. crypto
backends). So what does really prevent to use a standard signal delivery
over fd for this usecase?
-- 
Michal Hocko
SUSE Labs
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND

2020-07-07 Thread Pavel Machek
Hi!

> > > > This patch adds logic to the kernel power code to zero out contents of
> > > > all MADV_WIPEONSUSPEND VMAs present in the system during its transition
> > > > to any suspend state equal or greater/deeper than Suspend-to-memory,
> > > > known as S3.
> > >
> > > How does the application learn that its memory got wiped? S2disk is an
> > > async operation and it can happen at any time during the task execution.
> > > So how does the application work to prevent from corrupted state - e.g.
> > > when suspended between two memory loads?
> > 
> > You can do it seqlock-style, kind of - you reserve the first byte of
> > the page or so as a "is this page initialized" marker, and after every
> > read from the page, you do a compiler barrier and check whether that
> > byte has been cleared.
> 
> This is certainly possible yet wery awkwar interface to use IMHO.
> MADV_EXTERNALY_VOLATILE would express the actual semantic much better.
> I might not still understand the expected usecase but if the target
> application has to be changed anyway then why not simply use a
> transparent and proper signaling mechanism like poll on a fd. That

The goal is to have cryprographically-safe get_random_number() with 0
syscalls.

You'd need to do:

   if (!poll(did_i_migrate)) {
 use_prng_seed();
 if (poll(did_i_migrate)) {
   /* oops_they_migrated_me_in_middle_of_computation,
  lets_redo_it() */
  goto retry:
 }
   }

Which means two syscalls..

Best regards,


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-07 Thread Pierre Morel
S390, protecting the guest memory against unauthorized host access
needs to enforce VIRTIO I/O device protection through the use of
VIRTIO_F_VERSION_1 and VIRTIO_F_IOMMU_PLATFORM.

Signed-off-by: Pierre Morel 
---
 arch/s390/kernel/uv.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index c296e5c8dbf9..106330f6eda1 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -413,3 +414,27 @@ static int __init uv_info_init(void)
 }
 device_initcall(uv_info_init);
 #endif
+
+/*
+ * arch_validate_virtio_iommu_platform
+ * @dev: the VIRTIO device being added
+ *
+ * Return value: returns -ENODEV if any features of the
+ *   device breaks the protected virtualization
+ *   0 otherwise.
+ */
+int arch_validate_virtio_features(struct virtio_device *dev)
+{
+   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
+   dev_warn(&dev->dev, "device must provide VIRTIO_F_VERSION_1\n");
+   return is_prot_virt_guest() ? -ENODEV : 0;
+   }
+
+   if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
+   dev_warn(&dev->dev,
+"device must provide VIRTIO_F_IOMMU_PLATFORM\n");
+   return is_prot_virt_guest() ? -ENODEV : 0;
+   }
+
+   return 0;
+}
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/2] virtio: let arch validate VIRTIO features

2020-07-07 Thread Pierre Morel
An architecture may need to validate the VIRTIO devices features
based on architecture specificities.

Signed-off-by: Pierre Morel 
---
 drivers/virtio/virtio.c   | 19 +++
 include/linux/virtio_config.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index a977e32a88f2..3179a8aa76f5 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, unsigned 
int status)
 }
 EXPORT_SYMBOL_GPL(virtio_add_status);
 
+/*
+ * arch_needs_virtio_iommu_platform - provide arch specific hook when 
finalizing
+ *   features for VIRTIO device dev
+ * @dev: the VIRTIO device being added
+ *
+ * Permits the platform to provide architecture specific functionality when
+ * devices features are finalized. This is the default implementation.
+ * Architecture implementations can override this.
+ */
+
+int __weak arch_validate_virtio_features(struct virtio_device *dev)
+{
+   return 0;
+}
+
 int virtio_finalize_features(struct virtio_device *dev)
 {
int ret = dev->config->finalize_features(dev);
@@ -176,6 +191,10 @@ int virtio_finalize_features(struct virtio_device *dev)
if (ret)
return ret;
 
+   ret = arch_validate_virtio_features(dev);
+   if (ret)
+   return ret;
+
if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
return 0;
 
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index bb4cc4910750..3f4117adf311 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -459,4 +459,5 @@ static inline void virtio_cwrite64(struct virtio_device 
*vdev,
_r; \
})
 
+int arch_validate_virtio_features(struct virtio_device *dev);
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 0/2] s390: virtio: let arch validate VIRTIO features

2020-07-07 Thread Pierre Morel
Hi all,

I changed the patch subject to reflect the content, becoming more
general.

1) I removed the ack from Christian and Jason even far as
   I understand they gave it for the functionality more than for the
   implementation.
   @Jason, @Christian, please can I get back your acked-by with these changes?

2) previous patch had another name:
   [PATCH v3 0/1] s390: virtio: let arch choose to accept devices without IOMMU 
feature
   id: Message-Id: <1592390637-17441-2-git-send-email-pmo...@linux.ibm.com>

3) The new version generalize the validation of the features by the
   architecture, making it not IOMMU_PLATFORM specific anymore inside
   virtio.c

   The architecture specific code for s390 is now testing the virtio
   features.

4) Since I reworked the patch I also moved the arch specific code
   from arch/s390/mm/init.c to arch/s390/kernel/to uv.c

5) Finaly, I splitted the patch into generic virtio and arch
   specific code.

Regards,
Pierre

Pierre Morel (2):
  virtio: let arch validate VIRTIO features
  s390: virtio: PV needs VIRTIO I/O device protection

 arch/s390/kernel/uv.c | 25 +
 drivers/virtio/virtio.c   | 19 +++
 include/linux/virtio_config.h |  1 +
 3 files changed, 45 insertions(+)

-- 
2.25.1

Changelog

to v4:

- separate virtio and arch code
  (Pierre)

- moved code from arch/s390/mm/init.c to arch/s390/kernel/uv.c
  (Heiko)

- moved validation inside the arch code
  (Connie)

- moved the call to arch validation before VIRTIO_F_1 test
  (Michael)

to v3:

- add warning
  (Connie, Christian)

- add comment
  (Connie)

- change hook name
  (Halil, Connie)

to v2:

- put the test in virtio_finalize_features()
  (Connie)

- put the test inside VIRTIO core
  (Jason)

- pass a virtio device as parameter
  (Halil)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND

2020-07-07 Thread Michal Hocko
On Tue 07-07-20 10:07:26, Pavel Machek wrote:
> Hi!
> 
> > > > > This patch adds logic to the kernel power code to zero out contents of
> > > > > all MADV_WIPEONSUSPEND VMAs present in the system during its 
> > > > > transition
> > > > > to any suspend state equal or greater/deeper than Suspend-to-memory,
> > > > > known as S3.
> > > >
> > > > How does the application learn that its memory got wiped? S2disk is an
> > > > async operation and it can happen at any time during the task execution.
> > > > So how does the application work to prevent from corrupted state - e.g.
> > > > when suspended between two memory loads?
> > > 
> > > You can do it seqlock-style, kind of - you reserve the first byte of
> > > the page or so as a "is this page initialized" marker, and after every
> > > read from the page, you do a compiler barrier and check whether that
> > > byte has been cleared.
> > 
> > This is certainly possible yet wery awkwar interface to use IMHO.
> > MADV_EXTERNALY_VOLATILE would express the actual semantic much better.
> > I might not still understand the expected usecase but if the target
> > application has to be changed anyway then why not simply use a
> > transparent and proper signaling mechanism like poll on a fd. That
> 
> The goal is to have cryprographically-safe get_random_number() with 0
> syscalls.
> 
> You'd need to do:
> 
>if (!poll(did_i_migrate)) {
>  use_prng_seed();
>if (poll(did_i_migrate)) {
>  /* oops_they_migrated_me_in_middle_of_computation,
> lets_redo_it() */
> goto retry:
>}
>}
> 
> Which means two syscalls..

Is this a real problem though? Do we have any actual numbers? E.g. how
often does the migration happen so that 2 syscalls would be visible in
actual workloads?
-- 
Michal Hocko
SUSE Labs
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND

2020-07-07 Thread Michal Hocko
On Tue 07-07-20 10:01:23, Alexander Graf wrote:
> On 07.07.20 09:44, Michal Hocko wrote:
> > On Mon 06-07-20 14:52:07, Jann Horn wrote:
> > > On Mon, Jul 6, 2020 at 2:27 PM Alexander Graf  wrote:
> > > > Unless we create a vsyscall that returns both the PID as well as the
> > > > epoch and thus handles fork *and* suspend. I need to think about this a
> > > > bit more :).
> > > 
> > > You can't reliably detect forking by checking the PID if it is
> > > possible for multiple forks to be chained before the reuse check runs:
> > > 
> > >   - pid 1000 remembers its PID
> > >   - pid 1000 forks, creating child pid 1001
> > >   - pid 1000 exits and is waited on by init
> > >   - the pid allocator wraps around
> > >   - pid 1001 forks, creating child pid 1000
> > >   - child with pid 1000 tries to check for forking, determines that its
> > > PID is 1000, and concludes that it is still the original process
> > 
> > I must be really missing something here because I really fail to see why
> > there has to be something new even invented. Sure, checking for pid is
> > certainly a suboptimal solution because pids are terrible tokens to work
> > with. We do have a concept of file descriptors which a much better and
> > supports signaling. There is a clear source of the signal IIUC
> > (migration) and there are consumers to act upon that (e.g. crypto
> > backends). So what does really prevent to use a standard signal delivery
> > over fd for this usecase?
> 
> I wasn't part of the discussions on why things like WIPEONFORK were invented
> instead of just using signalling mechanisms, but the main reason I can think
> of are libraries.

Well, I would argue that WIPEONFORK is conceptually different. It is
one time initialization mechanism with a very clear life time semantic.
So any programming model is really as easy as, the initial state is
always 0 for a new task without any surprises later on because you own
the memory (essentially an extension to initialized .data section on
exec to any new task).

Compare that to a completely async nature of this interface. Any read
would essentially have to be properly synchronized with the external
event otherwise the state could have been corrupted. Such a consistency
model is really cumbersome to work with.

> As a library, you are under no control of the main loop usually, which means
> you just don't have a way to poll for an fd. As a library author, I would
> usually try to avoid very hard to create such a dependency, because it makes
> it really hard to glue pieces together.
> 
> The same applies to signals btw, which would also be a possible way to
> propagate such events.

Just to clarify I didn't really mean posix signals here. Those would be
quite clumsy indeed. But I can imagine that a library registers to a
system wide means to get a notification. There are many examples for
that, including a lot of usage inside libraries. All different *bus
interfaces.

-- 
Michal Hocko
SUSE Labs
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/2] virtio: let arch validate VIRTIO features

2020-07-07 Thread Cornelia Huck
On Tue,  7 Jul 2020 10:44:36 +0200
Pierre Morel  wrote:

> An architecture may need to validate the VIRTIO devices features
> based on architecture specificities.

s/specifities/specifics/

> 
> Signed-off-by: Pierre Morel 
> ---
>  drivers/virtio/virtio.c   | 19 +++
>  include/linux/virtio_config.h |  1 +
>  2 files changed, 20 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index a977e32a88f2..3179a8aa76f5 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, 
> unsigned int status)
>  }
>  EXPORT_SYMBOL_GPL(virtio_add_status);
>  
> +/*
> + * arch_needs_virtio_iommu_platform - provide arch specific hook when 
> finalizing

s/arch_needs_virtio_iommu_platform/arch_validate_virtio_features/

:)

> + * features for VIRTIO device dev
> + * @dev: the VIRTIO device being added
> + *
> + * Permits the platform to provide architecture specific functionality when

s/provide architecture specific functionality/handle architecture-specific 
requirements/

?

> + * devices features are finalized. This is the default implementation.

s/devices/device/

> + * Architecture implementations can override this.
> + */
> +
> +int __weak arch_validate_virtio_features(struct virtio_device *dev)
> +{
> + return 0;
> +}
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>   int ret = dev->config->finalize_features(dev);
> @@ -176,6 +191,10 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (ret)
>   return ret;
>  
> + ret = arch_validate_virtio_features(dev);
> + if (ret)
> + return ret;
> +
>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>   return 0;
>  
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index bb4cc4910750..3f4117adf311 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -459,4 +459,5 @@ static inline void virtio_cwrite64(struct virtio_device 
> *vdev,
>   _r; \
>   })
>  
> +int arch_validate_virtio_features(struct virtio_device *dev);
>  #endif /* _LINUX_VIRTIO_CONFIG_H */

With the wording fixed,

Reviewed-by: Cornelia Huck 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-07 Thread Cornelia Huck
On Tue,  7 Jul 2020 10:44:37 +0200
Pierre Morel  wrote:

> S390, protecting the guest memory against unauthorized host access
> needs to enforce VIRTIO I/O device protection through the use of
> VIRTIO_F_VERSION_1 and VIRTIO_F_IOMMU_PLATFORM.

Hm... what about:

"If protected virtualization is active on s390, the virtio queues are
not accessible to the host, unless VIRTIO_F_IOMMU_PLATFORM has been
negotiated. Use the new arch_validate_virtio_features() interface to
enforce this."

> 
> Signed-off-by: Pierre Morel 
> ---
>  arch/s390/kernel/uv.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index c296e5c8dbf9..106330f6eda1 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -413,3 +414,27 @@ static int __init uv_info_init(void)
>  }
>  device_initcall(uv_info_init);
>  #endif
> +
> +/*
> + * arch_validate_virtio_iommu_platform

s/arch_validate_virtio_iommu_platform/arch_validate_virtio_features/

> + * @dev: the VIRTIO device being added
> + *
> + * Return value: returns -ENODEV if any features of the
> + *   device breaks the protected virtualization
> + *   0 otherwise.

I don't think you need to specify the contract here: that belongs to
the definition in the virtio core. What about simply adding a sentence
"Return an error if required features are missing on a guest running
with protected virtualization." ?

> + */
> +int arch_validate_virtio_features(struct virtio_device *dev)
> +{

Maybe jump out immediately if the guest is not protected?

> + if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> + dev_warn(&dev->dev, "device must provide VIRTIO_F_VERSION_1\n");
> + return is_prot_virt_guest() ? -ENODEV : 0;
> + }
> +
> + if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> + dev_warn(&dev->dev,
> +  "device must provide VIRTIO_F_IOMMU_PLATFORM\n");
> + return is_prot_virt_guest() ? -ENODEV : 0;
> + }

if (!is_prot_virt_guest())
return 0;

if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
dev_warn(&dev->dev,
 "legacy virtio is incompatible with protected guests");
return -ENODEV;
}

if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
dev_warn(&dev->dev,
 "device does not work with limited memory access in protected 
guests");
return -ENODEV;
}

> +
> + return 0;
> +}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-07-07 Thread Jason Wang


On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:

Hi Jason,

On 7/3/2020 12:46 PM, Jason Wang wrote:

On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:

Hi Jason,

On 7/2/2020 3:40 PM, Jason Wang wrote:

On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:

On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:

This series enhances Linux Vhost support to enable SoC-to-SoC
communication over MMIO. This series enables rpmsg communication between
two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2

1) Modify vhost to use standard Linux driver model
2) Add support in vring to access virtqueue over MMIO
3) Add vhost client driver for rpmsg
4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
  rpmsg communication between two SoCs connected to each other
5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
  between two SoCs connected via NTB
6) Add configfs to configure the components

UseCase1 :

    VHOST RPMSG VIRTIO RPMSG
     +   +
     |   |
     |   |
     |   |
     |   |
+-v--+ +--v---+
|   Linux    | | Linux    |
|  Endpoint  | | Root Complex |
|    <->  |
|    | |  |
|    SOC1    | | SOC2 |
++ +--+

UseCase 2:

    VHOST RPMSG  VIRTIO RPMSG
     + +
     | |
     | |
     | |
     | |
  +--v--+   +--v--+
  | |   | |
  |    HOST1    |   |    HOST2    |
  | |   | |
  +--^--+   +--^--+
     | |
     | |
+-+
|  +--v--+   +--v--+  |
|  | |   | |  |
|  | EP  |   | EP  |  |
|  | CONTROLLER1 |   | CONTROLLER2 |  |
|  | <---> |  |
|  | |   | |  |
|  | |   | |  |
|  | |  SoC With Multiple EP Instances   | |  |
|  | |  (Configured using NTB Function)  | |  |
|  +-+   +-+  |
+-+

Software Layering:

The high-level SW layering should look something like below. This series
adds support only for RPMSG VHOST, however something similar should be
done for net and scsi. With that any vhost device (PCI, NTB, Platform
device, user) can use any of the vhost client driver.


   ++  +---+  ++  +--+
   |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X |
   +---^+  +-^-+  +-^--+  +^-+
   | |  |  |
   | |  |  |
   | |  |  |
+---v-v--v--v--+
|    VHOST CORE    |
+^---^^--^-+
    |   |    |  |
    |   |    |  |
    |   |    |  |
+v---+  +v--+  +--v--+  +v-+
|  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X |
++  +---+  +-+  +--+

This was initially proposed here [1]

[1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6...@ti.com

I find this very interesting. A huge patchset so will take a bit
to review, but I certainly plan to do that. Thanks!

Yes, it would be better if there's a git branch for us to have a look.

I've pushed the br

Re: [PATCH v4 1/2] virtio: let arch validate VIRTIO features

2020-07-07 Thread Pierre Morel




On 2020-07-07 11:26, Cornelia Huck wrote:

On Tue,  7 Jul 2020 10:44:36 +0200
Pierre Morel  wrote:


An architecture may need to validate the VIRTIO devices features
based on architecture specificities.


s/specifities/specifics/


OK





Signed-off-by: Pierre Morel 
---
  drivers/virtio/virtio.c   | 19 +++
  include/linux/virtio_config.h |  1 +
  2 files changed, 20 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index a977e32a88f2..3179a8aa76f5 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, unsigned 
int status)
  }
  EXPORT_SYMBOL_GPL(virtio_add_status);
  
+/*

+ * arch_needs_virtio_iommu_platform - provide arch specific hook when 
finalizing


s/arch_needs_virtio_iommu_platform/arch_validate_virtio_features/

:)


grrr... yes.




+ *   features for VIRTIO device dev
+ * @dev: the VIRTIO device being added
+ *
+ * Permits the platform to provide architecture specific functionality when


s/provide architecture specific functionality/handle architecture-specific 
requirements/

?


better, thanks.




+ * devices features are finalized. This is the default implementation.


s/devices/device/


yes.




+ * Architecture implementations can override this.
+ */
+
+int __weak arch_validate_virtio_features(struct virtio_device *dev)
+{
+   return 0;
+}
+
  int virtio_finalize_features(struct virtio_device *dev)
  {
int ret = dev->config->finalize_features(dev);
@@ -176,6 +191,10 @@ int virtio_finalize_features(struct virtio_device *dev)
if (ret)
return ret;
  
+	ret = arch_validate_virtio_features(dev);

+   if (ret)
+   return ret;
+
if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
return 0;
  
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h

index bb4cc4910750..3f4117adf311 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -459,4 +459,5 @@ static inline void virtio_cwrite64(struct virtio_device 
*vdev,
_r; \
})
  
+int arch_validate_virtio_features(struct virtio_device *dev);

  #endif /* _LINUX_VIRTIO_CONFIG_H */


With the wording fixed,

Reviewed-by: Cornelia Huck 



Thanks for the review.

regards,
Pierre


--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-07 Thread Pierre Morel




On 2020-07-07 11:46, Cornelia Huck wrote:

On Tue,  7 Jul 2020 10:44:37 +0200
Pierre Morel  wrote:


S390, protecting the guest memory against unauthorized host access
needs to enforce VIRTIO I/O device protection through the use of
VIRTIO_F_VERSION_1 and VIRTIO_F_IOMMU_PLATFORM.


Hm... what about:

"If protected virtualization is active on s390, the virtio queues are
not accessible to the host, unless VIRTIO_F_IOMMU_PLATFORM has been
negotiated. Use the new arch_validate_virtio_features() interface to
enforce this."


Yes, thanks.






Signed-off-by: Pierre Morel 
---
  arch/s390/kernel/uv.c | 25 +
  1 file changed, 25 insertions(+)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index c296e5c8dbf9..106330f6eda1 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -413,3 +414,27 @@ static int __init uv_info_init(void)
  }
  device_initcall(uv_info_init);
  #endif
+
+/*
+ * arch_validate_virtio_iommu_platform


s/arch_validate_virtio_iommu_platform/arch_validate_virtio_features/


+ * @dev: the VIRTIO device being added
+ *
+ * Return value: returns -ENODEV if any features of the
+ *   device breaks the protected virtualization
+ *   0 otherwise.


I don't think you need to specify the contract here: that belongs to
the definition in the virtio core. What about simply adding a sentence
"Return an error if required features are missing on a guest running
with protected virtualization." ?


OK, right.




+ */
+int arch_validate_virtio_features(struct virtio_device *dev)
+{


Maybe jump out immediately if the guest is not protected?


+   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
+   dev_warn(&dev->dev, "device must provide VIRTIO_F_VERSION_1\n");
+   return is_prot_virt_guest() ? -ENODEV : 0;
+   }
+
+   if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
+   dev_warn(&dev->dev,
+"device must provide VIRTIO_F_IOMMU_PLATFORM\n");
+   return is_prot_virt_guest() ? -ENODEV : 0;
+   }


if (!is_prot_virt_guest())
return 0;

if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
dev_warn(&dev->dev,
  "legacy virtio is incompatible with protected guests");
return -ENODEV;
}

if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
dev_warn(&dev->dev,
 "device does not work with limited memory access in protected 
guests");
return -ENODEV;
}


Yes, easier to read.

Thanks,
Pierre


--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/2] virtio: let arch validate VIRTIO features

2020-07-07 Thread Christian Borntraeger



On 07.07.20 11:26, Cornelia Huck wrote:
> On Tue,  7 Jul 2020 10:44:36 +0200
> Pierre Morel  wrote:
> 
>> An architecture may need to validate the VIRTIO devices features
>> based on architecture specificities.
> 
> s/specifities/specifics/
> 
>>
>> Signed-off-by: Pierre Morel 
>> ---
>>  drivers/virtio/virtio.c   | 19 +++
>>  include/linux/virtio_config.h |  1 +
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index a977e32a88f2..3179a8aa76f5 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, 
>> unsigned int status)
>>  }
>>  EXPORT_SYMBOL_GPL(virtio_add_status);
>>  
>> +/*
>> + * arch_needs_virtio_iommu_platform - provide arch specific hook when 
>> finalizing
> 
> s/arch_needs_virtio_iommu_platform/arch_validate_virtio_features/

With the things from Conny fixed,

Acked-by: Christian Borntraeger 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-07 Thread Christian Borntraeger



On 07.07.20 10:44, Pierre Morel wrote:
> S390, protecting the guest memory against unauthorized host access
> needs to enforce VIRTIO I/O device protection through the use of
> VIRTIO_F_VERSION_1 and VIRTIO_F_IOMMU_PLATFORM.
> 
> Signed-off-by: Pierre Morel 
> ---
>  arch/s390/kernel/uv.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index c296e5c8dbf9..106330f6eda1 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -413,3 +414,27 @@ static int __init uv_info_init(void)
>  }
>  device_initcall(uv_info_init);
>  #endif
> +
> +/*
> + * arch_validate_virtio_iommu_platform
> + * @dev: the VIRTIO device being added
> + *
> + * Return value: returns -ENODEV if any features of the
> + *   device breaks the protected virtualization
> + *   0 otherwise.
> + */
> +int arch_validate_virtio_features(struct virtio_device *dev)
> +{
> + if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> + dev_warn(&dev->dev, "device must provide VIRTIO_F_VERSION_1\n");

I think you only want to warn if is_prot_virt_guest is true? We certainly
want to be able to run as a guest of older hypervisors with virtio 0.95, no?


> + return is_prot_virt_guest() ? -ENODEV : 0;
> + }
> +
> + if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> + dev_warn(&dev->dev,
> +  "device must provide VIRTIO_F_IOMMU_PLATFORM\n");

same here. 
> + return is_prot_virt_guest() ? -ENODEV : 0;
> + }
> +
> + return 0;
> +}
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-07 Thread Michael S. Tsirkin
On Tue, Jul 07, 2020 at 11:46:33AM +0200, Cornelia Huck wrote:
> On Tue,  7 Jul 2020 10:44:37 +0200
> Pierre Morel  wrote:
> 
> > S390, protecting the guest memory against unauthorized host access
> > needs to enforce VIRTIO I/O device protection through the use of
> > VIRTIO_F_VERSION_1 and VIRTIO_F_IOMMU_PLATFORM.
> 
> Hm... what about:
> 
> "If protected virtualization is active on s390, the virtio queues are
> not accessible to the host, unless VIRTIO_F_IOMMU_PLATFORM has been
> negotiated. Use the new arch_validate_virtio_features() interface to
> enforce this."

s/enforce this/fail probe if that's not the case, preventing a host error on 
access attempt/



> > 
> > Signed-off-by: Pierre Morel 
> > ---
> >  arch/s390/kernel/uv.c | 25 +
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> > index c296e5c8dbf9..106330f6eda1 100644
> > --- a/arch/s390/kernel/uv.c
> > +++ b/arch/s390/kernel/uv.c
> > @@ -14,6 +14,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -413,3 +414,27 @@ static int __init uv_info_init(void)
> >  }
> >  device_initcall(uv_info_init);
> >  #endif
> > +
> > +/*
> > + * arch_validate_virtio_iommu_platform
> 
> s/arch_validate_virtio_iommu_platform/arch_validate_virtio_features/
> 
> > + * @dev: the VIRTIO device being added
> > + *
> > + * Return value: returns -ENODEV if any features of the
> > + *   device breaks the protected virtualization
> > + *   0 otherwise.
> 
> I don't think you need to specify the contract here: that belongs to
> the definition in the virtio core. What about simply adding a sentence
> "Return an error if required features are missing on a guest running
> with protected virtualization." ?
> 
> > + */
> > +int arch_validate_virtio_features(struct virtio_device *dev)
> > +{
> 
> Maybe jump out immediately if the guest is not protected?
> 
> > +   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> > +   dev_warn(&dev->dev, "device must provide VIRTIO_F_VERSION_1\n");
> > +   return is_prot_virt_guest() ? -ENODEV : 0;
> > +   }
> > +
> > +   if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> > +   dev_warn(&dev->dev,
> > +"device must provide VIRTIO_F_IOMMU_PLATFORM\n");
> > +   return is_prot_virt_guest() ? -ENODEV : 0;
> > +   }
> 
> if (!is_prot_virt_guest())
>   return 0;
> 
> if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
>   dev_warn(&dev->dev,
>  "legacy virtio is incompatible with protected guests");
>   return -ENODEV;
> }
> 
> if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
>   dev_warn(&dev->dev,
>"device does not work with limited memory access in protected 
> guests");
>   return -ENODEV;
> }
> 
> > +
> > +   return 0;
> > +}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-07 Thread Pierre Morel




On 2020-07-07 13:12, Christian Borntraeger wrote:



On 07.07.20 10:44, Pierre Morel wrote:

S390, protecting the guest memory against unauthorized host access
needs to enforce VIRTIO I/O device protection through the use of
VIRTIO_F_VERSION_1 and VIRTIO_F_IOMMU_PLATFORM.

Signed-off-by: Pierre Morel 
---
  arch/s390/kernel/uv.c | 25 +
  1 file changed, 25 insertions(+)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index c296e5c8dbf9..106330f6eda1 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -413,3 +414,27 @@ static int __init uv_info_init(void)
  }
  device_initcall(uv_info_init);
  #endif
+
+/*
+ * arch_validate_virtio_iommu_platform
+ * @dev: the VIRTIO device being added
+ *
+ * Return value: returns -ENODEV if any features of the
+ *   device breaks the protected virtualization
+ *   0 otherwise.
+ */
+int arch_validate_virtio_features(struct virtio_device *dev)
+{
+   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
+   dev_warn(&dev->dev, "device must provide VIRTIO_F_VERSION_1\n");


I think you only want to warn if is_prot_virt_guest is true? We certainly
want to be able to run as a guest of older hypervisors with virtio 0.95, no?


clear, yes.
I will first check for PV as Connie sugested.





+   return is_prot_virt_guest() ? -ENODEV : 0;
+   }
+
+   if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
+   dev_warn(&dev->dev,
+"device must provide VIRTIO_F_IOMMU_PLATFORM\n");


same here.


Yes,
Thanks,

Pierre

--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/2] virtio: let arch validate VIRTIO features

2020-07-07 Thread Pierre Morel




On 2020-07-07 13:09, Christian Borntraeger wrote:



On 07.07.20 11:26, Cornelia Huck wrote:

On Tue,  7 Jul 2020 10:44:36 +0200
Pierre Morel  wrote:


An architecture may need to validate the VIRTIO devices features
based on architecture specificities.


s/specifities/specifics/


yes





Signed-off-by: Pierre Morel 
---
  drivers/virtio/virtio.c   | 19 +++
  include/linux/virtio_config.h |  1 +
  2 files changed, 20 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index a977e32a88f2..3179a8aa76f5 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, unsigned 
int status)
  }
  EXPORT_SYMBOL_GPL(virtio_add_status);
  
+/*

+ * arch_needs_virtio_iommu_platform - provide arch specific hook when 
finalizing


s/arch_needs_virtio_iommu_platform/arch_validate_virtio_features/


With the things from Conny fixed,

Acked-by: Christian Borntraeger 



Thanks,
Pierre

--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-07 Thread Pierre Morel




On 2020-07-07 13:14, Michael S. Tsirkin wrote:

On Tue, Jul 07, 2020 at 11:46:33AM +0200, Cornelia Huck wrote:

On Tue,  7 Jul 2020 10:44:37 +0200
Pierre Morel  wrote:


S390, protecting the guest memory against unauthorized host access
needs to enforce VIRTIO I/O device protection through the use of
VIRTIO_F_VERSION_1 and VIRTIO_F_IOMMU_PLATFORM.


Hm... what about:

"If protected virtualization is active on s390, the virtio queues are
not accessible to the host, unless VIRTIO_F_IOMMU_PLATFORM has been
negotiated. Use the new arch_validate_virtio_features() interface to
enforce this."


s/enforce this/fail probe if that's not the case, preventing a host error on 
access attempt/



yes, more complete, thanks.

regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-07 Thread Halil Pasic
On Tue, 7 Jul 2020 12:38:17 +0200
Pierre Morel  wrote:

> 
> 
> On 2020-07-07 11:46, Cornelia Huck wrote:
> > On Tue,  7 Jul 2020 10:44:37 +0200
> > Pierre Morel  wrote:
> > 
> >> S390, protecting the guest memory against unauthorized host access
> >> needs to enforce VIRTIO I/O device protection through the use of
> >> VIRTIO_F_VERSION_1 and VIRTIO_F_IOMMU_PLATFORM.
> > 
> > Hm... what about:
> > 
> > "If protected virtualization is active on s390, the virtio queues are
> > not accessible to the host, unless VIRTIO_F_IOMMU_PLATFORM has been
> > negotiated. Use the new arch_validate_virtio_features() interface to
> > enforce this."
> 
> Yes, thanks.
> 
> 
> > 
> >>
> >> Signed-off-by: Pierre Morel 
> >> ---
> >>   arch/s390/kernel/uv.c | 25 +

Is this the right place to put this stuff? This file seems to be about
implementing the interface for interacting with the ultravisor. I would
rather expect something like arch/s390/kernel/virtio.c 

Should we ever get arch hooks for balloon those could go in
arch/s390/kernel/virtio.c as well.

> >>   1 file changed, 25 insertions(+)
> >>
> >> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> >> index c296e5c8dbf9..106330f6eda1 100644
> >> --- a/arch/s390/kernel/uv.c
> >> +++ b/arch/s390/kernel/uv.c
> >> @@ -14,6 +14,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>   #include 
> >>   #include 
> >> @@ -413,3 +414,27 @@ static int __init uv_info_init(void)
> >>   }
> >>   device_initcall(uv_info_init);
> >>   #endif
> >> +
> >> +/*
> >> + * arch_validate_virtio_iommu_platform
> > 
> > s/arch_validate_virtio_iommu_platform/arch_validate_virtio_features/
> > 
> >> + * @dev: the VIRTIO device being added
> >> + *
> >> + * Return value: returns -ENODEV if any features of the
> >> + *   device breaks the protected virtualization
> >> + *   0 otherwise.
> > 
> > I don't think you need to specify the contract here: that belongs to
> > the definition in the virtio core. What about simply adding a sentence
> > "Return an error if required features are missing on a guest running
> > with protected virtualization." ?
> 
> OK, right.
> 
> > 
> >> + */
> >> +int arch_validate_virtio_features(struct virtio_device *dev)
> >> +{
> > 
> > Maybe jump out immediately if the guest is not protected?
> > 
> >> +  if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> >> +  dev_warn(&dev->dev, "device must provide VIRTIO_F_VERSION_1\n");
> >> +  return is_prot_virt_guest() ? -ENODEV : 0;
> >> +  }
> >> +
> >> +  if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> >> +  dev_warn(&dev->dev,
> >> +   "device must provide VIRTIO_F_IOMMU_PLATFORM\n");
> >> +  return is_prot_virt_guest() ? -ENODEV : 0;
> >> +  }
> > 
> > if (!is_prot_virt_guest())
> > return 0;
> > 
> > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
> > dev_warn(&dev->dev,
> >   "legacy virtio is incompatible with protected guests");
> > return -ENODEV;
> > }
> > 
> > if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> > dev_warn(&dev->dev,
> >  "device does not work with limited memory access in protected 
> > guests");
> > return -ENODEV;
> > }
> 
> Yes, easier to read.
> 

Not only easier to read but does not produce warnings
if !is_prot_virt_guest(). I strongly prefer the variant proposed by
Connie.

Otherwise LGTM.

Regards,
Halil
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND

2020-07-07 Thread Pavel Machek
Hi!

> > > > You can do it seqlock-style, kind of - you reserve the first byte of
> > > > the page or so as a "is this page initialized" marker, and after every
> > > > read from the page, you do a compiler barrier and check whether that
> > > > byte has been cleared.
> > > 
> > > This is certainly possible yet wery awkwar interface to use IMHO.
> > > MADV_EXTERNALY_VOLATILE would express the actual semantic much better.
> > > I might not still understand the expected usecase but if the target
> > > application has to be changed anyway then why not simply use a
> > > transparent and proper signaling mechanism like poll on a fd. That
> > 
> > The goal is to have cryprographically-safe get_random_number() with 0
> > syscalls.
> > 
> > You'd need to do:
> > 
> >if (!poll(did_i_migrate)) {
> >  use_prng_seed();
> >  if (poll(did_i_migrate)) {
> >/* oops_they_migrated_me_in_middle_of_computation,
> >   lets_redo_it() */
> >   goto retry:
> >  }
> >}
> > 
> > Which means two syscalls..
> 
> Is this a real problem though? Do we have any actual numbers? E.g. how
> often does the migration happen so that 2 syscalls would be visible in
> actual workloads?

Please go through the thread and try to understand it.

You'd need syscalls per get_randomness(), not per migration.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 18/18] arm64: lto: Strengthen READ_ONCE() to acquire when CLANG_LTO=y

2020-07-07 Thread Nick Desaulniers via Virtualization
I'm trying to put together a Micro Conference for Linux Plumbers
conference focused on "make LLVM slightly less shitty."  Do you all
plan on attending the conference? Would it be worthwhile to hold a
session focused on discussing this (LTO and memory models) be
worthwhile?


On Tue, Jul 7, 2020 at 3:51 PM Paul E. McKenney  wrote:
>
> On Tue, Jul 07, 2020 at 11:29:15AM +0100, Dave Martin wrote:
> > On Mon, Jul 06, 2020 at 10:36:28AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jul 06, 2020 at 06:05:57PM +0100, Dave Martin wrote:
>
> [ . . . ]
>
> > > > The underlying problem here seems to be that the necessary ordering
> > > > rule is not part of what passes for the C memory model prior to C11.
> > > > If we want to control the data flow, don't we have to wrap the entire
> > > > dereference in a macro?
> > >
> > > Yes, exactly.  Because we are relying on things that are not guaranteed
> > > by the C memory model, we need to pay attention to the implementations.
> > > As I have said elsewhere, the price of control dependencies is eternal
> > > vigilance.
> > >
> > > And this also applies, to a lesser extent, to address and data
> > > dependencies, which are also not well supported by the C standard.
> > >
> > > There is one important case in which the C memory model -does- support
> > > control dependencies, and that is when the dependent write is a normal
> > > C-language write that is not involved in a data race.  In that case,
> > > if the compiler broke the control dependency, it might have introduced
> > > a data race, which it is forbidden to do.  However, this rule can also
> > > be broken when the compiler knows too much, as it might be able to prove
> > > that breaking the dependency won't introduce a data race.  In that case,
> > > according to the standard, it is free to break the dependency.
> >
> > Which only matters because the C abstract machine may not match reality.
> >
> > LTO has no bearing on the abstract machine though.
> >
> > If specific compiler options etc. can be added to inhibit the
> > problematic optimisations, that would be ideal.  I guess that can't
> > happen overnight though.
>
> Sadly, I must agree.
>
> > > > > > > We likely won't realise if/when this goes wrong, other than 
> > > > > > > impossible to
> > > > > > > debug, subtle breakage that crops up seemingly randomly. Ideally, 
> > > > > > > we'd be
> > > > > > > able to detect this sort of thing happening at build time, and 
> > > > > > > perhaps
> > > > > > > even prevent it with compiler options or annotations, but none of 
> > > > > > > that is
> > > > > > > close to being available and I'm keen to progress the LTO patches 
> > > > > > > in the
> > > > > > > meantime because they are a requirement for CFI.
> > > > > >
> > > > > > My concern was not so much why LTO makes things dangerous, as why 
> > > > > > !LTO
> > > > > > makes things safe...
> > > > >
> > > > > Because ignorant compilers are safe compilers!  ;-)
> > > >
> > > > AFAICT ignorance is no gurantee of ordering in general -- the compiler
> > > > is free to speculatively invent knowledge any place that the language
> > > > spec allows it to.  !LTO doesn't stop this happening.
> > >
> > > Agreed, according to the standard, the compiler has great freedom.
> > >
> > > We have two choices: (1) Restrict ourselves to live within the confines of
> > > the standard or (2) Pay continued close attention to the implementation.
> > > We have made different choices at different times, but for many ordering
> > > situations we have gone with door #2.
> > >
> > > Me, I have been working to get the standard to better support our
> > > use case.  This is at best slow going.  But don't take my word for it,
> > > ask Will.
> >
> > I can believe it.  They want to enable optimisations rather than prevent
> > them...
>
> Right in one!  ;-)
>
> > > > Hopefully some of the knowledge I invented in my reply is valid...
> > >
> > > It is.  It is just that there are multiple valid strategies, and the
> > > Linux kernel is currently taking a mixed-strategy approach.
> >
> > Ack.  The hope that there is a correct way to fix everything dies
> > hard ;)
>
> Either that, or one slowly degrades ones definition of "correct".  :-/
>
> > Life was cosier before I started trying to reason about language specs.
>
> Same here!
>
> Thanx, Paul



-- 
Thanks,
~Nick Desaulniers
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-07 Thread Waiman Long

On 7/7/20 1:57 AM, Nicholas Piggin wrote:

Yes, powerpc could certainly get more performance out of the slow
paths, and then there are a few parameters to tune.

We don't have a good alternate patching for function calls yet, but
that would be something to do for native vs pv.

And then there seem to be one or two tunable parameters we could
experiment with.

The paravirt locks may need a bit more tuning. Some simple testing
under KVM shows we might be a bit slower in some cases. Whether this
is fairness or something else I'm not sure. The current simple pv
spinlock code can do a directed yield to the lock holder CPU, whereas
the pv qspl here just does a general yield. I think we might actually
be able to change that to also support directed yield. Though I'm
not sure if this is actually the cause of the slowdown yet.


Regarding the paravirt lock, I have taken a further look into the 
current PPC spinlock code. There is an equivalent of pv_wait() but no 
pv_kick(). Maybe PPC doesn't really need that. Attached are two 
additional qspinlock patches that adds a CONFIG_PARAVIRT_QSPINLOCKS_LITE 
option to not require pv_kick(). There is also a fixup patch to be 
applied after your patchset.


I don't have access to a PPC LPAR with shared processor at the moment, 
so I can't test the performance of the paravirt code. Would you mind 
adding my patches and do some performance test on your end to see if it 
gives better result?


Thanks,
Longman

>From 161e545523a7eb4c42c145c04e9a5a15903ba3d9 Mon Sep 17 00:00:00 2001
From: Waiman Long 
Date: Tue, 7 Jul 2020 20:46:51 -0400
Subject: [PATCH 1/9] locking/pvqspinlock: Code relocation and extraction

Move pv_kick_node() and the unlock functions up and extract out the hash
and lock code from pv_wait_head_or_lock() into pv_hash_lock(). There
is no functional change.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock_paravirt.h | 302 ++--
 1 file changed, 156 insertions(+), 146 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index e84d21aa0722..8eec58320b85 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -55,6 +55,7 @@ struct pv_node {
 
 /*
  * Hybrid PV queued/unfair lock
+ * 
  *
  * By replacing the regular queued_spin_trylock() with the function below,
  * it will be called once when a lock waiter enter the PV slowpath before
@@ -259,6 +260,156 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
 	BUG();
 }
 
+/*
+ * Insert lock into hash and set _Q_SLOW_VAL.
+ * Return true if lock acquired.
+ */
+static inline bool pv_hash_lock(struct qspinlock *lock, struct pv_node *node)
+{
+	struct qspinlock **lp = pv_hash(lock, node);
+
+	/*
+	 * We must hash before setting _Q_SLOW_VAL, such that
+	 * when we observe _Q_SLOW_VAL in __pv_queued_spin_unlock()
+	 * we'll be sure to be able to observe our hash entry.
+	 *
+	 *   [S]  [Rmw] l->locked == _Q_SLOW_VAL
+	 *   MB   RMB
+	 * [RmW] l->locked = _Q_SLOW_VAL  [L] 
+	 *
+	 * Matches the smp_rmb() in __pv_queued_spin_unlock().
+	 */
+	if (xchg(&lock->locked, _Q_SLOW_VAL) == 0) {
+		/*
+		 * The lock was free and now we own the lock.
+		 * Change the lock value back to _Q_LOCKED_VAL
+		 * and unhash the table.
+		 */
+		WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
+		WRITE_ONCE(*lp, NULL);
+		return true;
+	}
+	return false;
+}
+
+/*
+ * Called after setting next->locked = 1 when we're the lock owner.
+ *
+ * Instead of waking the waiters stuck in pv_wait_node() advance their state
+ * such that they're waiting in pv_wait_head_or_lock(), this avoids a
+ * wake/sleep cycle.
+ */
+static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
+{
+	struct pv_node *pn = (struct pv_node *)node;
+
+	/*
+	 * If the vCPU is indeed halted, advance its state to match that of
+	 * pv_wait_node(). If OTOH this fails, the vCPU was running and will
+	 * observe its next->locked value and advance itself.
+	 *
+	 * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
+	 *
+	 * The write to next->locked in arch_mcs_spin_unlock_contended()
+	 * must be ordered before the read of pn->state in the cmpxchg()
+	 * below for the code to work correctly. To guarantee full ordering
+	 * irrespective of the success or failure of the cmpxchg(),
+	 * a relaxed version with explicit barrier is used. The control
+	 * dependency will order the reading of pn->state before any
+	 * subsequent writes.
+	 */
+	smp_mb__before_atomic();
+	if (cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_hashed)
+	!= vcpu_halted)
+		return;
+
+	/*
+	 * Put the lock into the hash table and set the _Q_SLOW_VAL.
+	 *
+	 * As this is the same vCPU that will check the _Q_SLOW_VAL value and
+	 * the hash table later on at unlock time, no atomic instruction is
+	 * needed.
+	 */
+	WRITE_ONCE(lock->locked, _Q_SLOW_VAL);
+	(void)pv_hash(lock, pn);
+}
+
+/*
+ * PV v

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-07 Thread Nicholas Piggin
Excerpts from Waiman Long's message of July 8, 2020 1:33 pm:
> On 7/7/20 1:57 AM, Nicholas Piggin wrote:
>> Yes, powerpc could certainly get more performance out of the slow
>> paths, and then there are a few parameters to tune.
>>
>> We don't have a good alternate patching for function calls yet, but
>> that would be something to do for native vs pv.
>>
>> And then there seem to be one or two tunable parameters we could
>> experiment with.
>>
>> The paravirt locks may need a bit more tuning. Some simple testing
>> under KVM shows we might be a bit slower in some cases. Whether this
>> is fairness or something else I'm not sure. The current simple pv
>> spinlock code can do a directed yield to the lock holder CPU, whereas
>> the pv qspl here just does a general yield. I think we might actually
>> be able to change that to also support directed yield. Though I'm
>> not sure if this is actually the cause of the slowdown yet.
> 
> Regarding the paravirt lock, I have taken a further look into the 
> current PPC spinlock code. There is an equivalent of pv_wait() but no 
> pv_kick(). Maybe PPC doesn't really need that.

So powerpc has two types of wait, either undirected "all processors" or 
directed to a specific processor which has been preempted by the 
hypervisor.

The simple spinlock code does a directed wait, because it knows the CPU 
which is holding the lock. In this case, there is a sequence that is 
used to ensure we don't wait if the condition has become true, and the
target CPU does not need to kick the waiter it will happen automatically
(see splpar_spin_yield). This is preferable because we only wait as 
needed and don't require the kick operation.

The pv spinlock code I did uses the undirected wait, because we don't
know the CPU number which we are waiting on. This is undesirable because 
it's higher overhead and the wait is not so accurate.

I think perhaps we could change things so we wait on the correct CPU 
when queued, which might be good enough (we could also put the lock
owner CPU in the spinlock word, if we add another format).

> Attached are two 
> additional qspinlock patches that adds a CONFIG_PARAVIRT_QSPINLOCKS_LITE 
> option to not require pv_kick(). There is also a fixup patch to be 
> applied after your patchset.
> 
> I don't have access to a PPC LPAR with shared processor at the moment, 
> so I can't test the performance of the paravirt code. Would you mind 
> adding my patches and do some performance test on your end to see if it 
> gives better result?

Great, I'll do some tests. Any suggestions for what to try?

Thanks,
Nick
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization