[PATCH v3 0/2] iommu: fix the failure of deferred attach for iommu attach device

2021-01-26 Thread Lianbo Jiang
This patchset is to fix the failure of deferred attach for iommu attach
device, it includes the following two patches:

[1] [PATCH 1/2] dma-iommu: use static-key to minimize the impact in the 
fast-path
This is a prepared patch for the second one, move out the is_kdump_kernel()
check from iommu_dma_deferred_attach() to iommu_dma_init(), and use the
static-key in the fast-path to minimize the impact in the normal case.

[2] [PATCH 2/2] iommu: use the __iommu_attach_device() directly for deferred 
attach
Move the handling currently in iommu_dma_deferred_attach() into the
iommu core code so that it can call the __iommu_attach_device()
directly instead of the iommu_attach_device(). The external interface
iommu_attach_device() is not suitable for handling this situation.

Changes since v1:
[1] use the __iommu_attach_device() directly for deferred attach
[2] use static-key to minimize the impact in the fast-path

Changes since v2:
[1] remove the underscores for the variable "__deferred_attach", and change
its name to iommu_deferred_attach_enabled [Suggested by Christoph Hellwig]
[2] remove the "do_" from the iommu_do_deferred_attach(), and change its
name to iommu_deferred_attach()
[3] remove the "extern" from the definition of iommu_deferred_attach() in
include/linux/iommu.h

Lianbo Jiang (2):
  dma-iommu: use static-key to minimize the impact in the fast-path
  iommu: use the __iommu_attach_device() directly for deferred attach

 drivers/iommu/dma-iommu.c | 29 +++--
 drivers/iommu/iommu.c | 10 ++
 include/linux/iommu.h |  1 +
 3 files changed, 22 insertions(+), 18 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/2] dma-iommu: use static-key to minimize the impact in the fast-path

2021-01-26 Thread Lianbo Jiang
Let's move out the is_kdump_kernel() check from iommu_dma_deferred_attach()
to iommu_dma_init(), and use the static-key in the fast-path to minimize
the impact in the normal case.

Signed-off-by: Lianbo Jiang 
Co-developed-by: Robin Murphy 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4078358ed66e..c80056f6c9f9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -51,6 +51,8 @@ struct iommu_dma_cookie {
struct iommu_domain *fq_domain;
 };
 
+static DEFINE_STATIC_KEY_FALSE(iommu_deferred_attach_enabled);
+
 void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
struct iommu_domain *domain)
 {
@@ -383,9 +385,6 @@ static int iommu_dma_deferred_attach(struct device *dev,
 {
const struct iommu_ops *ops = domain->ops;
 
-   if (!is_kdump_kernel())
-   return 0;
-
if (unlikely(ops->is_attach_deferred &&
ops->is_attach_deferred(domain, dev)))
return iommu_attach_device(domain, dev);
@@ -535,7 +534,8 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size_t iova_off = iova_offset(iovad, phys);
dma_addr_t iova;
 
-   if (unlikely(iommu_dma_deferred_attach(dev, domain)))
+   if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
+   iommu_dma_deferred_attach(dev, domain))
return DMA_MAPPING_ERROR;
 
size = iova_align(iovad, size + iova_off);
@@ -693,7 +693,8 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
 
*dma_handle = DMA_MAPPING_ERROR;
 
-   if (unlikely(iommu_dma_deferred_attach(dev, domain)))
+   if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
+   iommu_dma_deferred_attach(dev, domain))
return NULL;
 
min_size = alloc_sizes & -alloc_sizes;
@@ -976,7 +977,8 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
unsigned long mask = dma_get_seg_boundary(dev);
int i;
 
-   if (unlikely(iommu_dma_deferred_attach(dev, domain)))
+   if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
+   iommu_dma_deferred_attach(dev, domain))
return 0;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
@@ -1424,6 +1426,9 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 
 static int iommu_dma_init(void)
 {
+   if (is_kdump_kernel())
+   static_branch_enable(&iommu_deferred_attach_enabled);
+
return iova_cache_get();
 }
 arch_initcall(iommu_dma_init);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/2] iommu: use the __iommu_attach_device() directly for deferred attach

2021-01-26 Thread Lianbo Jiang
Currently, because domain attach allows to be deferred from iommu
driver to device driver, and when iommu initializes, the devices
on the bus will be scanned and the default groups will be allocated.

Due to the above changes, some devices could be added to the same
group as below:

[3.859417] pci :01:00.0: Adding to iommu group 16
[3.864572] pci :01:00.1: Adding to iommu group 16
[3.869738] pci :02:00.0: Adding to iommu group 17
[3.874892] pci :02:00.1: Adding to iommu group 17

But when attaching these devices, it doesn't allow that a group has
more than one device, otherwise it will return an error. This conflicts
with the deferred attaching. Unfortunately, it has two devices in the
same group for my side, for example:

[9.627014] iommu_group_device_count(): device name[0]::01:00.0
[9.633545] iommu_group_device_count(): device name[1]::01:00.1
...
[   10.255609] iommu_group_device_count(): device name[0]::02:00.0
[   10.262144] iommu_group_device_count(): device name[1]::02:00.1

Finally, which caused the failure of tg3 driver when tg3 driver calls
the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma().

[9.660310] tg3 :01:00.0: DMA engine test failed, aborting
[9.754085] tg3: probe of :01:00.0 failed with error -12
[9.997512] tg3 :01:00.1: DMA engine test failed, aborting
[   10.043053] tg3: probe of :01:00.1 failed with error -12
[   10.288905] tg3 :02:00.0: DMA engine test failed, aborting
[   10.334070] tg3: probe of :02:00.0 failed with error -12
[   10.578303] tg3 :02:00.1: DMA engine test failed, aborting
[   10.622629] tg3: probe of :02:00.1 failed with error -12

In addition, the similar situations also occur in other drivers such
as the bnxt_en driver. That can be reproduced easily in kdump kernel
when SME is active.

Let's move the handling currently in iommu_dma_deferred_attach() into
the iommu core code so that it can call the __iommu_attach_device()
directly instead of the iommu_attach_device(). The external interface
iommu_attach_device() is not suitable for handling this situation.

Signed-off-by: Lianbo Jiang 
---
 drivers/iommu/dma-iommu.c | 18 +++---
 drivers/iommu/iommu.c | 10 ++
 include/linux/iommu.h |  1 +
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c80056f6c9f9..f659395e7959 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -380,18 +380,6 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
return iova_reserve_iommu_regions(dev, domain);
 }
 
-static int iommu_dma_deferred_attach(struct device *dev,
-   struct iommu_domain *domain)
-{
-   const struct iommu_ops *ops = domain->ops;
-
-   if (unlikely(ops->is_attach_deferred &&
-   ops->is_attach_deferred(domain, dev)))
-   return iommu_attach_device(domain, dev);
-
-   return 0;
-}
-
 /**
  * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
  *page flags.
@@ -535,7 +523,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
dma_addr_t iova;
 
if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
-   iommu_dma_deferred_attach(dev, domain))
+   iommu_deferred_attach(dev, domain))
return DMA_MAPPING_ERROR;
 
size = iova_align(iovad, size + iova_off);
@@ -694,7 +682,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
*dma_handle = DMA_MAPPING_ERROR;
 
if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
-   iommu_dma_deferred_attach(dev, domain))
+   iommu_deferred_attach(dev, domain))
return NULL;
 
min_size = alloc_sizes & -alloc_sizes;
@@ -978,7 +966,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
int i;
 
if (static_branch_unlikely(&iommu_deferred_attach_enabled) &&
-   iommu_dma_deferred_attach(dev, domain))
+   iommu_deferred_attach(dev, domain))
return 0;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ffeebda8d6de..15b5fd6bd554 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1980,6 +1980,16 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
+{
+   const struct iommu_ops *ops = domain->ops;
+
+   if (ops->is_attach_deferred && ops->is_attach_deferred(domain, dev))
+   return __iommu_attach_device(domain, dev);
+
+   return 0;
+}
+
 /*
  * Check flag

[PATCH] iommu: check for the deferred attach when attaching a device

2020-12-25 Thread Lianbo Jiang
Currently, because domain attach allows to be deferred from iommu
driver to device driver, and when iommu initializes, the devices
on the bus will be scanned and the default groups will be allocated.

Due to the above changes, some devices could be added to the same
group as below:

[3.859417] pci :01:00.0: Adding to iommu group 16
[3.864572] pci :01:00.1: Adding to iommu group 16
[3.869738] pci :02:00.0: Adding to iommu group 17
[3.874892] pci :02:00.1: Adding to iommu group 17

But when attaching these devices, it doesn't allow that a group has
more than one device, otherwise it will return an error. This conflicts
with the deferred attaching. Unfortunately, it has two devices in the
same group for my side, for example:

[9.627014] iommu_group_device_count(): device name[0]::01:00.0
[9.633545] iommu_group_device_count(): device name[1]::01:00.1
...
[   10.255609] iommu_group_device_count(): device name[0]::02:00.0
[   10.262144] iommu_group_device_count(): device name[1]::02:00.1

Finally, which caused the failure of tg3 driver when tg3 driver calls
the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma().

[9.660310] tg3 :01:00.0: DMA engine test failed, aborting
[9.754085] tg3: probe of :01:00.0 failed with error -12
[9.997512] tg3 :01:00.1: DMA engine test failed, aborting
[   10.043053] tg3: probe of :01:00.1 failed with error -12
[   10.288905] tg3 :02:00.0: DMA engine test failed, aborting
[   10.334070] tg3: probe of :02:00.0 failed with error -12
[   10.578303] tg3 :02:00.1: DMA engine test failed, aborting
[   10.622629] tg3: probe of :02:00.1 failed with error -12

In addition, the similar situations also occur in other drivers such
as the bnxt_en driver. That can be reproduced easily in kdump kernel
when SME is active.

Add a check for the deferred attach in the iommu_attach_device() and
allow to attach the deferred device regardless of how many devices
are in a group.

Signed-off-by: Lianbo Jiang 
---
 drivers/iommu/iommu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ffeebda8d6de..dccab7b133fb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1967,8 +1967,11 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 */
mutex_lock(&group->mutex);
ret = -EINVAL;
-   if (iommu_group_device_count(group) != 1)
+   if (!iommu_is_attach_deferred(domain, dev) &&
+   iommu_group_device_count(group) != 1) {
+   dev_err_ratelimited(dev, "Group has more than one device\n");
goto out_unlock;
+   }
 
ret = __iommu_attach_group(domain, group);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2 v2] iommu: fix the failure of deferred attach for iommu attach device

2021-01-19 Thread Lianbo Jiang
This patchset is to fix the failure of deferred attach for iommu attach
device, it includes the following two patches:

[1] [PATCH 1/2] dma-iommu: use static-key to minimize the impact in the 
fast-path
This is a prepared patch for the second one, move out the is_kdump_kernel()
check from iommu_dma_deferred_attach() to iommu_dma_init(), and use the
static-key in the fast-path to minimize the impact in the normal case.

[2] [PATCH 2/2] iommu: use the __iommu_attach_device() directly for deferred 
attach
Move the handling currently in iommu_dma_deferred_attach() into the
iommu core code so that it can call the __iommu_attach_device()
directly instead of the iommu_attach_device(). The external interface
iommu_attach_device() is not suitable for handling this situation.

Changes since v1:
[1] use the __iommu_attach_device() directly for deferred attach
[2] use static-key to minimize the impact in the fast-path

Lianbo Jiang (2):
  dma-iommu: use static-key to minimize the impact in the fast-path
  iommu: use the __iommu_attach_device() directly for deferred attach

 drivers/iommu/dma-iommu.c | 29 +++--
 drivers/iommu/iommu.c | 12 
 include/linux/iommu.h |  2 ++
 3 files changed, 25 insertions(+), 18 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2 v2] dma-iommu: use static-key to minimize the impact in the fast-path

2021-01-19 Thread Lianbo Jiang
Let's move out the is_kdump_kernel() check from iommu_dma_deferred_attach()
to iommu_dma_init(), and use the static-key in the fast-path to minimize
the impact in the normal case.

Signed-off-by: Lianbo Jiang 
Co-developed-by: Robin Murphy 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f0305e6aac1b..3711b4a6e4f9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -51,6 +51,8 @@ struct iommu_dma_cookie {
struct iommu_domain *fq_domain;
 };
 
+static DEFINE_STATIC_KEY_FALSE(__deferred_attach);
+
 void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
struct iommu_domain *domain)
 {
@@ -383,9 +385,6 @@ static int iommu_dma_deferred_attach(struct device *dev,
 {
const struct iommu_ops *ops = domain->ops;
 
-   if (!is_kdump_kernel())
-   return 0;
-
if (unlikely(ops->is_attach_deferred &&
ops->is_attach_deferred(domain, dev)))
return iommu_attach_device(domain, dev);
@@ -535,7 +534,8 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size_t iova_off = iova_offset(iovad, phys);
dma_addr_t iova;
 
-   if (unlikely(iommu_dma_deferred_attach(dev, domain)))
+   if (static_branch_unlikely(&__deferred_attach) &&
+   iommu_dma_deferred_attach(dev, domain))
return DMA_MAPPING_ERROR;
 
size = iova_align(iovad, size + iova_off);
@@ -693,7 +693,8 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
 
*dma_handle = DMA_MAPPING_ERROR;
 
-   if (unlikely(iommu_dma_deferred_attach(dev, domain)))
+   if (static_branch_unlikely(&__deferred_attach) &&
+   iommu_dma_deferred_attach(dev, domain))
return NULL;
 
min_size = alloc_sizes & -alloc_sizes;
@@ -1003,7 +1004,8 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
unsigned long mask = dma_get_seg_boundary(dev);
int i;
 
-   if (unlikely(iommu_dma_deferred_attach(dev, domain)))
+   if (static_branch_unlikely(&__deferred_attach) &&
+   iommu_dma_deferred_attach(dev, domain))
return 0;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
@@ -1451,6 +1453,9 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 
 static int iommu_dma_init(void)
 {
+   if (is_kdump_kernel())
+   static_branch_enable(&__deferred_attach);
+
return iova_cache_get();
 }
 arch_initcall(iommu_dma_init);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2 v2] iommu: use the __iommu_attach_device() directly for deferred attach

2021-01-19 Thread Lianbo Jiang
Currently, because domain attach allows to be deferred from iommu
driver to device driver, and when iommu initializes, the devices
on the bus will be scanned and the default groups will be allocated.

Due to the above changes, some devices could be added to the same
group as below:

[3.859417] pci :01:00.0: Adding to iommu group 16
[3.864572] pci :01:00.1: Adding to iommu group 16
[3.869738] pci :02:00.0: Adding to iommu group 17
[3.874892] pci :02:00.1: Adding to iommu group 17

But when attaching these devices, it doesn't allow that a group has
more than one device, otherwise it will return an error. This conflicts
with the deferred attaching. Unfortunately, it has two devices in the
same group for my side, for example:

[9.627014] iommu_group_device_count(): device name[0]::01:00.0
[9.633545] iommu_group_device_count(): device name[1]::01:00.1
...
[   10.255609] iommu_group_device_count(): device name[0]::02:00.0
[   10.262144] iommu_group_device_count(): device name[1]::02:00.1

Finally, which caused the failure of tg3 driver when tg3 driver calls
the dma_alloc_coherent() to allocate coherent memory in the tg3_test_dma().

[9.660310] tg3 :01:00.0: DMA engine test failed, aborting
[9.754085] tg3: probe of :01:00.0 failed with error -12
[9.997512] tg3 :01:00.1: DMA engine test failed, aborting
[   10.043053] tg3: probe of :01:00.1 failed with error -12
[   10.288905] tg3 :02:00.0: DMA engine test failed, aborting
[   10.334070] tg3: probe of :02:00.0 failed with error -12
[   10.578303] tg3 :02:00.1: DMA engine test failed, aborting
[   10.622629] tg3: probe of :02:00.1 failed with error -12

In addition, the similar situations also occur in other drivers such
as the bnxt_en driver. That can be reproduced easily in kdump kernel
when SME is active

Let's move the handling currently in iommu_dma_deferred_attach() into
the iommu core code so that it can call the __iommu_attach_device()
directly instead of the iommu_attach_device(). The external interface
iommu_attach_device() is not suitable for handling this situation.

Signed-off-by: Lianbo Jiang 
---
 drivers/iommu/dma-iommu.c | 18 +++---
 drivers/iommu/iommu.c | 12 
 include/linux/iommu.h |  2 ++
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3711b4a6e4f9..fa6f9098e77d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -380,18 +380,6 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
return iova_reserve_iommu_regions(dev, domain);
 }
 
-static int iommu_dma_deferred_attach(struct device *dev,
-   struct iommu_domain *domain)
-{
-   const struct iommu_ops *ops = domain->ops;
-
-   if (unlikely(ops->is_attach_deferred &&
-   ops->is_attach_deferred(domain, dev)))
-   return iommu_attach_device(domain, dev);
-
-   return 0;
-}
-
 /**
  * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
  *page flags.
@@ -535,7 +523,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
dma_addr_t iova;
 
if (static_branch_unlikely(&__deferred_attach) &&
-   iommu_dma_deferred_attach(dev, domain))
+   iommu_do_deferred_attach(dev, domain))
return DMA_MAPPING_ERROR;
 
size = iova_align(iovad, size + iova_off);
@@ -694,7 +682,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
*dma_handle = DMA_MAPPING_ERROR;
 
if (static_branch_unlikely(&__deferred_attach) &&
-   iommu_dma_deferred_attach(dev, domain))
+   iommu_do_deferred_attach(dev, domain))
return NULL;
 
min_size = alloc_sizes & -alloc_sizes;
@@ -1005,7 +993,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
int i;
 
if (static_branch_unlikely(&__deferred_attach) &&
-   iommu_dma_deferred_attach(dev, domain))
+   iommu_do_deferred_attach(dev, domain))
return 0;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ffeebda8d6de..32164d355d2e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1980,6 +1980,18 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_do_deferred_attach(struct device *dev,
+struct iommu_domain *domain)
+{
+   const struct iommu_ops *ops = domain->ops;
+
+   if (unlikely(ops->is_attach_deferred &&
+ops->is_attach_deferred(domain, dev)))
+   return __iommu_attach_device(domain, dev);
+
+ 

[PATCH 1/4 V3] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-06-16 Thread Lianbo Jiang
It is convenient to remap the old memory encrypted to the second
kernel by calling ioremap_encrypted().

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. remove the sme_active() check in __ioremap_caller().
2. put some logic into the early_memremap_pgprot_adjust() for
early memremap.

 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 28 
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index f6e5b93..989d60b 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545..e365fc4 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wt);
 
+void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size)
+{
+   return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
+   __builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(ioremap_encrypted);
+
 void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_cache);
 
@@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
unsigned long size,
 {
return __ioremap_caller(phys_addr, size,
pgprot2cachemode(__pgprot(prot_val)),
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_prot);
 
@@ -688,6 +697,9 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
encrypted_prot = false;
 
+   if (sme_active() && is_kdump_kernel())
+   encrypted_prot = false;
+
  

[PATCH 0/4 V3] Support kdump for AMD secure memory encryption(SME)

2018-06-16 Thread Lianbo Jiang
It is convenient to remap the old memory encrypted to the second kernel by
calling ioremap_encrypted().

When sme enabled on AMD server, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
encrypted to the second kernel(crash kernel), and sme is also enabled in
the second kernel, otherwise the old memory encrypted can not be decrypted.
Because simply changing the value of a C-bit on a page will not
automatically encrypt the existing contents of a page, and any data in the
page prior to the C-bit modification will become unintelligible. A page of
memory that is marked encrypted will be automatically decrypted when read
from DRAM and will be automatically encrypted when written to DRAM.

For the kdump, it is necessary to distinguish whether the memory is
encrypted. Furthermore, we should also know which part of the memory is
encrypted or decrypted. We will appropriately remap the memory according
to the specific situation in order to tell cpu how to deal with the
data(encrypted or decrypted). For example, when sme enabled, if the old
memory is encrypted, we will remap the old memory in encrypted way, which
will automatically decrypt the old memory encrypted when we read those data
from the remapping address.

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

This patch is only for SME kdump, it is not support SEV kdump.

Test tools:
makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
Author: Lianbo Jiang 
Date:   Mon May 14 17:02:40 2018 +0800
Note: This patch can only dump vmcore in the case of SME enabled.

crash-7.2.1: https://github.com/crash-utility/crash.git
commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
Author: Dave Anderson 
Date:   Fri May 11 15:54:32 2018 -0400

Test environment:
HP ProLiant DL385Gen10 AMD EPYC 7251
8-Core Processor
32768 MB memory
600 GB disk space

Linux 4.17-rc7:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit b04e217704b7 ("Linux 4.17-rc7")
Author: Linus Torvalds 
Date:   Sun May 27 13:01:47 2018 -0700

Reference:
AMD64 Architecture Programmer's Manual
https://support.amd.com/TechDocs/24593.pdf

Some changes:
1. remove the sme_active() check in __ioremap_caller().
2. remove the '#ifdef' stuff throughout this patch.
3. put some logic into the early_memremap_pgprot_adjust() and clean the
previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
4. add a new file and modify Makefile.
5. clean compile warning in copy_device_table() and some compile error.
6. split the original patch into four patches, it will be better for
review.

Some known issues:
1. about SME
Upstream kernel doesn't work when we use kexec in the follow command. The
system will hang.
(This issue doesn't matter with the kdump patch.)

Reproduce steps:
 # kexec -l /boot/vmlinuz-4.17.0-rc7+ --initrd=/boot/initramfs-4.17.0-rc7+.img 
--command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on 
rd.lvm.lv=rhel_hp-dl385g10-03/root rd.lvm.lv=rhel_hp-dl385g10-03/swap 
console=ttyS0,115200n81 LANG=en_US.UTF-8 earlyprintk=serial debug nokaslr"
 # kexec -e (or reboot)

The system will hang:
[ 1248.932239] kexec_core: Starting new kernel
early console in extract_kernel
input_data: 0x00087e91c3b4
input_len: 0x0067fcbd
output: 0x00087d40
output_len: 0x01b6fa90
kernel_total_size: 0x01a9d000
trampoline_32bit: 0x00099000

Decompressing Linux...
Parsing ELF...[-here the system will hang]

2. about SEV
Upstream kernel(Host OS) doesn't work in host side, some drivers about
SEV always go wrong in host side. We can't boot SEV Guest OS to test
kdump patch. Maybe it is more reasonable to improve SEV in another
patch. When some drivers can work in host side and it can also boot
Virtual Machine(SEV Guest OS), it will be suitable to fix SEV for kdump.

[  369.426131] INFO: task systemd-udevd:865 blocked for more than 120 seconds.
[  369.433177]   Not tainted 4.17.0-rc5+ #60
[  369.437585] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  369.445783] systemd-udevd   D0   865813 0x8004
[  369.451323] Call Trace:
[  369.453815]  ? __schedule+0x290/0x870
[  369.457523]  schedule+0x32/0x80
[  369.460714]  __sev_do_cmd_locked+0x1f6/0x2a0 [ccp]
[  369.465556]  ? cleanup_uevent_env+0x10/0x10
[  369.470084]  ? 

[PATCH 2/4 V3] Allocate pages for kdump without encryption when SME is enabled

2018-06-16 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate pages
for kdump without encryption in order to be able to boot the
second kernel in the same manner as kexec, which helps to keep
the same code style.

Signed-off-by: Lianbo Jiang 
---
 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 20fef1a..3c22a9b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   unsigned int count, i;
+
+   pages->mapping = NULL;
+   set_page_private(pages, order);
+   count = 1 << order;
+   for (i = 0; i < count; i++)
+   SetPageReserved(pages + i);
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -865,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -882,6 +893,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4 V3] Remap the device table of IOMMU in encrypted manner for kdump

2018-06-16 Thread Lianbo Jiang
In kdump mode, it will copy the device table of IOMMU from the old
device table, which is encrypted when SME is enabled in the first
kernel. So we must remap it in encrypted manner in order to be
automatically decrypted when we read.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. add some comments
2. clean compile warning.

 drivers/iommu/amd_iommu_init.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 904c575..a20af4c 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -889,11 +889,24 @@ static bool copy_device_table(void)
}
 
old_devtb_phys = entry & PAGE_MASK;
+
+   /*
+*  When sme enable in the first kernel, old_devtb_phys includes the
+*  memory encryption mask(sme_me_mask), we must remove the memory
+*  encryption mask to obtain the true physical address in kdump mode.
+*/
+   if (mem_encrypt_active() && is_kdump_kernel())
+   old_devtb_phys = __sme_clr(old_devtb_phys);
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (mem_encrypt_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4 V3] Help to dump the old memory encrypted into vmcore file

2018-06-16 Thread Lianbo Jiang
In kdump mode, we need to dump the old memory into vmcore file,
if SME is enabled in the first kernel, we must remap the old
memory in encrypted manner, which will be automatically decrypted
when we read from DRAM. It helps to parse the vmcore for some tools.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. add a new file and modify Makefile.
2. remove some code in sev_active().

 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 20 ++
 include/linux/crash_dump.h   | 11 
 4 files changed, 79 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5c..afb5bad 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index 000..e44ef33
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ * Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af..5200266 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 #include "internal.h"
+#include 
+#include 
 
 /* List representing chunks of contiguous memory areas and their offsets in
  * vmcore file.
@@ -86,7 +88,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
unsigned long pfn, offset;
size_t nr_bytes;
@@ -108,8 +111,11 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
if (pfn_is_ram(pfn) == 0)
memset(buf, 0, nr_bytes);
else {
-   tmp = copy_oldmem_page(pfn, buf, nr_bytes,
-   offset, userbuf);
+   tmp = encrypted ? copy_oldmem_page_encrypted(pfn,
+   buf, nr_bytes, offset, userbuf)
+   : copy_oldmem_page(pfn, buf, nr_bytes,
+  offset, userbuf);
+
if (tmp < 0)
return tmp;
}
@@ -143,7 +149,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
  */
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0);
+   return read_from_oldmem(buf, count, ppos, 0, false);
 }
 

[PATCH 0/5 V4] Support kdump for AMD secure memory encryption(SME)

2018-06-28 Thread Lianbo Jiang
When sme enabled on AMD server, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
encrypted to the second kernel(crash kernel), and sme is also enabled in
the second kernel, otherwise the old memory encrypted can not be decrypted.
Because simply changing the value of a C-bit on a page will not
automatically encrypt the existing contents of a page, and any data in the
page prior to the C-bit modification will become unintelligible. A page of
memory that is marked encrypted will be automatically decrypted when read
from DRAM and will be automatically encrypted when written to DRAM.

For the kdump, it is necessary to distinguish whether the memory is
encrypted. Furthermore, we should also know which part of the memory is
encrypted or decrypted. We will appropriately remap the memory according
to the specific situation in order to tell cpu how to deal with the
data(encrypted or decrypted). For example, when sme enabled, if the old
memory is encrypted, we will remap the old memory in encrypted way, which
will automatically decrypt the old memory encrypted when we read those data
from the remapping address.

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

This patch is only for SME kdump, it is not support SEV kdump.

For kdump(SME), there are two cases that doesn't support:
1. SME is enabled in the first kernel, but SME is disabled in the
second kernel
Because the old memory is encrypted, we can't decrypt the old memory
if SME is off in the second kernel.

2. SME is disabled in the first kernel, but SME is enabled in the
second kernel
Maybe it is unnecessary to support this case, because the old memory
is unencrypted, the old memory can be dumped as usual, we don't need
to enable sme in the second kernel, furthermore the requirement is
rare in actual deployment. Another, If we must support the scenario,
it will increase the complexity of the code, we will have to consider
how to transfer the sme flag from the first kernel to the second kernel,
in order to let the second kernel know that whether the old memory is
encrypted.
There are two manners to tranfer the SME flag to the second kernel, the
first way is to modify the assembly code, which includes some common
code and the path is too long. The second way is to use kexec tool,
which could require the sme flag to be exported in the first kernel
by "proc" or "sysfs", kexec will read the sme flag from "proc" or
"sysfs" when we use kexec tool to load image, subsequently the sme flag
will be saved in boot_params, we can properly remap the old memory
according to the previously saved sme flag. Although we can fix this
issue, maybe it is too expensive to do this. By the way, we won't fix
the problem unless someone thinks it is necessary to do it.

Test tools:
makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
Author: Lianbo Jiang 
Date:   Mon May 14 17:02:40 2018 +0800
Note: This patch can only dump vmcore in the case of SME enabled.

crash-7.2.1: https://github.com/crash-utility/crash.git
commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
Author: Dave Anderson 
Date:   Fri May 11 15:54:32 2018 -0400

Test environment:
HP ProLiant DL385Gen10 AMD EPYC 7251
8-Core Processor
32768 MB memory
600 GB disk space

Linux 4.18-rc2:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit 7daf201d7fe8 ("Linux 4.18-rc2")
Author: Linus Torvalds 
Date:   Sun Jun 24 20:54:29 2018 +0800

Reference:
AMD64 Architecture Programmer's Manual
https://support.amd.com/TechDocs/24593.pdf

Some changes:
1. remove the sme_active() check in __ioremap_caller().
2. remove the '#ifdef' stuff throughout this patch.
3. put some logic into the early_memremap_pgprot_adjust() and clean the
previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
4. add a new file and modify Makefile.
5. clean compile warning in copy_device_table() and some compile error.
6. split the original patch into five patches, it will be better for
review.
7. modify elfcorehdr_read().
8. add some comments.

Some known issues:
1. about SME
Upstream kernel doesn't work when we use kexec in the follow command. The
system will hang on 'HP ProLiant DL385Gen10 AMD EPYC 7251'. But it can't
reproduce on speedway.
(This issue doesn't matter with the kdump patch.)

Reproduce steps:
 # kexec -l /b

[PATCH 1/5 V4] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-06-28 Thread Lianbo Jiang
It is convenient to remap the old memory encrypted to the second
kernel by calling ioremap_encrypted().

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. remove the sme_active() check in __ioremap_caller().
2. revert some logic in the early_memremap_pgprot_adjust() for
early memremap and make it separate a new patch.

 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de6484..f8795f9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545..e01e6c6 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wt);
 
+void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size)
+{
+   return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
+   __builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(ioremap_encrypted);
+
 void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_cache);
 
@@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
unsigned long size,
 {
return __ioremap_caller(phys_addr, size,
pgprot2cachemode(__pgprot(prot_val)),
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_prot);
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5 V4] Allocate pages for kdump without encryption when SME is enabled

2018-06-28 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate pages
for kdump without encryption in order to be able to boot the
second kernel in the same manner as kexec, which helps to keep
the same code style.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. remove some redundant codes for crash control pages.
2. add some comments.

 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4..e7efcd1 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   /*
+* For kdump, we need to ensure that these pages are
+* unencrypted pages if SME is enabled.
+* By the way, it is unnecessary to call the arch_
+* kexec_pre_free_pages(), which will make the code
+* become more simple.
+*/
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5 V4] Remap the device table of IOMMU in encrypted manner for kdump

2018-06-28 Thread Lianbo Jiang
In kdump mode, it will copy the device table of IOMMU from the old
device table, which is encrypted when SME is enabled in the first
kernel. So we must remap it in encrypted manner in order to be
automatically decrypted when we read.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. add some comments
2. clean compile warning.
3. remove unnecessary code when we clear sme mask bit.

 drivers/iommu/amd_iommu_init.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 904c575..4cebb00 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -888,12 +888,22 @@ static bool copy_device_table(void)
}
}
 
-   old_devtb_phys = entry & PAGE_MASK;
+   /*
+* When SME is enabled in the first kernel, the entry includes the
+* memory encryption mask(sme_me_mask), we must remove the memory
+* encryption mask to obtain the true physical address in kdump mode.
+*/
+   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (sme_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5 V4] Adjust some permanent mappings in unencrypted ways for kdump when SME is enabled.

2018-06-28 Thread Lianbo Jiang
For kdump, the acpi table and dmi table will need to be remapped in
unencrypted ways during early init, they have just a simple wrapper
around early_memremap(), but the early_memremap() remaps the memory
in encrypted ways by default when SME is enabled, so we put some logic
into the early_memremap_pgprot_adjust(), which will have an opportunity
to adjust it.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/mm/ioremap.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index e01e6c6..3c1c8c4 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -689,8 +689,17 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
encrypted_prot = true;
 
if (sme_active()) {
+   /*
+* In kdump mode, the acpi table and dmi table will need to
+* be remapped in unencrypted ways during early init when
+* SME is enabled. They have just a simple wrapper around
+* early_memremap(), but the early_memremap() remaps the
+* memory in encrypted ways by default when SME is enabled,
+* so we must adjust it.
+*/
if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size))
+   memremap_is_efi_data(phys_addr, size) ||
+   is_kdump_kernel())
encrypted_prot = false;
}
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5 V4] Help to dump the old memory encrypted into vmcore file

2018-06-28 Thread Lianbo Jiang
In kdump mode, we need to dump the old memory into vmcore file,
if SME is enabled in the first kernel, we must remap the old
memory in encrypted manner, which will be automatically decrypted
when we read from DRAM. It helps to parse the vmcore for some tools.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. add a new file and modify Makefile.
2. revert some code about previously using sev_active().
3. modify elfcorehdr_read().

 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 45 +-
 include/linux/crash_dump.h   | 12 
 4 files changed, 104 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5c..afb5bad 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index 000..e1b1a57
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ * Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cfb6674..5fef489 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 #include "internal.h"
+#include 
+#include 
 
 /* List representing chunks of contiguous memory areas and their offsets in
  * vmcore file.
@@ -98,7 +100,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
unsigned long pfn, offset;
size_t nr_bytes;
@@ -120,8 +123,11 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
if (pfn_is_ram(pfn) == 0)
memset(buf, 0, nr_bytes);
else {
-   tmp = copy_oldmem_page(pfn, buf, nr_bytes,
-   offset, userbuf);
+   tmp = encrypted ? copy_oldmem_page_encrypted(pfn,
+   buf, nr_bytes, offset, userbuf)
+   : copy_oldmem_page(pfn, buf, nr_bytes,
+  offset, userbuf);
+
if (tmp < 0)
return tmp;
}
@@ -151,11 +157,34 @@ void __weak elfcorehdr_free(unsigned long long addr)
 {}
 
 /*
- * Architectures may override this function to read from ELF header
+ * Architectures may override this function to read fr

[PATCH 0/5 V5] Support kdump for AMD secure memory encryption(SME)

2018-07-02 Thread Lianbo Jiang
When sme enabled on AMD server, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
encrypted to the second kernel(crash kernel), and sme is also enabled in
the second kernel, otherwise the old memory encrypted can not be decrypted.
Because simply changing the value of a C-bit on a page will not
automatically encrypt the existing contents of a page, and any data in the
page prior to the C-bit modification will become unintelligible. A page of
memory that is marked encrypted will be automatically decrypted when read
from DRAM and will be automatically encrypted when written to DRAM.

For the kdump, it is necessary to distinguish whether the memory is
encrypted. Furthermore, we should also know which part of the memory is
encrypted or decrypted. We will appropriately remap the memory according
to the specific situation in order to tell cpu how to deal with the
data(encrypted or decrypted). For example, when sme enabled, if the old
memory is encrypted, we will remap the old memory in encrypted way, which
will automatically decrypt the old memory encrypted when we read those data
from the remapping address.

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

This patch is only for SME kdump, it is not support SEV kdump.

For kdump(SME), there are two cases that doesn't support:
1. SME is enabled in the first kernel, but SME is disabled in the
second kernel
Because the old memory is encrypted, we can't decrypt the old memory
if SME is off in the second kernel.

2. SME is disabled in the first kernel, but SME is enabled in the
second kernel
Maybe it is unnecessary to support this case, because the old memory
is unencrypted, the old memory can be dumped as usual, we don't need
to enable sme in the second kernel, furthermore the requirement is
rare in actual deployment. Another, If we must support the scenario,
it will increase the complexity of the code, we will have to consider
how to transfer the sme flag from the first kernel to the second kernel,
in order to let the second kernel know that whether the old memory is
encrypted.
There are two manners to transfer the SME flag to the second kernel, the
first way is to modify the assembly code, which includes some common
code and the path is too long. The second way is to use kexec tool,
which could require the sme flag to be exported in the first kernel
by "proc" or "sysfs", kexec will read the sme flag from "proc" or
"sysfs" when we use kexec tool to load image, subsequently the sme flag
will be saved in boot_params, we can properly remap the old memory
according to the previously saved sme flag. Although we can fix this
issue, maybe it is too expensive to do this. By the way, we won't fix
the problem unless someone thinks it is necessary to do it.

Test tools:
makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
Author: Lianbo Jiang 
Date:   Mon May 14 17:02:40 2018 +0800
Note: This patch can only dump vmcore in the case of SME enabled.

crash-7.2.1: https://github.com/crash-utility/crash.git
commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
Author: Dave Anderson 
Date:   Fri May 11 15:54:32 2018 -0400

Test environment:
HP ProLiant DL385Gen10 AMD EPYC 7251
8-Core Processor
32768 MB memory
600 GB disk space

Linux 4.18-rc3:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit 021c91791a5e7e85c567452f1be3e4c2c6cb6063
Author: Linus Torvalds 
Date:   Sun Jul 1 16:04:53 2018 -0700

Reference:
AMD64 Architecture Programmer's Manual
https://support.amd.com/TechDocs/24593.pdf

Some changes:
1. remove the sme_active() check in __ioremap_caller().
2. remove the '#ifdef' stuff throughout this patch.
3. put some logic into the early_memremap_pgprot_adjust() and clean the
previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
4. add a new file and modify Makefile.
5. clean compile warning in copy_device_table() and some compile error.
6. split the original patch into five patches, it will be better for
review.
7. add some comments.

Some known issues:
1. about SME
Upstream kernel doesn't work when we use kexec in the follow command. The
system will hang on 'HP ProLiant DL385Gen10 AMD EPYC 7251'. But it can't
reproduce on speedway.
(This issue doesn't matter with the kdump patch.)

Reproduce steps:
 # kexec -l /boot/vmlinuz-4.18.0-rc3+ --initrd=

[PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-02 Thread Lianbo Jiang
It is convenient to remap the old memory encrypted to the second
kernel by calling ioremap_encrypted().

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. remove the sme_active() check in __ioremap_caller().
2. revert some logic in the early_memremap_pgprot_adjust() for
early memremap and make it separate a new patch.

 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de6484..f8795f9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545..e01e6c6 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wt);
 
+void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size)
+{
+   return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
+   __builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(ioremap_encrypted);
+
 void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_cache);
 
@@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
unsigned long size,
 {
return __ioremap_caller(phys_addr, size,
pgprot2cachemode(__pgprot(prot_val)),
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_prot);
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5 V5] Allocate pages for kdump without encryption when SME is enabled

2018-07-02 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate pages
for kdump without encryption in order to be able to boot the
second kernel in the same manner as kexec, which helps to keep
the same code style.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. remove some redundant codes for crash control pages.
2. add some comments.

 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4..e7efcd1 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   /*
+* For kdump, we need to ensure that these pages are
+* unencrypted pages if SME is enabled.
+* By the way, it is unnecessary to call the arch_
+* kexec_pre_free_pages(), which will make the code
+* become more simple.
+*/
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5 V5] Remap the device table of IOMMU in encrypted manner for kdump

2018-07-02 Thread Lianbo Jiang
In kdump mode, it will copy the device table of IOMMU from the old
device table, which is encrypted when SME is enabled in the first
kernel. So we must remap it in encrypted manner in order to be
automatically decrypted when we read.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. add some comments
2. clean compile warning.
3. remove unnecessary code when we clear sme mask bit.

 drivers/iommu/amd_iommu_init.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 904c575..4cebb00 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -888,12 +888,22 @@ static bool copy_device_table(void)
}
}
 
-   old_devtb_phys = entry & PAGE_MASK;
+   /*
+* When SME is enabled in the first kernel, the entry includes the
+* memory encryption mask(sme_me_mask), we must remove the memory
+* encryption mask to obtain the true physical address in kdump mode.
+*/
+   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (sme_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5 V5] Adjust some permanent mappings in unencrypted ways for kdump when SME is enabled.

2018-07-02 Thread Lianbo Jiang
For kdump, the acpi table and dmi table will need to be remapped in
unencrypted ways during early init, they have just a simple wrapper
around early_memremap(), but the early_memremap() remaps the memory
in encrypted ways by default when SME is enabled, so we put some logic
into the early_memremap_pgprot_adjust(), which will have an opportunity
to adjust it.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/mm/ioremap.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index e01e6c6..3c1c8c4 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -689,8 +689,17 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
encrypted_prot = true;
 
if (sme_active()) {
+   /*
+* In kdump mode, the acpi table and dmi table will need to
+* be remapped in unencrypted ways during early init when
+* SME is enabled. They have just a simple wrapper around
+* early_memremap(), but the early_memremap() remaps the
+* memory in encrypted ways by default when SME is enabled,
+* so we must adjust it.
+*/
if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size))
+   memremap_is_efi_data(phys_addr, size) ||
+   is_kdump_kernel())
encrypted_prot = false;
}
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5 V5] Help to dump the old memory encrypted into vmcore file

2018-07-02 Thread Lianbo Jiang
In kdump mode, we need to dump the old memory into vmcore file,
if SME is enabled in the first kernel, we must remap the old
memory in encrypted manner, which will be automatically decrypted
when we read from DRAM. It helps to parse the vmcore for some tools.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. add a new file and modify Makefile.
2. revert some code about previously using sev_active().

 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 21 ++
 include/linux/crash_dump.h   | 12 
 4 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5c..afb5bad 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index 000..e1b1a57
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ * Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cfb6674..07c1934 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "internal.h"
 
 /* List representing chunks of contiguous memory areas and their offsets in
@@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
unsigned long pfn, offset;
size_t nr_bytes;
@@ -120,8 +124,11 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
if (pfn_is_ram(pfn) == 0)
memset(buf, 0, nr_bytes);
else {
-   tmp = copy_oldmem_page(pfn, buf, nr_bytes,
-   offset, userbuf);
+   tmp = encrypted ? copy_oldmem_page_encrypted(pfn,
+   buf, nr_bytes, offset, userbuf)
+   : copy_oldmem_page(pfn, buf, nr_bytes,
+  offset, userbuf);
+
if (tmp < 0)
return tmp;
}
@@ -155,7 +162,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
  */
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0);
+   return read_from_oldmem(buf, count, ppo

[PATCH 0/5 V6] Support kdump for AMD secure memory encryption(SME)

2018-08-31 Thread Lianbo Jiang
9.433177]   Not tainted 4.17.0-rc5+ #60
[  369.437585] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  369.445783] systemd-udevd   D0   865813 0x8004
[  369.451323] Call Trace:
[  369.453815]  ? __schedule+0x290/0x870
[  369.457523]  schedule+0x32/0x80
[  369.460714]  __sev_do_cmd_locked+0x1f6/0x2a0 [ccp]
[  369.465556]  ? cleanup_uevent_env+0x10/0x10
[  369.470084]  ? remove_wait_queue+0x60/0x60
[  369.474219]  ? 0xc0247000
[  369.477572]  __sev_platform_init_locked+0x2b/0x70 [ccp]
[  369.482843]  sev_platform_init+0x1d/0x30 [ccp]
[  369.487333]  psp_pci_init+0x40/0xe0 [ccp]
[  369.491380]  ? 0xc0247000
[  369.494936]  sp_mod_init+0x18/0x1000 [ccp]
[  369.499071]  do_one_initcall+0x4e/0x1d4
[  369.502944]  ? _cond_resched+0x15/0x30
[  369.506728]  ? kmem_cache_alloc_trace+0xae/0x1d0
[  369.511386]  ? do_init_module+0x22/0x220
[  369.515345]  do_init_module+0x5a/0x220
[  369.519444]  load_module+0x21cb/0x2a50
[  369.523227]  ? m_show+0x1c0/0x1c0
[  369.526571]  ? security_capable+0x3f/0x60
[  369.530611]  __do_sys_finit_module+0x94/0xe0
[  369.534915]  do_syscall_64+0x5b/0x180
[  369.538607]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  369.543698] RIP: 0033:0x7f708e6311b9
[  369.547536] RSP: 002b:79d32aa8 EFLAGS: 0246 ORIG_RAX: 
0139
[  369.555162] RAX: ffda RBX: 55602a04c2d0 RCX: 7f708e6311b9
[  369.562346] RDX:  RSI: 7f708ef52039 RDI: 0008
[  369.569801] RBP: 7f708ef52039 R08:  R09: 55602a048b20
[  369.576988] R10: 0008 R11: 0246 R12: 00000000
[  369.584177] R13: 55602a075260 R14: 0002 R15: 

Lianbo Jiang (5):
  x86/ioremap: add a function ioremap_encrypted() to remap kdump old
memroy
  x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to
adjust encryption mask
  kexec: allocate unencrypted control pages for kdump in case SME is
enabled
  iommu/amd_iommu: remap the device table of IOMMU with the memory
encryption mask for kdump
  kdump/vmcore: support encrypted old memory with SME enabled

 arch/x86/include/asm/io.h|  3 ++
 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 arch/x86/mm/ioremap.c| 34 +-
 drivers/iommu/amd_iommu_init.c   | 14 ++--
 fs/proc/vmcore.c | 21 +++
 include/linux/crash_dump.h   | 12 +++
 kernel/kexec_core.c  | 12 +++
 8 files changed, 133 insertions(+), 17 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5 V6] iommu/amd_iommu: remap the device table of IOMMU with the memory encryption mask for kdump

2018-08-31 Thread Lianbo Jiang
In kdump kernel, it will copy the device table of IOMMU from the old device
table, which is encrypted when SME is enabled in the first kernel. So we
have to remap the old device table with the memory encryption mask.

Signed-off-by: Lianbo Jiang 
---
 drivers/iommu/amd_iommu_init.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 84b3e4445d46..3931c7de7c69 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -902,12 +902,22 @@ static bool copy_device_table(void)
}
}
 
-   old_devtb_phys = entry & PAGE_MASK;
+   /*
+* When SME is enabled in the first kernel, the entry includes the
+* memory encryption mask(sme_me_mask), we must remove the memory
+* encryption mask to obtain the true physical address in kdump kernel.
+*/
+   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (sme_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5 V6] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memroy

2018-08-31 Thread Lianbo Jiang
When SME is enabled on AMD machine, the memory is encrypted in the first
kernel. In this case, SME also needs to be enabled in kdump kernel, and
we have to remap the old memory with the memory encryption mask.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de64840dd22..f8795f9581c7 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545ec199..e01e6c695add 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wt);
 
+void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size)
+{
+   return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
+   __builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(ioremap_encrypted);
+
 void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_cache);
 
@@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
unsigned long size,
 {
return __ioremap_caller(phys_addr, size,
pgprot2cachemode(__pgprot(prot_val)),
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_prot);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5 V6] x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust encryption mask

2018-08-31 Thread Lianbo Jiang
For kdump kernel, when SME is enabled, the acpi table and dmi table will need
to be remapped without the memory encryption mask. So we have to strengthen
the logic in early_memremap_pgprot_adjust(), which makes us have an opportunity
to adjust the memory encryption mask.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/mm/ioremap.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index e01e6c695add..f9d9a39955f3 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -689,8 +689,15 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
encrypted_prot = true;
 
if (sme_active()) {
+/*
+ * In kdump kernel, the acpi table and dmi table will need
+ * to be remapped without the memory encryption mask. Here
+ * we have to strengthen the logic to adjust the memory
+ * encryption mask.
+ */
if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size))
+   memremap_is_efi_data(phys_addr, size) ||
+   is_kdump_kernel())
encrypted_prot = false;
}
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5 V6] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-08-31 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate unencrypted pages
for kdump in order to be able to boot the kdump kernel like kexec.

Signed-off-by: Lianbo Jiang 
---
 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4da38a..e7efcd1a977b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   /*
+* For kdump, we need to ensure that these pages are
+* unencrypted pages if SME is enabled.
+* By the way, it is unnecessary to call the arch_
+* kexec_pre_free_pages(), which will make the code
+* become more simple.
+*/
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5 V6] kdump/vmcore: support encrypted old memory with SME enabled

2018-08-31 Thread Lianbo Jiang
In kdump kernel, we need to dump the old memory into vmcore file,if SME
is enabled in the first kernel, we have to remap the old memory with the
memory encryption mask, which will be automatically decrypted when we
read from DRAM.

For SME kdump, there are two cases that doesn't support:

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

1. SME is enabled in the first kernel, but SME is disabled in kdump kernel
In this case, because the old memory is encrypted, we can't decrypt the
old memory.

2. SME is disabled in the first kernel, but SME is enabled in kdump kernel
On the one hand, the old memory is unencrypted, the old memory can be dumped
as usual, we don't need to enable SME in kdump kernel; On the other hand, it
will increase the complexity of the code, we will have to consider how to
pass the SME flag from the first kernel to the kdump kernel, it is really
too expensive to do this.

This patches are only for SME kdump, the patches don't support SEV kdump.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 21 +++
 include/linux/crash_dump.h   | 12 +++
 4 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8824d01c0c35..dfbeae0e35ce 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index ..e1b1a577f197
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ *     Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cbde728f8ac6..3065c8bada6a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "internal.h"
 
 /* List representing chunks of contiguous memory areas and their offsets in
@@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
  

[PATCH 1/4 v7] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-07 Thread Lianbo Jiang
When SME is enabled on AMD machine, the memory is encrypted in the first
kernel. In this case, SME also needs to be enabled in kdump kernel, and
we have to remap the old memory with the memory encryption mask.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de64840dd22..f8795f9581c7 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545ec199..e01e6c695add 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wt);
 
+void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size)
+{
+   return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
+   __builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(ioremap_encrypted);
+
 void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_cache);
 
@@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
unsigned long size,
 {
return __ioremap_caller(phys_addr, size,
pgprot2cachemode(__pgprot(prot_val)),
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_prot);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4 v7] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-09-07 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate unencrypted pages
for kdump in order to be able to boot the kdump kernel like kexec.

Signed-off-by: Lianbo Jiang 
---
 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4da38a..e7efcd1a977b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   /*
+* For kdump, we need to ensure that these pages are
+* unencrypted pages if SME is enabled.
+* By the way, it is unnecessary to call the arch_
+* kexec_pre_free_pages(), which will make the code
+* become more simple.
+*/
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4 v7] Support kdump for AMD secure memory encryption(SME)

2018-09-07 Thread Lianbo Jiang
levant to my posted patches.

The kernel log:
[ 1248.932239] kexec_core: Starting new kernel
early console in extract_kernel
input_data: 0x00087e91c3b4
input_len: 0x0067fcbd
output: 0x00087d40
output_len: 0x01b6fa90
kernel_total_size: 0x01a9d000
trampoline_32bit: 0x00099000

Decompressing Linux...
Parsing ELF...[---Here the system will hang]


Lianbo Jiang (4):
  x86/ioremap: add a function ioremap_encrypted() to remap kdump old
memory
  kexec: allocate unencrypted control pages for kdump in case SME is
enabled
  amd_iommu: remap the device table of IOMMU with the memory encryption
mask for kdump
  kdump/vmcore: support encrypted old memory with SME enabled

 arch/x86/include/asm/io.h|  3 ++
 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 arch/x86/mm/ioremap.c| 25 -
 drivers/iommu/amd_iommu_init.c   | 14 ++--
 fs/proc/vmcore.c | 21 +++
 include/linux/crash_dump.h   | 12 +++
 kernel/kexec_core.c  | 12 +++
 8 files changed, 125 insertions(+), 16 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4 v7] kdump/vmcore: support encrypted old memory with SME enabled

2018-09-07 Thread Lianbo Jiang
In kdump kernel, we need to dump the old memory into vmcore file,if SME
is enabled in the first kernel, we have to remap the old memory with the
memory encryption mask, which will be automatically decrypted when we
read from DRAM.

For SME kdump, there are two cases that doesn't support:

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

1. SME is enabled in the first kernel, but SME is disabled in kdump kernel
In this case, because the old memory is encrypted, we can't decrypt the
old memory.

2. SME is disabled in the first kernel, but SME is enabled in kdump kernel
On the one hand, the old memory is unencrypted, the old memory can be dumped
as usual, we don't need to enable SME in kdump kernel; On the other hand, it
will increase the complexity of the code, we will have to consider how to
pass the SME flag from the first kernel to the kdump kernel, it is really
too expensive to do this.

This patches are only for SME kdump, the patches don't support SEV kdump.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 21 +++
 include/linux/crash_dump.h   | 12 +++
 4 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8824d01c0c35..dfbeae0e35ce 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index ..e1b1a577f197
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ *     Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cbde728f8ac6..3065c8bada6a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "internal.h"
 
 /* List representing chunks of contiguous memory areas and their offsets in
@@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
  

[PATCH 3/4 v7] amd_iommu: remap the device table of IOMMU with the memory encryption mask for kdump

2018-09-07 Thread Lianbo Jiang
In kdump kernel, it will copy the device table of IOMMU from the old device
table, which is encrypted when SME is enabled in the first kernel. So we
have to remap the old device table with the memory encryption mask.

Signed-off-by: Lianbo Jiang 
---
 drivers/iommu/amd_iommu_init.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 84b3e4445d46..3931c7de7c69 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -902,12 +902,22 @@ static bool copy_device_table(void)
}
}
 
-   old_devtb_phys = entry & PAGE_MASK;
+   /*
+* When SME is enabled in the first kernel, the entry includes the
+* memory encryption mask(sme_me_mask), we must remove the memory
+* encryption mask to obtain the true physical address in kdump kernel.
+*/
+   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (sme_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu