Re: [PATCH v2] acpi: apei: fix the wrongly iterate generic error status block

2017-08-16 Thread gengdongjiu
Hello Tyler,
  I have already updated a new version patch to adds this usage to 
cper_estatus_check(), please re-test.

Hello Boris,
  The original macro of apei_estatus_for_each_section has two issues:
one is the iteration loop termination condition; another is the iteration 
steps. please review it.
thanks.



On 2017/8/16 7:26, Baicar, Tyler wrote:
> On 8/15/2017 3:34 PM, gengdongjiu wrote:
>> Hi Tyler ,
>>
>>> Hello Boris,
>>>
>>> His patch fixes the define for apei_estatus_for_each_section which in turn
>>> should fix ghes_do_proc(). So my patch should no longer be needed. I'm going
>>> to test this out just to verify if fixes the issue I found.
>> I have verified the issue about the iteration for the revision 0x300
>> generic error data,
>> it works well. it is good that you will verify that in your  platform.
> I've verified that this resolves the issue as well! I'll re-test with the 
> next version that adds this usage to cper_estatus_check() and add my 
> tested-by after that.
> 
> Thanks,
> Tyler
> 



Re: [PATCH v3] acpi: apei: fix the wrongly iterate generic error status block

2017-08-16 Thread gengdongjiu
CC Will and Jonathan


On 2017/8/16 21:55, Baicar, Tyler wrote:
> On 8/16/2017 2:14 AM, Dongjiu Geng wrote:
>> The revision 0x300 generic error data entry is different
>> from the old version, but currently iterating through the
>> GHES estatus blocks does not take into account this difference.
>> This will lead to failure to get the right data entry if GHES
>> has revision 0x300 error data entry.
>>
>> Update the GHES estatus iteration to properly increment using
>> acpi_hest_get_next, and correct the iteration termination condition
>> because the status block data length only includes error data length.
>> Clear the CPER estatus printing iteration logic to use same macro.
>>
>> Signed-off-by: Dongjiu Geng 
>> CC: Tyler Baicar 
> Tested-by: Tyler Baicar 
> 
> Works good for me!
> 
> Thanks,
> Tyler
>> ---
>>   drivers/acpi/apei/apei-internal.h |  5 -
>>   drivers/firmware/efi/cper.c   | 12 ++--
>>   include/acpi/ghes.h   |  5 +
>>   3 files changed, 7 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/apei-internal.h 
>> b/drivers/acpi/apei/apei-internal.h
>> index 6e9f14c0a71b..cb4126051f62 100644
>> --- a/drivers/acpi/apei/apei-internal.h
>> +++ b/drivers/acpi/apei/apei-internal.h
>> @@ -120,11 +120,6 @@ int apei_exec_collect_resources(struct 
>> apei_exec_context *ctx,
>>   struct dentry;
>>   struct dentry *apei_get_debugfs_dir(void);
>>   -#define apei_estatus_for_each_section(estatus, section)\
>> -for (section = (struct acpi_hest_generic_data *)(estatus + 1);\
>> - (void *)section - (void *)estatus < estatus->data_length;\
>> - section = (void *)(section+1) + section->error_data_length)
>> -
>>   static inline u32 cper_estatus_len(struct acpi_hest_generic_status 
>> *estatus)
>>   {
>>   if (estatus->raw_data_length)
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index 48a8f69da42a..bf3672a81e49 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -606,7 +606,6 @@ void cper_estatus_print(const char *pfx,
>>   const struct acpi_hest_generic_status *estatus)
>>   {
>>   struct acpi_hest_generic_data *gdata;
>> -unsigned int data_len;
>>   int sec_no = 0;
>>   char newpfx[64];
>>   __u16 severity;
>> @@ -617,14 +616,10 @@ void cper_estatus_print(const char *pfx,
>>  "It has been corrected by h/w "
>>  "and requires no further action");
>>   printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
>> -data_len = estatus->data_length;
>> -gdata = (struct acpi_hest_generic_data *)(estatus + 1);
>>   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
>>   -while (data_len >= acpi_hest_get_size(gdata)) {
>> +apei_estatus_for_each_section(estatus, gdata) {
>>   cper_estatus_print_section(newpfx, gdata, sec_no);
>> -data_len -= acpi_hest_get_record_size(gdata);
>> -gdata = acpi_hest_get_next(gdata);
>>   sec_no++;
>>   }
>>   }
>> @@ -653,15 +648,12 @@ int cper_estatus_check(const struct 
>> acpi_hest_generic_status *estatus)
>>   if (rc)
>>   return rc;
>>   data_len = estatus->data_length;
>> -gdata = (struct acpi_hest_generic_data *)(estatus + 1);
>>   -while (data_len >= acpi_hest_get_size(gdata)) {
>> +apei_estatus_for_each_section(estatus, gdata) {
>>   gedata_len = acpi_hest_get_error_length(gdata);
>>   if (gedata_len > data_len - acpi_hest_get_size(gdata))
>>   return -EINVAL;
>> -
>>   data_len -= acpi_hest_get_record_size(gdata);
>> -gdata = acpi_hest_get_next(gdata);
>>   }
>>   if (data_len)
>>   return -EINVAL;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 9f26e01186ae..9061c5c743b3 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -113,6 +113,11 @@ static inline void *acpi_hest_get_next(struct 
>> acpi_hest_generic_data *gdata)
>>   return (void *)(gdata) + acpi_hest_get_record_size(gdata);
>>   }
>>   +#define apei_estatus_for_each_section(estatus, section)\
>> +for (section = (struct acpi_hest_generic_data *)(estatus + 1);\
>> + (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>> + section = acpi_hest_get_next(section))
>> +
>>   int ghes_notify_sea(void);
>> #endif /* GHES_H */
> 



Re: [PATCH v3] acpi: apei: fix the wrongly iterate generic error status block

2017-08-17 Thread gengdongjiu
Borislav,
  thanks for the review,

On 2017/8/17 17:25, Borislav Petkov wrote:
> On Wed, Aug 16, 2017 at 04:14:50PM +0800, Dongjiu Geng wrote:
>> The revision 0x300 generic error data entry is different
>> from the old version, but currently iterating through the
>> GHES estatus blocks does not take into account this difference.
>> This will lead to failure to get the right data entry if GHES
>> has revision 0x300 error data entry.
>>
>> Update the GHES estatus iteration to properly increment using
> iteration macro
> 
>> acpi_hest_get_next, and correct the iteration termination condition
> Please end function names with parentheses.
> 
>> because the status block data length only includes error data length.
> < newline here.
> 
>> Clear the CPER estatus printing iteration logic to use same macro.
> s/Clear ... /Convert ... to the same macro./
> 
>> Signed-off-by: Dongjiu Geng 
>> CC: Tyler Baicar 
>> ---
>>  drivers/acpi/apei/apei-internal.h |  5 -
>>  drivers/firmware/efi/cper.c   | 12 ++--
>>  include/acpi/ghes.h   |  5 +
>>  3 files changed, 7 insertions(+), 15 deletions(-)
> With those addressed you can add:
  Ok, I will add.

> 
> Reviewed-by: Borislav Petkov 
> 
> -- 



Re: [PATCH] arm64: KVM: set right LR register value for 32 bit guest when inject abort

2017-10-14 Thread gengdongjiu
Hi Marc,

On 2017/10/13 23:12, Marc Zyngier wrote:
> On 13/10/17 15:29, gengdongjiu wrote:
>> Hi Marc,
>> Thank you very much for your time to review it.
>>
>>> On 12/10/17 17:44, Dongjiu Geng wrote:
>>>> When a exception is trapped to EL2, hardware uses  ELR_ELx to hold the
>>>> current fault instruction address. If KVM wants to inject a abort to
>>>> 32 bit guest, it needs to set the LR register for the guest to emulate
>>>> this abort  happened in the guest. Because ARM32 architecture is
>>>> Multi-pipeline, so the LR value has an offset to
>>>
>>> What does "Multi-pipeline" mean?
>>
>> I mean the ARM's single-cycle instruction 3-stage pipeline operation, as 
>> shown below:
>>
>> fetch   decode   execute
>> fetchdecode   execute
>>  fetchdecode   execute
>>
>> when CPU finish instructions fetch,  PC=PC + 4
>> when CPU finish instructions decode, PC=PC + 8
>> when CPU finish instructions execution, PC=PC+12
> 
> Yeah, and that's called pipelined execution. "Multi-pipeline" doesn't 
> mean anything. Also, that's an artefact of the original ARM1 
> implementation, and not how modern CPUs work anymore.
Ok, thanks for the clarification.

> 
>>
>> that is to say, when happen data abort, 
>> the PC = fault instruction address + 12, LR_abt = fault instruction address 
>> + 8
>>
>> In order to emulate this abort for KVM, LR_abt needs to add an offset 8 when 
>> inject data abort. 
>>
>>>
>>>> the fault instruction address.
>>>>
>>>> The offsets applied to Link value for exceptions as shown below, which
>>>> should be added for the ARM32 link register(LR).
>>>>
>>>> Exception  Offset, for PE state of:
>>>>A32   T32
>>>> Undefined Instruction  +4+2
>>>> Prefetch Abort +4+4
>>>> Data Abort +8+8
>>>> IRQ or FIQ +4+4
>>>
>>> Please document where this table is coming from.
>>
>>
>> Thanks for pointing out. Will add it.
>> It come from:  DDI0487A_k_armv8_arm_iss10775, "G1.12.3 Overview of exception 
>> entry", Table G1-10 Offsets applied to Link value for exceptions taken to 
>> non-EL2 modes
>>
>>>
>>>>
>>>> Signed-off-by: Dongjiu Geng 
>>>> Signed-off-by: Haibin Zhang 
>>>>
>>>> ---
>>>> For example, to the undefined instruction injection:
>>>>
>>>> 1. Guest OS call SMC(Secure Monitor Call) instruction in the address
>>>> 0xc025405c, then Guest traps to hypervisor
>>>>
>>>> c0254050:   e59d5028ldr r5, [sp, #40]   ; 0x28
>>>> c0254054:   e3a03001mov r3, #1
>>>> c0254058:   e1a01003mov r1, r3
>>>> c025405c:   e1600070smc 0
>>>> c0254060:   e30a0270movwr0, #41584  ; 0xa270
>>>> c0254064:   e34c00bfmovtr0, #49343  ; 0xc0bf
>>>>
>>>> 2. KVM  injects undefined abort to guest 3. We will find the fault PC
>>>> is 0xc0254058, not 0xc025405c.
>>>>
>>>> [   12.348072] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
>>>> [   12.349786] Modules linked in:
>>>> [   12.350563] CPU: 1 PID: 71 Comm: cat Not tainted 4.1.0-dirty #25
>>>> [   12.352061] Hardware name: Generic DT based system
>>>> [   12.353275] task: d9d08000 ti: d9cfe000 task.ti: d9cfe000
>>>> [   12.354637] PC is at proc_dointvec+0x20/0x60
>>>> [   12.355717] LR is at proc_sys_call_handler+0xb0/0xc4
>>>> [   12.356972] pc : []lr : []psr: a0060013
>>>> [   12.356972] sp : d9cffe90  ip : c0254038  fp : 0001
>>>> [   12.359824] r10: d9cfff80  r9 : 0004  r8 : 
>>>> [   12.361132] r7 : bec21cb0  r6 : d9cffec4  r5 : d9cfff80  r4 : c0e82de0
>>>> [   12.362766] r3 : 0001  r2 : bec21cb0  r1 : 0001  r0 : c0e82de0
>>>> [   12.364400] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  
>>>> Segment user
>>>> [   12.366183] Control: 10c5383d  Table: 59d3406a  DAC: 0015
>>>> [   12.367623] Process cat (pid: 71, stack limit = 0xd9cfe220)
>>>>
>>>> 4. After correct the LR register, it will have right value
>>>>
>>>> [  125.763370] Internal er

Re: [PATCH v3 1/2] acpi: apei: remove the unused dead-code for SEA notification type

2017-10-15 Thread gengdongjiu
Borislav,
  Thank you for your time to review it.

On 2017/10/13 21:21, Borislav Petkov wrote:
>>  .notifier_call = ghes_notify_hed,
>>  };
>>  
>> -#ifdef CONFIG_ACPI_APEI_SEA
>>  static LIST_HEAD(ghes_sea);
> But now those get compiled in on x86 where there's no SEA and where we
> don't need them. So no, I don't think this patch is correct.

If have updated the patch for the x86, you can review the version 4 patches.
thanks.




Re: [PATCH v6 5/7] arm64: kvm: route synchronous external abort exceptions to el2

2017-10-16 Thread gengdongjiu
Hi James,

>
>> Today I added the support to do some minimal emulation for
>> RAS-Error-Record registers, thanks
>> for the good suggestion.
>
> Where can I find this patch?
> I'd like to repost it as part of the SError_rework/RAS/IESB series: this is
> one
> of the bits KVM needs but I didn't touch as it looks like your updated
> version
> of this patch should cover it.

I have updated this patch according your suggestion that do some
emulation for the ERR* trap. and have verified it. but still not sent
it out. Tomorrow I will send it out for your review(now it is Chinese
midnight), if you think it is ok, you can add it as part of the
SError_rework/RAS/IESB series. thanks for your reminder and good
review comments.


>
>
> Thanks,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>


Re: [PATCH] arm64: KVM: set right LR register value for 32 bit guest when inject abort

2017-10-16 Thread gengdongjiu
Hi Marc,

> 
> Please also update the 32bit code accordingly, as it looks broken too.

I have updated the 32 bit code according, in my hand, there is no arm32 host 
environment,
So there is no method to verify it in the arm32 host, only verify the patch in 
the arm64 host.

Anyway I firstly send the patch out for review. Thanks.

> 
> Thanks,
> 
>   M.
> --
> Jazz is not dead. It just smells funny...


Re: [PATCH v4 2/2] acpi: apei: Add SEI notification type support for ARMv8

2017-10-17 Thread gengdongjiu
Have fixed it in the patch v5.

On 2017/10/17 18:20, kbuild test robot wrote:
> Hi Dongjiu,
> 
> [auto build test ERROR on pm/linux-next]
> [also build test ERROR on v4.14-rc5 next-20171016]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Dongjiu-Geng/acpi-apei-remove-the-unused-dead-code-for-SEA-NMI-notification-type/20171017-141237
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
> linux-next
> config: x86_64-kexec (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>drivers/acpi/apei/ghes.c: In function 'ghes_probe':
>>> drivers/acpi/apei/ghes.c:1191:3: error: implicit declaration of function 
>>> 'ghes_abort_add' [-Werror=implicit-function-declaration]
>   ghes_abort_add(ghes);
>   ^~
>drivers/acpi/apei/ghes.c: In function 'ghes_remove':
>>> drivers/acpi/apei/ghes.c:1245:3: error: implicit declaration of function 
>>> 'ghes_abort_remove' [-Werror=implicit-function-declaration]
>   ghes_abort_remove(ghes);
>   ^
>cc1: some warnings being treated as errors
> 
> vim +/ghes_abort_add +1191 drivers/acpi/apei/ghes.c
> 
>   1085
>   1086static int ghes_probe(struct platform_device *ghes_dev)
>   1087{
>   1088struct acpi_hest_generic *generic;
>   1089struct ghes *ghes = NULL;
>   1090
>   1091int rc = -EINVAL;
>   1092
>   1093generic = *(struct acpi_hest_generic 
> **)ghes_dev->dev.platform_data;
>   1094if (!generic->enabled)
>   1095return -ENODEV;
>   1096
>   1097switch (generic->notify.type) {
>   1098case ACPI_HEST_NOTIFY_POLLED:
>   1099case ACPI_HEST_NOTIFY_EXTERNAL:
>   1100case ACPI_HEST_NOTIFY_SCI:
>   1101case ACPI_HEST_NOTIFY_GSIV:
>   1102case ACPI_HEST_NOTIFY_GPIO:
>   1103break;
>   1104
>   1105case ACPI_HEST_NOTIFY_SEA:
>   1106if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA)) {
>   1107pr_warn(GHES_PFX "Generic hardware 
> error source: %d notified via SEA is not supported\n",
>   1108generic->header.source_id);
>   1109rc = -ENOTSUPP;
>   1110goto err;
>   }
>   1112break;
>   1113case ACPI_HEST_NOTIFY_SEI:
>   1114if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
>   1115pr_warn(GHES_PFX "Generic hardware 
> error source: %d notified via SEI is not supported!\n",
>   1116generic->header.source_id);
>   1117goto err;
>   1118}
>   1119break;
>   1120case ACPI_HEST_NOTIFY_NMI:
>   1121if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>   1122pr_warn(GHES_PFX "Generic hardware 
> error source: %d notified via NMI interrupt is not supported!\n",
>   1123generic->header.source_id);
>   1124goto err;
>   1125}
>   1126break;
>   1127case ACPI_HEST_NOTIFY_LOCAL:
>   1128pr_warning(GHES_PFX "Generic hardware error 
> source: %d notified via local interrupt is not supported!\n",
>   1129   generic->header.source_id);
>   1130goto err;
>   1131default:
>   1132pr_warning(FW_WARN GHES_PFX "Unknown 
> notification type: %u for generic hardware error source: %d\n",
>   1133   generic->notify.type, 
> generic->header.source_id);
>   1134goto err;
>   1135}
>   1136
>   1137rc = -EIO;
>   1138if (generic->error_block_length <
>   1139sizeof(struct acpi_hest_generic_status)) {
>   1140pr_warning(FW_BUG GHES_PFX "Invalid error block 
> length: %u for generic hardware error source: %d\n",
>   1141   generic->error_block_length,
>   1142   generic->header.source_id);
>   1143goto err;
>   1144}
>   1145ghes = ghes_new(generic);
>   1146if (IS_ERR(ghes)) {
>   1147rc = PTR_ERR(g

Re: [PATCH] arm64: KVM: set right LR register value for 32 bit guest when inject abort

2017-10-17 Thread gengdongjiu
Hi Christoffer

On 2017/10/17 3:59, Christoffer Dall wrote:
> On Mon, Oct 16, 2017 at 04:10:01PM +0000, gengdongjiu wrote:
>> Hi Marc,
>>
>>>
>>> Please also update the 32bit code accordingly, as it looks broken too.
>>
>> I have updated the 32 bit code according, in my hand, there is no arm32 host 
>> environment,
>> So there is no method to verify it in the arm32 host, only verify the patch 
>> in the arm64 host.
>>
>> Anyway I firstly send the patch out for review. Thanks.
>>
> In this case, if you just clearly specify in the patches you send out
> that the 32-bit one is untested, you can ask someone to test it for you.
> 

Thanks for your reminder, today I found a arm32 board and have tested it.
So I have tested in both arm32 and arm64. please review it. thanks.


> Thanks,
> -Christoffer
> 
> .
> 



Re: [Qemu-devel] [PATCH v3 1/4] ACPI: Add APEI GHES Table Generation support

2017-05-29 Thread gengdongjiu
Dear Laszlo,
  Thank your very much for your review and detailed comment. and very sorry for 
the late response due to recently debug the wholes RAS solution. 

On 2017/5/22 22:23, Laszlo Ersek wrote:
> Keeping some context:
> 
> On 05/12/17 23:00, Laszlo Ersek wrote:
>> On 04/30/17 07:35, Dongjiu Geng wrote:
>>> This implements APEI GHES Table by passing the error cper info to 
>>> the guest via a fw_cfg_blob. After a CPER info is added, an SEA/SEI 
>>> exception will be injected into the guest OS.
>>>
>>> Below is the table layout, the max number of error soure is 11, 
>>> which is classified by notification type.
>>>
>>> etc/acpi/tables etc/hardware_errors
>>>  ==
>>>  +---+
>>> +--+ | address   | +-> +--+
>>> |HEST  + | registers | |   | Error Status |
>>> + ++ | +-+ |   | Data Block 1 |
>>> | | GHES1  | --> | |address1 | +   | ++
>>> | | GHES2  | --> | |address2 | --+ | |  CPER  |
>>> | | GHES3  | --> | |address3 | + | | |  CPER  |
>>> | |    | --> | | ... | | | | |  CPER  |
>>> | | GHES10 | --> | |address10| -+  | | | |  CPER  |
>>> +-++ +-+-+  |  | | +-++
>>> |  | |
>>> |  | +---> +--+
>>> |  |   | Error Status |
>>> |  |   | Data Block 2 |
>>> |  |   | ++
>>> |  |   | |  CPER  |
>>> |  |   | |  CPER  |
>>> |  |   +-++
>>> |  |
>>> |  +-> +--+
>>> |  | Error Status |
>>> |  | Data Block 3 |
>>> |  | ++
>>> |  | |  CPER  |
>>> |  +-++
>>> |...
>>> +> +--+
>>>| Error Status |
>>>| Data Block 10|
>>>| ++
>>>| |  CPER  |
>>>| |  CPER  |
>>>| |  CPER  |
>>>+-++
>>>
>>> Signed-off-by: Dongjiu Geng 
>>> ---
>>>  default-configs/arm-softmmu.mak |   1 +
>>>  hw/acpi/Makefile.objs   |   1 +
>>>  hw/acpi/aml-build.c |   2 +
>>>  hw/acpi/hest_ghes.c | 203 +++
>>>  hw/arm/virt-acpi-build.c|   6 ++
>>>  include/hw/acpi/acpi-defs.h | 227 
>>> 
>>>  include/hw/acpi/aml-build.h |   1 +
>>>  include/hw/acpi/hest_ghes.h |  43 
>>>  8 files changed, 484 insertions(+)
>>>  create mode 100644 hw/acpi/hest_ghes.c  create mode 100644 
>>> include/hw/acpi/hest_ghes.h
> 
>> Next file:
>>
>>> diff --git a/include/hw/acpi/hest_ghes.h 
>>> b/include/hw/acpi/hest_ghes.h new file mode 100644 index 
>>> 000..0cadc2b
>>> --- /dev/null
>>> +++ b/include/hw/acpi/hest_ghes.h
>>> @@ -0,0 +1,43 @@
>>> +#ifndef ACPI_GHES_H
>>> +#define ACPI_GHES_H
>>> +
>>> +#include "hw/acpi/bios-linker-loader.h"
>>> +
>>> +#define GHES_ERRORS_FW_CFG_FILE  "etc/hardware_errors"
>>> +#define GHES_DATA_ADDR_FW_CFG_FILE  "etc/hardware_errors_addr"
>>> +
>>> +#define GAS_ADDRESS_OFFSET  4
>>> +#define ERROR_STATUS_ADDRESS_OFFSET 20
>>> +#define NOTIFICATION_STRUCTURE  32
>>> +
>>> +#define BFAPEI_OK   0
>>> +#define BFAPEI_FAIL 1
>>> +
>>> +/* The max number of error source, the error sources
>>> + * are classified by notification type, below is the definition
>>> + * 0 - Polled
>>> + * 1 - External Interrupt
>>> + * 2 - Local Interrupt
>>> + * 3 - SCI
>>> + * 4 - NMI
>>> + * 5 - CMCI
>>> + * 6 - MCE
>>> + * 7 - GPIO-Signal
>>> + * 8 - ARMv8 SEA
>>> + * 9 - ARMv8 SEI
>>> + * 10 - External Interrupt - GSIV
>>> + */
>>> +#define MAX_ERROR_SOURCE_COUNT_V6   11
>>
>> I'll have to review this header file more thoroughly, once I see the 
>> code that references these macros. For now, I have one comment:
>>
>> (42) I think the notification type list should be removed from this 
>> location. Also, the open-coded value 11 

Re: [RFC/RFT PATCH 3/6] ACPI / APEI: Replace ioremap_page_range() with fixmap

2017-10-31 Thread gengdongjiu
On 2017/10/31 23:38, James Morse wrote:
> CC'd people I've seen posting CPER log fragments, could you give this a
> test on your platforms?
Thanks for the fixing, not found obviously issue.



Re: [PATCH v1 0/3] manually add Error Synchronization Barrier at exception handler entry and exit

2017-11-01 Thread gengdongjiu

On 2017/11/1 19:32, James Morse wrote:
>> RAS&IESB for firmware first support". In Huawei's platform, we do not
>> support IESB, so software needs to insert that.
> Surely you don't implement it because your CPU doesn't need it. Can
> unrecoverable errors really cross an exception without becoming an SError?

CC lishuo and liujun.

Hi James,

we will needs that if we can ensure that the Error is not propagated.
The unrecoverable errors can cross an different exception, so hardware use IESB 
to isolate it.
Afterwards we may also not support IESB(about the definition as shown in [3] ), 
the reason that
we are not support IESB is that hardware colleague think software insert it in 
exception entry/exit can be flexibly.
And ARM spec allow that:
ARM spec allow software inset ESB instead of hardware insert as shown in [1].
About how to use the ESB in the software, the SPEC even give an example[2]:

[1]
AArch_v8A_v8.2_Extension_ARM-ECM-0326763-20-0:
"ARMv8.2 adds an extension to RAS to allow system software to insert implicit 
Error Synchronization Barrier operations at exception handler entry and exit".

[2]
RAS_Extension_PRD03-PRDC-010953-39-0:

Example exception entry sequence
Example 1 shows a typical sequence for exception entry.

Vector: ESB ;; Error Synchronization Barrier
SUB SP,SP,#34*8
STP X0,X1,[SP]  ;; Save all general purpose 
registers
STP X2,X3,[SP,#2*8]
...
STP X28,X29,[SP,#28*8]
MRS X21,SP_EL0
MRS X22,ELR_EL1 ;; Save exception syndrome 
registers
MRS X23,SPSR_EL1
STP X30,X21,[SP,#30*8]
STP X22,X23,[SP,#32*8]
...
MRS X21,DISR_EL1;; Check for deferred error
TBNZ X21,#DISR_A_bit,Error_Handler
MSR DAIFClr,#0xF;; Clear flags
Example 1: Exception entry sequence


[3]
Below is the SCTLR_ELx.IESB definition.
SCTLR_ELx[21]: IESB – Implicit Error Synchronization Barrier enable.
0: Disabled.
1: An implicit ErrorSynchronizationBarrier() call is added:
   After each exception taken to ELx.
   Before the operational pseudocode of each ERET instruction executed at ELx.


Hi lishuo/liujun,
   could you explain more to James about the reason that why we need software 
inserts ESB instead of hardware support IESB?

> 
> The ESB instruction does the barrier, but it also consumes any pending SError.
> As it is this series will silently consume-and-discard uncontainable errors.

In the firmware-first solution, if ESB consumed a SError, it will trap to 
firmware.
firmware will handle it and record the Error. the DISR_EL1 will not record, so 
no need to Check for deferred error

But if it is non-firmware solution, it will not trap, it needs to check the 
defered error through DISR_EL1 and
call the Error_Handler, as shown in [2].
In the kernel patchset, I do not are check the defered error. If you think it 
should be added, I can add it.

welcome comments.




Re: [PATCH v1 1/3] arm64: add a macro for SError synchronization

2017-11-01 Thread gengdongjiu
Hi Robin,

On 2017/11/1 19:24, Robin Murphy wrote:
>> +esb
>> +alternative_else_nop_endif
>> +1:
>> +.endm
> Having a branch in here is pretty horrible, and furthermore using label
> number 1 has a pretty high chance of subtly breaking code where this
> macro is inserted.
> 
> Can we not somehow nest or combine the alternative conditions here?

I found it will report error if combine the alternative conditions here.

For example:

+   .macro  error_synchronize
+alternative_if ARM64_HAS_IESB
+alternative_if ARM64_HAS_RAS_EXTN
+   esb
+alternative_else_nop_endif
+alternative_else_nop_endif
+   .endm

And even using b.eq/cbz instruction in the alternative instruction in 
arch/arm64/kernel/entry.S,
it will report Error.

For example below

alternative_if ARM64_HAS_PAN

b.eqx
alternative_else_nop_endif

I do not dig it deeply, do you know the reason about it or good suggestion 
about that?
Thanks a lot in advance.


> 
> Robin.
> 
>>  #endif  /* __ASM_ASSEMBLER_H */



Re: [PATCH v1 1/3] arm64: add a macro for SError synchronization

2017-11-01 Thread gengdongjiu
> 
> On 01/11/17 12:54, gengdongjiu wrote:
> > Hi Robin,
> >
> > On 2017/11/1 19:24, Robin Murphy wrote:
> >>> + esb
> >>> +alternative_else_nop_endif
> >>> +1:
> >>> + .endm
> >> Having a branch in here is pretty horrible, and furthermore using
> >> label number 1 has a pretty high chance of subtly breaking code where
> >> this macro is inserted.
> >>
> >> Can we not somehow nest or combine the alternative conditions here?
> >
> > I found it will report error if combine the alternative conditions here.
> >
> > For example:
> >
> > +   .macro  error_synchronize
> > +alternative_if ARM64_HAS_IESB
> > +alternative_if ARM64_HAS_RAS_EXTN
> > +   esb
> > +alternative_else_nop_endif
> > +alternative_else_nop_endif
> > +   .endm
> >
> > And even using b.eq/cbz instruction in the alternative instruction in
> > arch/arm64/kernel/entry.S, it will report Error.
> >
> > For example below
> >
> > alternative_if ARM64_HAS_PAN
> > 
> > b.eqx
> > alternative_else_nop_endif
> >
> > I do not dig it deeply, do you know the reason about it or good suggestion 
> > about that?
> > Thanks a lot in advance.
> 
> Actually, on second look ARM64_HAS_RAS_EXTN doesn't even matter - ESB is a 
> hint, so if the CPU doesn't have RAS it should behave as a
> NOP anyway.


Yes, you are right. It is "HINT #16"

So in fact it can be written below:

+   .macro  error_synchronize
+alternative_if_not ARM64_HAS_IESB
+   esb
+alternative_else_nop_endif
+   .endm

If written to that, whether it will be strange? although ESB should behave as a
NOP anyway if the CPU doesn't have RAS. 

> 
> On which note, since I don't see one here - are any of those other patches 
> defining an "esb" assembly macro similar to the inline asm case?
> If not then this isn't going to build with older toolchains - perhaps we 
> should just use the raw hint syntax directly.


Sorry for that I do not push the dependent patch[1].
The "ESB" is defined as a macro 

/*
+ * RAS Error Synchronization barrier
+ */
+   .macro  esb
+   hint#16
+   .endm
+
+/*

[1]
https://www.spinics.net/lists/arm-kernel/msg612884.html

> 
> Robin.


Re: [PATCH v1 1/3] arm64: add a macro for SError synchronization

2017-11-02 Thread gengdongjiu

On 2017/11/1 22:16, Mark Rutland wrote:
>> it will report Error.
> Alternatives cannot be nested. You need to define a cap like:
> 
>   ARM64_HAS_RAS_NOT_IESB
> 
> ... which is set when ARM64_HAS_RAS_EXTN && !ARM64_HAS_IESB.
> 
> Then you can do:
> 
> alternative_if ARM64_HAS_RAS_NOT_IESB
>   esb
> alternative_else_nop_endif
> 
> See ARM64_ALT_PAN_NOT_UAO for an example.
> 
> That said, as Robin points out we may not even need the alternative.

Ok, got it. thank you very much for your point and opinion


> 
> Thanks,
> Mark.



Re: [RFC/RFT PATCH 3/6] ACPI / APEI: Replace ioremap_page_range() with fixmap

2017-11-02 Thread gengdongjiu
Hi James,

> 
> Can I take that as a 'Tested-by:'?
> 
> These tags also let us record who has a system that can test changes to this 
> driver.

sure.
Thanks for the fixing.
Qiang Zheng who is my colleague have tested it.

CC  Qiang.

Tested-by:  Qiang Zheng 

> 
> 
> Thanks,
> 
> James
> 
> 
> 
> .
> 



consult a question about action_result() in memory_failure()

2017-10-24 Thread gengdongjiu
Hi Naoya,
   very sorry to disturb you, I want to consult you about the handing to error 
page type in memory_failure().
If the error page is the current task's page table, will the memory_failure not 
handling that?
>From my test, I found the memory_failure() consider the error page table 
>physical address as unknown page.
why it does not handling the page table page error? Thanks a lot.

commit 64d37a2baf5e5c0f1009c0ef290a9027de721d66
Author: Naoya Horiguchi 
Date:   Wed Apr 15 16:13:05 2015 -0700

mm/memory-failure.c: define page types for action_result() in one place

This cleanup patch moves all strings passed to action_result() into a
singl= e array action_page_type so that a reader can easily find which
kind of actio= n results are possible.  And this patch also fixes the
odd lines to be printed out, like "unknown page state page" or "free
buddy, 2nd try page".

[a...@linux-foundation.org: rename messages, per David]
[a...@linux-foundation.org: s/DIRTY_UNEVICTABLE_LRU/CLEAN_UNEVICTABLE_LRU', 
per Andi]
Signed-off-by: Naoya Horiguchi 
Reviewed-by: Andi Kleen 
Cc: Tony Luck 
Cc: "Xie XiuQi" 
Cc: Steven Rostedt 
Cc: Chen Gong 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d487f8d..5fd8931 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -521,6 +521,52 @@ static const char *action_name[] = {
[RECOVERED] = "Recovered",
 };

+enum action_page_type {
+   MSG_KERNEL,
+   MSG_KERNEL_HIGH_ORDER,
+   MSG_SLAB,
+   MSG_DIFFERENT_COMPOUND,
+   MSG_POISONED_HUGE,
+   MSG_HUGE,
+   MSG_FREE_HUGE,
+   MSG_UNMAP_FAILED,
+   MSG_DIRTY_SWAPCACHE,
+   MSG_CLEAN_SWAPCACHE,
+   MSG_DIRTY_MLOCKED_LRU,
+   MSG_CLEAN_MLOCKED_LRU,
+   MSG_DIRTY_UNEVICTABLE_LRU,
+   MSG_CLEAN_UNEVICTABLE_LRU,
+   MSG_DIRTY_LRU,
+   MSG_CLEAN_LRU,
+   MSG_TRUNCATED_LRU,
+   MSG_BUDDY,
+   MSG_BUDDY_2ND,
+   MSG_UNKNOWN,
+};
+
+static const char * const action_page_types[] = {
+   [MSG_KERNEL]= "reserved kernel page",
+   [MSG_KERNEL_HIGH_ORDER] = "high-order kernel page",
+   [MSG_SLAB]  = "kernel slab page",
+   [MSG_DIFFERENT_COMPOUND]= "different compound page after 
locking",
+   [MSG_POISONED_HUGE] = "huge page already hardware poisoned",
+   [MSG_HUGE]  = "huge page",
+   [MSG_FREE_HUGE] = "free huge page",
+   [MSG_UNMAP_FAILED]  = "unmapping failed page",
+   [MSG_DIRTY_SWAPCACHE]   = "dirty swapcache page",
+   [MSG_CLEAN_SWAPCACHE]   = "clean swapcache page",
+   [MSG_DIRTY_MLOCKED_LRU] = "dirty mlocked LRU page",
+   [MSG_CLEAN_MLOCKED_LRU] = "clean mlocked LRU page",
+   [MSG_DIRTY_UNEVICTABLE_LRU] = "dirty unevictable LRU page",
+   [MSG_CLEAN_UNEVICTABLE_LRU] = "clean unevictable LRU page",
+   [MSG_DIRTY_LRU] = "dirty LRU page",
+   [MSG_CLEAN_LRU] = "clean LRU page",
+   [MSG_TRUNCATED_LRU] = "already truncated LRU page",
+   [MSG_BUDDY] = "free buddy page",
+   [MSG_BUDDY_2ND] = "free buddy page (2nd try)",
+   [MSG_UNKNOWN]   = "unknown page",
+};



Re: [PATCH v5 1/2] acpi: apei: remove the unused dead-code for SEA/NMI notification type

2017-10-17 Thread gengdongjiu
Hi,Borislav

On 2017/10/18 0:43, Borislav Petkov wrote:
>> -}
>> -
> So GHES NMI notification method is x86-only AFAIK and HAVE_ACPI_APEI_NMI
> is selected only on x86. Why are you removing those guards? Does ARM
> have ACPI_HEST_NOTIFY_NMI notification type now too?

ARM does not have ACPI_HEST_NOTIFY_NMI notification, which should only used by 
x86.
In the code, I see those guards are never used. As you see, if the 
'CONFIG_HAVE_ACPI_APEI_NMI'
does not defined in [1], it will print error info and goto [2], in the [2], it 
will return error,
then the probe for GHES NMI is failed. so those guards( ghes_nmi_add() and 
ghes_nmi_remove()) have no chance
to execute. so I redefine them to NULL for compiling[3].

static int ghes_probe(struct platform_device *ghes_dev)
{
  struct acpi_hest_generic *generic;
  struct ghes *ghes = NULL;

  int rc = -EINVAL;

   switch (generic->notify.type) {
   ...
   case ACPI_HEST_NOTIFY_NMI:[1]
  if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
  pr_warn(GHES_PFX "Generic hardware error source: %d notified via 
NMI interrupt is not supported!\n",
  generic->header.source_id);
  goto err;
  }
   ..
  }
  switch (generic->notify.type) {
  ...
  case ACPI_HEST_NOTIFY_NMI:
 ghes_nmi_add(ghes);
 break;
  }
  ..
 err:[2]
  if (ghes) {
  ghes_fini(ghes);
  kfree(ghes);
  }
  return rc;
}

[3]:
-static inline void ghes_nmi_add(struct ghes *ghes)
-{
-   pr_err(GHES_PFX "ID: %d, trying to add NMI notification which is not 
supported!\n",
-  ghes->generic->header.source_id);
-   BUG();
-}
-
-static inline void ghes_nmi_remove(struct ghes *ghes)
-{
-   pr_err(GHES_PFX "ID: %d, trying to remove NMI notification which is not 
supported!\n",
-  ghes->generic->header.source_id);
-   BUG();
-}
-
-static inline void ghes_nmi_init_cxt(void)
-{
-}
+static inline void ghes_nmi_add(struct ghes *ghes) { }
+static inline void ghes_nmi_remove(struct ghes *ghes) { }
+static inline void ghes_nmi_init_cxt(void) { }






Re: [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8

2017-10-17 Thread gengdongjiu
Hi Borislav,

On 2017/10/18 1:06, Borislav Petkov wrote:
> On Tue, Oct 17, 2017 at 04:02:21PM +0800, Dongjiu Geng wrote:
>> ARMv8.2 requires implementation of the RAS extension, in
>> this extension it adds SEI(SError Interrupt) notification
>> type, this patch adds new GHES error source SEI handling
>> functions. Because this error source parsing and handling
>> methods are similar with the SEA. So share some SEA handling
>> functions with the SEI
>>
>> Expose one API ghes_notify_abort() to external users. External
>> modules can call this exposed API to parse and handle the
>> SEA or SEI.
>>
>> Note: For the SEI(SError Interrupt), it is asynchronous external
>> abort, the error address recorded by firmware may be not accurate.
>> If not accurate, EL3 firmware needs to identify the address to a
>> invalid value.
>>
>> Cc: Borislav Petkov 
>> Cc: James Morse 
>> Signed-off-by: Dongjiu Geng 
>> Tested-by: Tyler Baicar 
>> Tested-by: Dongjiu Geng 
>> ---
>>  arch/arm64/mm/fault.c |  4 +--
>>  drivers/acpi/apei/Kconfig | 15 ++
>>  drivers/acpi/apei/ghes.c  | 71 
>> ++-
>>  include/acpi/ghes.h   |  2 +-
>>  4 files changed, 70 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 2509e4f..c98c1b3 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -585,7 +585,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>> struct pt_regs *regs)
>>  if (interrupts_enabled(regs))
>>  nmi_enter();
>>  
>> -ret = ghes_notify_sea();
>> +ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
>>  
>>  if (interrupts_enabled(regs))
>>  nmi_exit();
>> @@ -682,7 +682,7 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>>  int ret = -ENOENT;
>>  
>>  if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
>> -ret = ghes_notify_sea();
>> +ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
>>  
>>  return ret;
>>  }
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index de14d49..47fcb0c 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -54,6 +54,21 @@ config ACPI_APEI_SEA
>>option allows the OS to look for such hardware error record, and
>>take appropriate action.
>>  
>> +config ACPI_APEI_SEI
>> +bool "APEI Asynchronous SError Interrupt logging/recovering support"
> 
> What is "SError" ?
SError is System Error, which is a asynchronous exception in the Internal CPU.

In the ARM RAS Extension, there are mainly two type abort for CPU:
SEA(Synchronous External Abort)
SEI(SError Interrupt)

> 
>> +depends on ARM64 && ACPI_APEI_GHES
>> +default y
>> +help
>> +  This option should be enabled if the system supports
>> +  firmware first handling of SEI (asynchronous SError interrupt).
>> +
>> +  SEI happens with asynchronous external abort for errors on device
>> +  memory reads on ARMv8 systems. If a system supports firmware first
>> +  handling of SEI, the platform analyzes and handles hardware error
>> +  notifications from SEI, and it may then form a HW error record for
>> +  the OS to parse and handle. This option allows the OS to look for
>> +  such hardware error record, and take appropriate action.
>> +
>>  config ACPI_APEI_MEMORY_FAILURE
>>  bool "APEI memory error recovering support"
>>  depends on ACPI_APEI && MEMORY_FAILURE
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 3eee30a..24b4233 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -815,43 +815,67 @@ static struct notifier_block ghes_notifier_hed = {
>>  
>>  #ifdef CONFIG_ACPI_APEI_SEA
>>  static LIST_HEAD(ghes_sea);
>> +#endif
>> +
>> +#ifdef CONFIG_ACPI_APEI_SEI
>> +static LIST_HEAD(ghes_sei);
>> +#endif
>>  
>> +#if defined(CONFIG_ACPI_APEI_SEA) || defined(CONFIG_ACPI_APEI_SEI)
>>  /*
>> - * Return 0 only if one of the SEA error sources successfully reported an 
>> error
>> - * record sent from the firmware.
>> + * Return 0 only if one of the SEA or SEI error sources successfully
>> + * reported an error record sent from the firmware.
>>   */
>> -int ghes_notify_sea(void)
>> +int ghes_notify_abort(u8 type)
> 
> Adding "abort" everywhere makes it worse: what does this function now
> do, notify or abort or both?
This function is used to notify APEI driver to parse APEI table and do some
recovery, such as calling memrory_failure() to identify the address to a
poisoned memory and deliver SIGBUS to related application.

> 
> Ditto for the remaining ones. Please think of a better name.
Ok, thanks for your good suggestion, I will consider to use a better name.

> 
> 
> ghes_notify_sei() sounds much better to me, for example. And then you
> can add a whole set of *_sei() functions similar to the *_sea() ones and
> then you don't have to do all that 

Re: [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8

2017-10-18 Thread gengdongjiu

On 2017/10/18 17:06, Borislav Petkov wrote:
> On Wed, Oct 18, 2017 at 01:00:44PM +0800, gengdongjiu wrote:
>> SError is System Error, which is a asynchronous exception in the Internal 
>> CPU.
>>
>> In the ARM RAS Extension, there are mainly two type abort for CPU:
>> SEA(Synchronous External Abort)
>> SEI(SError Interrupt)
> And you're not writing it out as "System Error" because ...?

Thanks Borislav, can I write it as asynchronous exception or asynchronous abort?



Re: [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8

2017-10-18 Thread gengdongjiu
On 2017/10/18 17:44, James Morse wrote:
>> The thing is abbreviated as "SEI" and apparently means "System Error
>> Interrupt". Nothing else.
> ARM has 'external abort', which are either synchronous or asynchronous, both 
> are
> delivered as different types of exception.
> 
> Asynchronous external abort is treated as a special kind of interrupt, 'SError
> Interrupt', (where SError stands for System Error, but its rarely written like
> that). 'SEI' is a relatively new abbreviation for SError interrupt.
> 
> 
> What should we call this thing? In the ACPI code I'd prefer 'SEI' as that is
> what the ACPI spec calls it. Here we are talking about an GHES notification.
> 
> But in the arm64 arch code this should be called SError Interrupt as this is
> what the ARM-ARM calls it. This code cares about exception routing and 
> interrupt
> masking.
> 
> 
> But, I don't really care.

Thanks very much James's clear explanation.
I agree with James.

In the ACPI sepc, we usually call SEI as SError Interrupt, we rarely call 
SError to System Error,
Anyway I will explain clearly about the abbreviations in my next version patch.



Re: [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8

2017-10-18 Thread gengdongjiu
On 2017/10/18 18:04, Borislav Petkov wrote:
> On Wed, Oct 18, 2017 at 10:44:48AM +0100, James Morse wrote:
>> What should we call this thing?
> My only pet peeve is having abbreviations everywhere and nothing
> explaining them.
> 
> So whatever you guys decide upon and as long as there's an explanation
> what those things mean and you stick with that name, is perfectly fine
> with me.

surely, we will, thanks Borislav's reminder.



Re: [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8

2017-10-18 Thread gengdongjiu
Hi james,

On 2017/10/18 18:26, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 17/10/17 09:02, Dongjiu Geng wrote:
>> ARMv8.2 requires implementation of the RAS extension, in
>> this extension it adds SEI(SError Interrupt) notification
>> type, this patch adds new GHES error source SEI handling
>> functions.
> 
> This paragraph is merging two things that aren't related.
> The 'ARM v8.2 architecture extensions' have some RAS bits, which if your CPU
> implements v8.2 are required.
> 
> ACPIv6.1 added NOTIFY_SEI as a notification type for ARMv8 systems.
> 
> This patch adds a GHES function for NOTIFY_SEI. Please leave the CPU RAS
> extensions out of it.
Ok, thanks

> 
> 
>> Because this error source parsing and handling
>> methods are similar with the SEA. So share some SEA handling
>> functions with the SEI
>>
>> Expose one API ghes_notify_abort() to external users. External
>> modules can call this exposed API to parse and handle the
>> SEA or SEI.
> 
> This series doesn't add a caller/user for this new API, so why do we need to 
> do
> this now?
 there is caller and user, it is in another series(RAS virtualization series), 
not included in this series

As shown:

+int handle_guest_sei(unsigned int esr)
+{
+   int ret = -ENOENT;
+
+   if (IS_ENABLED(CONFIG_ACPI_APEI_SEI))
+   ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEI);
+
+   return ret;
+}

> 
> (I still haven't had a usable answer for 'what does your firmware do when 
> SError
> is masked', but I'll go beat that drum on the other thread).
sorry for my late response due to resent busy, I will answer your question in 
another thread.

May be tomorrow.


in short, regardless the physical SError is masked or unmasked, firmware will 
jump to
the corresponding SEA/SEI exception vector entry. there is only one PSTATE.DAIF 
which will be shared by different EL,
regardless EL1,EL2, EL3.

> 
> 
> More important for the APEI code is: How do SEA and SEI interact?
> 
> As far as I can see they can both interrupt each other, which isn't something
> the single in_nmi() path in APEI can handle. I thinks we should fix this 
> first.
> (I'll try and polish my RFC that had a stab at that...)
if you have fix patch, you CC me. thanks.

> 
> 
> SEA gets away with a lot of things because its synchronous. SEI isn't. Xie 
> XiuQi
> pointed to the memory_failure_queue() code. We can use this directly from SEA,
> but not SEI. (what happens if an SError arrives while we are queueing
> memory_failure work from an IRQ).
do you mean SError can interrupt memory_failure work from an IRQ?
memory_failure is in an process context, and in a work queue, not IRQ context.


> 
> The one that scares me is the trace-point reporting stuff. What happens if an
> SError arrives while we are enabling a trace point? (these are static-keys 
> right?)
For the trace-point issue, may be we can consider it in the next step.
Now I am not consider the trace-point issue.


> 
> 
> I don't think we can just plumb SEI in like this and be done with it.
> (I'm looking at teasing out the estatus cache code from being x86:NMI only. 
> This
> way we solve the same 'cant do this from NMI context' with the same code'.)
> 
> 
> Thanks,
> 
> James
> 
> 
> 
> boring nits below:
> 
>> Note: For the SEI(SError Interrupt), it is asynchronous external
>> abort, the error address recorded by firmware may be not accurate.
>> If not accurate, EL3 firmware needs to identify the address to a
>> invalid value.
> 
> This paragraph keeps cropping up. Who expects an address with an SError?
> We don't get one for IRQs, but that never needs stating.
> 
> 
>> Cc: Borislav Petkov 
>> Cc: James Morse 
>> Signed-off-by: Dongjiu Geng 
>> Tested-by: Tyler Baicar 
> 
>> Tested-by: Dongjiu Geng 
> (It's expected you test your own code)

Ok

> 
> 
> 
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 2509e4f..c98c1b3 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -585,7 +585,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>> struct pt_regs *regs)
>>  if (interrupts_enabled(regs))
>>  nmi_enter();
>>  
>> -ret = ghes_notify_sea();
>> +ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
>>  
>>  if (interrupts_enabled(regs))
>>  nmi_exit();
>> @@ -682,7 +682,7 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>>  int ret = -ENOENT;
>>  
>>  if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
>> -ret = ghes_notify_sea();
>> +ret = ghes_notify_abort(ACPI_HEST_NOTIFY_SEA);
>>  
>>  return ret;
>>  }
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index de14d49..47fcb0c 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -54,6 +54,21 @@ config ACPI_APEI_SEA
>>option allows the OS to look for such hardware error record, and
>>take appropriate action.
>>  
>> +config ACPI_APEI_SEI
>> 

Re: [PATCH v5 1/2] acpi: apei: remove the unused dead-code for SEA/NMI notification type

2017-10-18 Thread gengdongjiu

On 2017/10/18 18:17, Borislav Petkov wrote:
> On Wed, Oct 18, 2017 at 11:04:28AM +0800, gengdongjiu wrote:
>> ARM does not have ACPI_HEST_NOTIFY_NMI notification, which should only
>> used by x86. In the code, I see those guards are never used.
> Yeah, I see it now.

Borislav/Rafael,
  For this patch(the first one), whether it can be firstly applied?
This patch is code clean and not related with the second one.
For the second, I may discuss more with James.
Thanks so much in advance.

> 
> Thx.
> 
> -- 



Re: [PATCH v5 2/2] acpi: apei: Add SEI notification type support for ARMv8

2017-10-18 Thread gengdongjiu
HI James,
   Thanks for the mail, for your some question, I think that is how to use 
NOTIFY_SEI, this patch just add a API to other user calling(such as KVM), 
because KVM needs to use that.

> 
> Hi Dongjiu Geng,
> 
> On 17/10/17 09:02, Dongjiu Geng wrote:
> > ARMv8.2 requires implementation of the RAS extension, in this
> > extension it adds SEI(SError Interrupt) notification type, this patch
> > adds new GHES error source SEI handling functions.
> 
> This paragraph is merging two things that aren't related.
> The 'ARM v8.2 architecture extensions' have some RAS bits, which if your CPU 
> implements v8.2 are required.
> 
> ACPIv6.1 added NOTIFY_SEI as a notification type for ARMv8 systems.
> 
> This patch adds a GHES function for NOTIFY_SEI. Please leave the CPU RAS 
> extensions out of it.
> 
> 
> > Because this error source parsing and handling methods are similar
> > with the SEA. So share some SEA handling functions with the SEI
> >
> > Expose one API ghes_notify_abort() to external users. External modules
> > can call this exposed API to parse and handle the SEA or SEI.
> 
> This series doesn't add a caller/user for this new API, so why do we need to 
> do this now?

In RAS virtualization patch, for SEI, firstly I does not call APEI GHES driver, 
you suggest me calling APEI code. So I add NOTIFY_SEI support.


> 
> (I still haven't had a usable answer for 'what does your firmware do when 
> SError is masked', but I'll go beat that drum on the other thread).
> 
> 
> More important for the APEI code is: How do SEA and SEI interact?

The SEA and SEI can interact each other, in some case, it should be allow 
nesting

maybe we need to consider a better way. But I think the method should be 
implemented in another place, not in this pace.
That is say, it is implemented in the caller or user. 

this patch just add a NOTIFY_SEI support, so that other user can call such as 
KVM.

> 
> As far as I can see they can both interrupt each other, which isn't something 
> the single in_nmi() path in APEI can handle. I thinks we should
> fix this first.
> (I'll try and polish my RFC that had a stab at that...)

Your fixing method use some lock? but I think it is not a good method. For 
example, when we handling SEI, then happen SEA, I think we need to handle the 
SEA immediately instead of waiting SEI finish handling.


> 
> 
> SEA gets away with a lot of things because its synchronous. SEI isn't. Xie 
> XiuQi pointed to the memory_failure_queue() code. We can use
> this directly from SEA, but not SEI. (what happens if an SError arrives while 
> we are queueing memory_failure work from an IRQ).

From my understand, xiexiuqi's code mainly solve 
memory_failure() execution later than SEA/SEI handling, memory_failure() 
execution is in a workqueue process context,not in a SEA/SEI context. They are 
in different contex.
For example, after the SEA/SEI handing finished, the memory_failure() workqueue 
does not start running。so he add a flag to make sure this workqueue can be 
timely scheduled.

For your question: "what happens if an SError arrives while we are queueing 
memory_failure work from an IRQ",
I think SEA have the same issue, For example, when we are queueing 
memory_failure work from an IRQ, it happen SEA.
Do you mean xiexiuqi's patches can solve this problem?

> 
> The one that scares me is the trace-point reporting stuff. What happens if an 
> SError arrives while we are enabling a trace point? (these are
> static-keys right?)
> 
> 
> I don't think we can just plumb SEI in like this and be done with it.
> (I'm looking at teasing out the estatus cache code from being x86:NMI only. 
> This way we solve the same 'cant do this from NMI context'
> with the same code'.)

Do you mean like this for caller/user to make sure it is from NMI context?  Or 
do something in the GHES driver code to make sure it is in NMI context.

.
if (IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
if (interrupts_enabled(regs))
nmi_enter();

ret = ghes_notify_seI();

if (interrupts_enabled(regs))
nmi_exit();
}
..

> 
> 
> Thanks,
> 
> James
> 
> 
> 
> boring nits below:
> 
> > Note: For the SEI(SError Interrupt), it is asynchronous external
> > abort, the error address recorded by firmware may be not accurate.
> > If not accurate, EL3 firmware needs to identify the address to a
> > invalid value.
> 
> This paragraph keeps cropping up. Who expects an address with an SError?
> We don't get one for IRQs, but that never needs stating.

I do not check more code about the IRQ. Do you mean there is no address for IRQ?
If so,  why you have question "what happens if an SError arrives while we are 
queueing memory_failure work from an IRQ"?
Only when have address, the memory_failure work can be called.


> 
> 
> > Cc: Borislav Petkov 
> > Cc: James Morse 
> > Signed-off-by: Dongjiu Geng 
> > Tested-b

Re: [PATCH v5 1/2] acpi: apei: remove the unused dead-code for SEA/NMI notification type

2017-10-18 Thread gengdongjiu
> 
> On Wed, Oct 18, 2017 at 08:27:00PM +0800, gengdongjiu wrote:
> >   For this patch(the first one), whether it can be firstly applied?
> 
> Sure:
> 
> Reviewed-by: Borislav Petkov 

Thanks Boris.

Hi Rafael,
   If you think this patch is no problem, you can firstly apply this one. Boris 
have reviewed it. The first one is not related with the second. For the second 
one, I may need discuss more with James. Thanks so much.


> 
> --
> Regards/Gruss,
> Boris.
> 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
> --


Re: [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace

2017-10-19 Thread gengdongjiu


On 2017/10/7 1:31, James Morse wrote:
> Hi gengdongjiu,
> 
> On 27/09/17 12:07, gengdongjiu wrote:
>> On 2017/9/23 0:51, James Morse wrote:
>>> If this wasn't a firmware-first notification, then you're right KVM hands 
>>> the
>>> guest an asynchronous external abort. This could be considered a bug in 
>>> KVM. (we
>>> can discuss with Marc and Christoffer what it should do), but:
>>>
>>> I'm not sure what scenario you could see this in: surely all your
>>> CPU:external-aborts are taken to EL3 by SCR_EL3.EA and become firmware-first
>>> notifications. So they should always be claimed by APEI.
> 
>> Yes, if it is firmware-first we should not exist such issue.
> 
> [...]
> 
>>> What you may be seeing is some awkwardness with the change in the SError ESR
>>> with v8.2. Previously the VSE mechanism injected an impdef SError, (but they
>>> were all impdef so it didn't matter).
>>> With VSESR_EL2 KVM has to specify one, and all-zeros is a bad choice as this
>>> means 'classified as a RAS error ... unknown!'.
> 
>>> I have a patch in the upcoming SError/RAS series that changes KVMs 
>>> virtual-abort
>>> code to specify an impdef ESR for this path.
> 
> https://www.spinics.net/lists/arm-kernel/msg609589.html
> 
> 
>> Before I remember Marc and you suggest me specify the an impdef ESR (set the 
>> vsesr_el2) in
>> the userspace,
> 
> If Qemu/kvmtool wants to emulate a machine that notifies the guest about
> firmware-first RAS Errors using SError/SEI, it needs to decide when to send
> these SError and what ESR to specify.
yes, it is. I agree.

> 
> 
>> now do you want to specify an impdef ESR in KVM instead of usrspace?
> 
> No, that patch is just trying to fixup the existing cases where KVM already
> injects an SError. I'm just trying to keep the behaviour the-same:
> 
> Before the RAS Extensions the guest would always get an impdef SError ESR.
> After the RAS Extensions KVM has to pick an ESR, as otherwise the guest gets 
> the
> hardware-reset value of VSESR_EL2. On the FVP this is all-zeros, which is a 
> RAS
> encoding. That patch changes it to still be an impdef SError ESR.
> 
> 
>> if setting the vsesr_el2 in the KVM, whether user-space need to specify?
> 
> I think we need a KVM CAP API to do this. With that patch it can be wired into
> pend_guest_serror(), which takes the ESR to make pending if the CPU supports 
> it.

For this CAP API, I have made a patch in the new series patches.
> 
> It's not clear to me whether its useful for user-space to make an SError 
> pending
> even if it can't set the ESR
why it can not set the ESR?
In the KVM, we can return a encoding fault info to userspace, then user space 
can
specify its own ESR for guest.
I already made a patch for it.


> 
> [...]
> 
>>> Because the SEI notification depends on v8.2 I'd like to get the SError/RAS
>>> series posted (currently re-testing), then I'll pick up enough of the 
>>> patches
>>> you've posted for a consolidated version of the series, and we can take the
>>> discussion from there.
> 
>> James, it is great, we can make a consolidated version of the series.
> 
> We need to get some of the wider issues fixed first, the big-ugly one is
> memory_failure_queue() not being NMI safe. (this isn't a problem for SEA as 
> this
> would only become re-entrant if the kernel text was corrupt). It is a problem
> for SEI and SDEI.
 For memory_failure_queue(), I think the big problem is it is in a process 
context, not error handling context.
there are two context. and the memory_failure_queue() is scheduled later than 
the error handling.


> 
> 
>>> I'd still like to know what your firmware does if the normal-world believes 
>>> its
>>> masked physical-SError and you want to hand it an SEI notification.
> 
> Aha, thanks for this:
> 
>> firstly the physical-SError that happened in the EL2/EL1/EL1 can not be 
>> masked if SCR_EL3.EA is set.
> 
> Masked for the CPU because the CPU can deliver the SError to EL3.
> 
> What about software? Code at EL1 and EL2 each have a PSTATE.A bit they may 
> have
> set. HCR_EL2.VSE respects EL1's PSTATE.A ... the question is does your 
> firmware
> respect the PSTATE.A value of the exception level that SError are routed to?

Before route to the target EL, software set the spsr_el3.A to 1, then "eret", 
the PSTATE.A will be to 1.

Note:
PSTATE.A is shared by different EL, in the hardware, it is one register, not 
many registers.
spsr_elx has more registers, such as 

re: [PATCH v2] KVM: arm/arm64: Fix external abort type matching

2017-10-27 Thread gengdongjiu
> 
> On Thu, Oct 26 2017 at  6:07:01 pm BST, Dongjiu Geng  
> wrote:
> > For this matching, current code using the {I,D}FSC range to match
> > kvm_vcpu_trap_get_fault_type() return value, but
> > kvm_vcpu_trap_get_fault_type() only return the part {I,D}FSC instead
> > of whole, so fix this issue
> >
> > Value   Type
> > 0x10FSC_SEA
> > 0x14FSC_SEA_TTW0 FSC_SEA_TTW1 FSC_SEA_TTW2 FSC_SEA_TTW3
> > 0x18FSC_SECC
> > 0x1cFSC_SECC_TTW0 FSC_SECC_TTW1 FSC_SECC_TTW2 FSC_SECC_TTW3
> >
> > CC: James Morse 
> > CC: Tyler Baicar 
> > Signed-off-by: Dongjiu Geng 
> >
> > ---
> > As shown below code:
> >
> > The kvm_vcpu_trap_get_fault_type() only return {I,D}FSC bit[5]:bit[2],
> > not the whole {I,D}FSC, but FSC_SEA_TTWx and FSC_SECC_TTWx are the
> > whole {I,D}FSC value.
> >
> > static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu
> > *vcpu) {
> > return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_FSC_TYPE; }
> >
> > static inline bool kvm_vcpu_dabt_isextabt(const struct kvm_vcpu *vcpu)
> > {
> > switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> > case FSC_SEA:
> > case FSC_SEA_TTW0:
> > case FSC_SEA_TTW1:
> > case FSC_SEA_TTW2:
> > case FSC_SEA_TTW3:
> > case FSC_SECC:
> > case FSC_SECC_TTW0:
> > case FSC_SECC_TTW1:
> > case FSC_SECC_TTW2:
> > case FSC_SECC_TTW3:
> > return true;
> > default:
> > return false;
> > }
> > }
> > ---
> >  arch/arm/include/asm/kvm_arm.h   | 10 ++
> >  arch/arm/include/asm/kvm_emulate.h   | 10 ++
> >  arch/arm64/include/asm/kvm_arm.h | 10 ++
> >  arch/arm64/include/asm/kvm_emulate.h | 10 ++
> >  4 files changed, 8 insertions(+), 32 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/kvm_arm.h
> > b/arch/arm/include/asm/kvm_arm.h index c878145..b4b155c 100644
> > --- a/arch/arm/include/asm/kvm_arm.h
> > +++ b/arch/arm/include/asm/kvm_arm.h
> > @@ -188,15 +188,9 @@
> >  #define FSC_ACCESS (0x08)
> >  #define FSC_PERM   (0x0c)
> >  #define FSC_SEA(0x10)
> > -#define FSC_SEA_TTW0   (0x14)
> > -#define FSC_SEA_TTW1   (0x15)
> > -#define FSC_SEA_TTW2   (0x16)
> > -#define FSC_SEA_TTW3   (0x17)
> > +#define FSC_SEA_TTW(0x14)
> >  #define FSC_SECC   (0x18)
> > -#define FSC_SECC_TTW0  (0x1c)
> > -#define FSC_SECC_TTW1  (0x1d)
> > -#define FSC_SECC_TTW2  (0x1e)
> > -#define FSC_SECC_TTW3  (0x1f)
> > +#define FSC_SECC_TTW   (0x1c)
> >
> >  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> >  #define HPFAR_MASK (~0xf)
> > diff --git a/arch/arm/include/asm/kvm_emulate.h
> > b/arch/arm/include/asm/kvm_emulate.h
> > index 98089ff..aed8279 100644
> > --- a/arch/arm/include/asm/kvm_emulate.h
> > +++ b/arch/arm/include/asm/kvm_emulate.h
> > @@ -205,15 +205,9 @@ static inline bool kvm_vcpu_dabt_isextabt(struct
> > kvm_vcpu *vcpu)  {
> > switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> > case FSC_SEA:
> > -   case FSC_SEA_TTW0:
> > -   case FSC_SEA_TTW1:
> > -   case FSC_SEA_TTW2:
> > -   case FSC_SEA_TTW3:
> > +   case FSC_SEA_TTW:
> > case FSC_SECC:
> > -   case FSC_SECC_TTW0:
> > -   case FSC_SECC_TTW1:
> > -   case FSC_SECC_TTW2:
> > -   case FSC_SECC_TTW3:
> > +   case FSC_SECC_TTW:
> > return true;
> > default:
> > return false;
> > diff --git a/arch/arm64/include/asm/kvm_arm.h
> > b/arch/arm64/include/asm/kvm_arm.h
> > index 61d694c..c0f1798 100644
> > --- a/arch/arm64/include/asm/kvm_arm.h
> > +++ b/arch/arm64/include/asm/kvm_arm.h
> > @@ -205,15 +205,9 @@
> >  #define FSC_ACCESS ESR_ELx_FSC_ACCESS
> >  #define FSC_PERM   ESR_ELx_FSC_PERM
> >  #define FSC_SEAESR_ELx_FSC_EXTABT
> > -#define FSC_SEA_TTW0   (0x14)
> > -#define FSC_SEA_TTW1   (0x15)
> > -#define FSC_SEA_TTW2   (0x16)
> > -#define FSC_SEA_TTW3   (0x17)
> > +#define FSC_SEA_TTW(0x14)
> >  #define FSC_SECC   (0x18)
> > -#define FSC_SECC_TTW0  (0x1c)
> > -#define FSC_SECC_TTW1  (0x1d)
> > -#define FSC_SECC_TTW2  (0x1e)
> > -#define FSC_SECC_TTW3  (0x1f)
> > +#define FSC_SECC_TTW   (0x1c)
> >
> >  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> >  #define HPFAR_MASK (~UL(0xf))
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index e5df3fc..5cde311 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -239,15 +239,9 @@ static inline bool kvm_vcpu_dabt_isextabt(const
> > struct kvm_vcpu *vcpu)  {
> > switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> > case FSC_SEA:
> > -   case FSC_SEA_TTW0:
> > -   case FSC_SEA_TTW1:
> > -   case FSC_SEA_TTW2:
> > -   case FSC_SEA_TTW3:
> > +   case FSC_SEA_TTW:
> > case FSC_SECC:
> > -   case FSC_SECC_TTW0:
> > -   case FSC_SECC_TTW1:
> > -   case FSC_SECC_TTW2:
> > -   case FSC_SECC_TTW3:
> > +   case FSC_SECC_TTW:
> > return true;
> > default:
> > retur

Re: [PATCH v2] KVM: arm/arm64: Fix external abort type matching

2017-10-27 Thread gengdongjiu

On 2017/10/28 2:28, Marc Zyngier wrote:
>> kvm_vcpu_trap_get_fault_type() will clear the low two bits to zero. So
>> I use FSC_SEA_TTW represent "Synchronous external abort on translation
>> table walk"
> I understand that, and I certainly not keen on adding another "fault
> type" for this.
 Ok,  thanks.

> 
>> As we can see the Translation fault type "FSC_FAULT", which does not
>> define the "FSC_FAULT0" "FSC_FAULT1" "FSC_FAULT2" "FSC_FAULT3".
>> Because the fault type " FSC_FAULT " include the four cases. 
> Indeed, and I'm still not convinced this is the best thing we have in
> the code.
Ok, thanks.


> 
>> static inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
>> {
>>  return kvm_vcpu_get_hsr(vcpu) & 0x3f;
>> }
> Here you go. This should give you a pretty good idea of how to provide a
> proper fix.
OK, got it, thanks

> 
> Thanks,



Re: [PATCH v3] KVM: arm/arm64: fix the incompatible matching for external abort

2017-10-29 Thread gengdongjiu

On 2017/10/29 9:12, Marc Zyngier wrote:
> I'm sorry, but I can't manage to parse this commit message. How about
> something like this?
> 
> "kvm_vcpu_dabt_isextabt() tries to match a full fault syndrome, but
>  calls kvm_vcpu_trap_get_fault_type() that only returns the fault class,
>  thus reducing the scope of the check. This doesn't cause any observable
>  bug yet as we end-up matching a closely related syndrome for which we
>  return the same value.
> 
>  Using kvm_vcpu_trap_get_fault() instead fixes it for good"

Marc, thank you very much for your pointing out and good suggestion,
will update it. Thanks a lot.



Re: [PATCH] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-04 Thread gengdongjiu
2018-07-27 18:06 GMT+08:00 Will Deacon :
> On Thu, Jul 26, 2018 at 05:01:47PM -0400, Dongjiu Geng wrote:
>> In order to remove the additional check before calling the
>> ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>
> Acked-by: Will Deacon 

Will,
 This patch will be applied, right? thanks

>
> Will
>
>> This cleanup is ever mentioned by Mark Rutland in [1]
>>
>> [1]:
>> https://lkml.org/lkml/2018/5/31/289
>> ---
>>  arch/arm64/mm/fault.c | 7 +--
>>  include/acpi/ghes.h   | 4 
>>  2 files changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index b8eecc7..9ffe01d 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -727,12 +727,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
>> struct pt_regs *regs)
>>
>>  int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>>  {
>> - int ret = -ENOENT;
>> -
>> - if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
>> - ret = ghes_notify_sea();
>> -
>> - return ret;
>> + return ghes_notify_sea();
>>  }
>>
>>  asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int 
>> esr,
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 1624e2b..82cb4eb 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -118,6 +118,10 @@ static inline void *acpi_hest_get_next(struct 
>> acpi_hest_generic_data *gdata)
>>(void *)section - (void *)(estatus + 1) < estatus->data_length; \
>>section = acpi_hest_get_next(section))
>>
>> +#ifdef CONFIG_ACPI_APEI_SEA
>>  int ghes_notify_sea(void);
>> +#else
>> +static inline int ghes_notify_sea(void) { return -ENOENT; }
>> +#endif
>>
>>  #endif /* GHES_H */
>> --
>> 1.9.1
>>


Re: [PATCH] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-06 Thread gengdongjiu



On 2018/8/6 22:26, Will Deacon wrote:
>> Will,
>>  This patch will be applied, right? thanks
> I haven't queued it in the arm64 tree, since it touches include/acpi/ghes.h
> and you don't have an ack from the acpi folks. I acked it so that you could
> route it via the acpi tree without me holding you up.

Thanks the explanation.
yes, this patch touches the "include/acpi/ghes.h", I will repost this patch to 
let acpi
folks review it.


> 
> Will
> 
 This cleanup is ever mentioned by Mark Rutland in [1]



Re: [PATCH RESEND v2] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-09 Thread gengdongjiu
CC Borislav


2018-08-08 0:26 GMT+08:00 Dongjiu Geng :
> In order to remove the additional check before calling the
> ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>
> After this cleanup, we can simply call the ghes_notify_sea() to let
> APEI driver handle the SEA notification.
>
> CC: Tyler Baicar 
> CC: James Morse 
> Signed-off-by: Dongjiu Geng 
> Acked-by: Will Deacon 
> ---
> This cleanup is ever mentioned by Mark Rutland in [1]
>
> [1]:
> https://lkml.org/lkml/2018/5/31/289
>
> Change since v1:
> 1. Update the commit messages
> 2. CC Tyler and James
> 3. Add Acked-by of Will
> ---
>  arch/arm64/mm/fault.c | 7 +--
>  include/acpi/ghes.h   | 4 
>  2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index b8eecc7..9ffe01d 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -727,12 +727,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
> struct pt_regs *regs)
>
>  int handle_guest_sea(phys_addr_t addr, unsigned int esr)
>  {
> -   int ret = -ENOENT;
> -
> -   if (IS_ENABLED(CONFIG_ACPI_APEI_SEA))
> -   ret = ghes_notify_sea();
> -
> -   return ret;
> +   return ghes_notify_sea();
>  }
>
>  asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int 
> esr,
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 1624e2b..82cb4eb 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -118,6 +118,10 @@ static inline void *acpi_hest_get_next(struct 
> acpi_hest_generic_data *gdata)
>  (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>  section = acpi_hest_get_next(section))
>
> +#ifdef CONFIG_ACPI_APEI_SEA
>  int ghes_notify_sea(void);
> +#else
> +static inline int ghes_notify_sea(void) { return -ENOENT; }
> +#endif
>
>  #endif /* GHES_H */
> --
> 1.9.1
>


Re: [PATCH RESEND v2] arm64: clean the additional checks before calling ghes_notify_sea()

2018-08-09 Thread gengdongjiu
2018-08-10 5:05 GMT+08:00 Tyler Baicar :
> On Thu, Aug 9, 2018 at 8:32 AM, gengdongjiu  wrote:
>>
>> 2018-08-08 0:26 GMT+08:00 Dongjiu Geng :
>> > In order to remove the additional check before calling the
>> > ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
>> >
>> > After this cleanup, we can simply call the ghes_notify_sea() to let
>> > APEI driver handle the SEA notification.
>> >
>> > CC: Tyler Baicar 
>> > CC: James Morse 
>> > Signed-off-by: Dongjiu Geng 
>> > Acked-by: Will Deacon 
>> > ---
>> > This cleanup is ever mentioned by Mark Rutland in [1]
>> >
>> > [1]:
>> > https://lkml.org/lkml/2018/5/31/289
>> >
>
> FWIW - Looks good to me!

Thanks very much for this comments and review,
whether I can get your "Acked-by" or "Reviewed-by"? thanks

>
> Thanks,
> Tyler


Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-01-23 Thread gengdongjiu
Hi James,

On 2018/1/23 3:39, James Morse wrote:
> Hi Dongjiu Geng,
> 
> (versions of patches 1,2 and 4 have been queued by Catalin)
> 
> (Nit 'ACPI / APEI:' is the normal subject prefix for ghes.c, this helps the
> maintainers know which patches they need to pay attention to when you are
> touching multiple trees)
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
>> ARMv8.2 requires implementation of the RAS extension.
> 
>> In
>> this extension, it adds SEI(SError Interrupt) notification
>> type, this patch adds new GHES error source SEI handling
>> functions. 
> 
> This reads as if this patch is handling SError RAS notifications generated by 
> a
> CPU with the RAS extensions. These are about CPU->Software notifications. APEI
> and GHES are a firmware first mechanism which is Software->Software.
> Reading the v8.2 documents won't help anyone with the APEI/GHES code.
> 
> Please describe this from the ACPI view, "ACPI 6.x adds support for NOTIFY_SEI
> as a GHES notification mechanism... ",  its up to the arch code to spot a v8.2
> RAS Error based on the cpu caps.Ok, I will modify it.

> 
> 
>> This error source parsing and handling method
>> is similar with the SEA.
> 
> There are problems with doing this:
> 
> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
> | How do SEA and SEI interact?
> |
> | As far as I can see they can both interrupt each other, which isn't 
> something
> | the single in_nmi() path in APEI can handle. I thinks we should fix this
> | first.
> 
> [..]
> 
> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
> | from SEA, but not SEI. (what happens if an SError arrives while we are
> | queueing memory_failure work from an IRQ).
> |
> | The one that scares me is the trace-point reporting stuff. What happens if 
> an
> | SError arrives while we are enabling a trace point? (these are static-keys
> | right?)
> |
> |  I don't think we can just plumb SEI in like this and be done with it.
> |  (I'm looking at teasing out the estatus cache code from being x86:NMI only.
> |  This way we solve the same 'cant do this from NMI context' with the same
> |  code'.)
> 
> 
> I will post what I've got for this estatus-cache thing as an RFC, its not 
> ready
> to be considered yet.Yes, I know you are dong that. Your serial's patch will 
> consider all above things, right?
If your patch can be consider that, this patch can based on your patchset. 
thanks.

> 
> 
>> Expose API ghes_notify_sei() to external users. External
>> modules can call this exposed API to parse APEI table and
>> handle the SEI notification.
> 
> external modules? You mean called by the arch code when it gets this 
> NOTIFY_SEI?
yes, called by kernel ARCH code, such as below, I remember I have discussed 
with you.

 asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
 {
nmi_enter();


if (!ghes_notify_sei())
return;



/* non-RAS errors are not containable */
if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
arm64_serror_panic(regs, esr);

nmi_exit();
}

> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-01-23 Thread gengdongjiu
sorry fix a typo.

On 2018/1/23 17:23, gengdongjiu wrote:
>> There are problems with doing this:
>>
>> Oct. 18, 2017, 10:26 a.m. James Morse wrote:
>> | How do SEA and SEI interact?
>> |
>> | As far as I can see they can both interrupt each other, which isn't 
>> something
>> | the single in_nmi() path in APEI can handle. I thinks we should fix this
>> | first.
>>
>> [..]
>>
>> | SEA gets away with a lot of things because its synchronous. SEI isn't. Xie
>> | XiuQi pointed to the memory_failure_queue() code. We can use this directly
>> | from SEA, but not SEI. (what happens if an SError arrives while we are
>> | queueing memory_failure work from an IRQ).
>> |
>> | The one that scares me is the trace-point reporting stuff. What happens if 
>> an
>> | SError arrives while we are enabling a trace point? (these are static-keys
>> | right?)
>> |
>> |  I don't think we can just plumb SEI in like this and be done with it.
>> |  (I'm looking at teasing out the estatus cache code from being x86:NMI 
>> only.
>> |  This way we solve the same 'cant do this from NMI context' with the same
>> |  code'.)
>>
>>
>> I will post what I've got for this estatus-cache thing as an RFC, its not 
>> ready
>> to be considered yet.

Yes, I know you are dong that. Your serial's patch will consider all above 
things, right?
If your patch can be consider that, this patch can based on your patchset. 
thanks.

> 
>>



Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-01-24 Thread gengdongjiu
Hi James,
   Thanks a lot for your review and comments.

> 
> Hi Dongjiu Geng,
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
> > The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
> > guest and user space needs a way to tell KVM this value. So we add a
> > new ioctl. Before user space specifies the Exception Syndrome Register
> > ESR(ESR), it firstly checks that whether KVM has the capability to set
> > the guest ESR, If has, will set it. Otherwise, nothing to do.
> >
> > For this ESR specifying, Only support for AArch64, not support AArch32.
> 
> After this patch user-space can trigger an SError in the guest. If it wants 
> to migrate the guest, how does the pending SError get migrated?
> 
> I think we need to fix migration first. Andrew Jones suggested using
> KVM_GET/SET_VCPU_EVENTS:
> https://www.spinics.net/lists/arm-kernel/msg616846.html
> 
> Given KVM uses kvm_inject_vabt() on v8.0 hardware too, we should cover 
> systems without the v8.2 RAS Extensions with the same API. I
> think this means a bit to read/write whether SError is pending, and another 
> to indicate the ESR should be set/read.
> CPUs without the v8.2 RAS Extensions can reject pending-SError that had an 
> ESR.

For the CPUs without the v8.2 RAS Extensions, its ESR is always 0, we only can 
inject a SError with ESR 0 to guest, cannot set its ESR.
About how about to use the KVM_GET/SET_VCPU_EVENTS, I will check the code, and 
consider your suggestion at the same time. 
The IOCTL KVM_GET/SET_VCPU_EVENTS has been used by X86.

> 
> user-space can then use the 'for migration' calls to make a 'new' SError 
> pending.
> 
> Now that the cpufeature bits are queued, I think this can be split up into 
> two separate series for v4.16-rc1, one to tackle NOTIFY_SEI and
> the associated plumbing. The second for the KVM 'make SError pending' API.

Ok, thanks for your suggestion, will split it.

> 
> 
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index
> > 5c7f657..738ae90 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu 
> > *vcpu,
> > return -EINVAL;
> >  }
> >
> > +int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome) {
> > +   return -EINVAL;
> > +}
> 
> Does nothing in the patch that adds the support? This is a bit odd.
> (oh, its hiding in patch 6...)

To make this patch simple and small, I add it in patch 6.

> 
> 
> Thanks,
> 
> James



Re: [PATCH v9 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest

2018-01-25 Thread gengdongjiu
Hi James,
  thanks for the review.

On 2018/1/24 3:07, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 06/01/18 16:02, Dongjiu Geng wrote:
>> RAS Extension add a VSESR_EL2 register which can provide
>> the syndrome value reported to software on taking a virtual
>> SError interrupt exception. This patch supports to specify
>> this Syndrome.
>>
>> In the RAS Extensions we can not set all-zero syndrome value
>> for SError, which means 'RAS error: Uncategorized' instead of
>> 'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
>> by default.
>>
>> We also need to support userspace to specify a valid syndrome
>> value, Because in some case, the recovery is driven by userspace.
>> This patch can support that userspace specify it.
>>
>> In the guest/host world switch, restore this value to VSESR_EL2
>> only when HCR_EL2.VSE is set. This value no need to be saved
>> because it is stale vale when guest exit.
> 
> A version of this patch has been queued by Catalin.
yeah, I have seen that.

> 
> Now that the cpufeature bits are queued, I think this can be split up into two
> separate series for v4.16-rc1, one to tackle NOTIFY_SEI and the associated
> plumbing. The second for the KVM 'make SError pending' API.
> 
> 
>> Signed-off-by: Dongjiu Geng 
>> [Set an impdef ESR for Virtual-SError]
>> Signed-off-by: James Morse 
> 
> I didn't sign-off this patch. If you pick some bits from another version and
> want to credit someone else you can 'CC:' them or just mention it in the
> commit-message.

I pick your modification of setting an impdef ESR for Virtual-SError, so add 
your name,
I change it to 'CC'

> 
> 
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index 47b967d..3b035cc 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -86,6 +86,9 @@
>>  #define REG_PSTATE_PAN_IMM  sys_reg(0, 0, 4, 0, 4)
>>  #define REG_PSTATE_UAO_IMM  sys_reg(0, 0, 4, 0, 3)
>>  
>> +/* virtual SError exception syndrome register */
>> +#define REG_VSESR_EL2  sys_reg(3, 4, 5, 2, 3)
> 
> Irrelevant-Nit: sys-regs usually have a 'SYS_' prefix, and are in instruction
> encoding order lower down the file.
will follow that.

> 
> (These PSTATE PAN things are a bit odd as they were used to generate and
> instruction before the fancy {read,write}_sysreg() helpers were added).>
> 
>>  #define SET_PSTATE_PAN(x) __emit_inst(0xd500 | REG_PSTATE_PAN_IMM | 
>> \
>>(!!x)<<8 | 0x1f)
>>  #define SET_PSTATE_UAO(x) __emit_inst(0xd500 | REG_PSTATE_UAO_IMM | 
>> \
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 738ae90..ffad42b 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>  
>>  int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
> 
> Bits of this are spread between patches 5 and 6. If you put them in the other
> order this wouldn't happen.

Ok, I will adjust that.

> 
> (but after a rebase most of this patch should disappear)
> 
>>  {
>> -return -EINVAL;
>> +u64 reg = *syndrome;
>> +
>> +/* inject virtual system Error or asynchronous abort */
>> +kvm_inject_vabt(vcpu);
> 
> So this writes an impdef ESR, because its the existing code-path in KVM.
> 
> 
>> +if (reg)
>> +/* set vsesr_el2[24:0] with value that user space specified */
>> +kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
> 
> And then you overwrite it. Which is a bit odd as there is a helper to do both 
> in
> one go:
thanks, I will directly call pend_guest_serror() in this function.

> 
> 
>> +
>> +return 0;
>>  }
> 
>>  int __attribute_const__ kvm_target_cpu(void)
> 
>> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
>> index 3556715..fb94b5e 100644
>> --- a/arch/arm64/kvm/inject_fault.c
>> +++ b/arch/arm64/kvm/inject_fault.c
>> @@ -246,14 +246,25 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
>>  inject_undef64(vcpu);
>>  }
>>  
>> +static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
>> +{
>> +kvm_vcpu_set_vsesr(vcpu, esr);
>> +vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
>> +}
> 
> How come you don't use this in kvm_arm_set_sei_esr()?
thanks, I will call pend_guest_serror() in the kvm_arm_set_sei_esr().

> 
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v4] perf tools: Add ARM Statistical Profiling Extensions (SPE) support

2018-01-11 Thread gengdongjiu
On 2018/1/11 22:17, Adrian Hunter wrote:
>>   (e.g., via 'perf inject --itrace'), are also not supported
>>
>> - technically both cs-etm and spe can be used simultaneously, however
>>   disabled for simplicity in this release
>>
>> Signed-off-by: Kim Phillips 
> For what is there now, it looks fine from the auxtrace point of view.  There
> are a couple of minor points below but nevertheless:
> 
> Acked-by: Adrian Hunter 

This patch is good to me.
Reviewed-by: gengdong...@huawei.com

> 
>> ---
>> v4: rebased onto acme's perf/core, whitespace fixes.



Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
Hi James,
   Sorry for my late response due to out of office recently.

2018-01-13 2:05 GMT+08:00 James Morse :
> Hi gengdongjiu,
>
> On 15/12/17 03:30, gengdongjiu wrote:
>> On 2017/12/7 14:37, gengdongjiu wrote:
>>>> We need to tackle (1) and (3) separately. For (3) we need some API that 
>>>> lets
>>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't 
>>>> have
>>>> a way of migrating pending SError yet... which is where I got stuck last 
>>>> time I
>>>> was looking at this.
>>> I understand you most idea.
>>>
>>> But In the Qemu one signal type can only correspond to one behavior, can 
>>> not correspond to two behaviors,
>>> otherwise Qemu will do not know how to do.
>>>
>>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate 
>>> the CPER
>>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if 
>>> receives the SIGBUS_MCEERR_AO
>>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown 
>>> below:
>>>
>>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>>
>>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify 
>>> trigger method, which all
>>>
>>> not involve _trigger_ an SError.
>>>
>>> so there is no chance for Qemu to trigger the SError when gets the 
>>> SIGBUS_MCEERR_A{O,R}.
>>
>> As I explained above:
>>
>> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger 
>> Synchronous External Abort;
>> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
>
>> So Qemu does not know when to _trigger_ an SError.
>
> There is no answer to this. How the CPU decides is specific to the CPU design.
> How Qemu decides is going to be specific to the machine it emulates.
>
> My understanding is there is some overlap for which RAS errors are reported as
> synchronous external abort, and which use SError. (Obviously the imprecise 
> ones
> are all SError). Which one the CPU uses depends on how the CPU is designed.
>
> When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
> stage2 fault because the page is marked with PG_poisoned. These started out 
> as a
> synchronous exception, but you could still report these with SError.

yes, I agree, it is policy choice.

>
> We don't have a way to signal user-space about imprecise exceptions, this 
> isn't
> a KVM specific problem.
>
>
>> so here I "return a error" to Qemu if ghes_notify_sei() return failure in 
>> [1], if you opposed KVM "return error",
>> do you have a better idea about it? thanks
>
> If ghes_notify_sei() fails to claim the error, we should drop through to
> kernel-first-handling. We don't have that yet, just the stub that ignores 
> errors
> where we can make progress.
>
> If neither firmware-first nor kernel-first claim a RAS error, we're in 
> trouble.
> I'd like to panic() as we got a RAS notification but no description of the
> error. We can't do this until we have kernel-first support, hence that stub.
>
>
>> About the way of migrating pending SError, I think it is a separate case, 
>> because Qemu still does not know
>> how and when to trigger the SError.
>
> I agree, but I think we should fix this first before we add another user of 
> this
> unmigratable hypervisor state.
>
> (I recall someone saying migration is needed for any new KVM/cpu features, 
> but I
> can't find the thread)
>
>
>> [1]:
>> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>> ...
>> +   case ESR_ELx_AET_UER:   /* The error has not been propagated */
>> +   /*
>> +* Userspace only handle the guest SError Interrupt(SEI) if 
>> the
>> +* error has not been propagated
>> +*/
>> +   run->exit_reason = KVM_EXIT_EXCEPTION;
>> +   run->ex.exception = ESR_ELx_EC_SERROR;
>
> I'm against telling user space RAS errors ever happened, only the final
> user-visible error when the kernel can't fix it.

thanks for the explanation.
For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
be better[1].
But if you think panic is better until we support kernel-first, it is
also OK to me.


+static int kvm_handle_guest_sei(struct kvm_vcp

Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
2018-01-13 2:05 GMT+08:00 James Morse :
> Hi gengdongjiu,
>
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> + case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> + /*
>>>> +  * Userspace only handle the guest SError Interrupt(SEI) if 
>>>> the
>>>> +  * error has not been propagated
>>>> +  */
>>>> + run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> + run->ex.exception = ESR_ELx_EC_SERROR;
>>>> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> + return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either 
>>> handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports 
>>> RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
>
> [..]
>
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception,
>
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
>
> If the kernel takes an unhandled RAS error it should panic - we don't know 
> where
> the error is.
OK, here I will panic.

>
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
>
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables 
> from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
>
>
>> Host does not know which application of guest has error, so host can
>> not handle it,
>
> It has to work this out, otherwise the errors we can handle never get a 
> chance.
>
> This kernel is expected to look at the error description, (which for some 
> reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining 
> signals
> on all affected processes.
>
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
>
> If you pass the notification to user-space, all it can do is signal the guest 
> to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
>
>
>> panic OS is not a good choice for the Recoverable error.
>
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
Ok, I will panic here.

>
> This code is never expected to run! (why are we arguing about it?) We should 
> get
> RAS errors as GHES notifications from firmware via some mechanism. If those 
> are
> NOTIFY_SEI then APEI should claim the notification and kick off the 
> appropriate
> handling based on the CPER records. If/when we get kernel-first, that can 
> claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
>
>
>
> James


Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-20 Thread gengdongjiu
2018-01-15 16:33 GMT+08:00 Christoffer Dall :
> On Fri, Jan 12, 2018 at 06:05:23PM +, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>> > On 2017/12/7 14:37, gengdongjiu wrote:
>
> [...]
>
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, 
>> but I
>> can't find the thread)
>>
>
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
yes, agree it.

>
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
Ok,  I think this problem is similar to migrating a VM that uses an irqchip in
 userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.

>
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Ok, got it.

>
> Thanks,
> -Christoffer
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] arm64: v8.4: Support for new floating point multiplication instructions

2018-01-04 Thread gengdongjiu
Hi will/catalin

On 2017/12/13 18:09, Suzuki K Poulose wrote:
> On 13/12/17 10:13, Dongjiu Geng wrote:
>> ARM v8.4 extensions add new neon instructions for performing a
>> multiplication of each FP16 element of one vector with the corresponding
>> FP16 element of a second vector, and to add or subtract this without an
>> intermediate rounding to the corresponding FP32 element in a third vector.
>>
>> This patch detects this feature and let the userspace know about it via a
>> HWCAP bit and MRS emulation.
>>
>> Cc: Dave Martin 
>> Cc: Suzuki K Poulose 
>> Signed-off-by: Dongjiu Geng 
>> Reviewed-by: Dave Martin 
> 
> Looks good to me.
> 
> Reviewed-by: Suzuki K Poulose 

 sorry to disturb you. Reminder, hope this patch can be applied to Linux 
4.15-rc7.
 Thanks a lot in advance.

> 
> 
> .
> 



Re: [PATCH v3] arm64: v8.4: Support for new floating point multiplication instructions

2018-01-05 Thread gengdongjiu
On 2018/1/5 15:57, Greg KH wrote:
> On Fri, Jan 05, 2018 at 09:22:54AM +0800, gengdongjiu wrote:
>> Hi will/catalin
>>
>> On 2017/12/13 18:09, Suzuki K Poulose wrote:
>>> On 13/12/17 10:13, Dongjiu Geng wrote:
>>>> ARM v8.4 extensions add new neon instructions for performing a
>>>> multiplication of each FP16 element of one vector with the corresponding
>>>> FP16 element of a second vector, and to add or subtract this without an
>>>> intermediate rounding to the corresponding FP32 element in a third vector.
>>>>
>>>> This patch detects this feature and let the userspace know about it via a
>>>> HWCAP bit and MRS emulation.
>>>>
>>>> Cc: Dave Martin 
>>>> Cc: Suzuki K Poulose 
>>>> Signed-off-by: Dongjiu Geng 
>>>> Reviewed-by: Dave Martin 
>>>
>>> Looks good to me.
>>>
>>> Reviewed-by: Suzuki K Poulose 
>>
>>  sorry to disturb you. Reminder, hope this patch can be applied to Linux 
>> 4.15-rc7.
> 
> New features should not be going into 4.15-rc, that should be a 4.16-rc1
> thing, right?

It is also great if it can be applied to 4.16-rc1. Thanks a lot!


> 
> thanks,
> 
> greg k-h
> 
> .
> 



Re: [PATCH v3] arm64: v8.4: Support for new floating point multiplication instructions

2018-01-05 Thread gengdongjiu
> > > New features should not be going into 4.15-rc, that should be a
> > > 4.16-rc1 thing, right?
> >
> > It is also great if it can be applied to 4.16-rc1. Thanks a lot!
> 
> I will queue it for 4.16-rc1.

Thanks very much to Catalin.

> 
> --
> Catalin


Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-16 Thread gengdongjiu
Hi Christoffer

On 2018/1/15 16:33, Christoffer Dall wrote:
> On Fri, Jan 12, 2018 at 06:05:23PM +, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>>> On 2017/12/7 14:37, gengdongjiu wrote:
> 
> [...]
> 
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, 
>> but I
>> can't find the thread)
>>
> 
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
> 
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
> 
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Thanks very much for this mail and reply, I will check it, please give me some 
time due to
recently busy with other things.

> 
> Thanks,
> -Christoffer
> 
> .
> 



Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

2018-01-16 Thread gengdongjiu
Hi James,
  thanks very much for your mail and reply, I will check it ASAP. Due to 
recently busy with other thing, so reply may be late.

On 2018/1/13 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> + case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> + /*
>>>> +  * Userspace only handle the guest SError Interrupt(SEI) if 
>>>> the
>>>> +  * error has not been propagated
>>>> +  */
>>>> + run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> + run->ex.exception = ESR_ELx_EC_SERROR;
>>>> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> + return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either 
>>> handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports 
>>> RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
> 
> [..]
> 
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception, 
> 
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
> 
> If the kernel takes an unhandled RAS error it should panic - we don't know 
> where
> the error is.
> 
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
> 
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables 
> from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
> 
> 
>> Host does not know which application of guest has error, so host can
>> not handle it,
> 
> It has to work this out, otherwise the errors we can handle never get a 
> chance.
> 
> This kernel is expected to look at the error description, (which for some 
> reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining 
> signals
> on all affected processes.
> 
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
> 
> If you pass the notification to user-space, all it can do is signal the guest 
> to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
> 
> 
>> panic OS is not a good choice for the Recoverable error.
> 
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
> 
> This code is never expected to run! (why are we arguing about it?) We should 
> get
> RAS errors as GHES notifications from firmware via some mechanism. If those 
> are
> NOTIFY_SEI then APEI should claim the notification and kick off the 
> appropriate
> handling based on the CPER records. If/when we get kernel-first, that can 
> claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
> 
> 
> 
> James
> 
> .
> 



Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-11 Thread gengdongjiu
Dear James,
Thanks for this mail and sorry for my late response.


2018-02-16 1:55 GMT+08:00 James Morse :
> Hi gengdongjiu, liu jun
>
> On 05/02/18 11:24, gengdongjiu wrote:
[]
>>
>>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
>>> TGE}?
>>
>> Yes, it is.
>
> ... and yet ...
>
>
>>> What does your firmware do when it wants to emulate SError but its masked?
>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>> PSTATE.A  set.
>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>> emulated  SError should go to EL1. This effectively masks SError.)
>>
>> Currently we does not consider much about the mask status(SPSR).
>
> .. this is a problem.
>
> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret 
> to
> EL2. This should never happen, SError is effectively masked if you are running
> at an EL higher than the one its routed to.
>
> More obviously: if the exception came from the EL that SError should be routed
> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only

James, I  summarized the masking and routing rules for SError to
confirm with you for the firmware first solution,

1. If the HCR_EL2.{AMO,TGE} is set, which means the SError should route to EL2,
When system happens SError and trap to EL3,   If EL3 find
HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
and find this SError come from EL2, it will not deliver an SError:
store the RAS error in the BERT and 'reboot'; but if
it find that this SError come from EL1 or EL0, it also need to deliver
an SError, right?


2. If the HCR_EL2.{AMO,TGE} is not set, which means the SError should
route to EL1,
When system happens SError and trap to EL3, If EL3 find
HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
and find this SError come from EL1, it will not deliver an SError:
store the RAS error in the BERT and 'reboot'; but if
it find that this SError come from EL0, it also need to deliver an
SError, right?


> way the OS has to indicate it can't take an exception right now. VBAR_EL1 may 
> be
> 'wrong' if we're doing some power-management, the FAR/ELR/ESR/SPSR registers 
> may
> contain live values that the OS would lose if you deliver another exception 
> over
> the top.
>
> If you deliver an emulated-SError as the OS eret's, your new ELR will point at
> the eret instruction and the CPU will spin on this instruction forever.
>
> You have to honour the masking and routing rules for SError, otherwise no OS 
> can
> run safely with this firmware.
>
>
>> I remember that you ever suggested firmware should reboot if the mask status
>> is set(SPSR), right?
>
> Yes, this is my suggestion of what to do if you can't deliver an SError: store
> the RAS error in the BERT and 'reboot'.
>
>
>> I CC "liu jun"  who is our UEFI firmware Architect,
>> if you have firmware requirements, you can raise again.
>
> (UEFI? I didn't think there was any of that at EL3, but I'm not familiar with
> all the 'PI' bits).
>
> The requirement is your emulated-SError from EL3 looks exactly like a
> physical-SError as if EL3 wasn't implemented.
> Your CPU has to handle cases where it can't deliver an SError, your emulation
> has to do the same.
>
> This is not something any OS can work around.
>
>
>>> Answers to these let us determine whether a bug is in the firmware or the
>>> kernel. If firmware is expecting the OS to do something special, I'd like 
>>> to know
>>> about it from the beginning!
>>
>> I know your meaning, thanks for raising it again.
>
>
> Happy new year,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification

2018-04-11 Thread gengdongjiu
Hi James,
   thanks for this mail.

On 2018/4/10 22:15, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> 1. Detect whether KVM can set set guest SError syndrome
>> 2. Support to Set VSESR_EL2 and inject SError by user space.
>> 3. Support live migration to keep SError pending state and VSESR_EL2 value.
>> 4. ACPI 6.1 adds support for NOTIFY_SEI as a GHES notification mechanism, so 
>> support this
>>notification in software, KVM or kernel ARCH code call handle_guest_sei() 
>> to let ACP driver
>>to handle this notification.
> 
> Please don't post code during the merge-window, will this apply to v4.17-rc1? 
> We
> can't know until its tagged.
I do not know when it is merge-window. About the apply version, it does not 
have limited.

> 
> 
> This series is doing two separate things, please split it into two series.
OK, thanks!

> 
> But on the ACPI front: I don't see how any OS can support your NOTIFY_SEI when
> firmware is ignoring the normal world's PSTATE.A.
> 
> The latest lobe of that discussion was on the list here:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html
I have replied the mail.
I still have some questions that need to clarify with you.
After clarification, we will follow that.
The question is in the reply of this mail 
"https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html";

> 
> 
> As it is, we would need to spot SError being delivered while SError is masked,
> spray nasty messages about firmware being horrifically buggy, then panic(). 
> For
> a corrected error, this looks bad, but its preferable to letting firmware
> silently overwrite the exception registers, causing linux to spin through the
> vectors 'eret' with all exceptions masked.
> I still think its best to wait for firmware that does the right thing.
Let us  discuss that in another mail.
In a summary, I think firmware follow below rule can be OK, right?
1. The exception came from the EL that SError should be routed to(according to 
hcr_EL2.{AMO, TGE}),but PSTATE.A was set, EL3 firmware can't deliver SError;
2. The exception came from the EL that SError should not be routed to(according 
to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3 firmware still 
deliver SError

> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v11 1/4] arm64: KVM: export the capability to set guest SError syndrome

2018-04-12 Thread gengdongjiu
HI James,
  Thanks for the review.


2018-04-10 22:15 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> Before user space injects a SError, it needs to know whether it can
>> specify the guest Exception Syndrome, so KVM should tell user space
>> whether it has such capability.
>
> (you could improve the commit message by briefly explaining how/why
> user-space
> would want to do this. As this is patch 1, you don't have the context of
> the
> previous patch to say that some systems can provide an ESR with
> virtual-SError)
Exactly, thanks for the good comments.

>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index fc3ae95..8a3d708 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4415,3 +4415,14 @@ Parameters: none
>>  This capability indicates if the flic device will be able to get/set the
>>  AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and
>> allows
>>  to discover this without having to create a flic device.
>> +
>> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that userspace can specify syndrome value
>> reported to
>
> (Nit: 'the syndrome value')
will fix it.

>
>> +guest OS when guest takes a virtual SError interrupt exception.
>
> (Nit: 'the guest')
will fix it.

>
>> +If KVM has this capability, userspace can only specify the ISS field for
>> the ESR
>> +syndrome, can not specify the EC field which is not under control by
>> KVM.
>
> (Nit: 'it can not specify...')
will fix it.

>
>> +If this virtual SError is taken to EL1 using AArch64, this value will be
>> reported
>> +into ISS filed of ESR_EL1.
>
> (Nit: 'in the ISS field')
will fix it.

>
>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index 3256b92..38c8a64 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm,
>> long ext)
>>  case KVM_CAP_ARM_PMU_V3:
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>> +case KVM_CAP_ARM_INJECT_SERROR_ESR:
>> +r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
>> +break;
>>  case KVM_CAP_SET_GUEST_DEBUG:
>>  case KVM_CAP_VCPU_ATTRIBUTES:
>>  r = 1;
>
> 'dev_ioctl' feels a bit weird, but we already have cpu_has_32bit_el1() in
> here.

Yes, although the name is "dev_ioctl", it does not have relationship
with the device.
here it mainly check vcpu capability, such as PMU, 32bit EL1 etc.

>
>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 8fb90a0..3587b33 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_S390_AIS_MIGRATION 150
>>  #define KVM_CAP_PPC_GET_CPU_CHAR 151
>>  #define KVM_CAP_S390_BPB 152
>> +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153
>>
>>  #ifdef KVM_CAP_IRQ_ROUTING
>
> (patch 1&2 should probably be swapped around, as on its own this does
> thing).
ok, I will do it.

>
> Reviewed-by: James Morse 
thanks this Reviewed-by

>
>
> Thanks,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>


Re: [PATCH v11 2/4] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-04-12 Thread gengdongjiu
Hi James,
  Thanks for the comments.

2018-04-10 22:15 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, it can inject
>> SError with specified syndrome to guest by setup kvm_vcpu_events
>> value.
>
>> Also it can support live migration.
>
> Could you explain what user-space is expected to do for this?
> (this is also relevant for snapshot-ing/suspending VMs)
Ok.

>
> It's probably worth noting that this solves an existing problem: KVM may
> make an
> SError pending, but user-space has no way to discover/migrate this.

if KVM make an SError pending, when user-space do migration, it get the
kvm_vcpu_events through KVM_GET_VCPU_EVENTS, then can find that pending status.
What are the things you're worried about?

>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index 8a3d708..45719b4 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -819,11 +819,13 @@ struct kvm_clock_data {
>>
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as
>> related
>>  states of the vcpu.
>>
>> @@ -865,15 +867,31 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 4 bytes */
>> +__u8 pad[2];
>> +__u64 serror_esr;
>> +} exception;
>> +};
>> +
>
> I'm not convinced we should change this struct from the layout/size x86 has.
> Its
> confusing for the documentation, is this API call really the same on all
> architectures?
>
> What if we want to add some future interrupt, NMI or related state? We've
> found
> ourselves needing to add this API, it seems odd to remove its other uses on
> x86.
> We can't put them back in the future.
>
> Having a different layout would force user-space to ifdef/duplicate any
> code
> that accesses this between architectures.
 In x86 and arm64 user space code, the handling logic of
KVM_GET/SET_VCPU_EVENTS is in different ARCH folder,  maybe it is not
necessary to share the handling code in the user space.

>
>
>
> The compiler will want that __u64 to be naturally aligned to 8-bytes, so
> your
> 4-byte padding still causes some secret compiler-padding to be inserted.
> Different versions of the compiler may put it in different places.
>
>
>>  4.32 KVM_SET_VCPU_EVENTS
>>
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of
>> the
>>  vcpu.
>>
>> @@ -894,6 +912,12 @@ shall be written into the VCPU.
>>
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.
>> +
>> +See KVM_GET_VCPU_EVENTS for the data structure.
>> +
>>
>>  4.33 KVM_GET_DEBUGREGS
>>
>
>
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>> b/arch/arm64/include/uapi/asm/kvm.h
>> index 9abbf30..855cc9a 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -39,6 +39,7 @@
>>  #define __KVM_HAVE_GUEST_DEBUG
>>  #define __KVM_HAVE_IRQ_LINE
>>  #define __KVM_HAVE_READONLY_MEM
>> +#define __KVM_HAVE_VCPU_EVENTS
>>
>>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>>
>> @@ -153,6 +154,17 @@ struct kvm_sync_regs {
>>  struct kvm_arch_memory_slot {
>>  };
>>
>> +/* for KVM_GET/SET_VCPU_EVENTS */
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>
>> +/* Align it to 4 bytes */
>> +__u8 pad[2];
>
> (padding noted above)
>
>
>> +__u64 serror_esr;
>> +} exception;
>> +};
>> +
>>  /* If you need to interpret the index values, here is the key: */
>>  #define KVM_REG_ARM_COPROC_MASK 0x0FFF
>>  #define KVM_REG_ARM_COPROC_SHIFT16
>
>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 5c7f657..42e1222 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -277,6 +277,37 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu
>> *vcpu,
>>  return -EINVAL;
>>  }
>>

Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-13 Thread gengdongjiu
James,
   Thanks for this mail.

On 2018/4/13 0:14, James Morse wrote:
> Hi gengdongjiu,
> 
> On 12/04/18 06:00, gengdongjiu wrote:
>> 2018-02-16 1:55 GMT+08:00 James Morse :
>>> On 05/02/18 11:24, gengdongjiu wrote:
>>>>> Is the emulated SError routed following the routing rules for 
>>>>> HCR_EL2.{AMO,
>>>>> TGE}?
>>>>
>>>> Yes, it is.
>>>
>>> ... and yet ...
>>>
>>>
>>>>> What does your firmware do when it wants to emulate SError but its masked?
>>>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>>>> PSTATE.A  set.
>>>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>>>> emulated  SError should go to EL1. This effectively masks SError.)
>>>>
>>>> Currently we does not consider much about the mask status(SPSR).
>>>
>>> .. this is a problem.
>>>
>>> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
>>> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't 
>>> eret to
>>> EL2. This should never happen, SError is effectively masked if you are 
>>> running
>>> at an EL higher than the one its routed to.
>>>
>>> More obviously: if the exception came from the EL that SError should be 
>>> routed
>>> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the 
>>> only
> 
>> James, I  summarized the masking and routing rules for SError to
>> confirm with you for the firmware first solution,
> 
> You also said "Currently we does not consider much about the mask 
> status(SPSR)."
Yes, we currently do not consider much it. After clarification with you, we 
want to modify the EL3 firmware to follow this rule.

> 
> 
>> 1. If the HCR_EL2.{AMO,TGE} is set,
> 
> If one or the other of these bits is set: (AMO==1 || TGE==1)
> 
>> which means the SError should route to EL2,
>> When system happens SError and trap to EL3,   If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
>> and find this SError come from EL2, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; but if
>> it find that this SError come from EL1 or EL0, it also need to deliver
>> an SError, right?
> 
> Yes.
> 
> 
>> 2. If the HCR_EL2.{AMO,TGE} is not set,
> 
> If neither of these bits is set: (AMO==0 && TGE == 0)
> 
>> which means the SError should route to EL1,
>> When system happens SError and trap to EL3, If EL3 find
>> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
> 
> (I'm reading this as all three of these bits are clear)
sorry, it is a typo issue.
it should be HCR_EL2.AMO and HCR_EL2.TGE are both clear, but SPSR_EL3.A is set.

> 
>> and find this SError come from EL1, it will not deliver an SError:
>> store the RAS error in the BERT and 'reboot'; 
> 
> No, (AMO==0 && TGE == 0) means SError is routed to EL1, this exception
> interrupted EL1 and the A bit was clear, so EL1 can take an SError.

Agree.

> 
> The two cases here are:
> AMO==0,TGE==0 means SError should be routed to EL1. If SPSR_EL3 says the
> exception interrupted EL1 and the A bit was set, you need to do the BERT 
> trick.

> 
> If SPSR_EL3 says the exception interrupted EL2, you need to do the BERT trick
"BERT trick" is storing the RAS error in the BERT and 'reboot, right?

> regardless of the A bit, as SError is implicitly masked by running at a higher
> exception level than it was routed to.


> 
> 
>>From your v11 reply:
>> 2. The exception came from the EL that SError should not be routed
>> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3
>> firmware still deliver SError
> 
> (this is re-iterating the two-cases above:)
> 'not be routed to' is one of two things: Route-to-EL2+interruted-EL1, or
> Route-to-EL1+interrupted-EL2.
> 
> Route-to-EL2+interrupted-EL1 is fine, regardless of SPSR_EL3.A the emulated
> SError can be delivered to EL2, as EL2 can't mask SError when executing at a
> lower EL.
Agree.

> 
> Route-to-EL1+interrupted-EL2 is the problem. SError is implicitly masked by
> running at a higher EL. Regardless of SPSR_EL3.A, the emulated SError can not 
> be
> delivered.
"can not be delivered" means storing the RAS error in the BERT and 'reboot, 
right?
In the Table D1-15 in "D1.14.2 Asynchronous exception masking", for the case, 
it is "C"
"C"means SError is 

Re: [PATCH]arm64:defconfig:enable ACPI_APEI_SEA

2018-03-20 Thread gengdongjiu
Hi Shiju,
   The configuration "CONFIG_ACPI_APEI_SEA" is needed to manually enable?
In the "drivers/acpi/apei/Kconfig" file, the default value of ACPI_APEI_SEA is 
"y"

config ACPI_APEI_SEA
bool "APEI Synchronous External Abort logging/recovering support"
depends on ARM64 && ACPI_APEI_GHES
default y
help
  This option should be enabled if the system supports
  firmware first handling of SEA (Synchronous External Abort).
  SEA happens with certain faults of data abort or instruction
  abort synchronous exceptions on ARMv8 systems. If a system
  supports firmware first handling of SEA, the platform analyzes
  and handles hardware error notifications from SEA, and it may then
  form a HW error record for the OS to parse and handle. This
  option allows the OS to look for such hardware error record, and
  take appropriate action.


On 2018/3/20 20:23, Shiju Jose wrote:
> Enable ACPI APEI SEA option for arm64, to handle
> ARMv8 SEA(Synchronous External Abort).
> 
> Signed-off-by: Shiju Jose 
> Cc: Tyler Baicar 
> Cc: James Morse 
> Cc: Dongjiu Geng 
> Cc: Xie XiuQi 
> Cc: Qiang Zheng 
> ---
>  arch/arm64/configs/defconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 634b373..5ddf25c 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -574,6 +574,7 @@ CONFIG_ACPI_APEI_GHES=y
>  CONFIG_ACPI_APEI_PCIEAER=y
>  CONFIG_ACPI_APEI_MEMORY_FAILURE=y
>  CONFIG_ACPI_APEI_EINJ=y
> +CONFIG_ACPI_APEI_SEA=y
>  CONFIG_EXT2_FS=y
>  CONFIG_EXT3_FS=y
>  CONFIG_EXT4_FS_POSIX_ACL=y
> 



Re: [PATCH v9 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl

2018-03-07 Thread gengdongjiu
Hi James,
   sorry for my late response due to chines new year.

2018-02-16 1:55 GMT+08:00 James Morse :
> Hi gengdongjiu,
>
> On 12/02/18 10:19, gengdongjiu wrote:
>> On 2018/2/10 1:44, James Morse wrote:
>>> The point? We can't know what a CPU without the RAS extensions puts in 
>>> there.
>>>
>>> Why Does this matter? When migrating a pending SError we have to know the
>>> difference between 'use this 64bit value', and 'the CPU will generate it'.
>>> If I make an SError pending with ESR=0 on a CPU with VSESR, I can't 
>>> migrated to
>>> a system that generates an impdef SError-ESR, because I can't know it will 
>>> be 0.
>
>> For the target system, before taking the SError, no one can know whether its 
>> syndrome value
>> is IMPLEMENTATION DEFINED or architecturally defined.
>
> For a virtual-SError, the hypervisor knows what it generated. (do I have
> VSESR_EL2? What did I put in there?).
>
>
>> when the virtual SError is taken, the ESR_ELx.IDS will be updated, then we 
>> can know
>> whether the ESR value is impdef or architecturally defined.
>
> True, the guest can't know anything about a pending virtual SError until it
> takes it. Why is this a problem?
>
>
>> It seems migration is only allowed only when target system and source system 
>> all support
>> RAS extension, because we do not know whether its syndrome is IMPLEMENTATION 
>> DEFINED or
>> architecturally defined.
>
> I don't think Qemu allows migration between hosts with differing guest-ID
> registers. But we shouldn't depend on this, and we may want to hide the v8.2 
> RAS
> features from the guest's ID register, but still use them from the host.
>
> The way I imagined it working was we would pack the following information into
> that events struct:
> {
> bool serror_pending;
> bool serror_has_esr;
> u64  serror_esr;
> }

I have used your suggestion struct

>
> The problem I was trying to describe is because there is no value of 
> serror_esr
> we can use to mean 'Ignore this, I'm a v8.0 CPU'. VSESR_EL2 is a 64bit 
> register,
> any bits we abuse may get a meaning we want to use in the future.
>
> When it comes to migration, v8.{0,1} systems can only GET/SET events where
> serror_has_esr == false, they can't use the serror_esr. On v8.2 systems we
> should require serror_has_esr to be true.
yes, I agreed.

>
> If we need to support migration from v8.{0,1} to v8.2, we can make up an 
> impdef
> serror_esr.

For the Qemu migration, I need to check more the QEMU code.


Hi Andrew,
  I use KVM_GET/SET_VCPU_EVENTS IOCTL to migrate the Serror
exception status of VM,
The even struct is shown below:

{
  bool serror_pending;
  bool serror_has_esr;
 u64  serror_esr;
}

Only when the target machine is armv8.2, it needs to set the
serror_esr(SError Exception Syndrome Register).
for the armv8.0,  software can not set the serror_esr(SError Exception
Syndrome Register).
so when migration from v8.{0,1} to v8.2, QEMU should make up an impdef
serror_esr for the v8.2 target.
can you give me some suggestion how to set that register in the QEMU?
I do not familar with the QEMU migration.
Thanks very much.

>
> We will need to decide what KVM does when SET is called but an SError was
> already pending. 2.5.3 "Multiple SError interrupts" of [0] has something to 
> say.

how about KVM set again to the same VCPU?

>
>
> Happy new year,
thanks!

>
> James
>
>
> [0]
> https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] arm/arm64: KVM: share the check for vcpu events capability

2018-09-27 Thread gengdongjiu
Hi christoffer/marc,
   Could you review this simple patch to enable the 32 bit KVM vcpu event 
supports?  Because below user space patch depended on it. thanks
   https://patchwork.kernel.org/patch/10617601/


> subject: [PATCH 2/2] arm/arm64: KVM: share the check for vcpu events 
> capability
> 
> The commit 539aee0edb9f ("KVM: arm64: Share the parts of get/set events 
> useful to 32bit") shares the get/set events helper for arm64
> and arm32, it is better also share the check for vcpu events capability.
> 
> User space will check whether KVM supports vcpu events through 
> KVM_CAP_VCPU_EVENTS IOCTL.
> 
> Signed-off-by: Dongjiu Geng 
> ---
> For the 32 bit guest migration, it needs to enable the vcpu events
> 
> ---
>  arch/arm64/kvm/reset.c | 1 -
>  virt/kvm/arm/arm.c | 1 +
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 
> fd37c53..e50245e 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -82,7 +82,6 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   break;
>   case KVM_CAP_SET_GUEST_DEBUG:
>   case KVM_CAP_VCPU_ATTRIBUTES:
> - case KVM_CAP_VCPU_EVENTS:
>   r = 1;
>   break;
>   default:
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 40e79ea..64e5d97 
> 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -212,6 +212,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_READONLY_MEM:
>   case KVM_CAP_MP_STATE:
>   case KVM_CAP_IMMEDIATE_EXIT:
> + case KVM_CAP_VCPU_EVENTS:
>   r = 1;
>   break;
>   case KVM_CAP_ARM_SET_DEVICE_ADDR:
> --
> 1.9.1
> 



Re: Resend: How to handle the SMMU RAS Error in the kernel

2018-11-21 Thread gengdongjiu
Hi James,
   Thanks for the mail.

On 2018/11/20 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 17/11/2018 15:41, gengdongjiu wrote:
>> In the current kernel, it only handles three kinds of error, which is
>> memory error, PCIE device and ARM process. But now the SMMU already
>> support the RAS, how to handle the SMMU RAS error in the kernel?
> 
> What errors are being detected here?
> 
> I don't know much about the SMMU, but I think we should start with a list of
> errors that we want to handle.

In our platform, the SMMU RAS error mainly include below which flow the SMMU 
spec:
1. one bit ECC error, reported as CE.
2. two bits ECC error, reported as UEU.
3. fetch error in the SMMUv3 spec, reported as UER.

The 2 and 3 should be handled, but I do not know how do recovery to it.

> 
> 
> Is this a v8.2 fault handling interrupt from the SMMU taken to EL3?
> Or a cpu-access that was returned as external-abort? or a device access that 
> was
> told external-abort?
it flows v8.2 RAS spec, it is a v8.2 fault handling interrupt from the SMMU 
taken to EL3.

> 
> What do we intend to do with this error information? Does the DMA layer have
> error handling we can hook this into?

we can get the DMA layer error from the RAS registers, such as DMA read errors.
may be the handle method is disabling this SMMU to avoid propagation.

> 
> Is this just another interface for memory-errors? (e.g the SMMU provides a
> device/address pair and the kernel works out the physical page to run
> memory_failure() on)
I need to check more.

> 
> 
>> I check the UEFI_SPEC_2.7, the ACPI's CPER have the IOMMU type, but it
>> seems the IOMMU type only are specific to AMD’s IOMMU specification,
> 
> ... and Intel VT-d. It looks like UEFI generalises all these as types of 
> 'DMAr'.
yes, it is.

> 
> 
>> not have the ARM’s IOMMU section type, can we reuse this IOMMU section
>> type for the ARM SMMU?
> 
> The architecture specific records for AMD? No. Even if the information was the
> same, the presence of this record tells you its an AMD IOMMU, which its not.
> 
> The generic error section? Maybe.
> 
> Assuming the 'fault reason' list in Table 285 is sufficient to cover our list 
> or
> errors, we can use the 'DMAr Generic Errors' section, (N.2.11.1), to describe
> the generic bits of the error ... but SMMU doesn't have an 'Architecture 
> Type',
> so we at least need to get one allocated.
> 
> We will probably need an architecture specific 'DMAr Error Section'.
> 
> 
> I think adding the UEFI bits is probably the 'easy' bit. We should start with 
> a
> list of errors, and the error handling code. This way we know what we need to
> add to the spec.

The list of SMMU RAS error is shown below, but I still do not know how to 
handle it.
1. one bit ECC error
2. two bits ECC error
3. fetch error in the SMMUv3 spec

> 
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Resend: How to handle the SMMU RAS Error in the kernel

2018-11-17 Thread gengdongjiu
Hi robin/will/James


In the current kernel, it only handles three kinds of error, which is
memory error, PCIE device and ARM process. But now the SMMU already
support the RAS, how to handle the SMMU RAS error in the kernel?

I check the UEFI_SPEC_2.7, the ACPI's CPER have the IOMMU type, but it
seems the IOMMU type only are specific to AMD’s IOMMU specification,
not have the ARM’s IOMMU section type, can we reuse this IOMMU section
type for the ARM SMMU?



N.2.11.3 IOMMU specific DMAr Error Section

Type: {0x036F84E1, 0x7F37, 0x428c, {0xA7, 0x9E, 0x57, 0x5F, 0xDF,
0xAA, 0x84, 0xEC}}

All fields in this error record are specific to AMD’s IOMMU
specification. This error section has a fixed size.


Re: [PATCH v5 2/2] arm64: KVM: export the capability to set guest SError syndrome

2018-07-01 Thread gengdongjiu
Hi James,

On 2018/6/29 23:58, James Morse wrote:
> Hi Dongjiu Geng,
> 
> This patch doesn't apply on v4.18-rc2.
> 
> Documentation/virtual/kvm/api.txt already has a 8.18 section. I guess you 
> based
> this on v4.17.

Yes, indeed I based on v4.17.

> 
> For posting patches, please use the latest 'rc' from Linus' tree, (or the
> maintainer's tree listed in MAINTAINERS for the tree you are targeting if the
> maintainer has started to pick up patches).
Ok, I will rebase it using the latest 'rc' from Linus' tree. thanks for the 
reminder.


> 
> 
> Thanks,
> 
> James
> 
> 
> On 25/06/18 21:58, Dongjiu Geng wrote:
>> For the arm64 RAS Extension, user space can inject a virtual-SError
>> with specified ESR. So user space needs to know whether KVM support
>> to inject such SError, this interface adds this query for this capability.
>>
>> KVM will check whether system support RAS Extension, if supported, KVM
>> returns true to user space, otherwise returns false.
> 
> 
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index 3732097..86b3808 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4628,3 +4628,14 @@ Architectures: s390
>>  This capability indicates that kvm will implement the interfaces to handle
>>  reset, migration and nested KVM for branch prediction blocking. The stfle
>>  facility 82 should not be provided to the guest without this capability.
>> +
>> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that userspace can specify the syndrome value 
>> reported
>> +to the guest OS when guest takes a virtual SError interrupt exception.
>> +If KVM has this capability, userspace can only specify the ISS field for 
>> the ESR
>> +syndrome, it can not specify the EC field which is not under control by KVM.
>> +If this virtual SError is taken to EL1 using AArch64, this value will be 
>> reported
>> +in ISS filed of ESR_EL1.
> 
> 
> 
> 
> .
> 



hrtimer become inaccurate with RT patch

2018-07-02 Thread gengdongjiu
Hi Thomas/Anna/John,

  Recently I found that the hrtimer become inaccurate when there is a RT
process runs on the same cpu core, and the kernel has applied preempt_rt
patch.
  The Linux kernel version is v4.1.46, and the preempt_rt patch is
patch-4.1.46-rt52.patch.
  I know that in the preempt_rt environment the interrupt handlers no
longer run in interrupt context but in process context, so that RT
process will not be interrupt. But if the hrtimer is also runs in
process context the timer is useless when it's inaccurate. so I want to
consult you whether this is expected behavior? whether is reasonable to move 
the timer IRQ
handling to a thread?

  Check the patch-4.1.46-rt52.patch will found in function
'hrtimer_interrupt' the modify below:

@@ -1296,7 +1539,10 @@ retry:
if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer))
break;

-   __run_hrtimer(timer, &basenow);
+   if (!hrtimer_rt_defer(timer))
+   __run_hrtimer(timer, &basenow);
+   else
+   raise = 1;
}
}
/* Reevaluate the clock bases for the next expiry */

@@ -1357,6 +1603,9 @@ retry:
tick_program_event(expires_next, 1);
printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
ktime_to_ns(delta));
+out:
+   if (raise)
+   raise_softirq_irqoff(HRTIMER_SOFTIRQ);
 }

I think this is why hrtimer is run as a thread, I tried to set
hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
give some advise? Thanks.



Re: hrtimer become inaccurate with RT patch

2018-07-02 Thread gengdongjiu
Hi Sebastian ,
   Thanks for the answer.

On 2018/7/2 18:14, Sebastian Andrzej Siewior wrote:
> On 2018-07-02 17:34:34 [+0800], gengdongjiu wrote:
>>   The Linux kernel version is v4.1.46, and the preempt_rt patch is
>> patch-4.1.46-rt52.patch.
> 
> the 4.1 series is no longer supported (neither RT wise nor non-RT,
> https://www.kernel.org/category/releases.html). I suggest to move away.
> If you notice this problem now it is hardly a long running project.
yes, I Know, but we found the latest RT 4.14 series also has the same problem,
so this is common issue.

> 
>> process will not be interrupt. But if the hrtimer is also runs in
>> process context the timer is useless when it's inaccurate. so I want to
>> consult you whether this is expected behavior? whether is reasonable to move 
>> the timer IRQ
>> handling to a thread?
> 
> This depends on your expectations. The timer is defined not to fire
> before the programmed time. So it fires as soon as possible _after_ the
> programmed time.
It is reasonable that the timer is defined not to fire before the programmed 
time.
but we found it fires long _after_ the programmed time. For example, we define 
it to
fire after 2s, but it will fire after 5s, so it is very later than the 
expectations. I think the reason may be
that the timer handler thread is preempted by another higher priority thread. 
so from for this issue,
the timer handler should be in IRQ context instead of the process context or 
increase the timer handler thread priority, right?

> 
>> I think this is why hrtimer is run as a thread, I tried to set
>> hrtimer.irqsafe to 1, but the timer still seemed not right. Could anyone
>> give some advise? Thanks.
> 
> By setting irqsafe to 1 you ensure taht the timer will fire from the
> timer interrupt and before doing so you should ensure that the timer is
> indeed IRQ safe.
> Depending on what you do it is possible that the timer fires early but
> the application notices it later (the scheduler will first handle RT
> tasks according to their priorities followed by non RT tasks).

(I will continue check this comments, thanks)

> 
> Sebastian
> 
> .
> 



Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-18 Thread gengdongjiu
> On 12/06/18 15:50, gengdongjiu wrote:
> > On 2018/6/11 21:36, James Morse wrote:
> >> On 08/06/18 20:48, Dongjiu Geng wrote:
> >>> For the migrating VMs, user space may need to know the exception
> >>> state. For example, in the machine A, KVM make an SError pending,
> >>> when migrate to B, KVM also needs to pend an SError.
> >>>
> >>> This new IOCTL exports user-invisible states related to SError.
> >>> Together with appropriate user space changes, user space can get/set
> >>> the SError exception state to do migrate/snapshot/suspend.
> 
> 
> >>> diff --git a/arch/arm/include/uapi/asm/kvm.h
> >>> b/arch/arm/include/uapi/asm/kvm.h index caae484..c3e6975 100644
> >>> --- a/arch/arm/include/uapi/asm/kvm.h
> >>> +++ b/arch/arm/include/uapi/asm/kvm.h
> >>> @@ -124,6 +124,18 @@ struct kvm_sync_regs {  struct
> >>> kvm_arch_memory_slot {  };
> >>>
> >>> +/* for KVM_GET/SET_VCPU_EVENTS */
> >>> +struct kvm_vcpu_events {
> >>> + struct {
> >>> + __u8 serror_pending;
> >>> + __u8 serror_has_esr;
> >>> + /* Align it to 8 bytes */
> >>> + __u8 pad[6];
> >>> + __u64 serror_esr;
> >>> + } exception;
> >>> + __u32 reserved[12];
> >>> +};
> >>> +
> >>
> >> You haven't defined __KVM_HAVE_VCPU_EVENTS for 32bit, so presumably
> >> this struct will never be used. Why is it here?
> 
> >   if not add it for 32 bits. the 32 arm platform will build Fail, whether 
> > you have good
> >idea to avoid this Failure if not add this struct for the 32 bit?
> 
> How does this 32bit code build without this patch?
> If do you provide the struct, how will that code build with older headers?
> 
> As far as I can see, this is what the __KVM_HAVE_VCPU_EVENTS define is for.
> 
> This should be both, or neither. Having just the struct is useless.
> 
> 
> >>> +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> >>> + struct kvm_vcpu_events *events)
> >>> +{
> >>> + bool serror_pending = events->exception.serror_pending;
> >>> + bool has_esr = events->exception.serror_has_esr;
> >>> +
> >>> + if (serror_pending && has_esr) {
> >>> + if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
> >>> + return -EINVAL;
> >>> +
> >>> + kvm_set_sei_esr(vcpu, events->exception.serror_esr);
> >>
> >> kvm_set_sei_esr() will silently discard the top 40 bits of
> >> serror_esr, (which is correct, we shouldn't copy them into hardware 
> >> without know what they do).
> >>
> >> Could we please force user-space to zero these bits, we can advertise
> >> extra CAPs if new features turn up in that space, instead of
> >> user-space passing  and relying on the kernel to remove it.
> >
> >   yes, I can zero these bits in the  user-space and not depend on kernel to 
> > remove it.
> 
> But the kernel must check that user-space did zero those bits. Otherwise 
> user-space may start using them when a future version of the

For this comments, how about add below kernel check that user-space did zero 
those bits? Thanks.

+   if (!((events->exception.serror_esr) & ~ESR_ELx_ISS_MASK))
+   kvm_set_sei_esr(vcpu, events->exception.serror_esr);
+   else
+   return -EINVAL;


> architecture gives them a meaning, but an older kernel version doesn't know 
> it has to do extra work, but still lets the bits from user-space
> through into the hardware.
> 
> If new bits do turn up, we can advertise a CAP that says that KVM supports 
> whatever that feature is.
> 
> 
> >> (Background: VSESR is a 64bit register that holds the value to go in
> >> a 32bit register. I suspect the top-half could get re-used for
> >> control values or something we don't want to give user-space)
> 
> >   do you mean when user-space get the VSESR value through
> > KVM_GET_VCPU_EVENTS it only return the low-half 32 bits?
> 
> No, the kernel will only ever set a 24bit value here. If we force user-space 
> to only provide a 24bit value then we don't need to check it on
> read. We never read the value back from hardware.
> 
> These high bits are RES0 at the moment, they may get used for something in 
> the future. As we are exposing this via a user-space ABI we
> need to make sure we only expose the bits we understand today.

Ok

> 
> 
> Thanks,
> 
> James


Re: [PATCH v1 2/2] arm64: handle NOTIFY_SEI notification by the APEI driver

2018-06-01 Thread gengdongjiu
Hi Mark, James,
   Thanks the review.

On 2018/6/1 0:51, James Morse wrote:
> Hi Mark, Dongjiu Geng,
> 
> On 31/05/18 12:01, Mark Rutland wrote:
>> In do_serror() we already handle nmi_{enter,exit}(), so there's no need
>> for that here.
> 
> Even better: nmi_enter() has a BUG_ON(in_nmi()).

There are two places call the arm64_is_fatal_ras_serror():
1. do_serror()
2. kvm_handle_guest_serror()

Yes, the do_serror() already handle nmi_{enter,exit}(), so the 
arm64_is_fatal_ras_serror() does not need to handle it again.
For the kvm_handle_guest_serror(), it does not handle the nmi_{enter,exit}(), 
so should we add nmi_{enter,exit}() in  kvm_handle_guest_serror() before 
calling arm64_is_fatal_ras_serror()?

For the NOTIFY_SEA, I do not know why handle_guest_sea() does not handle the 
nmi_{enter,exit}(). James, does we miss it in handle_guest_sea()?

> 
> 
>> TBH, I don't understand why do_sea() does that conditionally today.
>> Unless there's some constraint I'm missing, 
> 
> APEI uses a different fixmap entry and locks when in_nmi(). This was because 
> we
> may interrupt the irq-masked region in APEI that was using the regular memory.
> (e.g. the 'polled' notification, or something backed by an interrupt.) But,
> Borislav has spotted other things in here that are broken[0]. I'm working on
> rolling all that into 'v5' of the in_nmi() rework stuff.
> 
> We currently get away with this on arm because 'SEA' is the only NMI-like 
> thing,
> and it occurs synchronously. The problem cases are all also cases where the
> kernel text is corrupt, which we can't possibly hope to handle.
> 
> For NOTIFY_SDEI and NOTIFY_SEI this is the wrong pattern as these are
> asynchronous. do_serror() has already done most of the work for NOTIFY_SEI, 
> but
> we need to use the estatus queue in APEI, which is currently x86 only.
I think this patch can based on you 'v5' of the in_nmi() rework stuff

> 
> 
>> I think it would make more
>> sense to do that regardless of whether the interrupted context had
>> interrupts enabled. James -- does that make sense to you?
>>
>> If you update the prior patch with a stub for !CONFIG_ACPI_APEI_SEI, you
>> can simplify all of the above additions down to:
>>
>>  if (!ghes_notify_sei())
>>  return;
>>
>> ... which would look a lot nicer.
> 
> The code that calls ghes_notify_sei() may need calling by KVM too, but its
> default action to an 'unclaimed' SError will be different.
> Because of the race between memory_failure() and return-to-userspace, we may
> need to kick the irq work queue (if we can), as we return from do_serror(). 
> [1]
> and [2] provide an example for NOTIFY_SEA. SDEI does this by returning to the
> kernel through the IRQ handler, (which handles the KVM case too).
I can kick the IRQ work queue as you do for the NOTIFY_SEA and NOTIFY_SDEI.

> 
> 
> I think this series is unsafe until we can use the estatus queue in APEI. Its
> also missing the handling for an SError interrupting a KVM guest.

how about this series is based on your patches that uses the estatus queue in 
APEI to make it safe?

when an SError interrupting a KVM guest, it will trap to hypervisor, hypervisor 
will call
below software stack to handle it:
kvm_handle_guest_serror()->arm64_is_fatal_ras_serror()->ghes_notify_sei()

so it already handles the case that an SError interrupting a KVM guest.

> 
> 
> Thanks,
> 
> James
> 
> [0] https://www.spinics.net/lists/arm-kernel/msg653332.html
> [1] https://www.spinics.net/lists/arm-kernel/msg649237.html
> [2] https://www.spinics.net/lists/arm-kernel/msg649239.html
> 
> .
> 



Re: [PATCH v1 1/2] ACPI / APEI: Add SEI notification type support for ARMv8

2018-06-01 Thread gengdongjiu
Hi Mark,
   Thanks for the comments.

On 2018/5/31 18:52, Mark Rutland wrote:
> On Thu, May 31, 2018 at 08:41:45PM +0800, Dongjiu Geng wrote:
>> +#ifdef CONFIG_ACPI_APEI_SEI
>> +static LIST_HEAD(ghes_sei);
>> +
>> +/*
>> + * Return 0 only if one of the SEI error sources successfully reported an 
>> error
>> + * record sent from the firmware.
>> + */
>> +int ghes_notify_sei(void)
>> +{
>> +struct ghes *ghes;
>> +int ret = -ENOENT;
>> +
>> +rcu_read_lock();
>> +list_for_each_entry_rcu(ghes, &ghes_sei, list) {
>> +if (!ghes_proc(ghes))
>> +ret = 0;
>> +}
>> +rcu_read_unlock();
>> +return ret;
>> +}
>> +
>> +static void ghes_sei_add(struct ghes *ghes)
>> +{
>> +mutex_lock(&ghes_list_mutex);
>> +list_add_rcu(&ghes->list, &ghes_sei);
>> +mutex_unlock(&ghes_list_mutex);
>> +}
>> +
>> +static void ghes_sei_remove(struct ghes *ghes)
>> +{
>> +mutex_lock(&ghes_list_mutex);
>> +list_del_rcu(&ghes->list);
>> +mutex_unlock(&ghes_list_mutex);
>> +synchronize_rcu();
>> +}
>> +#else /* CONFIG_ACPI_APEI_SEI */
>> +static inline void ghes_sei_add(struct ghes *ghes) { }
>> +static inline void ghes_sei_remove(struct ghes *ghes) { }
>> +#endif /* CONFIG_ACPI_APEI_SEI */
>> +
>>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>  /*
> 
>> index 8feb0c8..9ba59e2 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct 
>> acpi_hest_generic_data *gdata)
>>   section = acpi_hest_get_next(section))
>>  
>>  int ghes_notify_sea(void);
>> +int ghes_notify_sei(void);
> 
> It would be nice to have a stub definition when !CONFIG_ACPI_APEI_SEI,
> e.g.
> 
> #ifdef CONFIG_ACPI_APEI_SEI
> int ghes_notify_sei(void);
> #else
> static in int ghes_notify_sei(void) { return -ENOENT; }
> #endif
> 
> ... as callers could simply call this without additional checks.
> 
> As a cleanup, similar would be nice for ghes_notify_sea() with
> CONFIG_ACPI_APEI_SEA.

I think your suggestion is better, I will do the cleanup include the 
ghes_notify_sea().
thanks again.

> 
> Thanks,
> Mark.
> 
> .
> 



Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Christoffer,
   Thanks for the review.

On 2018/6/9 19:17, Christoffer Dall wrote:
> On Sat, Jun 09, 2018 at 03:48:40AM +0800, Dongjiu Geng wrote:
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>> change since v3:
>> 1. Fix the memset() issue in the kvm_arm_vcpu_get_events()
>>
>> change since v2:
>> 1. Add kvm_vcpu_events structure definition for arm platform to avoid the 
>> build errors.
>>
>> change since v1:
>> Address Marc's comments, thanks Marc's review
>> 1. serror_has_esr always true when ARM64_HAS_RAS_EXTN is set
>> 2. remove Spurious blank line in kvm_arm_vcpu_set_events()
>> 3. rename pend_guest_serror() to kvm_set_sei_esr()
>> 4. Make kvm_arm_vcpu_get_events() did all the work rather than having this 
>> split responsibility.
>> 5.  using sizeof(events) instead of sizeof(struct kvm_vcpu_events)
>>
>> this series patch is separated from 
>> https://www.spinics.net/lists/kvm/msg168917.html
>> The user space patch is here: 
>> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html
>>
>> change since V12:
>> 1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) 
>> in kvm_arm_vcpu_get_events()
>>
>> Change since V11:
>> Address James's comments, thanks James
>> 1. Align the struct of kvm_vcpu_events to 64 bytes
>> 2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
>> 3. Change variables 'injected' name to 'serror_pending' in the 
>> kvm_arm_vcpu_set_events()
>> 4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in 
>> kvm_arch_vcpu_ioctl()
>>
>> Change since V10:
>> Address James's comments, thanks James
>> 1. Merge the helper function with the user.
>> 2. Move the ISS_MASK into pend_guest_serror() to clear top bits
>> 3. Make kvm_vcpu_events struct align to 4 bytes
>> 4. Add something check in the kvm_arm_vcpu_set_events()
>> 5. Check kvm_arm_vcpu_get/set_events()'s return value.
>> 6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space 
>> doesn't
>>contain kernel stack.
>> ---
>>  Documentation/virtual/kvm/api.txt| 31 ---
>>  arch/arm/include/asm/kvm_host.h  |  6 ++
>>  arch/arm/include/uapi/asm/kvm.h  | 12 
>>  arch/arm/kvm/guest.c | 12 
>>  arch/arm64/include/asm/kvm_emulate.h |  5 +
>>  arch/arm64/include/asm/kvm_host.h|  7 +++
>>  arch/arm64/include/uapi/asm/kvm.h| 13 +
>>  arch/arm64/kvm/guest.c   | 36 
>> 
>>  arch/arm64/kvm/inject_fault.c|  6 +++---
>>  arch/arm64/kvm/reset.c   |  1 +
>>  virt/kvm/arm/arm.c   | 19 +++
>>  11 files changed, 142 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
> 
> nit: unintended change?
> 
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions

Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Hi Marc
   thanks for the review.

On 2018/6/9 20:40, Marc Zyngier wrote:
> On Fri, 08 Jun 2018 20:48:40 +0100,
> Dongjiu Geng wrote:
>>
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>> change since v3:
>> 1. Fix the memset() issue in the kvm_arm_vcpu_get_events()
>>
>> change since v2:
>> 1. Add kvm_vcpu_events structure definition for arm platform to avoid the 
>> build errors.
>>
>> change since v1:
>> Address Marc's comments, thanks Marc's review
>> 1. serror_has_esr always true when ARM64_HAS_RAS_EXTN is set
>> 2. remove Spurious blank line in kvm_arm_vcpu_set_events()
>> 3. rename pend_guest_serror() to kvm_set_sei_esr()
>> 4. Make kvm_arm_vcpu_get_events() did all the work rather than having this 
>> split responsibility.
>> 5.  using sizeof(events) instead of sizeof(struct kvm_vcpu_events)
>>
>> this series patch is separated from 
>> https://www.spinics.net/lists/kvm/msg168917.html
>> The user space patch is here: 
>> https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html
>>
>> change since V12:
>> 1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) 
>> in kvm_arm_vcpu_get_events()
>>
>> Change since V11:
>> Address James's comments, thanks James
>> 1. Align the struct of kvm_vcpu_events to 64 bytes
>> 2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
>> 3. Change variables 'injected' name to 'serror_pending' in the 
>> kvm_arm_vcpu_set_events()
>> 4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in 
>> kvm_arch_vcpu_ioctl()
>>
>> Change since V10:
>> Address James's comments, thanks James
>> 1. Merge the helper function with the user.
>> 2. Move the ISS_MASK into pend_guest_serror() to clear top bits
>> 3. Make kvm_vcpu_events struct align to 4 bytes
>> 4. Add something check in the kvm_arm_vcpu_set_events()
>> 5. Check kvm_arm_vcpu_get/set_events()'s return value.
>> 6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space 
>> doesn't
>>contain kernel stack.
>> ---
>>  Documentation/virtual/kvm/api.txt| 31 ---
>>  arch/arm/include/asm/kvm_host.h  |  6 ++
>>  arch/arm/include/uapi/asm/kvm.h  | 12 
>>  arch/arm/kvm/guest.c | 12 
>>  arch/arm64/include/asm/kvm_emulate.h |  5 +
>>  arch/arm64/include/asm/kvm_host.h|  7 +++
>>  arch/arm64/include/uapi/asm/kvm.h| 13 +
>>  arch/arm64/kvm/guest.c   | 36 
>> 
>>  arch/arm64/kvm/inject_fault.c|  6 +++---
>>  arch/arm64/kvm/reset.c   |  1 +
>>  virt/kvm/arm/arm.c   | 19 +++
>>  11 files changed, 142 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.
>>

Re: [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-06-12 Thread gengdongjiu
Hi James,
  thanks for the review.

On 2018/6/11 21:36, James Morse wrote:
> Hi Dongjiu Geng,
> 
> Please only put 'RESEND' in the subject if the patch content is identical.
> This patch is not the same as v4.

Yes, it should

> 
> On 08/06/18 20:48, Dongjiu Geng wrote:
>> For the migrating VMs, user space may need to know the exception
>> state. For example, in the machine A, KVM make an SError pending,
>> when migrate to B, KVM also needs to pend an SError.
>>
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, user space can get/set
>> the SError exception state to do migrate/snapshot/suspend.
> 
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index fdac969..8896737 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -835,11 +835,13 @@ struct kvm_clock_data {
>>  
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
> 
> Isn't this actually a per-vcpu ioctl? Can we fix the documentation?
  I will modify the original documentation
> 
> 
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as related
>>  states of the vcpu.
>>  
>> @@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>  
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the 
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
>>  4.32 KVM_SET_VCPU_EVENTS
>>  
>> -Capability: KVM_CAP_VCPU_EVENTS
>> +Capebility: KVM_CAP_VCPU_EVENTS
> 
> (please fix this)
   Ok, will fix this
> 
> 
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
> 
> (this is also a vcpu ioctl)
  will fix

> 
> 
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>  
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of 
>> the
>>  vcpu.
>>  
>> @@ -910,6 +929,12 @@ shall be written into the VCPU.
>>  
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>  
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.
> 
> There are some deliberate choices here I think we should document:
> | This API can't be used to clear a pending SError.
> 
> If there already was an SError pending, this API just overwrites it with the 
> new
> one. The architecture has some rules about merging multiple SError. (details 
> in
> 2.5.3 Multiple SError interrupts of [0])
> 
> I don't think KVM needs to enforce these, as they are implementation-defined 
> if
> one of the ESR is implementation-defined... the part that matters is reporting
> the 'most severe' RAS ESR if there are multiple pending. As only user-space 
> ever
> sets these, let's make it user-spaces problem to do.
> 
> I think we should recommend user-space always reads the pending values and
> applies its merging-multiple-SError logic. (I assume your Qemu patches do 
> this).


 I will check whether QEMU can be possible to do such things, anyway this patch
 not need to do such merging.

> Something like:
> | User-space should first use KVM_GET_VCPU_EVENTS in case KVM has made an 
> SError
> | pending as part of its device emulation. When both values are architected 
> RAS
> | SError ESR values, the new ESR should reflect the combined effect of both
> | errors.>
> 
>> diff --git a/arch/arm/include/uapi/asm/kvm.h 
>> b/arch/arm/include/uapi/asm/kvm.h
>> index caae484..c3e6975 100644
>> --- a/arch/arm/include/uapi/asm/kvm.h
>> +++ b/arch/arm/include/uapi/asm/kvm.h
>> @@ -124,6 +124,18 @@ struct kvm_sync_regs {
>>  struct kvm_arch_memory_slot {
>>  };
>>  
>> +/* for KVM_GET/SET_VCPU_EVENTS */
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 8 bytes */
>> +__u8 pad[6];
>> +__u64 serror_esr;
>> +} exception;
>> +__u32 reserved[12];
>> +};
>> +
> 
> You haven't defined __KVM_HAVE_VCPU_EVENTS for 32bit, so presumably this 
> struct
> will never be used. Why is it here?
  if not add it for 32 bits. the 32 arm platform will build Fail, whether you 
have good
   idea to avoid this Failure if not add this struct for the 32 bit?
> (I agree if we ever provide it on 32bit, the struct layout should be the same.
> Is this only here to force that to h

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2018-12-14 Thread gengdongjiu
HI James,

  Thanks for the mail and comments, I will reply to you in the next mail.

2018-12-14 21:55 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 14/12/2018 10:15, Dongjiu Geng wrote:
>> When user space do memory recovery, it will check whether KVM and
>> guest support the error recovery, only when both of them support,
>> user space will do the error recovery. This patch exports this
>> capability of KVM to user space.
>
> I can understand user-space only wanting to do the work if host and guest
> support the feature. But 'error recovery' isn't a KVM feature, its a Linux
> kernel feature.
>
> KVM will send it's user-space a SIGBUS with MCEERR code whenever its trying
> to
> map a page at stage2 that the kernel-mm code refuses this because its
> poisoned.
> (e.g. check_user_page_hwpoison(), get_user_pages() returns -EHWPOISON)
>
> This is exactly the same as happens to a normal user-space process.
>
> I think you really want to know if the host kernel was built with
> CONFIG_MEMORY_FAILURE. The not-at-all-portable way to tell this from
> user-space
> is the presence of /proc/sys/vm/memory_failure_* files.
> (It looks like the prctl():PR_MCE_KILL/PR_MCE_KILL_GET options silently
> update
> an ignored policy if the kernel isn't built with CONFIG_MEMORY_FAILURE, so
> they
> aren't helpful)
>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index cd209f7..241e2e2 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4895,3 +4895,12 @@ Architectures: x86
>>  This capability indicates that KVM supports paravirtualized Hyper-V IPI
>> send
>>  hypercalls:
>>  HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
>> +
>> +8.21 KVM_CAP_ARM_MEMORY_ERROR_RECOVERY
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that guest memory error can be detected by the
>> KVM which
>> +supports the error recovery.
>
> KVM doesn't detect these errors.
> The hardware detects them and notifies the OS via one of a number of
> mechanisms.
> This gets plumbed into memory_failure(), which sets a flag that the mm code
> uses
> to prevent the page being used again.
>
> KVM is only involved when it tries to map a page at stage2 and the mm code
> rejects it with -EHWPOISON. This is the same as the architectures
> do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of
> handle_mm_fault(). We don't have a KVM cap for this, nor do we need one.
>
>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index b72a3dd..90d1d9a 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -82,6 +82,7 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm,
>> long ext)
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>>  case KVM_CAP_ARM_INJECT_SERROR_ESR:
>> +case KVM_CAP_ARM_MEMORY_ERROR_RECOVERY:
>>  r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
>>  break;
>
> The CPU RAS Extensions are not at all relevant here. It is perfectly
> possible to
> support memory-failure without them, AMD-Seattle and APM-X-Gene do this.
> These
> systems would report not-supported here, but the kernel does support this
> stuff.
> Just because the CPU supports this, doesn't mean the kernel was built with
> CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to
> SIGKILL.
>
>
>
> Thanks,
>
> James
>


Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2018-12-14 Thread gengdongjiu
> 
> On Fri, 14 Dec 2018 at 13:56, James Morse  wrote:
> >
> > Hi Dongjiu Geng,
> >
> > On 14/12/2018 10:15, Dongjiu Geng wrote:
> > > When user space do memory recovery, it will check whether KVM and
> > > guest support the error recovery, only when both of them support,
> > > user space will do the error recovery. This patch exports this
> > > capability of KVM to user space.
> >
> > I can understand user-space only wanting to do the work if host and
> > guest support the feature. But 'error recovery' isn't a KVM feature,
> > its a Linux kernel feature.
> >
> > KVM will send it's user-space a SIGBUS with MCEERR code whenever its
> > trying to map a page at stage2 that the kernel-mm code refuses this because 
> > its poisoned.
> > (e.g. check_user_page_hwpoison(), get_user_pages() returns -EHWPOISON)
> >
> > This is exactly the same as happens to a normal user-space process.
> >
> > I think you really want to know if the host kernel was built with
> > CONFIG_MEMORY_FAILURE.
> 
> Does userspace need to care about that? Presumably if the host kernel wasn't 
> built with that support then it will simply never deliver any
> memory failure events to QEMU, which is fine.
> 
> The point I was trying to make in the email Dongjiu references
> (https://patchwork.codeaurora.org/patch/652261/) is simply that "QEMU gets 
> memory-failure notifications from the host kernel"
> does not imply "the guest is prepared to receive memory failure 
> notifications", and so the code path which handles the SIGBUS must do
> some kind of check for whether the guest CPU is a type which expects them and 
> that the board code set up the ACPI tables that it wants to
> fill in.

Thanks Peter's explanation. Frankly speaking, I agree Peter's suggestion.

To James, I explain more to you, as peter said QEMU needs to check whether the 
guest CPU is a type which can handle the error though guest ACPI table. Let us 
see the X86's QEMU logic:
1. Before the vCPU created, it will set a default env->mcg_cap value with 
MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports RAS 
error recovery.[1]
2. when the vCPU initialize, it will check whether host kernel support this 
feature[2]. Only when host kernel and default env->mcg_cap value all expected 
this feature, then it will setup vCPU support RAS error recovery[3].
So I add this IOCTL "KVM_CAP_ARM_MEMORY_ERROR_RECOVERY" to Let QEMU check 
whether host/KVM support RAS error detection and recovery, only when this 
supports, QEMU will do the error recovery for the guest memory. 

[1]
#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF |
(cpu->enable_lmce ? MCG_LMCE_P : 0);

[2] ret = kvm_get_mce_cap_supported(cs->kvm_state, &mcg_cap, &banks);

[3]
env->mcg_cap &= mcg_cap | MCG_CAP_BANKS_MASK;
ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, &env->mcg_cap);

-For James's 
comments-
> KVM doesn't detect these errors.
> The hardware detects them and notifies the OS via one of a number of 
> mechanisms.
> This gets plumbed into memory_failure(), which sets a flag that the mm code 
> uses
> to prevent the page being used again.

> KVM is only involved when it tries to map a page at stage2 and the mm code
> rejects it with -EHWPOISON. This is the same as the architectures
> do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of
> handle_mm_fault(). We don't have a KVM cap for this, nor do we need one.
--
James, for your above comments, I completed understand, but KVM also delivered 
the SIGBUS, which means KVM supports guest memory RAS error recovery, so maybe 
we need to tell user space this capability.

-- For James's comments 
---
> The CPU RAS Extensions are not at all relevant here. It is perfectly possible 
> to
> support memory-failure without them, AMD-Seattle and APM-X-Gene do this. These
> systems would report not-supported here, but the kernel does support this 
> stuff.
> Just because the CPU supports this, doesn't mean the kernel was built with
> CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to SIGKILL.
--
James, for your above comments[4], if you think we should not check the 
"cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", which do you prefer we should check?
In the X86 KVM code, it uses hardcode to tell use space the host/KVM support 
RAS error software recovery. If KVM does not check the " 
cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", we have to check the hardcode as 
X86's method.

[4]:
u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_S

撤回: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2018-12-14 Thread gengdongjiu
gengdongjiu 将撤回邮件“[RFC RESEND PATCH] kvm: arm64: export memory error recovery 
capability to user space”。

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2018-12-14 Thread gengdongjiu
> 
> On Fri, 14 Dec 2018 at 13:56, James Morse  wrote:
> >
> > Hi Dongjiu Geng,
> >
> > On 14/12/2018 10:15, Dongjiu Geng wrote:
> > > When user space do memory recovery, it will check whether KVM and 
> > > guest support the error recovery, only when both of them support, 
> > > user space will do the error recovery. This patch exports this 
> > > capability of KVM to user space.
> >
> > I can understand user-space only wanting to do the work if host and 
> > guest support the feature. But 'error recovery' isn't a KVM feature, 
> > its a Linux kernel feature.
> >
> > KVM will send it's user-space a SIGBUS with MCEERR code whenever its 
> > trying to map a page at stage2 that the kernel-mm code refuses this because 
> > its poisoned.
> > (e.g. check_user_page_hwpoison(), get_user_pages() returns 
> > -EHWPOISON)
> >
> > This is exactly the same as happens to a normal user-space process.
> >
> > I think you really want to know if the host kernel was built with 
> > CONFIG_MEMORY_FAILURE.
> 
> Does userspace need to care about that? Presumably if the host kernel 
> wasn't built with that support then it will simply never deliver any memory 
> failure events to QEMU, which is fine.
> 
> The point I was trying to make in the email Dongjiu references
> (https://patchwork.codeaurora.org/patch/652261/) is simply that "QEMU gets 
> memory-failure notifications from the host kernel"
> does not imply "the guest is prepared to receive memory failure 
> notifications", and so the code path which handles the SIGBUS must do 
> some kind of check for whether the guest CPU is a type which expects them and 
> that the board code set up the ACPI tables that it wants to fill in.

Thanks Peter's explanation. Frankly speaking, I agree Peter's suggestion.

To James, I explain more to you, as peter said QEMU needs to check whether the 
guest CPU is a type which can handle the error though guest ACPI table. Let us 
see the X86's QEMU logic:
1. Before the vCPU created, it will set a default env->mcg_cap value with 
MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports RAS 
error recovery.[1] 2. when the vCPU initialize, it will check whether host 
kernel support this feature[2]. Only when host kernel and default env->mcg_cap 
value all expected this feature, then it will setup vCPU support RAS error 
recovery[3].
So I add this IOCTL "KVM_CAP_ARM_MEMORY_ERROR_RECOVERY" to Let QEMU check 
whether host/KVM support RAS error detection and recovery, only when this 
supports, QEMU will do the error recovery for the guest memory. 

[1]
#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
cenv->mcg_cap = MCE_CAP_DEF | MCE_BANKS_DEF |
(cpu->enable_lmce ? MCG_LMCE_P : 0);

[2] ret = kvm_get_mce_cap_supported(cs->kvm_state, &mcg_cap, &banks);

[3]
env->mcg_cap &= mcg_cap | MCG_CAP_BANKS_MASK;
ret = kvm_vcpu_ioctl(cs, KVM_X86_SETUP_MCE, &env->mcg_cap);

-For James's 
comments-
> KVM doesn't detect these errors.
> The hardware detects them and notifies the OS via one of a number of 
> mechanisms.
> This gets plumbed into memory_failure(), which sets a flag that the mm 
> code uses to prevent the page being used again.

> KVM is only involved when it tries to map a page at stage2 and the mm 
> code rejects it with -EHWPOISON. This is the same as the architectures
> do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of 
> handle_mm_fault(). We don't have a KVM cap for this, nor do we need one.
--
James, for your above comments, I completed understand, but KVM also delivered 
the SIGBUS, which means KVM supports guest memory RAS error recovery, so maybe 
we need to tell user space this capability.

-- For James's comments 
---
> The CPU RAS Extensions are not at all relevant here. It is perfectly 
> possible to support memory-failure without them, AMD-Seattle and 
> APM-X-Gene do this. These systems would report not-supported here, but the 
> kernel does support this stuff.
> Just because the CPU supports this, doesn't mean the kernel was built 
> with CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to 
> SIGKILL.
--
James, for your above comments, if you think we should not check the 
"cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", which do you prefer we should check?
In the X86 KVM code, it uses hardcode to tell use space the host/KVM support 
RAS error software recovery[4]. If KVM does not check the " 
cpus_have_const_cap(ARM64_HAS_RAS_EXTN)", we have to check the hardcode as 
X86's method.

[4]:
u64 __read_mostly kvm_mce_cap_supported = M

Re: Disable RT_GROUP_SCHED in PREEMPT_RT_FULL

2018-12-09 Thread gengdongjiu
Loop RT kernel Mailing lists.


On 2018/12/10 1:04, zhang jianwei wrote:
> hi Thomas and Mike??
> i say you disabled the RT_GROUP_SCHED in RT kernel because of strange cpu 
> stall was observed. currently, in my projects, I need 
> enable?0?2RT_GROUP_SCHED in RT kernel. I??d like to know how to reproduce the 
> strange cpu stall in order to fix this bug. can you guys give me some advice 
> ?0?2about how to fix this problem.
> Thanks in advance!
> 
> 
> patched mentioned?0?2
> https://lore.kernel.org/patchwork/patch/313215/
> 
> 



Re: [PATCH v2 2/2] arm/arm64: KVM: enable 32 bits kvm vcpu events support

2018-10-09 Thread gengdongjiu
Hi Suzuki

On 2018/10/10 1:22, Suzuki K Poulose wrote:
> 
> 
> On 08/10/18 13:34, Dongjiu Geng wrote:
>> The commit 539aee0edb9f ("KVM: arm64: Share the parts of
>> get/set events useful to 32bit") shares the get/set events
>> helper for arm64 and arm32, it is better also share the check
>> for vcpu events capability to enable 32 bit kvm vcpu events
>> support.
>>
>> User space will check whether KVM supports vcpu events through
>> KVM_CAP_VCPU_EVENTS IOCTL.
> 
> nit: User space will check whether KVM supports vcpu events by checking
> the KVM_CAP_VCPU_EVENTS extension.

  Thanks for the pointing out, I will update it.

> 
>>
>> Cc: James Morse 
>> Signed-off-by: Dongjiu Geng 
> 
> Reviewed-by : Suzuki K Poulose 
> 
>> ---
>> For the 32 bits kvm migration, it needs to enable the vcpu events,
>> this patch will enable it. The user space QEMU patch is here:
>> https://patchwork.ozlabs.org/patch/975615/
>> ---
>>   arch/arm64/kvm/reset.c | 1 -
>>   virt/kvm/arm/arm.c | 1 +
>>   2 files changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index fd37c53..e50245e 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -82,7 +82,6 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, 
>> long ext)
>>   break;
>>   case KVM_CAP_SET_GUEST_DEBUG:
>>   case KVM_CAP_VCPU_ATTRIBUTES:
>> -    case KVM_CAP_VCPU_EVENTS:
>>   r = 1;
>>   break;
>>   default:
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 40e79ea..64e5d97 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -212,6 +212,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
>> ext)
>>   case KVM_CAP_READONLY_MEM:
>>   case KVM_CAP_MP_STATE:
>>   case KVM_CAP_IMMEDIATE_EXIT:
>> +    case KVM_CAP_VCPU_EVENTS:
>>   r = 1;
>>   break;
>>   case KVM_CAP_ARM_SET_DEVICE_ADDR:
>>
> 
> .
> 



Re: [PATCH] usb: xhci: remove the code build warning

2018-06-05 Thread gengdongjiu



On 2018/6/5 16:40, Greg KH wrote:
> On Wed, Jun 06, 2018 at 12:35:00AM +0800, Dongjiu Geng wrote:
>> Initialize the 'err' variate to remove the build warning,
>> the warning is shown as below:
>>
>> drivers/usb/host/xhci-tegra.c: In function ‘tegra_xusb_mbox_thread’:
>> drivers/usb/host/xhci-tegra.c:552:6: warning: ‘err’ may be used 
>> uninitialized in this function [-Wuninitialized]
>> drivers/usb/host/xhci-tegra.c:482:6: note: ‘err’ was declared here
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
> 
> Any hint as to what commit caused this warning to show up?

It seems below commit:

commit e84fce0f8837496a48d11086829bdbe170358b7a
Author: Thierry Reding 
Date:   Thu Feb 11 18:10:48 2016 +0100

usb: xhci: Add NVIDIA Tegra XUSB controller driver

Add support for the on-chip XUSB controller present on Tegra SoCs. This
controller, when loaded with external firmware, exposes an interface
compliant with xHCI. This driver loads the firmware, starts the
controller, and is able to service host-specific messages sent by the
controller's firmware.

The controller also supports USB device mode as well as powergating
of the SuperSpeed and host-controller logic when not in use, but
support for these is not yet implemented.

Based on work by:
  Ajay Gupta 
  Bharath Yadav 
  Andrew Bresticker 

Cc: Mathias Nyman 
Cc: Greg Kroah-Hartman 
Acked-by: Mathias Nyman 
Signed-off-by: Thierry Reding 


> 
> thanks,
> 
> greg k-h
> 
> .
> 



Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2019-01-10 Thread gengdongjiu
Hi James/Peter,
   thanks for this discussion, and sorry for my late response due to vacation.

On 2018/12/22 2:17, James Morse wrote:
> Hi Peter,
> 
> On 19/12/2018 19:02, Peter Maydell wrote:
>> On Mon, 17 Dec 2018 at 15:56, James Morse  wrote:
>>> I don't think this really matters. Its only the NMIlike notifications that 
>>> the
>>> guest doesn't have to register or poll. The ones we support today extend the
>>> architectures existing behaviour: you would have taken an external-abort on 
>>> a
>>> real system, whether you know about the additional metadata doesn't matter 
>>> to Qemu.
>>
>> Consider the case where we booted the guest using a DTB and no ACPI
>> table at all -- we certainly can't just call QEMU code that tries to
>> add entries to a nonexistent table.
> 
> Sure, because you know which of the two sets of firmware-table you're 
> providing.
> 
> I'm taking the behaviour of physical machines as the template for what we 
> should
> do here. I can boot a DT-only kernel on Seattle. Firmware has no idea I did
> this, it will still take DRAM uncorrected-error IRQs in firmware, and generate
> CPER records in the POLLed areas. But the kernel will never look, because it
> booted with DT.
> What happens if the kernel goes on to access the corrupt location? It either
> gets corrupt values back, or an external abort, depending on the design of the
> memory-controller.
> 
> X-gene uses an IRQ for its firmware-first notification. Booted with DT that
> interrupt can be asserted, but as the OS has didn't know to register it, its
> never taken. We eventually get the same corrupt-values/external-abort 
> behaviour.
> 
> KVM/Linux is acting as the memory controller using stage2. When an error is
> detected by the host it unmaps the page from stage2, and refuses to map it 
> again
> until its fixed up in Qemu's memory map (which can happen automatically). If 
> the
> kernel can't fix it itself, the AO signal is like the DRAM-IRQ above, and the 
> AR
> like the external abort.
> We don't have a parallel to the 'gets corrupt values back' behaviour as Linux
> will always unmap hwpoison pages from user-space/guests.
> 
> If the host-kernel wasn't build with CONFIG_MEMORY_FAILURE, its like the 
> memory
> controller doesn't support any of the above. I think knowing this is the 
> closest
> to what you want.
> 
> 
>> My main point is that there
>> needs to be logic in Dongjiu's QEMU patches that checks more than
>> just "does this KVM feature exist". I'm not sufficiently familiar
>> with all this RAS stuff to be certain what those checks should
>> be and what the right choices are; I just know we need to check
>> *something*...
> 
> I think this is the crux of where we don't see this the same way.
> The v8.2 RAS stuff is new, RAS support on arm64 is not. Kernel support arrived
> at roughly the same time, but not CPU support. There are v8.0 systems that
> support RAS. There are DT systems that can do the same with edac drivers.
> The physical v8.0 systems that do this, are doing it without any extra CPU 
> support.
> 
> I think x86's behaviour here includes some history, which we don't have.
>>From the order of the HEST entries, it looks like the machine-check stuff came
> first, then firmware-first using a 'GHES' entry in that table.
> I think Qemu on x86 only supports the emulated machine check stuff, so it 
> needs
> to check KVM has the widget to do this.
> If Qemu on x86 supported firmware-first, I don't think there would be anything
> to check. (details below)

Peter, I summarize James's main idea, James think QEMU does not needs to check 
*something* if Qemu support firmware-first.
What do we do for your comments?

> 
> 
 Let us see the X86's QEMU logic:
 1. Before the vCPU created, it will set a default env->mcg_cap value with
>>>
 MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports
 RAS error recovery.[1] 2. when the vCPU initialize, it will check whether 
 host
 kernel support this feature[2]. Only when host kernel and default 
 env->mcg_cap
 value all expected this feature, then it will setup vCPU support RAS error
 recovery[3].
>>>
>>> This looks like KVM exposing a CPU capability to Qemu, which then 
>>> configures the
>>> behaviour KVM gives to the guest. This doesn't tell you anything about what 
>>> the
>>> guest supports.
>>
>> It tells you what the *guest CPU* supports, which for x86 is a combination
>> of (a) what did the user/machine model ask for and (b) what can KVM
>> actually implement. I don't much care whether the guest OS supports
>> anything or not, that's its business... but it does seem odd to me
>> that the equivalent Arm code is not similarly saying "what were we
>> asked for, and what can we do?".
> 
> The flow is something like:
> For AO, generate CPER records, and notify the OS via NOTIFY_POLL (which isn't
> really a notification) or some flavour of IRQ.
> To do this, Qemu needs to be able to write to its reserved area of g

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2019-01-10 Thread gengdongjiu
> 
> On Thu, 10 Jan 2019 at 12:09, gengdongjiu  wrote:
> > Peter, I summarize James's main idea, James think QEMU does not needs
> > to check *something* if Qemu support firmware-first.
> > What do we do for your comments?
> 
> Unless I'm missing something, the code in your most recent patchset attempts 
> to update an ACPI table when it gets the SIGBUS from the
> host kernel without doing anything to check whether it has ever created the 
> ACPI table (and set up the QEMU global variable that tells the
> code where it is in the guest memory) in the first place.

when QEMU version is greater than some version, it will default create the APEI 
table. But only when the guest is booted by UEFI, it will support to record the 
CPER to guest memory. 
In my test, I boot guest using UEFI, so it is no problem, I will check whether 
this booting uses UEFI before update the ACPI table.

> I don't see how that can work.
> 
> > >> I think one question here which it would be good to answer is:
> > >> if we are modelling a guest and we haven't specifically provided it
> > >> an ACPI table to tell it about memory errors, what do we do when we
> > >> get a sigbus from the host? We have basically two choices:
> > >>  (1) send the guest an SError (aka asynchronous external abort)
> > >>  anyway (with no further info about what the memory error is)
> > >
> > > For an AR signal an external abort is valid. Its up to the
> > > implementation whether these are synchronous or asynchronous. Qemu
> > > can only take a signal for something that was synchronous, so you can 
> > > choose between the two.
> > > Synchronous external abort is marginally better as an unaware OS
> > > knows its affects this thread, and may be able to kill it.
> > > SError with an imp-def ESR is indistinguishable from 'part of the
> > > soc fell out', and should always result in a panic().
> > >
> > >
> > >>  (2) just stop QEMU (as we would for a memory error in QEMU's
> > >>  own memory)
> > >
> > > This is also valid. A machine may take external-abort to EL3 and
> > > then reboot/crash/burn.
> 
> We should decide which of these we want to do, and have a comment explaining 
> what we're doing. If I'm reading your current patchset
> correctly, it does neither -- if it can't record the fault in the ACPI table 
> it just ignores it without either stopping QEMU or delivering an SError.

James may not know my detailed implementation in the QEMU. In my patch, I only 
handle the BUS_MCEERR_AR SIGBUS signal(synchronous signal). when the SIGBUS is 
BUS_MCEERR_AR, it will deliver a synchronous exception abort.
James said it needs to deliver an SError when the BUS_MCEERR_OR SIGBUS 
signal(synchronous signal), but I do not handle the this case because QEMU main 
thread will mask this asynchronous signal.

If the memory error is belong to QEMU itself, I just print an error log[2]. If 
you think, it should stop QEMU for this case, I will change it.

void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) {
..
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
if (ghes_record_errors(ACPI_HEST_NOTIFY_SEA, paddr)) {
kvm_inject_arm_sea(c);
} else {
fprintf(stderr, "failed to record the error\n");
}
}
[2] fprintf(stderr, "Hardware memory error for memory used by "
"QEMU itself instead of guest system!\n");
}
> 
> I think I favour option (2) here.
> 
> thanks
> -- PMM


撤回: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2019-01-10 Thread gengdongjiu
gengdongjiu 将撤回邮件“[RFC RESEND PATCH] kvm: arm64: export memory error recovery 
capability to user space”。

Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

2019-01-10 Thread gengdongjiu
> 
> On Thu, 10 Jan 2019 at 12:09, gengdongjiu  wrote:
> > Peter, I summarize James's main idea, James think QEMU does not 
> > needs to check *something* if Qemu support firmware-first.
> > What do we do for your comments?
> 
> Unless I'm missing something, the code in your most recent patchset 
> attempts to update an ACPI table when it gets the SIGBUS from the host 
> kernel without doing anything to check whether it has ever created the ACPI 
> table (and set up the QEMU global variable that tells the code where it is in 
> the guest memory) in the first place.

when QEMU version is greater than some version, it will default create the APEI 
table. But only when the guest is booted by UEFI, it will support to record the 
CPER to guest memory. 
In my test, I boot guest using UEFI, so it is no problem, I will check whether 
this booting uses UEFI before update the ACPI table.

> I don't see how that can work.
> 
> > >> I think one question here which it would be good to answer is:
> > >> if we are modelling a guest and we haven't specifically provided 
> > >> it an ACPI table to tell it about memory errors, what do we do 
> > >> when we get a sigbus from the host? We have basically two choices:
> > >>  (1) send the guest an SError (aka asynchronous external abort)
> > >>  anyway (with no further info about what the memory error is)
> > >
> > > For an AR signal an external abort is valid. Its up to the 
> > > implementation whether these are synchronous or asynchronous. Qemu 
> > > can only take a signal for something that was synchronous, so you can 
> > > choose between the two.
> > > Synchronous external abort is marginally better as an unaware OS 
> > > knows its affects this thread, and may be able to kill it.
> > > SError with an imp-def ESR is indistinguishable from 'part of the 
> > > soc fell out', and should always result in a panic().
> > >
> > >
> > >>  (2) just stop QEMU (as we would for a memory error in QEMU's
> > >>  own memory)
> > >
> > > This is also valid. A machine may take external-abort to EL3 and 
> > > then reboot/crash/burn.
> 
> We should decide which of these we want to do, and have a comment 
> explaining what we're doing. If I'm reading your current patchset correctly, 
> it does neither -- if it can't record the fault in the ACPI table it just 
> ignores it without either stopping QEMU or delivering an SError.

James may not know my detailed implementation in the QEMU. In my patch, I only 
handle the BUS_MCEERR_AR SIGBUS signal(synchronous signal). when the SIGBUS is 
BUS_MCEERR_AR, it will deliver a synchronous exception abort.
James said it needs to deliver an SError when the BUS_MCEERR_OR SIGBUS 
signal(asynchronous signal), but I do not handle the this case because QEMU 
main thread will mask this asynchronous signal.

If the memory error is belong to QEMU itself, I just print an error log[2]. If 
you think, it should stop QEMU for this case, I will change it.

void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) {
..
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
if (ghes_record_errors(ACPI_HEST_NOTIFY_SEA, paddr)) {
kvm_inject_arm_sea(c);
} else {
fprintf(stderr, "failed to record the error\n");
}
}
[2] fprintf(stderr, "Hardware memory error for memory used by "
"QEMU itself instead of guest system!\n"); 
}

> 
> I think I favour option (2) here.
> 
> thanks
> -- PMM


Re: [PATCH v2] usb: xhci: remove the code build warning

2018-06-11 Thread gengdongjiu
Hi Mathias,
   This patch is ready to apply, It is great that this patch can be applied to 
4.18. thanks a lot.

On 2018/6/6 22:36, Dongjiu Geng wrote:
> Initialize the 'err' variate to remove the build warning,
> the warning is shown as below:
> 
> drivers/usb/host/xhci-tegra.c: In function 'tegra_xusb_mbox_thread':
> drivers/usb/host/xhci-tegra.c:552:6: warning: 'err' may be used uninitialized 
> in this function [-Wuninitialized]
> drivers/usb/host/xhci-tegra.c:482:6: note: 'err' was declared here
> 
> Signed-off-by: Dongjiu Geng 
> Acked-by: Thierry Reding 
> ---
> How to reproduce:
> 1. make defconfig ARCH=arm
> 2. make -j100 CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm
> 
> Then you can see below warnings:
> drivers/usb/host/xhci-tegra.c: In function 'tegra_xusb_mbox_thread':
> drivers/usb/host/xhci-tegra.c:552:6: warning: 'err' may be used uninitialized 
> in this function [-Wuninitialized]
> drivers/usb/host/xhci-tegra.c:482:6: note: 'err' was declared here
> ---
>  drivers/usb/host/xhci-tegra.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-tegra.c b/drivers/usb/host/xhci-tegra.c
> index a8c1d07..d50549f 100644
> --- a/drivers/usb/host/xhci-tegra.c
> +++ b/drivers/usb/host/xhci-tegra.c
> @@ -481,7 +481,7 @@ static void tegra_xusb_mbox_handle(struct tegra_xusb 
> *tegra,
>   unsigned long mask;
>   unsigned int port;
>   bool idle, enable;
> - int err;
> + int err = 0;
>  
>   memset(&rsp, 0, sizeof(rsp));
>  
> 



Re: [PATCH] arm64: rename the function arm64_is_ras_serror() to avoid confusion

2018-03-17 Thread gengdongjiu
Hi James,

> Hi gengdongjiu,
> 
> On 26/02/18 16:13, gengdongjiu wrote:
> > 2018-02-24 1:58 GMT+08:00 James Morse :
> >> On 22/02/18 18:02, Dongjiu Geng wrote:
> >>> The RAS SError Syndrome can be Implementation-Defined,
> >>> arm64_is_ras_serror() is used to judge whether it is RAS SError, but
> >>> arm64_is_ras_serror() does not include this judgement. In order to
> >>> avoid function name confusion, we rename the arm64_is_ras_serror()
> >>> to arm64_is_categorized_ras_serror(), this function is used to judge
> >>> whether it is categorized RAS Serror.
> >>
> >> I don't see how 'categorized' is relevant. The most significant ISS
> >> bit is used to determine if this is an IMP-DEF ESR, or one that uses the 
> >> architected layout.
> >
> > From the name arm64_is_ras_serror(), it used to judge whether this is
> > RAS Serror, but arm64_is_ras_serror() think the IMP-DEF SError is not
> > RAS SError, as shown the code note and code in[1].
> 
> > In fact the IMP-DEF SError is also RAS SError, so when I read the
> > code, it looks like
> 
> This is just you then. No-one else has your imp-def:RAS error ESR values.
> 
> This would be like me adding some impdef branch instruction, then claiming
> aarch64_insn_is_branch() doesn't take account of my private additions.
> 
> I agree the name is assuming all architected ESR are RAS-errors, and that 
> impdef ESR are just that: impdef, that's all we know about them.
> Unless this causes us to do the wrong thing, I don't think it matters.
> Obviously we would need to change it if a new architected ESR is added.

Ok, let us keep the current code and not change it until a new architected ESR 
is added


Re: [PATCH v10 1/5] arm64: KVM: Prepare set virtual SEI syndrome value

2018-03-17 Thread gengdongjiu
Hi James,

> 
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
> > Export one API to specify virtual SEI syndrome value for guest, and
> > add a helper to get the VSESR_EL2 value.
> 
> This patch adds two helpers that nothing calls... its not big, please merge 
> it with the patch that uses these.

OK, thanks, I will merge them.

> 
> 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index 413dc82..3294885 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -71,6 +71,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, 
> > unsigned long hcr)
> > vcpu->arch.hcr_el2 = hcr;
> >  }
> >
> > +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu) {
> > +   return vcpu->arch.vsesr_el2;
> > +}
> > +
> >  static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
> > {
> > vcpu->arch.vsesr_el2 = vsesr;
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> > b/arch/arm64/include/asm/kvm_host.h
> > index a73f63a..3dc49b7 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -354,6 +354,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu,
> > struct kvm_run *run,  int kvm_perf_init(void);  int
> > kvm_perf_teardown(void);
> >
> > +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
> > +
> >  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long
> > mpidr);
> >
> >  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, diff
> > --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > index 60666a0..78ecb28 100644
> > --- a/arch/arm64/kvm/inject_fault.c
> > +++ b/arch/arm64/kvm/inject_fault.c
> > @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu)  {
> > pend_guest_serror(vcpu, ESR_ELx_ISV);  }
> > +
> > +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome) {
> > +   pend_guest_serror(vcpu, syndrome & ESR_ELx_ISS_MASK);
> 
> If you move the ISS_MASK into pend_guest_serror(), you wouldn't need this at 
> all.
> 
> It would be better if any validation were in the user-space helpers so we can 
> check user-space hasn't put something funny in the top bits.

Yes, I agree your idea, thanks for the good suggestion.

> 
> > +}
> >
> 
> 
> Thanks,
> 
> James


Re: [PATCH v10 3/5] arm/arm64: KVM: Introduce set and get per-vcpu event

2018-03-17 Thread gengdongjiu
Hi James,
   Thanks for your review and good suggestion.

> 
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
> > RAS Extension provides VSESR_EL2 register to specify virtual SError
> > syndrome value, this patch adds a new IOCTL to export user-invisible
> > states related to SError exceptions. User space can setup the
> > kvm_vcpu_events to inject specified SError, also it can support live
> > migration.
> 
> > diff --git a/Documentation/virtual/kvm/api.txt
> > b/Documentation/virtual/kvm/api.txt
> > index 8a3d708..26ae151 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -819,11 +819,13 @@ struct kvm_clock_data {
> >
> >  Capability: KVM_CAP_VCPU_EVENTS
> >  Extended by: KVM_CAP_INTR_SHADOW
> > -Architectures: x86
> > +Architectures: x86, arm, arm64
> >  Type: vm ioctl
> >  Parameters: struct kvm_vcpu_event (out)
> >  Returns: 0 on success, -1 on error
> >
> > +X86:
> > +
> >  Gets currently pending exceptions, interrupts, and NMIs as well as
> > related  states of the vcpu.
> >
> > @@ -865,15 +867,29 @@ Only two fields are defined in the flags field:
> >  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
> >smi contains a valid state.
> >
> > +ARM, ARM64:
> > +
> > +Gets currently pending SError exceptions as well as related states of the 
> > vcpu.
> > +
> > +struct kvm_vcpu_events {
> > +   struct {
> > +   bool serror_pending;
> > +   bool serror_has_esr;
> > +   u64 serror_esr;
> > +   } exception;
> > +};
> 
> Don't put bool in an ABI struct. The encoding is up to the compiler.
> The compiler will insert padding in this struct to make serror_esr naturally 
> aligned. Different compilers may do it differently. You'll see that
> the existing struct kvm_vcpu_events has 'pad' fields to ensure each element 
> in the struct is naturally aligned.

I checked the exited x86 strut kvm_vcpu_events definition, it aligned to 32 
bits, so how about using below kvm_vcpu_events struct definition for arm64?
struct kvm_vcpu_events {
   struct {
   __u8_8 serror_pending;
   __u8 serror_has_esr;
   __u8 pad[2];
   __u64 serror_esr;
   } exception;
};

> 
> serror_pending and serror_has_esr need to be in a flags field.
How about this definition?
struct kvm_vcpu_events {
   struct {
   __u8_8 serror_pending;
   __u8 serror_has_esr;
   __u8 pad[2];
   __u64 serror_esr;
   } exception;
};

> 
> I thought the logic for re-using the CAP was so user-space could re-use 
> save/restore code to transfer whatever we put in here during
> migration. If the struct is a different size the code has to be different 
> anyway.
> My understanding of Drew and Christoffer's comments was that we should re-use 
> the existing struct. (but now that I look at it, its not so
> clear).
> 
> (If we reuse the struct, we can put the esr in exception.error_code, if we 
> can get away with it: It would be good to union exception up with
> a u64, then use that. This would let us transfer anything we need in those 
> RES0 bits of the 64bit VSESR_EL2).

It seems Drew and Christoffer's comments suggested to use the 
KVM_GET/SET_VCPU_EVENTS ABI, not suggested arm64 must use the same struct
kvm_vcpu_events definition with x86. 
> 
> 
> >  4.32 KVM_SET_VCPU_EVENTS
> >
> >  Capability: KVM_CAP_VCPU_EVENTS
> >  Extended by: KVM_CAP_INTR_SHADOW
> > -Architectures: x86
> > +Architectures: x86, arm, arm64
> >  Type: vm ioctl
> >  Parameters: struct kvm_vcpu_event (in)
> >  Returns: 0 on success, -1 on error
> >
> > +X86:
> > +
> >  Set pending exceptions, interrupts, and NMIs as well as related
> > states of the  vcpu.
> >
> > @@ -894,6 +910,12 @@ shall be written into the VCPU.
> >
> >  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
> >
> > +ARM, ARM64:
> > +
> > +Set pending SError exceptions as well as related states of the vcpu.
> > +
> > +See KVM_GET_VCPU_EVENTS for the data structure.
> > +
> >
> >  4.33 KVM_GET_DEBUGREGS
> >
> 
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index 9abbf30..32c0eae 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -39,6 +39,7 @@
> >  #define __KVM_HAVE_GUEST_DEBUG
> >  #define __KVM_HAVE_IRQ_LINE
> >  #define __KVM_HAVE_READONLY_MEM
> > +#define __KVM_HAVE_VCPU_EVENTS
> >
> >  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> >
> > @@ -153,6 +154,15 @@ struct kvm_sync_regs {  struct
> > kvm_arch_memory_slot {  };
> >
> > +/* for KVM_GET/SET_VCPU_EVENTS */
> > +struct kvm_vcpu_events {
> > +   struct {
> > +   bool serror_pending;
> > +   bool serror_has_esr;
> > +   u64 serror_esr;
> > +   } exception;
> > +};
> > +
> 
> >  /* If you need to interpret the index values, here is the key: */
> >  #define KVM_REG_ARM_COPROC_MASK0x00

Re: [PATCH v10 2/5] arm64: KVM: export the capability to set guest SError syndrome

2018-03-18 Thread gengdongjiu
Hi James,
   Thanks for your time to review and give comments.

[...]
> > +
> > +8.14 KVM_CAP_ARM_SET_SERROR_ESR
> > +
> > +Architectures: arm, arm64
> > +
> > +This capability indicates that userspace can specify syndrome value
> > +reported to guest OS when guest takes a virtual SError interrupt exception.
> 
> "when userspace triggers a virtual SError"... how?

In the user space(QEMU), it will call kvm_arch_put_registers() or 
kvm_arch_get_registers() to set or get KVM registers through KVM_SET_ONE_REG/ 
KVM_GET_ONE_REG IOCTL, at the same time the two functions will separately call 
kvm_arm_vcpu_get_events() and kvm_arm_vcpu_set_events() to get/set vcpu events. 
If user space want to trigger a virtual SError with specified ESR, it only need 
to setup the kvm_vcpu_events struct(exception.serror_pending = 1; 
exception.serror_has_esr=1; serror_esr=x), then KVM will trigger this 
virtual SError.

userspace can trigger it at any time, for example, for debug purpose.  Or 
simulate a SError after recording a CPER for guest. But before triggering a 
virtual SError, it needs to know whether KVM has such capability, so KVM needs 
to export this capability to user space. If has this capability, User space 
will call kvm_arm_vcpu_set_events() to trigger a virtual SError.

> 
> 
> > +If KVM has this capability, userspace can only specify the ISS field
> > +for the ESR syndrome, can not specify the EC field which is not under 
> > control by KVM.
> 
> Where do I put the ESR?
> If you re-order this after the patch that adds the API, you can describe how 
> this can be used.

Ok, thank a lot for your suggestion.

> 
> 
> Thanks,
> 
> James
> 
> 
> 
> > +If this virtual SError is taken to EL1 using AArch64, this value will
> > +be reported into ISS filed of ESR_EL1.
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index
> > 3256b92..38c8a64 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
> > long ext)
> > case KVM_CAP_ARM_PMU_V3:
> > r = kvm_arm_support_pmu_v3();
> > break;
> > +   case KVM_CAP_ARM_INJECT_SERROR_ESR:
> > +   r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
> > +   break;
> > case KVM_CAP_SET_GUEST_DEBUG:
> > case KVM_CAP_VCPU_ATTRIBUTES:
> > r = 1;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index
> > 8fb90a0..3587b33 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt {  #define
> > KVM_CAP_S390_AIS_MIGRATION 150  #define KVM_CAP_PPC_GET_CPU_CHAR 151
> > #define KVM_CAP_S390_BPB 152
> > +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> >



Re: [PATCH v10 1/5] arm64: KVM: Prepare set virtual SEI syndrome value

2018-03-16 Thread gengdongjiu
Hi James,
   Thank you very much for this mail and your time to review this patch. 
Appreciate that.
I will check it and reply.


On 2018/3/16 4:37, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 03/03/18 16:09, Dongjiu Geng wrote:
>> Export one API to specify virtual SEI syndrome value
>> for guest, and add a helper to get the VSESR_EL2 value.
> 
> This patch adds two helpers that nothing calls... its not big, please merge it
> with the patch that uses these.
> 
> 
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index 413dc82..3294885 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -71,6 +71,11 @@ static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, 
>> unsigned long hcr)
>>  vcpu->arch.hcr_el2 = hcr;
>>  }
>>  
>> +static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
>> +{
>> +return vcpu->arch.vsesr_el2;
>> +}
>> +
>>  static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
>>  {
>>  vcpu->arch.vsesr_el2 = vsesr;
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index a73f63a..3dc49b7 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -354,6 +354,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct 
>> kvm_run *run,
>>  int kvm_perf_init(void);
>>  int kvm_perf_teardown(void);
>>  
>> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>> +
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>  
>>  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
>> index 60666a0..78ecb28 100644
>> --- a/arch/arm64/kvm/inject_fault.c
>> +++ b/arch/arm64/kvm/inject_fault.c
>> @@ -186,3 +186,8 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu)
>>  {
>>  pend_guest_serror(vcpu, ESR_ELx_ISV);
>>  }
>> +
>> +void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome)
>> +{
>> +pend_guest_serror(vcpu, syndrome & ESR_ELx_ISS_MASK);
> 
> If you move the ISS_MASK into pend_guest_serror(), you wouldn't need this at 
> all.
> 
> It would be better if any validation were in the user-space helpers so we can
> check user-space hasn't put something funny in the top bits.
> 
>> +}
>>
> 
> 
> Thanks,
> 
> James
> 
> .
> 



Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification

2018-04-18 Thread gengdongjiu
James,

> 
>> I do not know when it is merge-window. About the apply version, it does not 
>> have limited.
> 
> 'git fetch' Linus' tree and look at the tags. 'v4.16' lost its '-rc' suffixes,
> and there isn't a 'v4.17-rc1' yet, so we are still in the merge window.
> 
> Linus sends a message to LKML. eg:
> https://lkml.org/lkml/2018/4/1/175
> 
> net-next closes shortly before the merge window, and re-opens afterwards. 
> There
> is a handy web page:
> http://vger.kernel.org/~davem/net-next.html

 Thanks for this information, got it.



Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-21 Thread gengdongjiu
Hi kbuild test robot,

  Thank you.
  The build error is due to "vsesr_el2" is armv8.2 register, I will change 
"vsesr_el2" to sysreg usage


On 2017/3/21 21:51, kbuild test robot wrote:
> Hi Dongjiu,
> 
> [auto build test ERROR on arm64/for-next/core]
> [also build test ERROR on v4.11-rc3 next-20170321]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Dongjiu-Geng/kvm-pass-the-virtual-SEI-syndrome-to-guest-OS/20170321-152433
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
> for-next/core
> config: arm64-allmodconfig (attached as .config)
> compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> wget 
> https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm64 
> 
> All errors (new ones prefixed by >>):
> 
>/tmp/ccWmLqCE.s: Assembler messages:
>>> /tmp/ccWmLqCE.s:677: Error: selected processor does not support system 
>>> register name 'vsesr_el2'
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 



Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-22 Thread gengdongjiu
Hi James,
  Thank you very much for your detailed comment and answer.

On 2017/3/21 21:10, James Morse wrote:
> Hi,
> 
> On 21/03/17 06:32, gengdongjiu wrote:
>> On 2017/3/20 23:08, James Morse wrote:
>>> On 20/03/17 13:58, Marc Zyngier wrote:
>>>> On 20/03/17 12:28, gengdongjiu wrote:
>>>>> On 2017/3/20 19:24, Marc Zyngier wrote:
>>>>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>>>>> In the RAS implementation, hardware pass the virtual SEI
>>>>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>>>>> the guest OS
> 
> (I've juggled the order of your replies:)
> 
>> so for both SEA and SEI, do you prefer to below steps?
>> EL0/EL1 SEI/SEA ---> EL3 firmware first handle --> EL2 hypervisor notify 
>> >
> the Qemu to inject SEI/SEA-->Qemu call KVM API to inject SEA/SEI>KVM >
> inject SEA/SEI to guest OS
> 
> Yes, to expand your EL2 hypervisor notify Qemu step:
> 1 The host should call its APEI code to parse the CPER records.
> 2 User space processes are then notified via SIGBUS (or for rasdaemon, trace
>   points).
> 3 Qemu can take the address delivered via SIGBUS and generate CPER records for
>   the guest. It knows how to convert host addresses to guest IPAs, and it 
> knows
>   where in guest memory to write the CPER records.
> 4 Qemu can then notify the guest via whatever mechanism it advertised via the
>   HEST/GHES table. It might not be the same mechanism that the host received
>   the notification through.
> 
> Steps 1 and 2 are the same even if no guest is running, so we don't have to 
> add
> any special case for KVM. This is existing code that x86 uses.
> We can test the Qemu parts without any firmware support and the APEI path in 
> the
> host and guest is the same.
   here do you mean map host APEI table to guest for steps 1 and 2 test? so 
that the APEI path in the
  host and guest is the same.

> 
> 
>>> Is anyone from Huawei looking at adding RAS support for Qemu?
>>  yes, I am looking at Qemu and want to add RAS support.
> 
> Great, support in Qemu is one of the missing pieces. On x86 it looks like it
> emulates machine-check-exceptions, which is how x86 did this before
> firmware-first and APEI became the standard.
> 
> 
>>  do you mean let Qemu inject both the SEA and SEI?
> 
> To do the notification, yes. It needs to happen after the CPER records have 
> been
> written, and the mechanism and CPER memory location need to match what the 
> guest
> was told via the HEST/GHES table.
> 
> If Qemu didn't tell the guest about firmware-first, it can still deliver the
> guest an SError Interrupt.
> 
> 
> SEA should be possible to do with the KVM_SET_REG API, GPIO/GSIV and the other
> kind of interrupts can use irqfd. For SEI we may need to add an API call to 
> KVM
> to let it pend SError with a specific ESR.
> 
> 
> 
>>> How does this work with firmware first?
> 
>> when the Guest OS triggers an SEI, it will firstly trap to EL3 firmware, El3 
>> firmware records the error
>> info to the APEI table, 
> 
> These are CPER records in a memory area pointed to by one of HEST's GHES 
> entries?
> 
> 
>> then copy the ESR_EL3 ELR_EL3 to ESR_EL2 ELR_EL2 and transfers control to the
>> hypervisor, hypervisor delegates the error exception to EL1 guest
> 
> This is a problem, just because the error occurred while the guest was running
> doesn't mean we should deliver it directly to the guest. Some of these errors
> will be fatal for the CPU and the host should try and power it off to contain
yes, some of error does not need to deliver to guest OS directly. for example 
if the error is guest kernel fault error,
hypervisor can directly power off the whole guest OS

> the fault. For example: CPER's 'micro-architectural error', should the guest
> power-off the vCPU? All that really does is return to the hypervisor, the 
> error
for this example, I think it is better hypervisor directly close the whole 
guest OS, instead of
guest power-off the vCPU.

> hasn't been contained.


> 
> Firmware should handle the error first, then the host, finally the guest via 
> Qemu.
> 
> 
>> OS by setting HCR_EL2.VSE to 1 and pass the virtual SEI syndrome through 
>> vsesr_el2. 
>> The EL1 guest OS check the DISR_EL1 syndrome information to decide to
>> terminate the application, or do some other recovery action. because the 
>> HCR_EL2.AMO is set, so in fact, read
>> DISR_EL1, it returns the VDISR_EL2. and

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-20 Thread gengdongjiu


On 2017/3/20 19:24, Marc Zyngier wrote:
> Please include James Morse on anything RAS related, as he's already
> looking at related patches.
> 
> On 20/03/17 07:55, Dongjiu Geng wrote:
>> In the RAS implementation, hardware pass the virtual SEI
>> syndrome information through the VSESR_EL2, so set the virtual
>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>> the guest OS
>>
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Quanming wu 
>> ---
>>  arch/arm64/Kconfig   |  8 
>>  arch/arm64/include/asm/esr.h |  1 +
>>  arch/arm64/include/asm/kvm_emulate.h | 12 
>>  arch/arm64/include/asm/kvm_host.h|  4 
>>  arch/arm64/kvm/hyp/switch.c  | 15 ++-
>>  arch/arm64/kvm/inject_fault.c| 10 ++
>>  6 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 8c7c244247b6..ea62170a3b75 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -908,6 +908,14 @@ endmenu
>>  
>>  menu "ARMv8.2 architectural features"
>>  
>> +config HAS_RAS_EXTENSION
>> +bool "Support arm64 RAS extension"
>> +default n
>> +help
>> +  Reliability, Availability, Serviceability(RAS; part of the ARMv8.2 
>> Extensions).
>> +
>> +  Selecting this option OS will try to recover the error that RAS 
>> hardware node detected.
>> +
> 
> As this is an architectural extension, this should be controlled by the
> CPU feature mechanism, and not be chosen at compile time. What you have
> here will break horribly when booted on a CPU that doesn't implement RAS.

thanks very much for your review, yes, it is, you are right.

> 
>>  config ARM64_UAO
>>  bool "Enable support for User Access Override (UAO)"
>>  default y
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index d14c478976d0..e38d32b2bdad 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -111,6 +111,7 @@
>>  #define ESR_ELx_COND_MASK   (UL(0xF) << ESR_ELx_COND_SHIFT)
>>  #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
>>  #define ESR_ELx_xVC_IMM_MASK((1UL << 16) - 1)
>> +#define VSESR_ELx_IDS_ISS_MASK((1UL << 25) - 1)
>>  
>>  /* ESR value templates for specific events */
>>  
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index f5ea0ba70f07..20d4da7f5dce 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -148,6 +148,18 @@ static inline u32 kvm_vcpu_get_hsr(const struct 
>> kvm_vcpu *vcpu)
>>  return vcpu->arch.fault.esr_el2;
>>  }
>>  
>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
>> +{
>> +return vcpu->arch.fault.vsesr_el2;
>> +}
>> +
>> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long 
>> val)
>> +{
>> +vcpu->arch.fault.vsesr_el2 = val;
>> +}
>> +#endif
>> +
>>  static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
>>  {
>>  u32 esr = kvm_vcpu_get_hsr(vcpu);
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index e7705e7bb07b..f9e3bb57c461 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -83,6 +83,10 @@ struct kvm_mmu_memory_cache {
>>  };
>>  
>>  struct kvm_vcpu_fault_info {
>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>> +/* Virtual SError Exception Syndrome Register */
>> +u32 vsesr_el2;
>> +#endif
>>  u32 esr_el2;/* Hyp Syndrom Register */
>>  u64 far_el2;/* Hyp Fault Address Register */
>>  u64 hpfar_el2;  /* Hyp IPA Fault Address Register */
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index aede1658aeda..770a153fb6ba 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -86,6 +86,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
>> *vcpu)
>>  isb();
>>  }
>>  write_sysreg(val, hcr_el2);
>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>> +/* If virtual System Error or Asynchronous Abort is pending. set
>> + * the virtual exception syndrome information
>> + */
>> +if (vcpu->arch.hcr_el2 & HCR_VSE)
>> +write_sysreg(vcpu->arch.fault.vsesr_el2, vsesr_el2);
>> +#endif
>>  /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
>>  write_sysreg(1 << 15, hstr_el2);
>>  /*
>> @@ -139,8 +146,14 @@ static void __hyp_text __deactivate_traps(struct 
>> kvm_vcpu *vcpu)
>>   * the crucial bit is "On taking a vSError interrupt,
>>   * HCR_EL2.VSE is cleared to 0."
>>   */
>> -if (vcpu->arch.hcr_el2 & HCR_VSE)
>> +if (vcpu->arch.hcr_el2 & HCR_VSE) {
>>  vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>> +/* set vsesr_el2[24:0] with esr_el2[24:0] */
>> +kvm_vcpu_set_vsesr(vcpu

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-20 Thread gengdongjiu
Hi Marc,
  Thank you very much for your review.


On 2017/3/20 21:58, Marc Zyngier wrote:
> On 20/03/17 12:28, gengdongjiu wrote:
>>
>>
>> On 2017/3/20 19:24, Marc Zyngier wrote:
>>> Please include James Morse on anything RAS related, as he's already
>>> looking at related patches.
>>>
>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>> In the RAS implementation, hardware pass the virtual SEI
>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>> the guest OS
>>>>
>>>> Signed-off-by: Dongjiu Geng 
>>>> Signed-off-by: Quanming wu 
>>>> ---
>>>>  arch/arm64/Kconfig   |  8 
>>>>  arch/arm64/include/asm/esr.h |  1 +
>>>>  arch/arm64/include/asm/kvm_emulate.h | 12 
>>>>  arch/arm64/include/asm/kvm_host.h|  4 
>>>>  arch/arm64/kvm/hyp/switch.c  | 15 ++-
>>>>  arch/arm64/kvm/inject_fault.c| 10 ++
>>>>  6 files changed, 49 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 8c7c244247b6..ea62170a3b75 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -908,6 +908,14 @@ endmenu
>>>>  
>>>>  menu "ARMv8.2 architectural features"
>>>>  
>>>> +config HAS_RAS_EXTENSION
>>>> +  bool "Support arm64 RAS extension"
>>>> +  default n
>>>> +  help
>>>> +Reliability, Availability, Serviceability(RAS; part of the ARMv8.2 
>>>> Extensions).
>>>> +
>>>> +Selecting this option OS will try to recover the error that RAS 
>>>> hardware node detected.
>>>> +
>>>
>>> As this is an architectural extension, this should be controlled by the
>>> CPU feature mechanism, and not be chosen at compile time. What you have
>>> here will break horribly when booted on a CPU that doesn't implement RAS.
>>
>> thanks very much for your review, yes, it is, you are right.
>>
>>>
>>>>  config ARM64_UAO
>>>>bool "Enable support for User Access Override (UAO)"
>>>>default y
>>>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>>>> index d14c478976d0..e38d32b2bdad 100644
>>>> --- a/arch/arm64/include/asm/esr.h
>>>> +++ b/arch/arm64/include/asm/esr.h
>>>> @@ -111,6 +111,7 @@
>>>>  #define ESR_ELx_COND_MASK (UL(0xF) << ESR_ELx_COND_SHIFT)
>>>>  #define ESR_ELx_WFx_ISS_WFE   (UL(1) << 0)
>>>>  #define ESR_ELx_xVC_IMM_MASK  ((1UL << 16) - 1)
>>>> +#define VSESR_ELx_IDS_ISS_MASK((1UL << 25) - 1)
>>>>  
>>>>  /* ESR value templates for specific events */
>>>>  
>>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>>>> b/arch/arm64/include/asm/kvm_emulate.h
>>>> index f5ea0ba70f07..20d4da7f5dce 100644
>>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>>> @@ -148,6 +148,18 @@ static inline u32 kvm_vcpu_get_hsr(const struct 
>>>> kvm_vcpu *vcpu)
>>>>return vcpu->arch.fault.esr_el2;
>>>>  }
>>>>  
>>>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>>>> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
>>>> +{
>>>> +  return vcpu->arch.fault.vsesr_el2;
>>>> +}
>>>> +
>>>> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned 
>>>> long val)
>>>> +{
>>>> +  vcpu->arch.fault.vsesr_el2 = val;
>>>> +}
>>>> +#endif
>>>> +
>>>>  static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
>>>>  {
>>>>u32 esr = kvm_vcpu_get_hsr(vcpu);
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>>> b/arch/arm64/include/asm/kvm_host.h
>>>> index e7705e7bb07b..f9e3bb57c461 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -83,6 +83,10 @@ struct kvm_mmu_memory_cache {
>>>>  };
>>>>  
>>>>  struct kvm_vcpu_fault_info {
>>>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>&

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-20 Thread gengdongjiu


On 2017/3/20 23:08, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 20/03/17 13:58, Marc Zyngier wrote:
>> On 20/03/17 12:28, gengdongjiu wrote:
>>> On 2017/3/20 19:24, Marc Zyngier wrote:
>>>> Please include James Morse on anything RAS related, as he's already
>>>> looking at related patches.
> 
> (Thanks Marc,)
> 
>>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>>> In the RAS implementation, hardware pass the virtual SEI
>>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>>> the guest OS
> 
> How does this work with firmware first?

I explained it in previous mail about the work flow.

when the Guest OS triggers an SEI, it will firstly trap to EL3 firmware, El3 
firmware records the error
info to the APEI table, then copy the ESR_EL3 ELR_EL3 to ESR_EL2 ELR_EL2 and 
transfers control to the
hypervisor, hypervisor delegates the error exception to EL1 guest OS by setting 
HCR_EL2.VSE to 1 and pass the
virtual SEI syndrome through vsesr_el2. The EL1 guest OS check the DISR_EL1 
syndrome information to decide to
terminate the application, or do some other recovery action. because the 
HCR_EL2.AMO is set, so in fact, read
DISR_EL1, it returns the VDISR_EL2. and VDISR_EL2 is loaded from VSESR_EL2, so 
here I pass the virtual SEI
syndrome vsesr_el2.

> If we took a Physical SError Interrupt the CPER records are in the hosts 
> memory.
> To deliver a RAS event to the guest something needs to generate CPER records 
> and
> put them in the guest memory. Only Qemu knows where these memory regions are.
> 
> Put another way, what is the guest expected to do with this SError interrupt?
No, we do not only panic,if it is EL0 application SEI. the OS error recovery
agent will terminate the EL0 application to isolate the error; If it is EL1 
guest
OS SError, guest OS can see whether it can recover. if the error was in a 
read-only file cache buffer, guest OS
can invalidate the page and reload the data from disk.

if all of the above are failed, OS will panic.


> The only choice is panic(). We should send this via Qemu so that we can add
> proper guest RAS support later. Once Qemu has written the CPER records into
> guest memory, it can notify the guest.
> 
> Is anyone from Huawei looking at adding RAS support for Qemu?
 yes, I am looking at Qemu and want to add RAS support.
 do you mean let Qemu inject both the SEA and SEI?

> 
> 
> It looks like we should save/restore VSESR_EL2 as part of the guest CPU state,
> but this needs doing with the cpufeature framework so that the single-image
> kernel works on platforms with and without these features.
 yes, you are right, we will follow cpufeature framework.


> 
> Xie XiuQi's series for SEI also touches the cpufeature framework.
> 
> 
>>>>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>>>>> index aede1658aeda..770a153fb6ba 100644
>>>>> --- a/arch/arm64/kvm/hyp/switch.c
>>>>> +++ b/arch/arm64/kvm/hyp/switch.c
>>>>> @@ -86,6 +86,13 @@ static void __hyp_text __activate_traps(struct 
>>>>> kvm_vcpu *vcpu)
>>>>>   isb();
>>>>>   }
>>>>>   write_sysreg(val, hcr_el2);
>>>>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>>>>> + /* If virtual System Error or Asynchronous Abort is pending. set
>>>>> +  * the virtual exception syndrome information
>>>>> +  */
>>>>> + if (vcpu->arch.hcr_el2 & HCR_VSE)
> 
>>>>> + write_sysreg(vcpu->arch.fault.vsesr_el2, vsesr_el2);
> 
> This won't build with versions of binutils that don't recognise vsesr_el2.
> Is there another patch out there that adds a sysreg definition for vsesr_el2?
> 
> 
>>>>> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
>>>>> index da6a8cfa54a0..08a13dfe28a8 100644
>>>>> --- a/arch/arm64/kvm/inject_fault.c
>>>>> +++ b/arch/arm64/kvm/inject_fault.c
>>>>> @@ -242,4 +242,14 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
>>>>>  void kvm_inject_vabt(struct kvm_vcpu *vcpu)
>>>>>  {
>>>>>   vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
>>>>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>>>>> + /* If virtual System Error or Asynchronous Abort is set. set
>>>>> +  * the virtual exception syndrome information
>>>>> +  */
>>>>> + kvm_vcpu_set_vsesr(vcpu, ((kvm_vcpu_get_vsesr(vcpu)
>>>>> + & (~VSESR_ELx_IDS_IS

Re: [PATCH] arm/arm64: KVM: send SIGBUS error to qemu

2017-03-24 Thread gengdongjiu
Hi James,
   thanks for your review.

On 2017/3/23 23:06, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 23/03/17 13:01, Dongjiu Geng wrote:
>> when the pfn is KVM_PFN_ERR_HWPOISON, it indicates to send
>> SIGBUS signal from KVM's fault-handling code to qemu, qemu
>> can handle this signal according to the fault address.
> 
> I'm afraid I beat you to it on this one:
> https://www.spinics.net/lists/arm-kernel/msg568919.html
> 
> (Are you the same gengdj who ask me to post that patch?:
>  https://lkml.org/lkml/2017/3/5/187 )

Oh, yes, it is me. recently I do not check my gmail and think you are not reply 
mail.
it is great that you upstream this patch.

> 
> We don't need upstream KVM to do this until either arm or arm64 has
> ARCH_SUPPORTS_MEMORY_FAILURE. Punit and Tyler have discovered problems with 
> the
> way arm64's hugepage and hwpoison interact:
> https://www.spinics.net/lists/arm-kernel/msg568995.html
  ok, thanks James. do you know when the arm or arm64 will have 
ARCH_SUPPORTS_MEMORY_FAILURE?
> 
> 
> Some comments on the differences:
> 
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1307ec400de3 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1237,6 +1237,20 @@ static void coherent_cache_guest_page(struct kvm_vcpu 
>> *vcpu, kvm_pfn_t pfn,
>>  __coherent_cache_guest_page(vcpu, pfn, size);
>>  }
>>  
>> +static void kvm_send_hwpoison_signal(unsigned long address,
>> +struct task_struct *tsk)
>> +{
>> +siginfo_t info;
>> +
>> +info.si_signo   = SIGBUS;
>> +info.si_errno   = 0;
>> +info.si_code= BUS_MCEERR_AR;
>> +info.si_addr= (void __user *)address;
>> +info.si_addr_lsb = PAGE_SHIFT;
> 
> Any version of this patch should handle hugepage for the sizes KVM uses in its
> stage2 mappings. By just passing PAGE_SHIFT you let the guest fault for each
> page that makes up the hugepage.
> 
> 
>> +
>> +send_sig_info(SIGBUS, &info, tsk);
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>struct kvm_memory_slot *memslot, unsigned long hva,
>>unsigned long fault_status)
>> @@ -1309,6 +1323,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  if (is_error_noslot_pfn(pfn))
>>  return -EFAULT;
>>  
>> +if (is_error_hwpoison_pfn(pfn)) {
>> +kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn),
>> +current);
>> +return -EFAULT;
> 
> This will return -EFAULT from the KVM_RUN ioctl(). Is Qemu expected to know it
> should try again? This is indistinguishable from the is_error_noslot_pfn() 
> error
> above.
> 
> x86 returns 0 from this path, kvm_handle_bad_page() in arch/x86/kvm/mmu.c as 
> the
> SIGBUS should arrive first. If the SIGBUS is handled the error has been 
> resolved
> and Qemu can call KVM_RUN again. Returning an error and sending SIGBUS 
> suggests
> there are two problems.
  thanks for that. I think your Statement is reasonable.
> 
> 
>> +}
>> +
>>  if (kvm_is_device_pfn(pfn)) {
>>  mem_type = PAGE_S2_DEVICE;
>>  flags |= KVM_S2PTE_FLAG_IS_IOMAP;
> 
> 
> 
> Thanks,
> 
> James
> 
> 
> .
> 



[PATCH] irqdomain: handle the per-CPU irq trigger type settings

2017-03-09 Thread gengdongjiu
when devices parse and map an per-cpu interrupt into linux virq space
using irq_of_parse_and_map API, it will always be failed if needs to set
the specified irq trigger type, because irq_set_irq_type is only for 1-N
mode interrupt source, not for per-cpu interrupt source. so handle per-cpu
IRQs for this failure.

Signed-off-by: Dongjiu Geng 
   Zhanghai bin 

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 9fd618d..8116cf2 100755
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -542,8 +542,16 @@ unsigned int irq_create_of_mapping(struct of_phandle_args 
*irq_data)

/* Set type if specified and different than the current one */
if (type != IRQ_TYPE_NONE &&
-   type != irq_get_trigger_type(virq))
-   irq_set_irq_type(virq, type);
+   type != irq_get_trigger_type(virq)) {
+   int ret = 0;
+   struct irq_data *irq_data = irq_get_irq_data(virq);
+
+   ret = irq_set_irq_type(virq, type);
+
+/* Handle per-cpu IRQ: just save type in irq_data */
+   if (-EINVAL == ret && irq_data)
+   irqd_set_trigger_type(irq_data, type);
+   }
return virq;
 }
 EXPORT_SYMBOL_GPL(irq_create_of_mapping);



Re: [PATCH] irqdomain: handle the per-CPU irq trigger type settings

2017-03-10 Thread gengdongjiu
Hi Gleixner,
  Thank you very much for your comment and review, I will update it later.

> 
> 
> 
> On Fri, 10 Mar 2017, gengdongjiu wrote:
> 
>> when devices parse and map an per-cpu interrupt into linux virq space
>> using irq_of_parse_and_map API, it will always be failed if needs to set
>> the specified irq trigger type, because irq_set_irq_type is only for 1-N
>> mode interrupt source, not for per-cpu interrupt source. so handle 
>> per-cpu
>> IRQs for this failure.
> 
> Please format your changelogs proper into sections:
> 
> 1) Context
> 
> 2) Problem
> 
> 3) Solution
> 
> Writing one big lump of a sentence is just unreadable.
> 
> Aside of that:
> 
> 1) No indentation of the changelog
> 2) Sentences start with upper case letters
> 3) Function references want () after the function name
> 
> Aside
> 
>>
>> Signed-off-by: Dongjiu Geng 
>>Zhanghai bin 
> 
> That SOB is bogus.
> 
>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>> index 9fd618d..8116cf2 100755
>> --- a/kernel/irq/irqdomain.c
>> +++ b/kernel/irq/irqdomain.c
>> @@ -542,8 +542,16 @@ unsigned int irq_create_of_mapping(struct 
>> of_phandle_args *irq_data)
>>
>> /* Set type if specified and different than the current one */
>> if (type != IRQ_TYPE_NONE &&
>> -   type != irq_get_trigger_type(virq))
>> -   irq_set_irq_type(virq, type);
>> +   type != irq_get_trigger_type(virq)) {
>> +   int ret = 0;
>> +   struct irq_data *irq_data = irq_get_irq_data(virq);
>> +
>> +   ret = irq_set_irq_type(virq, type);
>> +
>> +/* Handle per-cpu IRQ: just save type in irq_data */
>> +   if (-EINVAL == ret && irq_data)
>> +   irqd_set_trigger_type(irq_data, type);
> 
> This is completely broken. That stores a trigger type for any interrupt if
> the set type function fails.
> 
> You fail to explain
> 
> - WHY irq_set_irq_type() fails for these per cpu interrupts
> 
> - WHY storing the type in irqdata solves anything
> 
> - WHAT sets the type in the actual interrupt hardware
> 
> That information should be in the changelog 
> 
> Thanks,
> 
> tglx
> 
> .
> 



Re: [PATCH V11 10/10] arm/arm64: KVM: add guest SEA support

2017-03-05 Thread gengdongjiu
Hi James,


> Hi Wang Xiongfeng,
>
> On 25/02/17 07:15, Xiongfeng Wang wrote:
>> On 2017/2/22 5:22, Tyler Baicar wrote:
>>> Currently external aborts are unsupported by the guest abort
>>> handling. Add handling for SEAs so that the host kernel reports
>>> SEAs which occur in the guest kernel.
>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index a5265ed..04f1dd50 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
>>> struct kvm_run *run)
>>>
>>>  /* Check the stage-2 fault is trans. fault or write fault */
>>>  fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>>> -if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>>> -fault_status != FSC_ACCESS) {
>>> +
>>> +/* The host kernel will handle the synchronous external abort. There
>>> + * is no need to pass the error into the guest.
>>> + */
>
>> Can we inject an sea into the guest, so that the guest can kill the
>> application which causes the error if the guest won't be terminated
>> later. I'm not sure whether ghes_handle_memory_failure() called in
>> ghes_do_proc() will kill the qemu process. I think it only kill user
>> processes marked with PF_MCE_PROCESS & PF_MCE_EARLY.
>
> My understanding is the pages will get unmapped and recovered where possible
> (e.g. re-read from disk), the user space process will get SIGBUS/SIGSEV when 
> it
> next tries to access that page, which could be some time later.
> These flags in find_early_kill_thread() are a way to make the memory-failure
> code signal the process early, before it does any recovery. The 'MCE' makes me
> think its x86 specific.
> (early and late are described more in [0])
>
>
> Guests are a special case as QEMU may never access the faulty memory itself, 
> so
> it won't receive the 'late' signal. It looks like ARM/arm64 KVM lacks support
> for KVM_PFN_ERR_HWPOISON which sends SIGBUS from KVM's fault-handling code. I
> have patches to add support for this which I intend to send at rc1.

could you push this patch to opensource?


>
> [0] suggests 'KVM qemu' sets these MCE flags to take the 'early' path, but 
> given
> x86s KVM_PFN_ERR_HWPOISON, this may be out of date.
>
>
> Either way, once QEMU gets a signal indicating the virtual address, it can
> generate its own APEI CPER records and use the KVM APIs to mock up an
> Synchronous External Abort, (or inject an IRQ or run the vcpu waiting for the
> guest's polling thread to come round, whichever was described to the guest via
> the HEST/GHES tables).
>
> We can't hand the APEI CPER records we have in the kernel to the guest, as 
> they
> hold a host physical address, and maybe a host virtual address. We don't know
> where in guest memory we could write new APEI CPER records as these locations
> have to be reserved in the guests-UEFI memory map, and only QEMU knows where
> they are.
>
> To deliver RAS events to a guest we have to get QEMU involved.
>
>
> Thanks,
>
> James
>
>
> [0] https://www.kernel.org/doc/Documentation/vm/hwpoison.txt
>


Re: [PATCH V11 10/10] arm/arm64: KVM: add guest SEA support

2017-02-27 Thread gengdongjiu
@@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu
*vcpu, struct kvm_run *run)

/* Check the stage-2 fault is trans. fault or write fault */
fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
-   if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
-   fault_status != FSC_ACCESS) {
+
+   /* The host kernel will handle the synchronous external abort. There
+* is no need to pass the error into the guest.
+*/
+   if (fault_status == FSC_EXTABT) {
+   if(handle_guest_sea((unsigned long)fault_ipa,
+   kvm_vcpu_get_hsr(vcpu))) {
+   kvm_err("Failed to handle guest SEA, FSC:
EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
+   kvm_vcpu_trap_get_class(vcpu),
+   (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+   (unsigned long)kvm_vcpu_get_hsr(vcpu));
+   return -EFAULT;
+   }
+   } else if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
+  fault_status != FSC_ACCESS) {
kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
kvm_vcpu_trap_get_class(vcpu),
(unsigned long)kvm_vcpu_trap_get_fault(vcpu),



if the error is SEA and we want to inject the sea to guest OK, after
finish the handle, whether we need to directly return? instead of
continuation? as shown below:

   if (fault_status == FSC_EXTABT) {
   if(handle_guest_sea((unsigned long)fault_ipa,
   kvm_vcpu_get_hsr(vcpu))) {
   kvm_err("Failed to handle guest SEA, FSC:
EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
   kvm_vcpu_trap_get_class(vcpu),
   (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
   (unsigned long)kvm_vcpu_get_hsr(vcpu));
   return -EFAULT;
  } else
   return 1;






2017-02-24 18:42 GMT+08:00 James Morse :
> Hi Tyler,
>
> On 21/02/17 21:22, Tyler Baicar wrote:
>> Currently external aborts are unsupported by the guest abort
>> handling. Add handling for SEAs so that the host kernel reports
>> SEAs which occur in the guest kernel.
>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index e22089f..33a77509 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -187,6 +187,7 @@
>>  #define FSC_FAULT(0x04)
>>  #define FSC_ACCESS   (0x08)
>>  #define FSC_PERM (0x0c)
>> +#define FSC_EXTABT   (0x10)
>
> arm64 has ESR_ELx_FSC_EXTABT which is used in inject_abt64(), but for matching
> an external abort coming from hardware the range is wider.
>
> Looking at the ARM-ARMs 'ISS encoding for an exception from an Instruction
> Abort' in 'D7.2.27 ESR_ELx, Exception Syndrome Register (ELx)' (page D7-1954 
> of
> version 'k'...iss10775), the ten flavours of you Synchronous abort you hooked
> with do_sea() in patch 4 occupy 0x10 to 0x1f...
>
>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index a5265ed..04f1dd50 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -29,6 +29,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "trace.h"
>>
>> @@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
>> struct kvm_run *run)
>>
>>   /* Check the stage-2 fault is trans. fault or write fault */
>>   fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>
> ... kvm_vcpu_trap_get_fault_type() on both arm and arm64 masks the HSR/ESR_EL2
> with 0x3c ...
>
>
>> - if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>> - fault_status != FSC_ACCESS) {
>> +
>> + /* The host kernel will handle the synchronous external abort. There
>> +  * is no need to pass the error into the guest.
>> +  */
>> + if (fault_status == FSC_EXTABT) {
>
> ... but here we only check for 'Synchronous external abort, not on a 
> translation
> table walk'. Are the other types relevant?
>
> If so we need some helper as this range is sparse and 'all other values are
> reserved'. The aarch32 HSR format is slightly different. (G6-4411 ISS encoding
> from an exception from a Data Abort).
>
> If not, can we change patch 4 to check this type too so we don't call out to
> APEI for a fault type we know isn't relevant.
>
>
>> + if(handle_guest_sea((unsigned long)fault_ipa,
>> + kvm_vcpu_get_hsr(vcpu))) {
>> + kvm_err("Failed to handle guest SEA, FSC: EC=%#x 
>> xFSC=%#lx ESR_EL2=%#lx\n",
>> + kvm_vcpu_trap_get_class(vcpu),
>> + (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
>> + (unsigned long)kvm_vcpu_get_hsr(vcpu));
>> + return -EFAULT;
>> +  

RE:[PATCH] sched: Add trace for task wake up latency and leave running time

2020-09-02 Thread gengdongjiu
Hi Peter,
Sorry for the late response.

> 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h index
> > 93ecd930efd3..edb622c40a90 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1324,6 +1324,13 @@ struct task_struct {
> > /* CPU-specific state of this task: */
> > struct thread_structthread;
> >
> > +   /* Task wake up time stamp */
> > +   u64 ts_wakeup;
> > +   /* Previous task switch out time stamp */
> > +   u64 pre_ts_end;
> > +   /* Next task switch in time stamp */
> > +   u64 next_ts_start;
> > +   boolwakeup_state;
> > /*
> >  * WARNING: on x86, 'thread_struct' contains a variable-sized
> >  * structure.  It *MUST* be at the end of 'task_struct'.
> 
> ^^^ did you read that comment?
   Sorry for my carelessness.

> 
> > diff --git a/include/trace/events/sched.h
> > b/include/trace/events/sched.h index fec25b9cfbaf..e99c6d573a42 100644
> > --- a/include/trace/events/sched.h
> > +++ b/include/trace/events/sched.h
> > @@ -183,6 +183,72 @@ TRACE_EVENT(sched_switch,
> > __entry->next_comm, __entry->next_pid, __entry->next_prio)  );
> >
> > +DECLARE_EVENT_CLASS(sched_latency_template,
> > +
> > +   TP_PROTO(bool preempt,
> > +struct task_struct *prev,
> > +struct task_struct *next,
> > +u64 time),
> > +
> > +   TP_ARGS(preempt, prev, next, time),
> > +
> > +   TP_STRUCT__entry(
> > +   __array(char,   prev_comm,  TASK_COMM_LEN   )
> > +   __field(pid_t,  prev_pid)
> > +   __field(int,prev_prio   )
> > +   __field(long,   prev_state  )
> > +   __array(char,   next_comm,  TASK_COMM_LEN   )
> > +   __field(pid_t,  next_pid)
> > +   __field(int,next_prio   )
> > +   __field(u64,time)
> > +   ),
> > +
> > +   TP_fast_assign(
> > +   memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> > +   __entry->prev_pid   = prev->pid;
> > +   __entry->prev_prio  = prev->prio;
> > +   __entry->prev_state = __trace_sched_switch_state(preempt, 
> > prev);
> > +   memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
> > +   __entry->next_pid   = next->pid;
> > +   __entry->next_prio  = next->prio;
> > +   __entry->time   = time;
> > +   /* XXX SCHED_DEADLINE */
> > +   ),
> > +
> > +   TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> 
> > next_comm=%s next_pid=%d next_prio=%d passed
> time=%llu (ns)",
> > +   __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
> > +
> > +   (__entry->prev_state & (TASK_REPORT_MAX - 1)) ?
> > + __print_flags(__entry->prev_state & (TASK_REPORT_MAX - 1), 
> > "|",
> > +   { TASK_INTERRUPTIBLE, "S" },
> > +   { TASK_UNINTERRUPTIBLE, "D" },
> > +   { __TASK_STOPPED, "T" },
> > +   { __TASK_TRACED, "t" },
> > +   { EXIT_DEAD, "X" },
> > +   { EXIT_ZOMBIE, "Z" },
> > +   { TASK_PARKED, "P" },
> > +   { TASK_DEAD, "I" }) :
> > + "R",
> > +
> > +   __entry->prev_state & TASK_REPORT_MAX ? "+" : "",
> > +   __entry->next_comm, __entry->next_pid, __entry->next_prio,
> > +   __entry->time)
> > +);
> 
> NAK, that tracepoint is already broken, we don't want to proliferate the 
> broken.
  
Sorry, What the meaning that tracepoint is already broken? 
Maybe I need to explain the reason that why I add two trace point. 
when using perf tool or Ftrace sysfs to capture the task wake-up latency and 
the task leaving running queue time, usually the trace data is too large and 
the CPU utilization rate is too high in the process due to a lot of disk write. 
Sometimes even the disk is full, the issue still does not reproduced that above 
two time exceed a certain threshold.  So I added two trace points, using filter 
we can only record the abnormal trace that includes wakeup latency and leaving 
running time larger than an threshold. 
Or do you have better solution?

> 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c index
> > 8471a0f7eb32..b5a1928dc948 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2464,6 +2464,8 @@ static void ttwu_do_wakeup(struct rq *rq, struct
> > task_struct *p, int wake_flags,  {
> > check_preempt_curr(rq, p, wake_flags);
> > p->state = TASK_RUNNING;
> > +   p->ts_wakeup = local_clock();
> > +   p->wakeup_state = true;
> > trace_sched_wakeup(p);
> 

Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-27 Thread gengdongjiu
Hi Tyler,

I have a question for below code.

On 2017/3/25 0:01, Christoffer Dall wrote:
>   is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> - if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
> + if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
>   kvm_inject_vabt(vcpu);
when it is SEA synchronized abort, why here inject a asynchronous abort through 
kvm_inject_vabt(vcpu) instead of synchronized abort? seem the original kvm 
source code is also do that, so I am confused about that.
thanks .

>   return 1;
>   }



Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread gengdongjiu
Hi,

On 2017/3/22 6:47, Tyler Baicar wrote:
> + fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +
> + /* The host kernel will handle the synchronous external abort. There
> +  * is no need to pass the error into the guest.
> +  */
> + if (is_abort_synchronous(fault_status))
> + sea_status = handle_guest_sea((unsigned long)fault_ipa,
> + kvm_vcpu_get_hsr(vcpu));
>  
>   is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> - if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
> + if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
>   kvm_inject_vabt(vcpu);
>   return 1;
>   }
   After the host kernel correctly handle the synchronous external abort, 
the sea_status
   will return 0, so the code logical will be continue go-no, whether it is 
better directly return
   after correctly handle the SEA? such as below.

if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
kvm_inject_vabt(vcpu);
return 1;
} else
return 1;

>  
> - fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> -
>   trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
> kvm_vcpu_get_hfar(vcpu), fault_ipa);
>  
> - /* Check the stage-2 fault is trans. fault or write fault */
> - fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>   if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&



Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-28 Thread gengdongjiu
Hi all,

On 2017/3/28 19:54, Achin Gupta wrote:
> On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
>>> Hi Christoffer,
>>>
>>> (CC: Leif and Achin who know more about how UEFI fits into this picture)
>>>
>>> On 21/03/17 19:39, Christoffer Dall wrote:
>>>> On Tue, Mar 21, 2017 at 07:11:44PM +, James Morse wrote:
>>>>> On 21/03/17 11:34, Christoffer Dall wrote:
>>>>>> On Tue, Mar 21, 2017 at 02:32:29PM +0800, gengdongjiu wrote:
>>>>>>> On 2017/3/20 23:08, James Morse wrote:
>>>>>>>>>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>>>>>>>>>> In the RAS implementation, hardware pass the virtual SEI
>>>>>>>>>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>>>>>>>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>>>>>>>>>> the guest OS
>>>>>>>>
>>>>>>>> How does this work with firmware first?
>>>>>>>
>>>>>>> I explained it in previous mail about the work flow.
>>>>>>
>>>>>> When delivering and reporting SEIs to the VM, should this happen
>>>>>> directly to the OS running in the VM, or to the guest firmware (e.g.
>>>>>> UEFI) running in the VM as well?
>>>>>
>>>>> 'firmware first' is the ACPI specs name for x86's BIOS or management-mode
>>>>> handling the error. On arm64 we have multiple things called firmware, so 
>>>>> the
>>>>> name might be more confusing than helpful.
>>>>>
>>>>> As far as I understand it, firmware here refers to the secure-world and 
>>>>> EL3.
>>>>> Something like ATF can use SCR_EL3.EA to claim SErrors and external 
>>>>> aborts,
>>>>> routing them to EL3 where secure platform specific firmware generates 
>>>>> CPER records.
>>>>> For a guest, Qemu takes the role of this EL3-firmware.
> 
> +1
> 
>>>>>
>>>> Thanks for the clarification.  So UEFI in the VM would not be involved
>>>> in this at all?
>>>
>>> On the host, part of UEFI is involved to generate the CPER records.
>>> In a guest?, I don't know.
>>> Qemu could generate the records, or drive some other component to do it.
>>
>> I think I am beginning to understand this a bit.  Since the guet UEFI
>> instance is specifically built for the machine it runs on, QEMU's virt
>> machine in this case, they could simply agree (by some contract) to
>> place the records at some specific location in memory, and if the guest
>> kernel asks its guest UEFI for that location, things should just work by
>> having logic in QEMU to process error reports and populate guest memory.
>>
>> Is this how others see the world too?
> 
> I think so!
> 
> AFAIU, the memory where CPERs will reside should be specified in a GHES entry 
> in
> the HEST. Is this not the case with a guest kernel i.e. the guest UEFI 
> creates a
> HEST for the guest Kernel?
> 
> If so, then the question is how the guest UEFI finds out where QEMU (acting as
> EL3 firmware) will populate the CPERs. This could either be a contract between
> the two or a guest DXE driver uses the MM_COMMUNICATE call (see [1]) to ask 
> QEMU
> where the memory is.

whether invoke the guest UEFI will be complex? not see the advantage. it seems 
x86 Qemu directly generate the ACPI table, but I am not sure, we are checking 
the qemu logical.
let Qemu generate CPER record may be clear.

when qemu can take the address delivered via SIGBUS and generate CPER records 
for
the guest, seems the CPER table is simple because there is only one address 
passed by SIGBUS.



> 
> This is the way I expect it to work at the EL3/EL2 boundary. So I am
> extrapolating it to the guest/hypervisor boundary. Do shout if I am missing
> anything.
> 
> hth,
> Achin
> 
>>
>>>
>>> Leif and Achin are the people with the UEFI/bigger picture.
>>>
>>>
>>
>> Thanks!
>> -Christoffer
> 
> [1] 
> http://infocenter.arm.com/help/topic/com.arm.doc.den0060a/DEN0060A_ARM_MM_Interface_Specification.pdf
> 
> .
> 



Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-29 Thread gengdongjiu

Hi Laszlo/Biesheuvel/Qemu developer,

   Now I encounter a issue and want to consult with you in ARM64 platform, as 
described below:

   when guest OS happen synchronous or asynchronous abort, kvm needs to send 
the error address to Qemu or UEFI through sigbus to dynamically generate APEI 
table. from my investigation, there are two ways:

   (1) Qemu get the error address, and generate the APEI table, then notify 
UEFI to know this generation, then inject abort error to guest OS, guest OS 
read the APEI table.
   (2) Qemu get the error address, and let UEFI to generate the APEI table, 
then inject abort error to guest OS, guest OS read the APEI table.


   Do you think which modules generates the APEI table is better? UEFI or Qemu?




On 2017/3/28 21:40, James Morse wrote:
> Hi gengdongjiu,
> 
> On 28/03/17 13:16, gengdongjiu wrote:
>> On 2017/3/28 19:54, Achin Gupta wrote:
>>> On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
>>>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
>>>>> On the host, part of UEFI is involved to generate the CPER records.
>>>>> In a guest?, I don't know.
>>>>> Qemu could generate the records, or drive some other component to do it.
>>>>
>>>> I think I am beginning to understand this a bit.  Since the guet UEFI
>>>> instance is specifically built for the machine it runs on, QEMU's virt
>>>> machine in this case, they could simply agree (by some contract) to
>>>> place the records at some specific location in memory, and if the guest
>>>> kernel asks its guest UEFI for that location, things should just work by
>>>> having logic in QEMU to process error reports and populate guest memory.
>>>>
>>>> Is this how others see the world too?
>>>
>>> I think so!
>>>
>>> AFAIU, the memory where CPERs will reside should be specified in a GHES 
>>> entry in
>>> the HEST. Is this not the case with a guest kernel i.e. the guest UEFI 
>>> creates a
>>> HEST for the guest Kernel?
>>>
>>> If so, then the question is how the guest UEFI finds out where QEMU (acting 
>>> as
>>> EL3 firmware) will populate the CPERs. This could either be a contract 
>>> between
>>> the two or a guest DXE driver uses the MM_COMMUNICATE call (see [1]) to ask 
>>> QEMU
>>> where the memory is.
>>
>> whether invoke the guest UEFI will be complex? not see the advantage. it 
>> seems x86 Qemu
>> directly generate the ACPI table, but I am not sure, we are checking the qemu
> logical.
>> let Qemu generate CPER record may be clear.
> 
> At boot UEFI in the guest will need to make sure the areas of memory that may 
> be
> used for CPER records are reserved. Whether UEFI or Qemu decides where these 
> are
> needs deciding, (but probably not here)...
> 
> At runtime, when an error has occurred, I agree it would be simpler (fewer
> components involved) if Qemu generates the CPER records. But if UEFI made the
> memory choice above they need to interact and it gets complicated again. The
> CPER records are defined in the UEFI spec, so I would expect UEFI to contain
> code to generate/parse them.
> 
> 
> Thanks,
> 
> James
> 
> 
> .
> 



Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-29 Thread gengdongjiu
Hi Achin,
  Thanks for your mail and  answer.

2017-03-29 18:36 GMT+08:00, Achin Gupta :
> Hi gengdongjiu,
>
> On Wed, Mar 29, 2017 at 05:36:37PM +0800, gengdongjiu wrote:
>>
>> Hi Laszlo/Biesheuvel/Qemu developer,
>>
>>Now I encounter a issue and want to consult with you in ARM64 platform,
>> as described below:
>>
>>when guest OS happen synchronous or asynchronous abort, kvm needs to
>> send the error address to Qemu or UEFI through sigbus to dynamically
>> generate APEI table. from my investigation, there are two ways:
>>
>>(1) Qemu get the error address, and generate the APEI table, then
>> notify UEFI to know this generation, then inject abort error to guest OS,
>> guest OS read the APEI table.
>>(2) Qemu get the error address, and let UEFI to generate the APEI
>> table, then inject abort error to guest OS, guest OS read the APEI table.

The description may be not precise, I update it

   (1) Qemu get the error address, and generate the CPER table, then
notify guest UEFI to place this CPER table to Guest OS memory, then
Qemu let KVM inject abort error to guest OS, guest OS read the CPER
table.

 (2) Qemu get the error address, and let guest UEFI to directly
generate the CPER
table and place this table to the guest OS memory, not let Qemu gerate
it. then KVM inject abort error to guest OS, guest OS read the CPER
table.

>
> Just being pedantic! I don't think we are talking about creating the APEI
> table
> dynamically here. The issue is: Once KVM has received an error that is
> destined
> for a guest it will raise a SIGBUS to Qemu. Now before Qemu can inject the
> error
> into the guest OS, a CPER (Common Platform Error Record) has to be
> generated
> corresponding to the error source (GHES corresponding to memory subsystem,
> processor etc) to allow the guest OS to do anything meaningful with the
> error. So who should create the CPER is the question.
>
> At the EL3/EL2 interface (Secure Firmware and OS/Hypervisor), an error
> arrives
> at EL3 and secure firmware (at EL3 or a lower secure exception level) is
> responsible for creating the CPER. ARM is experimenting with using a
> Standalone
> MM EDK2 image in the secure world to do the CPER creation. This will avoid
> adding the same code in ARM TF in EL3 (better for security). The error will
> then
> be injected into the OS/Hypervisor (through SEA/SEI/SDEI) through ARM
> Trusted
> Firmware.
>
> Qemu is essentially fulfilling the role of secure firmware at the EL2/EL1
> interface (as discussed with Christoffer below). So it should generate the
> CPER
> before injecting the error.
>
> This is corresponds to (1) above apart from notifying UEFI (I am assuming
> you
> mean guest UEFI). At this time, the guest OS already knows where to pick up
> the
> CPER from through the HEST. Qemu has to create the CPER and populate its
> address
> at the address exported in the HEST. Guest UEFI should not be involved in
> this
> flow. Its job was to create the HEST at boot and that has been done by this
> stage.

 Sorry,  As I understand it, after Qemu generate the CPER table, it
should pass the CPER table to the guest UEFI, then Guest UEFI  place
this CPER table to the guest OS memory. In this flow, the Guest UEFI
should be involved, else the Guest OS can not see the CPER table.

>
> Qemu folk will be able to add but it looks like support for CPER generation
> will
> need to be added to Qemu. We need to resolve this.
>
> Do shout if I am missing anything above.
>
> cheers,
> Achin
>
>
>>
>>
>>Do you think which modules generates the APEI table is better? UEFI or
>> Qemu?
>>
>>
>>
>>
>> On 2017/3/28 21:40, James Morse wrote:
>> > Hi gengdongjiu,
>> >
>> > On 28/03/17 13:16, gengdongjiu wrote:
>> >> On 2017/3/28 19:54, Achin Gupta wrote:
>> >>> On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
>> >>>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
>> >>>>> On the host, part of UEFI is involved to generate the CPER records.
>> >>>>> In a guest?, I don't know.
>> >>>>> Qemu could generate the records, or drive some other component to do
>> >>>>> it.
>> >>>>
>> >>>> I think I am beginning to understand this a bit.  Since the guet
>> >>>> UEFI
>> >>>> instance is specifically built for the machine it runs on, QEMU's
>> >>>> virt
>> >>>> machine in this case, they could simply agree (by some contract) to
>> >>>> place the records at 

  1   2   3   >