[PATCH] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected

2024-11-14 Thread Yabin Cui
Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
selected.

On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
Experiments on Android show 4% improvement in cold app startup time
and 13% improvement in binder benchmarks.

Signed-off-by: Yabin Cui 
---
 Documentation/dev-tools/autofdo.rst | 18 +-
 arch/arm64/Kconfig  |  1 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/autofdo.rst 
b/Documentation/dev-tools/autofdo.rst
index 1f0a451e9ccd..f0952e3e8490 100644
--- a/Documentation/dev-tools/autofdo.rst
+++ b/Documentation/dev-tools/autofdo.rst
@@ -55,7 +55,7 @@ process consists of the following steps:
workload to gather execution frequency data. This data is
collected using hardware sampling, via perf. AutoFDO is most
effective on platforms supporting advanced PMU features like
-   LBR on Intel machines.
+   LBR on Intel machines, ETM traces on ARM machines.
 
 #. AutoFDO profile generation: Perf output file is converted to
the AutoFDO profile via offline tools.
@@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
 
   $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b 
-c  -o  -- 
 
+   - For ARM platforms:
+
+ Follow the instructions in the `Linaro OpenCSD document
+ 
https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
+ to record ETM traces for AutoFDO::
+
+  $ perf record -e cs_etm/@tmc_etr0/k -a -o  -- 
+  $ perf inject -i  -o  --itrace=i59il
+
+ For ARM platforms running Android, follow the instructions in the
+ `Android simpleperf document
+ 
`_
+ to record ETM traces for AutoFDO::
+
+  $ simpleperf record -e cs-etm:k -a -o  -- 
+
 4) (Optional) Download the raw perf file to the host machine.
 
 5) To generate an AutoFDO profile, two offline tools are available:
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd9df6dcc593..c3814df5e391 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -103,6 +103,7 @@ config ARM64
select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_RT
+   select ARCH_SUPPORTS_AUTOFDO_CLANG
select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
select ARCH_WANT_DEFAULT_BPF_JIT
-- 
2.47.0.338.g60cca15819-goog




Re: [PATCH v8 24/29] riscv: enable kernel access to shadow stack memory via FWFT sbi call

2024-11-14 Thread Nick Hu
Hi Deepak

On Thu, Nov 14, 2024 at 11:50 PM Deepak Gupta  wrote:
>
>
> Hi Nick,
>
> Thanks for reviewing and helping.
>
> On Thu, Nov 14, 2024 at 02:17:30PM +0800, Nick Hu wrote:
> >Hi Deepak
> >
> >On Thu, Nov 14, 2024 at 9:25 AM Deepak Gupta  wrote:
> >>
> >> On Thu, Nov 14, 2024 at 09:20:14AM +0800, Nick Hu wrote:
> >> >Hi Deepak
> >> >
> >> >On Thu, Nov 14, 2024 at 9:06 AM Deepak Gupta  wrote:
> >> >> >> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> >> >> >> index 356d5397b2a2..6244408ca917 100644
> >> >> >> --- a/arch/riscv/kernel/head.S
> >> >> >> +++ b/arch/riscv/kernel/head.S
> >> >> >> @@ -164,6 +164,12 @@ secondary_start_sbi:
> >> >> >> call relocate_enable_mmu
> >> >> >>  #endif
> >> >> >> call .Lsetup_trap_vector
> >> >> >> +   li a7, SBI_EXT_FWFT
> >> >> >> +   li a6, SBI_EXT_FWFT_SET
> >> >> >> +   li a0, SBI_FWFT_SHADOW_STACK
> >> >> >> +   li a1, 1 /* enable supervisor to access shadow stack access 
> >> >> >> */
> >> >> >> +   li a2, SBI_FWFT_SET_FLAG_LOCK
> >> >> >> +   ecall
> >> >> >> scs_load_current
> >> >> >> call smp_callin
> >> >> >>  #endif /* CONFIG_SMP */
> >> >> >> @@ -320,6 +326,12 @@ SYM_CODE_START(_start_kernel)
> >> >> >> la tp, init_task
> >> >> >> la sp, init_thread_union + THREAD_SIZE
> >> >> >> addi sp, sp, -PT_SIZE_ON_STACK
> >> >> >> +   li a7, SBI_EXT_FWFT
> >> >> >> +   li a6, SBI_EXT_FWFT_SET
> >> >> >> +   li a0, SBI_FWFT_SHADOW_STACK
> >> >> >> +   li a1, 1 /* enable supervisor to access shadow stack access 
> >> >> >> */
> >> >> >> +   li a2, SBI_FWFT_SET_FLAG_LOCK
> >> >> >> +   ecall
> >> >> >> scs_load_current
> >> >> >>
> >> >> >>  #ifdef CONFIG_KASAN
> >> >> >>
> >> >> >> --
> >> >> >> 2.45.0
> >> >> >>
> >> >> >Should we clear the SBI_FWFT_SET_FLAG_LOCK before the cpu hotplug
> >> >> >otherwise the menvcfg.sse won't be set by the fwft set sbi call when
> >> >> >the hotplug cpu back to kernel?
> >> >>
> >> >> Hmm...
> >> >>
> >> >> An incoming hotplug CPU has no features setup on it.
> >> >> I see that `sbi_cpu_start` will supply `secondary_start_sbi` as start
> >> >> up code for incoming CPU. `secondary_start_sbi` is in head.S which 
> >> >> converges
> >> >> in `.Lsecondary_start_common`. And thus hotplugged CPU should be
> >> >> issuing shadow stack set FWFT sbi as well.
> >> >>
> >> >> Am I missing something ?
> >> >>
> >> >This is the correct flow. However the opensbi will deny it due to the
> >> >SBI_FWFT_SET_FLAG_LOCK already being set.
> >> >So the menvcfg.sse will not set by this flow.
> >> >
> >> >if (conf->flags & SBI_FWFT_SET_FLAG_LOCK)
> >> >return SBI_EDENIED;
> >> >
> >>
> >> hmm... Why?
> >>
> >> `conf` is pointing to per-hart state in firmware.
> >>
> >> On this incoming cpu, opensbi (or equivalent) firmware must have
> >> ensured that this per-hart state doesn't have lock set.
> >>
> >> Am I missing something?
> >>
> >Current OpenSBI doesn't clear the lock in the warm init of the hotplug path.
> >It seems like we need a patch to address it.
>
> Got it thanks.
> Since you already know what's the problem, can you send a patch to opensbi.
> If you want rather have me do it, let me know. Thanks.
>
No problem. I'll send a patch to opensbi.

Regards,
Nick
> >
> >Regards,
> >Nick
> >> >Regards,
> >> >Nick
> >> >> >
> >> >> >Regards,
> >> >> >Nick
> >> >> >>
> >> >> >> ___
> >> >> >> linux-riscv mailing list
> >> >> >> linux-ri...@lists.infradead.org
> >> >> >> http://lists.infradead.org/mailman/listinfo/linux-riscv



[PATCH v4] Documentation/CoC: spell out enforcement for unacceptable behaviors

2024-11-14 Thread Shuah Khan
The Code of Conduct committee's goal first and foremost is to bring about
change to ensure our community continues to foster respectful discussions.

In the interest of transparency, the CoC enforcement policy is formalized
for unacceptable behaviors.

Update the Code of Conduct Interpretation document with the enforcement
information.

Acked-by: Linus Torvalds 
Acked-by: Greg Kroah-Hartman 
Acked-by: Miguel Ojeda 
Acked-by: Dave Hansen 
Acked-by: Jonathan Corbet 
Acked-by: Steven Rostedt 
Acked-by: Dan Williams 
Acked-by: Theodore Ts'o 
Acked-by: Konstantin Ryabitsev 
Signed-off-by: Shuah Khan 
---

Changes since v3:
- Modifies kernel.org actions as per Konstantin's comments
- Adds Konstantin's Ack

Changes since v2:
- Adds details on the process leading up to the proposed
  measures to seek public apology and bans for serious
  unacceptable behaviors.

- Hope this addresses your comments, Daniel Vetter,
  Laurent Pinchart, and Mark Brown.

- Would like to get this into 6.12 if at all possible.

Changes since v1:
- Updates Acks with Ted's ack.
- Fixes subsection formatting as per Randy's suggestion.
- Fixes a spelling error.

 .../code-of-conduct-interpretation.rst| 87 +++
 1 file changed, 87 insertions(+)

diff --git a/Documentation/process/code-of-conduct-interpretation.rst 
b/Documentation/process/code-of-conduct-interpretation.rst
index 66b07f14714c..1d1150954be3 100644
--- a/Documentation/process/code-of-conduct-interpretation.rst
+++ b/Documentation/process/code-of-conduct-interpretation.rst
@@ -156,3 +156,90 @@ overridden decisions including complete and identifiable 
voting details.
 Because how we interpret and enforce the Code of Conduct will evolve over
 time, this document will be updated when necessary to reflect any
 changes.
+
+Enforcement for Unacceptable Behavior Code of Conduct Violations
+
+
+The Code of Conduct committee works to ensure that our community continues
+to be inclusive and fosters diverse discussions and viewpoints, and works
+to improve those characteristics over time. A majority of the reports the
+Code of Conduct Committee receives stem from incorrect understanding regarding
+the development process and maintainers' roles, responsibilities, and their
+right to make decisions on code acceptance. These are resolved through
+clarification of the development process and the scope of the Code of Conduct.
+
+Unacceptable behaviors could interrupt respectful collaboration for a short
+period of time and negatively impact the health of the community longer term.
+Unacceptable behaviors often get resolved when individuals acknowledge their
+behavior and make amends for it in the setting the violation has taken place.
+
+The Code of Conduct Committee receives reports about unacceptable behaviors
+when they don't get resolved through community discussions. The Code of
+Conduct committee takes measures to restore productive and respectful
+collaboration when an unacceptable behavior has negatively impacted that
+relationship.
+
+The Code of Conduct Committee has the obligation to keep the reports and
+reporters' information private. Reports could come from injured parties
+and community members who are observers of unacceptable behaviors. The
+Code of Conduct Committee has the responsibility to investigate and resolve
+these reports, working with all involved parties.
+
+The Code of Conduct Committee works with the individual to bring about
+change in their understanding of the importance to repair the damage caused
+by their behavior to the injured party and the long term negative impact
+on the community.
+
+The goal is to reach a resolution which is agreeable to all parties. If
+working with the individual fails to bring about the desired outcome, the
+Code of Conduct Committee will evaluate other measures such as seeking
+public apology to repair the damage.
+
+Seek public apology for the violation
+~
+
+The Code of Conduct Committee publicly calls out the behavior in the
+setting in which the violation has taken place, seeking public apology
+for the violation.
+
+A public apology for the violation is the first step towards rebuilding
+the trust. Trust is essential for the continued success and health of the
+community which operates on trust and respect.
+
+Remedial measures if there is no public apology for the violation
+~
+
+The Code of Conduct Committee determines the next course of action to restore
+the healthy collaboration by recommending remedial measure(s) to the TAB for
+approval.
+
+- Ban violator from participating in the kernel development process for
+  a period of up to a full kernel development cycle. The Code of Conduct
+  Committee could require public apology as a condition for lifting the
+  ban.
+
+The scope of the ban for a period of time could include:
+
+a. denyi

Re: [PATCH] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected

2024-11-14 Thread Rong Xu
Hi Yabin,

Thanks for working to enable this on ARM64 and test the performance on Android!
Please see my comments below.

Thanks,

-Rong

On Thu, Nov 14, 2024 at 9:11 PM Yabin Cui  wrote:
>
> Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> selected.
>
> On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> Experiments on Android show 4% improvement in cold app startup time
> and 13% improvement in binder benchmarks.
>
> Signed-off-by: Yabin Cui 
> ---
>  Documentation/dev-tools/autofdo.rst | 18 +-
>  arch/arm64/Kconfig  |  1 +
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/dev-tools/autofdo.rst 
> b/Documentation/dev-tools/autofdo.rst
> index 1f0a451e9ccd..f0952e3e8490 100644
> --- a/Documentation/dev-tools/autofdo.rst
> +++ b/Documentation/dev-tools/autofdo.rst
> @@ -55,7 +55,7 @@ process consists of the following steps:
> workload to gather execution frequency data. This data is
> collected using hardware sampling, via perf. AutoFDO is most
> effective on platforms supporting advanced PMU features like
> -   LBR on Intel machines.
> +   LBR on Intel machines, ETM traces on ARM machines.
>
>  #. AutoFDO profile generation: Perf output file is converted to
> the AutoFDO profile via offline tools.
> @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
>
>$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N 
> -b -c  -o  -- 
>
> +   - For ARM platforms:

The instructions for SPE might be different. Can we change to "- For
ARM platforms with ETM trace:"

> +
> + Follow the instructions in the `Linaro OpenCSD document
> + 
> https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> + to record ETM traces for AutoFDO::
> +
> +  $ perf record -e cs_etm/@tmc_etr0/k -a -o  -- 
> +  $ perf inject -i  -o  --itrace=i59il
> +
> + For ARM platforms running Android, follow the instructions in the
> + `Android simpleperf document
> + 
> `_
> + to record ETM traces for AutoFDO::

The instructions in "Step 3: Convert ETM data to AutoFDO profile"
currently use create_llvm_prof to generate
 a "binary" profile format. This is incompatible with the default
FSAFDO format used for the kernel, which
 requires an "extbinary" format.

To correct this, please update the instructions to include the flag
"-format extbinary" in the
create_llvm_prof command.

Using a non-FSAFDO profile with FSAFDO can negatively impact performance.
Therefore, I recommend rerunning the test with the updated flag to
potentially achieve better results.

> +
> +  $ simpleperf record -e cs-etm:k -a -o  -- 
> +
>  4) (Optional) Download the raw perf file to the host machine.
>
>  5) To generate an AutoFDO profile, two offline tools are available:
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fd9df6dcc593..c3814df5e391 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -103,6 +103,7 @@ config ARM64
> select ARCH_SUPPORTS_PER_VMA_LOCK
> select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> select ARCH_SUPPORTS_RT
> +   select ARCH_SUPPORTS_AUTOFDO_CLANG
> select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> select ARCH_WANT_DEFAULT_BPF_JIT
> --
> 2.47.0.338.g60cca15819-goog
>



Re: [PATCH v3] Documentation/CoC: spell out enforcement for unacceptable behaviors

2024-11-14 Thread Shuah Khan

On 11/14/24 07:38, Konstantin Ryabitsev wrote:

On Wed, Nov 13, 2024 at 04:25:57PM -0700, Shuah Khan wrote:

+The scope of the ban for a period of time could include:
+
+a. denying patch contributions and pull requests
+b. pausing collaboration with the violator by ignoring their
+   contributions and/or blocking their email account(s)
+c. blocking their access to kernel.org accounts and mailing lists


Can we change this to:

 c. restricting their ability to communicate via kernel.org platforms,
such as mailing lists and social media sites

It makes more sense to phrase it this way, because it's really the
communication that is the focus of this policy, not general access like git,
patchwork, etc.



Thank you Konstantin. Correct. The intent is to restrict communication.
The way you phrased it makes perfect sense. I will make the change.

thanks,
-- Shuah




Re: [PATCH v3] Documentation/CoC: spell out enforcement for unacceptable behaviors

2024-11-14 Thread Konstantin Ryabitsev
On Thu, Nov 14, 2024 at 08:01:34AM -0700, Shuah Khan wrote:
> >  c. restricting their ability to communicate via kernel.org platforms,
> > such as mailing lists and social media sites
> > 
> > It makes more sense to phrase it this way, because it's really the
> > communication that is the focus of this policy, not general access like git,
> > patchwork, etc.
> > 
> 
> Thank you Konstantin. Correct. The intent is to restrict communication.
> The way you phrased it makes perfect sense. I will make the change.

Thank you!

Acked-by: Konstantin Ryabitsev 

-K



Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

2024-11-14 Thread Doug Covelli
On Wed, Nov 13, 2024 at 12:59 PM Paolo Bonzini  wrote:
>
> On 11/13/24 17:24, Doug Covelli wrote:
> >> No worries, you're not hijacking :) The only reason is that it would
> >> be more code for a seldom used feature and anyway with worse performance.
> >> (To be clear, CR8 based accesses are allowed, but stores cause an exit
> >> in order to check the new TPR against IRR. That's because KVM's API
> >> does not have an equivalent of the TPR threshold as you point out below).
> >
> > I have not really looked at the code but it seems like it could also
> > simplify things as CR8 would be handled more uniformly regardless of
> > who is virtualizing the local APIC.
>
> Not much because CR8 basically does not exist at all (it's just a byte
> in memory) with userspace APIC.  So it's not easy to make it simpler, even
> though it's less uniform.
>
> That said, there is an optimization: you only get KVM_EXIT_SET_TPR if
> CR8 decreases.
>
> >>> Also I could not find these documented anywhere but with MSFT's APIC our 
> >>> monitor
> >>> relies on extensions for trapping certain events such as INIT/SIPI plus 
> >>> LINT0
> >>> and SVR writes:
> >>>
> >>> UINT64 X64ApicInitSipiExitTrap: 1; // 
> >>> WHvRunVpExitReasonX64ApicInitSipiTrap
> >>> UINT64 X64ApicWriteLint0ExitTrap  : 1; // 
> >>> WHvRunVpExitReasonX64ApicWriteTrap
> >>> UINT64 X64ApicWriteLint1ExitTrap  : 1; // 
> >>> WHvRunVpExitReasonX64ApicWriteTrap
> >>> UINT64 X64ApicWriteSvrExitTrap: 1; // 
> >>> WHvRunVpExitReasonX64ApicWriteTrap
> >>
> >> There's no need for this in KVM's in-kernel APIC model. INIT and
> >> SIPI are handled in the hypervisor and you can get the current
> >> state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected
> >> with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR
> >> and NMI blocking respectively, plus the interrupt shadow; so
> >> there's no need for userspace to know when LINT0/LINT1 themselves
> >> change. The spurious interrupt vector register is also handled
> >> completely in kernel.
> >
> > I realize that KVM can handle LINT0/SVR updates themselves but our
> > interrupt subsystem relies on knowing the current values of these
> > registers even when not virtualizing the local APIC.  I suppose we
> > could use KVM_GET_LAPIC to sync things up on demand but that seems
> > like it might nor be great from a performance point of view.
>
> Ah no, you're right---you want to track the CPU that has ExtINT enabled
> and send KVM_INTERRUPT to that one, I guess?  And you need the spurious
> vector registers because writes can set the mask bit in LINTx, but
> essentially you want to trap LINT0 changes.
>
> Something like this (missing the KVM_ENABLE_CAP and KVM_CHECK_EXTENSION
> code) is good, feel free to include it in your v2 (Co-developed-by
> and Signed-off-by me):
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5fb29ca3263b..b7dd89c99613 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -122,6 +122,7 @@
>   #define KVM_REQ_HV_TLB_FLUSH \
> KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>   #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE  KVM_ARCH_REQ(34)
> +#define KVM_REQ_REPORT_LINT0_ACCESSKVM_ARCH_REQ(35)
>
>   #define CR0_RESERVED_BITS   \
> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> @@ -775,6 +776,7 @@ struct kvm_vcpu_arch {
> u64 smi_count;
> bool at_instruction_boundary;
> bool tpr_access_reporting;
> +   bool lint0_access_reporting;
> bool xfd_no_write_intercept;
> u64 ia32_xss;
> u64 microcode_version;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 88dc43660d23..0e070f447aa2 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1561,6 +1561,21 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
>   apic->divide_count));
>   }
>
> +static void __report_lint0_access(struct kvm_lapic *apic, u32 value)
> +{
> +   struct kvm_vcpu *vcpu = apic->vcpu;
> +   struct kvm_run *run = vcpu->run;
> +
> +   kvm_make_request(KVM_REQ_REPORT_LINT0_ACCESS, vcpu);
> +   run->lint0_access.value = value;
> +}
> +
> +static inline void report_lint0_access(struct kvm_lapic *apic, u32 value)
> +{
> +   if (apic->vcpu->arch.lint0_access_reporting)
> +   __report_lint0_access(apic, value);
> +}
> +
>   static void __report_tpr_access(struct kvm_lapic *apic, bool write)
>   {
> struct kvm_vcpu *vcpu = apic->vcpu;
> @@ -2312,8 +2327,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, 
> u32 reg, u32 val)
> int i;
>
> for (i = 0; i < apic->nr_lvt_entries; i++) {
> -   kvm_lapic_set_reg(apic, APIC_LVTx(i),
> -   kvm_lapic_get_reg(apic, APIC_LVTx(i)) 
> | APIC_L

Re: [PATCH v3] Documentation/CoC: spell out enforcement for unacceptable behaviors

2024-11-14 Thread Konstantin Ryabitsev
On Wed, Nov 13, 2024 at 04:25:57PM -0700, Shuah Khan wrote:
> +The scope of the ban for a period of time could include:
> +
> +a. denying patch contributions and pull requests
> +b. pausing collaboration with the violator by ignoring their
> +   contributions and/or blocking their email account(s)
> +c. blocking their access to kernel.org accounts and mailing lists

Can we change this to:

c. restricting their ability to communicate via kernel.org platforms,
   such as mailing lists and social media sites

It makes more sense to phrase it this way, because it's really the
communication that is the focus of this policy, not general access like git,
patchwork, etc.

-K



Re: [PATCH v8 24/29] riscv: enable kernel access to shadow stack memory via FWFT sbi call

2024-11-14 Thread Deepak Gupta



Hi Nick,

Thanks for reviewing and helping.

On Thu, Nov 14, 2024 at 02:17:30PM +0800, Nick Hu wrote:

Hi Deepak

On Thu, Nov 14, 2024 at 9:25 AM Deepak Gupta  wrote:


On Thu, Nov 14, 2024 at 09:20:14AM +0800, Nick Hu wrote:
>Hi Deepak
>
>On Thu, Nov 14, 2024 at 9:06 AM Deepak Gupta  wrote:
>> >> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
>> >> index 356d5397b2a2..6244408ca917 100644
>> >> --- a/arch/riscv/kernel/head.S
>> >> +++ b/arch/riscv/kernel/head.S
>> >> @@ -164,6 +164,12 @@ secondary_start_sbi:
>> >> call relocate_enable_mmu
>> >>  #endif
>> >> call .Lsetup_trap_vector
>> >> +   li a7, SBI_EXT_FWFT
>> >> +   li a6, SBI_EXT_FWFT_SET
>> >> +   li a0, SBI_FWFT_SHADOW_STACK
>> >> +   li a1, 1 /* enable supervisor to access shadow stack access */
>> >> +   li a2, SBI_FWFT_SET_FLAG_LOCK
>> >> +   ecall
>> >> scs_load_current
>> >> call smp_callin
>> >>  #endif /* CONFIG_SMP */
>> >> @@ -320,6 +326,12 @@ SYM_CODE_START(_start_kernel)
>> >> la tp, init_task
>> >> la sp, init_thread_union + THREAD_SIZE
>> >> addi sp, sp, -PT_SIZE_ON_STACK
>> >> +   li a7, SBI_EXT_FWFT
>> >> +   li a6, SBI_EXT_FWFT_SET
>> >> +   li a0, SBI_FWFT_SHADOW_STACK
>> >> +   li a1, 1 /* enable supervisor to access shadow stack access */
>> >> +   li a2, SBI_FWFT_SET_FLAG_LOCK
>> >> +   ecall
>> >> scs_load_current
>> >>
>> >>  #ifdef CONFIG_KASAN
>> >>
>> >> --
>> >> 2.45.0
>> >>
>> >Should we clear the SBI_FWFT_SET_FLAG_LOCK before the cpu hotplug
>> >otherwise the menvcfg.sse won't be set by the fwft set sbi call when
>> >the hotplug cpu back to kernel?
>>
>> Hmm...
>>
>> An incoming hotplug CPU has no features setup on it.
>> I see that `sbi_cpu_start` will supply `secondary_start_sbi` as start
>> up code for incoming CPU. `secondary_start_sbi` is in head.S which converges
>> in `.Lsecondary_start_common`. And thus hotplugged CPU should be
>> issuing shadow stack set FWFT sbi as well.
>>
>> Am I missing something ?
>>
>This is the correct flow. However the opensbi will deny it due to the
>SBI_FWFT_SET_FLAG_LOCK already being set.
>So the menvcfg.sse will not set by this flow.
>
>if (conf->flags & SBI_FWFT_SET_FLAG_LOCK)
>return SBI_EDENIED;
>

hmm... Why?

`conf` is pointing to per-hart state in firmware.

On this incoming cpu, opensbi (or equivalent) firmware must have
ensured that this per-hart state doesn't have lock set.

Am I missing something?


Current OpenSBI doesn't clear the lock in the warm init of the hotplug path.
It seems like we need a patch to address it.


Got it thanks.
Since you already know what's the problem, can you send a patch to opensbi.
If you want rather have me do it, let me know. Thanks.



Regards,
Nick

>Regards,
>Nick
>> >
>> >Regards,
>> >Nick
>> >>
>> >> ___
>> >> linux-riscv mailing list
>> >> linux-ri...@lists.infradead.org
>> >> http://lists.infradead.org/mailman/listinfo/linux-riscv




Re: [PATCH v7 13/13] Documentation: userspace-api: iommufd: Update vIOMMU

2024-11-14 Thread Jason Gunthorpe
On Wed, Nov 13, 2024 at 07:18:42PM -0800, Nicolin Chen wrote:
> > so the user would try to create vDevices with a given viommu_obj until
> > failure, then it would allocate another viommu_obj for the failed device.
> > is it? sounds reasonable.
> 
> Yes. It is the same as previously dealing with a nesting parent:
> test and allocate if fails. The virtual IOMMU driver in VMM can
> keep a list of the vIOMMU objects for each device to test.

The viommu object should be tied to the VMM's vIOMMU vHW object that
it is paravirtualizing toward the VM.

So we shouldn't be creating viommu objects on demand, it should be
created when the vIOMMU is created, and the presumably the qemu
command line will describe how to link vPCI/VFIO functions to vIOMMU
instances. If they kernel won't allow the user's configuration then it
should fail, IMHO.

Some try-and-fail might be interesting to auto-provision vIOMMU's and
provision vPCI functions. Though I suspect we will be providing
information in other ioctls so something like libvirt can construct
the correct configuration directly.

Jason