[thanks for the extensive answers]

On Mon, Dec 6, 2021 at 10:13 PM Kent Mcleod <[email protected]> wrote:

> On Wed, Dec 1, 2021 at 9:59 AM Sam Leffler via Devel <[email protected]>
> wrote:
> >
> > Some updates inline. Especially interested in answers for #5 and #8.
> >
> > On Wed, Nov 24, 2021 at 9:12 PM Sam Leffler <[email protected]> wrote:
> >
> > > I've been investigating how capd-loader works and have some questions.
> My
> > > apologies if these were previously discussed.
> > >
> > > 1. CDL_Model is mutable. The capDL specification looks like a perfect
> > > candidate for being in .rodata but it's stored+treated as mutable.
> Why? I
> > > see capdl-loader uses the mutability to store mapped page frame caps
> and,
> > > for risc-v, VSpace roots, but both can be easily handled w/o modifying
> the
> > > spec. Having this immutable has many benefits.
> > >
>
> It's currently mutable because of the updates to it as it is
> processed. If these updates were removed, then it would be a good idea
> to make it immutable. How would you handle the mapped_frame_cap
> bookkeeping without storing it in the spec as a single frame could
> have multiple shared mappings which each require a cap?
>

I handled the 2 cases on risc-v by:

1. Instead of marking vspace roots in the model, keeping a separate set of
obj id's. I collect the set on the stack while creating objects and then
record it "elsewhere" for later use. I actually use a data structure that
has a fixed amount of storage & automatically spills to the heap on
overflow. 32 entries works for me w/o spilling but obviously it's
spec-dependent.
2. For the shared mappings I re-use the equivalent of
the capdl_to_sel4_copy array. Since that array is only used for TCB & CNode
dup'd caps there appear to be no conflicts/overlaps.

I think collecting vspace roots by checking for CDL_TCB_VTable_Slot in the
TCB works for all platforms. The benefit of tracking them separately is you
can avoid walking the object list in init_pd_asids & init_vspace.


> > > 2. All page frames are treated as shared. CAPDL_SHARED_FRAMES is
> #define'd
> > > in the code so every frame is treated as shared. This inflates the
> size of
> > > the orig_caps array and the kernel capabilities table but seems
> > > unnecessary. Why? I've verified that instead of doing the Copy op on
> each
> > > page you can just check the return of the seL4_Page_Map call and if
> > > necessary clone the cap. Better would be for camkes to identify shared
> page
> > > frames so this doesn't require 2x syscalls (which spam's the console w/
> > > CONFIG_PRINTING enabled).
> > >
>
> What you suggest would be possible.  For some object types (like an
> endpoint) the original cap is special and can be used to revoke all
> children caps that have been derived from it. All frame types don't
> have this property which means that the original cap is no different
> from any copies of it. Embedding whether the frame is shared into the
> object as an attribute would make this easier to optimize as I agree
> that it becomes pretty inefficient in typical systems with many more
> private frames than shared.
>

Thanks for pointing out the issue with endpoints. I was only suggesting
doing this copy-on-write style handling for CDL_Frame objects.  This
approach lets me halve the size of capdl_to_sel4_copy & capdl_to_sel4_orig.

>
> > > 3. Why does copy_addr_with_pt exist? There are two memory regions
> created
> > > for mapping page frames into the rootserver address space for doing
> fill
> > > ops. One is 64KB and the other is 1 page (nominally 4KB). init_frame
> tries
> > > the 4KB region first then falls back to the 64K region if mapping into
> the
> > > 4KB region fails. Why (i.e. why would the 1st mapping call fail)? On my
> > > risc-v system I never use the 64K region and I'm not sure when it
> might be
> > > used? Even if it were used, why is 64K always allocated; seems like the
> > > region should be sized according to the target platform. Having this
> extra
> > > region has a cost that can be significant on small systems.
> > >
> >
> > I got this backwards; on risc-v at least the 64KB region is used but not
> > the page immediately after the ipc buffer. On my system I eliminated the
> > latter and shrank the bss region to 1 page. Looking for feedback on
> whether
> > this will fail on other platforms.
> >
>
> Short answer: For copy_addr_with_pt, yes we could make the 64kiB
> definition aarch32 specific and 4kiB on other platforms.
> Long answer: It looks like copy_addr_with_pt is defined to be an
> address range that is guaranteed to have mapped page tables down to
> the last level of the page table hierarchy and is therefore suitable
> for mapping 4kiB pages into (or 64k pages only on aarch32 which is the
> only supported arch which allows larger pages to also be mapped in at
> the lowest level page table entries.  copy_addr on the other hand is
> guaranteed to not have a mapped page table at the lowest level of the
> page table hierarchy which allows for 1MiB, 2MiB, 4MiB and 16MiB page
> sizes to be mapped here and achieves this by picking a virtual address
> with alignment of 16MiB above the end of the image but still expecting
> this to be within the page table that is one level up from the lowest
> level page table in the hierarchy.
> Then init_frame tries either region when trying to initialize a frame
> of varying size between 4kiB and 16MiB.  Frames the next size up,
> 1GiB, would not be able to be mapped by the loader without adding a
> new copy_addr variable that has larger alignment to guarantee it
> doesn't collide with a page table at the next level up.
>

Thank you. Still not sure whether this means it's possible to eliminate the
copy_addr support (sounds like it might be arch-dependent?). Is the
multiple page sizes for an x86 target? (so I can test my approaches).


> > >
> > > 4. console output. Why is sel4_DebugPutChar (or similar) not a standard
> > > part of the kernel? This forces capdl-loader to depend on
> > > libsel4platsupport for console output which in turn affects cap space.
> I
> > > can't imagine an seL4 system that cannot write to the console and I
> think
> > > it's entirely reasonable to have a critical system service like the
> > > rootserver use that.
> > >
> >
>
> The verified kernel configurations don't have a uart driver in the
> kernel.  This decision follows from an L4 design principle that the
> kernel doesn't provide an implementation for anything that could be
> provided with an implementation outside of the kernel.  In practice,
> if you know that the loader is only going to be used on a specific
> platform, then you could write a simple specialized uart driver that
> you link in instead of the entirety of libplatsupport.
>
> > > 5. duplicate_caps.  The scheme for constructing a running system from a
> > > spec is straightforward except for duplicating caps. Why are all TCB's
> and
> > > CNode caps dup'd? Why does init_tcbs call init_tcb on the original
> object
> > > but immediately configure_tcb on the dup? There are other places where
> the
> > > dup cap is used instead of the original cap (e.g. init_cnode_slot).
> None of
> > > this seems to matter on risc-v; maybe this matters on some other
> > > architectures?
> > >
>
> I don't know the reason for this and it's older than the git history I
> have access to.  It looks like its unnecessary for TCBs, and is likely
> still necessary for CNodes when loading specs that have CNodes that
> are moved into slots in other CNodes when constructing a multi-cnode
> CSpace.
>

Is there an example in the tree of a nested CNode spec? I'm still not clear
on why the dup is required so being able to repro this would be helpful.


> > > 6. On risc-v all pages appear to be marked executable--including ipc
> > > buffers! This seems just wrong and a potential security hole.
> > >
> >
> > This looks to be a bug. Marking the NX bit on risc-v when the grant right
> > is not set gives me what I want. arm already did this.
> >
>
> It's a bug because it doesn't do it the same way as Arm, but the way
> arm does it is a bit misleading. The kernel itself only uses the read
> and write bits from cap rights for frame mappings, and a cap with
> write-only rights gets downgraded to kernel-only mapping, so really
> the kernel only lets you describe read-write, read-only or none.  For
> both read-write and read-only cap abilities, the frame could be mapped
> exec or non-exec.  The read right implies executable rights from the
> kernel's perspective.  The capdl loader does add a little bit of extra
> security with using the grant right in the spec to imply whether the
> frame has executable permissions and tries to create the mapping
> accordingly (except on riscv which is a bug), but if the frame cap for
> this mapping is also given to the component in order to do things like
> cache maintenance invocations, then the component could use that
> capability to convert it's non-exec mappings to executable mappings.
>
> There's probably a good argument for overhauling how the kernel
> handles frame rights so that you can express executeNever as a
> capability right and associate it with the grant or grantreply bit in
> the existing capRights.  I think this wasn't done originally because
> either there were no spare bits in the cap structure at the time, or
> armv6 didn't support executeNever mappings (but you'd have to
> fact-check me on that).
>
>
> > >
> > > 7. Why does seL4_TCB_WriteRegisters take a _mutable_ seL4_UserContext?
> I'm
> > > guessing this is a mistake in the xml spec (but haven't looked).
> > >
> >
> > This looks to be a shortcoming in the python script that generates the
> > syscall stubs.
> >
> Correct.
>
> > >
> > > 8. init_cspaces mystifies me. There are two loops over all CNodes; one
> > > that does "COPY" operations and one that does "MOVE" operations. I see
> > > nothing explaining what's happening. I've read this code many times and
> > > still don't get it; please explain.
> > >
>
> As I mentioned higher up, some caps are special as originals and allow
> additional operations to be performed on them compared to their copies
> even if the cap rights are the same for both.  Initializing all of the
> cnodes in the system involves moving capabilities (seL4_CNode_Move) or
> deriving capabilities from source capabilities either with the same
> rights (seL4_CNode_Copy) or lesser rights (seL4_CNode_Mint).  Its
> split into two passes because the derivation of capabilities need to
> happen before the source capabilities are moved into the destination
> CNode.
>
>
> > > 9. According to comments, the use of a global static to setup the
> > > UserContext for a new TCB is done because one cannot take the address
> of a
> > > local variable. On what architecture is this true? On risc-v the
> context is
> > > copied into "registers" before calling the kernel so this doesn't
> appear to
> > > be an issue with crossing the user-kernel protection boundary.
> > >
>
> This is true for the kernel with the proofs.  The very first
> implementation of this loader was as part of a project that wanted to
> verify it using the same tooling as the kernel proofs.  So that
> comment is probably documenting a requirement that is no longer
> present.
>
> > > 10. Memory reclamation. I'm unclear what happens to the memory used by
> > > capdl-loader. The process does a TCB_Suspend call when it's done. Does
> this
> > > cause the CSpace & VSpace to be reclaimed? For CAmkES this doesn't
> matter
> > > as there's no support for allocating memory but I'm building a system
> where
> > > this isn't true. How do I release memory (in particular) back to the
> > > untyped pool?
> > >
> >
> > This was discussed on the developers zoom call so no need to belabor it.
> > Sounds like I've got my work cut out for me...
> >
>
> Yea, this one is a bit more involved.
>

It appears I need this so may be back w/ more questions :).

>
> >
> > > I have a bunch of niggly code complaints/questions but let's start with
> > > the above.
> > >
> > > -Sam
> > >
> > _______________________________________________
> > Devel mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
>
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to