[PATCH] add some qemu debugging notes

2022-01-26 Thread Luca Dariz
Signed-off-by: Luca Dariz 
---
 microkernel/mach/gnumach/debugging.mdwn | 48 -
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/microkernel/mach/gnumach/debugging.mdwn 
b/microkernel/mach/gnumach/debugging.mdwn
index 9534c758..a134b618 100644
--- a/microkernel/mach/gnumach/debugging.mdwn
+++ b/microkernel/mach/gnumach/debugging.mdwn
@@ -77,7 +77,53 @@ and then type continue, to let Mach continue execution. The 
debugger will be ent
 
 When you're [[running_a_system_in_QEMU|hurd/running/qemu]] you can directly
 [use GDB on the running
-kernel](http://www.nongnu.org/qemu/qemu-doc.html#SEC48).
+kernel](https://www.qemu.org/docs/master/system/gdb.html).
+
+When debugggin 32-bit gnumach, you can specify the kernel file in the
+command line with the `-kernel` option and the boot modules with
+`-initrd`, as described in [[hurd/running/qemu]].  This however does
+not work for 64-bit gnumach, due to a [limitation in
+qemu](https://gitlab.com/qemu-project/qemu/-/issues/243).  To overcome
+this, you can either patch qemu to enable multiboot also for 64-bit
+ELF, or build a bootable ISO image with `grub-mkrescue`.
+
+To enable the gdbserver on a running instance, you need to access the
+qemu monitor and use the `gdbserver` command. For example, with
+libvirt/virt-manager
+
+$ virsh --connect qemu:///session qemu-monitor-command --domain hurd --hmp 
--cmd gdbserver
+
+Otherwise, if you start qemu manually, you can use the `-s` and `-S`
+shortcuts, that will open a tcp connection on port 1234 and wait for
+gdb to attach before starting the vm.
+
+If you don't need a graphical interface, e.g. you're working on the
+boot process, you could use stdio as an emulated serial port with
+`-nographic`, and append `console=com0` to the kernel command line,
+either in grub or with the `-append` option.
+
+Once qemu has started, you can connect to the gdbserver with
+
+$ gdb gnumach
+...
+(gdb) target remote :1234
+(gdb) c
+
+You can also automate some steps with a `.gdbinit` file in your
+working directory. For example:
+
+set print pretty
+target remote :1234
+# let's set some breakpoints
+b Panic
+b c_boot_entry
+b user_bootstrap
+b ../i386/intel/pmap.c:1981
+# we can also refer to virtual addresses in userspace
+b *0x804901d
+# this shows the instruction being executed
+display/i $pc
+layout asm
 
 
 ## [[open_issues/debugging_gnumach_startup_qemu_gdb]]
-- 
2.30.2




Re: Asking help for a little project

2022-01-27 Thread Luca dariz

Hi!

Il 27/01/22 09:49, Alessandro Sangiuliano ha scritto:
So assuming the situation where my custom name server is booting near 
the hurd as a module from grub, how to get the other tasks to resume them?


I think there could be two ways:
* pass the task ports of all other tasks from the command line, like 
exec-task in the usual boot script
* retrieve the task list from the processor set of the host, if you want 
to be more flexible.


Actually I also have a small question... what is your target with this 
nameserver? One possible use I think could be to simplify the boot 
process by acting as an initial rendez-vous point for all boot servers 
(I think for example at the case of booting with a rump-based disk 
server, pci-arbiter and so on)


Luca



[PATCH 2/6] cleanup multiboot

2022-01-28 Thread Luca Dariz
* use _raw_ structs where we refer to the bootloader-provided data
* remove unused structures
* fix 64 bit boot

Signed-off-by: Luca Dariz 
---
 Makefrag.am|   1 -
 i386/i386at/model_dep.c|  23 +++---
 i386/include/mach/i386/multiboot.h | 108 +
 include/mach/multiboot.h   |  82 --
 kern/bootstrap.c   |  20 +-
 5 files changed, 31 insertions(+), 203 deletions(-)
 delete mode 100644 include/mach/multiboot.h

diff --git a/Makefrag.am b/Makefrag.am
index fef1e000..6e74697e 100644
--- a/Makefrag.am
+++ b/Makefrag.am
@@ -404,7 +404,6 @@ include_mach_HEADERS = \
include/mach/message.h \
include/mach/mig_errors.h \
include/mach/msg_type.h \
-   include/mach/multiboot.h \
include/mach/notify.h \
include/mach/pc_sample.h \
include/mach/policy.h \
diff --git a/i386/i386at/model_dep.c b/i386/i386at/model_dep.c
index 21a36bf2..b2a22a42 100644
--- a/i386/i386at/model_dep.c
+++ b/i386/i386at/model_dep.c
@@ -122,7 +122,7 @@ unsigned long *pfn_list = (void*) PFN_LIST;
 unsigned long la_shift = VM_MIN_KERNEL_ADDRESS;
 #endif
 #else  /* MACH_XEN */
-struct multiboot_info boot_info;
+struct multiboot_raw_info boot_info;
 #endif /* MACH_XEN */
 
 /* Command line supplied to kernel.  */
@@ -403,7 +403,7 @@ i386at_init(void)
}
 
if (boot_info.flags & MULTIBOOT_MODS && boot_info.mods_count) {
-   struct multiboot_module *m;
+   struct multiboot_raw_module *m;
int i;
 
if (! init_alloc_aligned(
@@ -591,13 +591,14 @@ void c_boot_entry(vm_offset_t bi)
 * so that the symbol table's memory won't be stomped on.
 */
if ((boot_info.flags & MULTIBOOT_AOUT_SYMS)
-   && boot_info.syms.a.addr)
+   && boot_info.shdr_addr)
{
vm_size_t symtab_size, strtab_size;
 
-   kern_sym_start = (vm_offset_t)phystokv(boot_info.syms.a.addr);
-   symtab_size = (vm_offset_t)phystokv(boot_info.syms.a.tabsize);
-   strtab_size = (vm_offset_t)phystokv(boot_info.syms.a.strsize);
+/* For simplicity we just use a simple boot_info_raw structure 
for elf */
+   kern_sym_start = (vm_offset_t)phystokv(boot_info.shdr_addr);
+   symtab_size = (vm_offset_t)phystokv(boot_info.shdr_num);
+   strtab_size = (vm_offset_t)phystokv(boot_info.shdr_size);
kern_sym_end = kern_sym_start + 4 + symtab_size + strtab_size;
 
printf("kernel symbol table at %08lx-%08lx (%ld,%ld)\n",
@@ -606,12 +607,12 @@ void c_boot_entry(vm_offset_t bi)
}
 
if ((boot_info.flags & MULTIBOOT_ELF_SHDR)
-   && boot_info.syms.e.num)
+   && boot_info.shdr_num)
{
-   elf_shdr_num = boot_info.syms.e.num;
-   elf_shdr_size = boot_info.syms.e.size;
-   elf_shdr_addr = (vm_offset_t)phystokv(boot_info.syms.e.addr);
-   elf_shdr_shndx = boot_info.syms.e.shndx;
+   elf_shdr_num = boot_info.shdr_num;
+   elf_shdr_size = boot_info.shdr_size;
+   elf_shdr_addr = (vm_offset_t)phystokv(boot_info.shdr_addr);
+   elf_shdr_shndx = boot_info.shdr_strndx;
 
printf("ELF section header table at %08lx\n", elf_shdr_addr);
}
diff --git a/i386/include/mach/i386/multiboot.h 
b/i386/include/mach/i386/multiboot.h
index 5a532576..40522d96 100644
--- a/i386/include/mach/i386/multiboot.h
+++ b/i386/include/mach/i386/multiboot.h
@@ -25,31 +25,6 @@
 
 #include 
 
-/* For a.out kernel boot images, the following header must appear
-   somewhere in the first 8192 bytes of the kernel image file.  */
-struct multiboot_header
-{
-   /* Must be MULTIBOOT_MAGIC */
-   unsignedmagic;
-
-   /* Feature flags - see below.  */
-   unsignedflags;
-
-   /*
-* Checksum
-*
-* The above fields plus this one must equal 0 mod 2^32.
-*/
-   unsignedchecksum;
-
-   /* These are only valid if MULTIBOOT_AOUT_KLUDGE is set.  */
-   vm_offset_t header_addr;
-   vm_offset_t load_addr;
-   vm_offset_t load_end_addr;
-   vm_offset_t bss_end_addr;
-   vm_offset_t entry;
-};
-
 /* The entire multiboot_header must be contained
within the first MULTIBOOT_SEARCH bytes of the kernel image.  */
 #define MULTIBOOT_SEARCH   8192
@@ -78,61 +53,7 @@ struct multiboot_header
that the multiboot method is being used */
 #define MULTIBOOT_VALID 0x2badb002
 
-/* The boot loader passes this data structure to the kernel in
-   register EBX on entry.  */
-struct multiboot_info
-{
-   /* These flags indicate which parts of the 

[PATCH 6/6] fix Task State Segment layout for 64 bit

2022-01-28 Thread Luca Dariz
Signed-off-by: Luca Dariz 
---
 i386/i386/i386asm.sym |  4 
 i386/i386/ktss.c  |  8 ++--
 i386/i386/pcb.c   |  4 
 i386/i386/tss.h   | 24 ++--
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 0662aea0..cfe5549c 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -95,8 +95,12 @@ offset   i386_interrupt_statei   eip
 offset i386_interrupt_statei   cs
 offset i386_interrupt_statei   efl
 
+#ifdef __x86_64__
+offset i386_tsstss rsp0
+#else
 offset i386_tsstss esp0
 offset i386_tsstss ss0
+#endif
 
 offset machine_slotsub_typecpu_type
 
diff --git a/i386/i386/ktss.c b/i386/i386/ktss.c
index 917e6305..24e12cf4 100644
--- a/i386/i386/ktss.c
+++ b/i386/i386/ktss.c
@@ -55,11 +55,15 @@ ktss_init(void)
fill_gdt_sys_descriptor(KERNEL_TSS,
kvtolin(&ktss), sizeof(struct task_tss) - 1,
ACC_PL_K|ACC_TSS, 0);
-
/* Initialize the master TSS.  */
+#ifdef __x86_64__
+   ktss.tss.rsp0 = (unsigned long)(exception_stack+1024);
+   ktss.tss.io_bit_map_offset = IOPB_INVAL;
+#else /* ! __x86_64__ */
ktss.tss.ss0 = KERNEL_DS;
ktss.tss.esp0 = (unsigned long)(exception_stack+1024);
-   ktss.tss.io_bit_map_offset = IOPB_INVAL;

+   ktss.tss.io_bit_map_offset = IOPB_INVAL;
+#endif /* __x86_64__ */
/* Set the last byte in the I/O bitmap to all 1's.  */
ktss.barrier = 0xff;
 
diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c
index 23585323..23b734e3 100644
--- a/i386/i386/pcb.c
+++ b/i386/i386/pcb.c
@@ -153,7 +153,11 @@ void switch_ktss(pcb_t pcb)
if (hyp_stack_switch(KERNEL_DS, pcb_stack_top))
panic("stack_switch");
 #else  /* MACH_RING1 */
+#ifdef __x86_64__
+curr_ktss(mycpu)->tss.rsp0 = pcb_stack_top;
+#else /* __x86_64__ */
curr_ktss(mycpu)->tss.esp0 = pcb_stack_top;
+#endif /* __x86_64__ */
 #endif /* MACH_RING1 */
 }
 
diff --git a/i386/i386/tss.h b/i386/i386/tss.h
index ff25f217..31e1f5cb 100644
--- a/i386/i386/tss.h
+++ b/i386/i386/tss.h
@@ -27,13 +27,33 @@
 #ifndef_I386_TSS_H_
 #define_I386_TSS_H_
 
+#include 
 #include 
 
 #include 
 
 /*
- * i386 Task State Segment
+ * x86 Task State Segment
  */
+#ifdef __x86_64__
+struct i386_tss {
+  uint32_t _reserved0;
+  uint64_t rsp0;
+  uint64_t rsp1;
+  uint64_t rsp2;
+  uint64_t _reserved1;
+  uint64_t ist1;
+  uint64_t ist2;
+  uint64_t ist3;
+  uint64_t ist4;
+  uint64_t ist5;
+  uint64_t ist6;
+  uint64_t ist7;
+  uint64_t _reserved2;
+  uint16_t _reserved3;
+  uint16_t io_bit_map_offset;
+} __attribute__((__packed__));
+#else /* ! __x86_64__ */
 struct i386_tss {
int back_link;  /* segment number of previous task,
   if nested */
@@ -67,7 +87,7 @@ struct i386_tss {
/* offset to start of IO permission
   bit map */
 };
-
+#endif /* __x86_64__ */
 
 /* The structure extends the above TSS structure by an I/O permission bitmap
and the barrier.  */
-- 
2.30.2




[PATCH 0/6] Add initial support for booting x86_64 from grub

2022-01-28 Thread Luca Dariz
These patches enable basic support for booting the x86_64 build on
qemu. It's possible to load the bootstrap modules (32-bit for now) but
they can't do much yet. This seems more or less in line with the xen
port.

Next steps (in no particular order):
* test simple syscalls, e.g. mach_tsk_self(), mach_print()
* move kernel to higher addresses, even just beyond 4G would be a good
  start
* enhance pmap module to actually use the L4/L3 tables, for now we're
  limited to the first 8G
* fix mig types and test mach_msg() for some rpc.

Note that due to recent changes in mig, compilation for x86_64 seems
to fail currently, and we should fix mig types for this. To test these
patches, you can revert for example to this commit:

63ed32b1 mach_i386: include MACH_I386_IMPORTS

I think however it's better to have this part reviewed, and address
the mig issues in another round.

Luca Dariz (6):
  add support for booting from grub with x86_64
  cleanup multiboot
  fix register corruption in irq on qemu
  fix console setting from cmdline
  enable user access
  fix Task State Segment layout for 64 bit

 Makefrag.am|   1 -
 configure.ac   |   3 +-
 i386/configfrag.ac |   2 +
 i386/i386/i386asm.sym  |   4 +
 i386/i386/ktss.c   |   8 +-
 i386/i386/pcb.c|   4 +
 i386/i386/tss.h|  24 +++-
 i386/i386/vm_param.h   |   4 +-
 i386/i386at/com.c  |   2 +-
 i386/i386at/model_dep.c|  23 ++--
 i386/include/mach/i386/multiboot.h | 108 +--
 i386/intel/pmap.c  |   8 +-
 i386/intel/pmap.h  |   4 +
 include/mach/multiboot.h   |  82 ---
 kern/bootstrap.c   |  20 ++-
 x86_64/Makefrag.am |  18 ++-
 x86_64/boothdr.S   | 214 +
 x86_64/interrupt.S |   6 +-
 x86_64/ldscript|  28 ++--
 x86_64/locore.S|   4 +-
 20 files changed, 336 insertions(+), 231 deletions(-)
 delete mode 100644 include/mach/multiboot.h
 create mode 100644 x86_64/boothdr.S

-- 
2.30.2




[PATCH 1/6] add support for booting from grub with x86_64

2022-01-28 Thread Luca Dariz
* link kernel at 0x400 as the xen version, higher values causes
  linker errors.
* we can't use full segmentation in long mode, so we need to create a
  temporary mapping during early boot to be able to jump to high
  addresses
* build direct map for first 4G in boothdr (seems required by Linux
  drivers
* enable also write page access check in kernel mode

Signed-off-by: Luca Dariz 
---
 configure.ac |   3 +-
 i386/configfrag.ac   |   2 +
 i386/i386/vm_param.h |   4 +-
 i386/intel/pmap.c|   4 +-
 i386/intel/pmap.h|   4 +
 x86_64/Makefrag.am   |  18 +++-
 x86_64/boothdr.S | 214 +++
 x86_64/interrupt.S   |   4 +-
 x86_64/ldscript  |  28 --
 x86_64/locore.S  |   4 +-
 10 files changed, 264 insertions(+), 21 deletions(-)
 create mode 100644 x86_64/boothdr.S

diff --git a/configure.ac b/configure.ac
index 019842db..3aaa935c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -56,8 +56,7 @@ case $host_platform:$host_cpu in
   default:i?86)
 host_platform=at;;
   default:x86_64)]
-AC_MSG_WARN([Platform set to Xen by default, this can not boot on non-Xen 
systems, you currently need a 32bit build for that.])
-[host_platform=xen;;
+[host_platform=at;;
   at:i?86 | xen:i?86 | at:x86_64 | xen:x86_64)
 :;;
   *)]
diff --git a/i386/configfrag.ac b/i386/configfrag.ac
index f697e277..f07a98ca 100644
--- a/i386/configfrag.ac
+++ b/i386/configfrag.ac
@@ -106,6 +106,8 @@ AC_ARG_ENABLE([apic],
 enable_pae=${enable_pae-yes};;
   *:i?86)
 :;;
+  *:x86_64)
+enable_pae=${enable_pae-yes};;
   *)
 if [ x"$enable_pae" = xyes ]; then]
   AC_MSG_ERROR([can only enable the `PAE' feature on ix86.])
diff --git a/i386/i386/vm_param.h b/i386/i386/vm_param.h
index edd9522c..c00c05b2 100644
--- a/i386/i386/vm_param.h
+++ b/i386/i386/vm_param.h
@@ -36,7 +36,7 @@
  * for better trace support in kdb; the _START symbol has to be offset by the
  * same amount. */
 #ifdef __x86_64__
-#define VM_MIN_KERNEL_ADDRESS  0x4000UL
+#define VM_MIN_KERNEL_ADDRESS  KERNEL_MAP_BASE
 #else
 #define VM_MIN_KERNEL_ADDRESS  0xC000UL
 #endif
@@ -73,7 +73,7 @@
 /* This is the kernel address range in linear addresses.  */
 #ifdef __x86_64__
 #define LINEAR_MIN_KERNEL_ADDRESS  VM_MIN_KERNEL_ADDRESS
-#define LINEAR_MAX_KERNEL_ADDRESS  (0x7fffUL)
+#define LINEAR_MAX_KERNEL_ADDRESS  (0xUL)
 #else
 /* On x86, the kernel virtual address space is actually located
at high linear addresses. */
diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 3bf00659..91835b30 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -655,7 +655,7 @@ void pmap_bootstrap(void)
  pa_to_pte(_kvtophys((void *) kernel_page_dir
  + i * INTEL_PGBYTES))
  | INTEL_PTE_VALID
-#ifdef MACH_PV_PAGETABLES
+#if defined(MACH_PV_PAGETABLES) || defined(__x86_64__)
  | INTEL_PTE_WRITE
 #endif
  );
@@ -1297,7 +1297,7 @@ pmap_t pmap_create(vm_size_t size)
WRITE_PTE(&p->pdpbase[i],
  pa_to_pte(kvtophys((vm_offset_t) page_dir[i]))
  | INTEL_PTE_VALID
-#ifdef MACH_PV_PAGETABLES
+#if defined(MACH_PV_PAGETABLES) || defined(__x86_64__)
  | INTEL_PTE_WRITE
 #endif
  );
diff --git a/i386/intel/pmap.h b/i386/intel/pmap.h
index f24b3a71..d9222e95 100644
--- a/i386/intel/pmap.h
+++ b/i386/intel/pmap.h
@@ -156,7 +156,11 @@ typedef phys_addr_t pt_entry_t;
 #endif /* MACH_PV_PAGETABLES */
 #define INTEL_PTE_WIRED0x0200
 #ifdef PAE
+#ifdef __x86_64__
+#define INTEL_PTE_PFN  0xf000ULL
+#else /* __x86_64__ */
 #define INTEL_PTE_PFN  0x7000ULL
+#endif/* __x86_64__ */
 #else
 #define INTEL_PTE_PFN  0xf000
 #endif
diff --git a/x86_64/Makefrag.am b/x86_64/Makefrag.am
index 40b50bc9..5da734de 100644
--- a/x86_64/Makefrag.am
+++ b/x86_64/Makefrag.am
@@ -207,11 +207,27 @@ nodist_libkernel_a_SOURCES += \
 
 EXTRA_DIST += \
x86_64/ldscript
+
 if PLATFORM_at
+# This should probably be 0x8000 for mcmodel=kernel, but let's try
+# to stay in the first 8G first, otherwise we have to fix the pmap module to
+# actually use the l4 page level
+#KERNEL_MAP_BASE=0x1
+# but for nor try with < 4G, otherwise we have linker errors
+KERNEL_MAP_BASE=0x4000
 gnumach_LINKFLAGS += \
--defsym _START_MAP=$(_START_MAP) \
-   --defsym _START=_START_MAP+0x4000 \
+   --defsym _START=_START_MAP \
+   --defsym KERNEL_MAP_BASE=$(KERNEL_MAP_BASE) \
-T '$(srcdir)'/x86_64/ldscript
+
+AM_CFLAGS += -D_START_MAP=$(_START_MAP) \
+   -DKERNEL_MAP_BASE=$(KERNEL_MAP_BAS

[PATCH 5/6] enable user access

2022-01-28 Thread Luca Dariz
The pmap module is a bit limited on 64 bit paging, so this should be
refined when we'll be able to use addresses over 4G.

Signed-off-by: Luca Dariz 
---
 i386/intel/pmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 91835b30..278ffb97 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -1298,7 +1298,7 @@ pmap_t pmap_create(vm_size_t size)
  pa_to_pte(kvtophys((vm_offset_t) page_dir[i]))
  | INTEL_PTE_VALID
 #if defined(MACH_PV_PAGETABLES) || defined(__x86_64__)
- | INTEL_PTE_WRITE
+ | INTEL_PTE_WRITE | INTEL_PTE_USER
 #endif
  );
}
@@ -1309,7 +1309,7 @@ pmap_t pmap_create(vm_size_t size)
!= KERN_SUCCESS)
panic("pmap_create");
memset(p->l4base, 0, INTEL_PGBYTES);
-   WRITE_PTE(&p->l4base[0], pa_to_pte(kvtophys((vm_offset_t) p->pdpbase)) 
| INTEL_PTE_VALID | INTEL_PTE_WRITE);
+   WRITE_PTE(&p->l4base[0], pa_to_pte(kvtophys((vm_offset_t) p->pdpbase)) 
| INTEL_PTE_VALID | INTEL_PTE_WRITE | INTEL_PTE_USER);
 #ifdef MACH_PV_PAGETABLES
// FIXME: use kmem_cache_alloc instead
if (kmem_alloc_wired(kernel_map,
-- 
2.30.2




[PATCH 3/6] fix register corruption in irq on qemu

2022-01-28 Thread Luca Dariz
it seems rbx is corrupted during interrupt handlers.
This appears on first call to thread_continue()

Signed-off-by: Luca Dariz 
---
 x86_64/interrupt.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/x86_64/interrupt.S b/x86_64/interrupt.S
index eab643a5..4ea849af 100644
--- a/x86_64/interrupt.S
+++ b/x86_64/interrupt.S
@@ -35,6 +35,7 @@ ENTRY(interrupt)
cmpl$255,%eax   /* was this a spurious intr? */
je  _no_eoi /* if so, just return */
 #endif
+   pushq   %rbx
pushq   %rax/* save irq number */
callspl7/* set ipl */
pushq   %rax/* save previous ipl */
@@ -89,6 +90,7 @@ ENTRY(interrupt)
movlEXT(curr_pic_mask),%eax /* restore original mask */
outb%al,$(PIC_MASTER_OCW)   /* unmask master */
 2:
+   popq%rbx
ret
 #else
cmpl$16,%ecx/* was this a low ISA intr? */
-- 
2.30.2




[PATCH 4/6] fix console setting from cmdline

2022-01-28 Thread Luca Dariz
The leading space prevents it working if console=comX is the only
argument.

Signed-off-by: Luca Dariz 
---
 i386/i386at/com.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/i386/i386at/com.c b/i386/i386at/com.c
index 3402a025..d1de51c0 100644
--- a/i386/i386at/com.c
+++ b/i386/i386at/com.c
@@ -183,7 +183,7 @@ comcnprobe(struct consdev *cp)
struct  bus_device *b;
int maj, unit, pri;
 
-#define CONSOLE_PARAMETER " console=com"
+#define CONSOLE_PARAMETER "console=com"
u_char *console = (u_char *) strstr(kernel_cmdline, CONSOLE_PARAMETER);
 
if (console)
-- 
2.30.2




[PATCH v2 0/6] Add initial support for booting x86_64 from grub

2022-02-05 Thread Luca Dariz
Updates from previous submission:
* added more description of the changes in the commit message
* removed some minor changes needed only when running the kernel from
  high addreses (>4G), currently it fails to link and it will be
  handled in another patch
* updated pre-processor condition for write access in PDP table, it's
  not impacting current 32-bit builds (non-PAE)
* updated command line fix for console=comX
* better fix for rbx corruption in x86_64/interrupt.S
* improved x86_64/boothdr.S, removing most magic numbers or adding an
  explanatory comment

Luca Dariz (6):
  add support for booting from grub with x86_64
  cleanup multiboot
  fix register corruption in irq on qemu
  fix console setting from cmdline
  enable user access
  fix Task State Segment layout for 64 bit

 Makefrag.am|   1 -
 configure.ac   |   3 +-
 i386/configfrag.ac |   2 +
 i386/i386/i386asm.sym  |   5 +
 i386/i386/ktss.c   |   8 +-
 i386/i386/pcb.c|   4 +
 i386/i386/tss.h|  24 ++-
 i386/i386/vm_param.h   |   2 +-
 i386/i386at/com.c  |   5 +
 i386/i386at/model_dep.c|  23 +--
 i386/include/mach/i386/multiboot.h | 108 +
 i386/intel/pmap.c  |   8 +-
 i386/intel/pmap.h  |   1 +
 include/mach/multiboot.h   |  82 --
 kern/bootstrap.c   |  20 ++-
 x86_64/Makefrag.am |  18 ++-
 x86_64/boothdr.S   | 238 +
 x86_64/interrupt.S |  12 +-
 x86_64/ldscript|  28 ++--
 19 files changed, 361 insertions(+), 231 deletions(-)
 delete mode 100644 include/mach/multiboot.h
 create mode 100644 x86_64/boothdr.S

-- 
2.30.2




[PATCH 1/6] add support for booting from grub with x86_64

2022-02-05 Thread Luca Dariz
* configure: compile for native x86_64 by default instead of xen
* x86_64/Makefrag.am: introduce KERNEL_MAP_BASE to reuse the constant
  in both code and linker script
* x86_64/ldscript: use a .boot section for the very first operations,
  until we reach long mode. This section is not really allocated, so
  it doesn't need to be freed later. The vm system is later
  initialized starting from .text and not including .boot
* link kernel at 0x400 as the xen version, higher values causes
  linker errors
* we can't use full segmentation in long mode, so we need to create a
  temporary mapping during early boot to be able to jump to high
  addresses
* build direct map for first 4G in boothdr, it seems required by Linux
  drivers
* add INTEL_PTE_PS bit definition to enable 2MB pages during bootstrap
* ensure write bit is set in PDP entry access rights. This only
  applies to PAE-enabled kernels, mandatory for x86_64. On xen
  platform it seems to be handled differently

Signed-off-by: Luca Dariz 
---
 configure.ac  |   3 +-
 i386/configfrag.ac|   2 +
 i386/i386/i386asm.sym |   1 +
 i386/i386/vm_param.h  |   2 +-
 i386/intel/pmap.c |   4 +-
 i386/intel/pmap.h |   1 +
 x86_64/Makefrag.am|  18 +++-
 x86_64/boothdr.S  | 238 ++
 x86_64/ldscript   |  28 +++--
 9 files changed, 281 insertions(+), 16 deletions(-)
 create mode 100644 x86_64/boothdr.S

diff --git a/configure.ac b/configure.ac
index 019842db..3aaa935c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -56,8 +56,7 @@ case $host_platform:$host_cpu in
   default:i?86)
 host_platform=at;;
   default:x86_64)]
-AC_MSG_WARN([Platform set to Xen by default, this can not boot on non-Xen 
systems, you currently need a 32bit build for that.])
-[host_platform=xen;;
+[host_platform=at;;
   at:i?86 | xen:i?86 | at:x86_64 | xen:x86_64)
 :;;
   *)]
diff --git a/i386/configfrag.ac b/i386/configfrag.ac
index f697e277..f07a98ca 100644
--- a/i386/configfrag.ac
+++ b/i386/configfrag.ac
@@ -106,6 +106,8 @@ AC_ARG_ENABLE([apic],
 enable_pae=${enable_pae-yes};;
   *:i?86)
 :;;
+  *:x86_64)
+enable_pae=${enable_pae-yes};;
   *)
 if [ x"$enable_pae" = xyes ]; then]
   AC_MSG_ERROR([can only enable the `PAE' feature on ix86.])
diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 0662aea0..9e1d13d7 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -122,6 +122,7 @@ exprsizeof(pt_entry_t)  
PTE_SIZE
 expr   INTEL_PTE_PFN   PTE_PFN
 expr   INTEL_PTE_VALID PTE_V
 expr   INTEL_PTE_WRITE PTE_W
+expr   INTEL_PTE_PSPTE_S
 expr   ~INTEL_PTE_VALIDPTE_INVALID
 expr   NPTES   PTES_PER_PAGE
 expr   INTEL_PTE_VALID|INTEL_PTE_WRITE INTEL_PTE_KERNEL
diff --git a/i386/i386/vm_param.h b/i386/i386/vm_param.h
index edd9522c..314fdb35 100644
--- a/i386/i386/vm_param.h
+++ b/i386/i386/vm_param.h
@@ -36,7 +36,7 @@
  * for better trace support in kdb; the _START symbol has to be offset by the
  * same amount. */
 #ifdef __x86_64__
-#define VM_MIN_KERNEL_ADDRESS  0x4000UL
+#define VM_MIN_KERNEL_ADDRESS  KERNEL_MAP_BASE
 #else
 #define VM_MIN_KERNEL_ADDRESS  0xC000UL
 #endif
diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 3bf00659..d0bd3b5d 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -655,7 +655,7 @@ void pmap_bootstrap(void)
  pa_to_pte(_kvtophys((void *) kernel_page_dir
  + i * INTEL_PGBYTES))
  | INTEL_PTE_VALID
-#ifdef MACH_PV_PAGETABLES
+#if !defined(MACH_HYP) || defined(MACH_PV_PAGETABLES)
  | INTEL_PTE_WRITE
 #endif
  );
@@ -1297,7 +1297,7 @@ pmap_t pmap_create(vm_size_t size)
WRITE_PTE(&p->pdpbase[i],
  pa_to_pte(kvtophys((vm_offset_t) page_dir[i]))
  | INTEL_PTE_VALID
-#ifdef MACH_PV_PAGETABLES
+#if !defined(MACH_HYP) || defined(MACH_PV_PAGETABLES)
  | INTEL_PTE_WRITE
 #endif
  );
diff --git a/i386/intel/pmap.h b/i386/intel/pmap.h
index f24b3a71..b93c4ad4 100644
--- a/i386/intel/pmap.h
+++ b/i386/intel/pmap.h
@@ -148,6 +148,7 @@ typedef phys_addr_t pt_entry_t;
 #define INTEL_PTE_NCACHE   0x0010
 #define INTEL_PTE_REF  0x0020
 #define INTEL_PTE_MOD  0x0040
+#define INTEL_PTE_PS   0x0080
 #ifdef MACH_PV_PAGETABLES
 /* Not supported */
 #define INTEL_PTE_GLOBAL   0x
diff --git a/x86_64/Makefrag.am b/x86_64/Makefrag.am
index 40b50bc9..5da734de 100644
--- a/x86_64/Make

[PATCH 3/6] fix register corruption in irq on qemu

2022-02-05 Thread Luca Dariz
rbx was used to compute the irq index in iunit and ivect arrays,
however it should be preserved by pushing it in to the stack.  As a
solution, we use rax instead, that is preserved across function calls
and is not used as a function argument.

Signed-off-by: Luca Dariz 
---
 x86_64/interrupt.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/x86_64/interrupt.S b/x86_64/interrupt.S
index fccf6e28..73151b06 100644
--- a/x86_64/interrupt.S
+++ b/x86_64/interrupt.S
@@ -38,15 +38,15 @@ ENTRY(interrupt)
pushq   %rax/* save irq number */
callspl7/* set ipl */
pushq   %rax/* save previous ipl */
-   movl8(%esp),%edx/* set irq number as 3rd arg */
-   movl%edx,%ebx   /* copy irq number */
-   shll$2,%ebx /* irq * 4 */
-   movlEXT(iunit)(%ebx),%edi   /* get device unit number as 1st arg */
movl%eax, %esi  /* previous ipl as 2nd arg */
+   movl8(%esp),%edx/* set irq number as 3rd arg */
+   movl%edx,%eax   /* copy irq number */
+   shll$2,%eax /* irq * 4 */
+   movlEXT(iunit)(%eax),%edi   /* get device unit number as 1st arg */
movq16(%esp), %rcx  /* return address as 4th arg */
movq24(%esp), %r8   /* address of interrupted registers as 
5th arg */
-   shll$1,%ebx /* irq * 8 */
-   call*EXT(ivect)(%ebx)   /* call interrupt handler */
+   shll$1,%eax /* irq * 8 */
+   call*EXT(ivect)(%eax)   /* call interrupt handler */
popq%rdi/* restore previous ipl */
callsplx_cli/* restore previous ipl */
 
-- 
2.30.2




[PATCH 6/6] fix Task State Segment layout for 64 bit

2022-02-05 Thread Luca Dariz
Signed-off-by: Luca Dariz 
---
 i386/i386/i386asm.sym |  4 
 i386/i386/ktss.c  |  8 ++--
 i386/i386/pcb.c   |  4 
 i386/i386/tss.h   | 24 ++--
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 9e1d13d7..417c040d 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -95,8 +95,12 @@ offset   i386_interrupt_statei   eip
 offset i386_interrupt_statei   cs
 offset i386_interrupt_statei   efl
 
+#ifdef __x86_64__
+offset i386_tsstss rsp0
+#else
 offset i386_tsstss esp0
 offset i386_tsstss ss0
+#endif
 
 offset machine_slotsub_typecpu_type
 
diff --git a/i386/i386/ktss.c b/i386/i386/ktss.c
index 917e6305..24e12cf4 100644
--- a/i386/i386/ktss.c
+++ b/i386/i386/ktss.c
@@ -55,11 +55,15 @@ ktss_init(void)
fill_gdt_sys_descriptor(KERNEL_TSS,
kvtolin(&ktss), sizeof(struct task_tss) - 1,
ACC_PL_K|ACC_TSS, 0);
-
/* Initialize the master TSS.  */
+#ifdef __x86_64__
+   ktss.tss.rsp0 = (unsigned long)(exception_stack+1024);
+   ktss.tss.io_bit_map_offset = IOPB_INVAL;
+#else /* ! __x86_64__ */
ktss.tss.ss0 = KERNEL_DS;
ktss.tss.esp0 = (unsigned long)(exception_stack+1024);
-   ktss.tss.io_bit_map_offset = IOPB_INVAL;

+   ktss.tss.io_bit_map_offset = IOPB_INVAL;
+#endif /* __x86_64__ */
/* Set the last byte in the I/O bitmap to all 1's.  */
ktss.barrier = 0xff;
 
diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c
index 23585323..23b734e3 100644
--- a/i386/i386/pcb.c
+++ b/i386/i386/pcb.c
@@ -153,7 +153,11 @@ void switch_ktss(pcb_t pcb)
if (hyp_stack_switch(KERNEL_DS, pcb_stack_top))
panic("stack_switch");
 #else  /* MACH_RING1 */
+#ifdef __x86_64__
+curr_ktss(mycpu)->tss.rsp0 = pcb_stack_top;
+#else /* __x86_64__ */
curr_ktss(mycpu)->tss.esp0 = pcb_stack_top;
+#endif /* __x86_64__ */
 #endif /* MACH_RING1 */
 }
 
diff --git a/i386/i386/tss.h b/i386/i386/tss.h
index ff25f217..31e1f5cb 100644
--- a/i386/i386/tss.h
+++ b/i386/i386/tss.h
@@ -27,13 +27,33 @@
 #ifndef_I386_TSS_H_
 #define_I386_TSS_H_
 
+#include 
 #include 
 
 #include 
 
 /*
- * i386 Task State Segment
+ * x86 Task State Segment
  */
+#ifdef __x86_64__
+struct i386_tss {
+  uint32_t _reserved0;
+  uint64_t rsp0;
+  uint64_t rsp1;
+  uint64_t rsp2;
+  uint64_t _reserved1;
+  uint64_t ist1;
+  uint64_t ist2;
+  uint64_t ist3;
+  uint64_t ist4;
+  uint64_t ist5;
+  uint64_t ist6;
+  uint64_t ist7;
+  uint64_t _reserved2;
+  uint16_t _reserved3;
+  uint16_t io_bit_map_offset;
+} __attribute__((__packed__));
+#else /* ! __x86_64__ */
 struct i386_tss {
int back_link;  /* segment number of previous task,
   if nested */
@@ -67,7 +87,7 @@ struct i386_tss {
/* offset to start of IO permission
   bit map */
 };
-
+#endif /* __x86_64__ */
 
 /* The structure extends the above TSS structure by an I/O permission bitmap
and the barrier.  */
-- 
2.30.2




[PATCH 4/6] fix console setting from cmdline

2022-02-05 Thread Luca Dariz
The leading space prevents it working if console=comX is the only
argument, so handle this case separately.

Signed-off-by: Luca Dariz 
---
 i386/i386at/com.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/i386/i386at/com.c b/i386/i386at/com.c
index 3402a025..fb291b87 100644
--- a/i386/i386at/com.c
+++ b/i386/i386at/com.c
@@ -189,6 +189,11 @@ comcnprobe(struct consdev *cp)
if (console)
mach_atoi(console + strlen(CONSOLE_PARAMETER), &rcline);
 
+   if (strncmp(kernel_cmdline, CONSOLE_PARAMETER + 1,
+   strlen(CONSOLE_PARAMETER) - 1) == 0)
+   mach_atoi(kernel_cmdline + strlen(CONSOLE_PARAMETER) - 1,
+ &rcline);
+
maj = 0;
unit = -1;
pri = CN_DEAD;
-- 
2.30.2




[PATCH 2/6] cleanup multiboot

2022-02-05 Thread Luca Dariz
* use _raw_ structs where we refer to the bootloader-provided data
* remove unused structures
* fix 64 bit boot

Signed-off-by: Luca Dariz 
---
 Makefrag.am|   1 -
 i386/i386at/model_dep.c|  23 +++---
 i386/include/mach/i386/multiboot.h | 108 +
 include/mach/multiboot.h   |  82 --
 kern/bootstrap.c   |  20 +-
 5 files changed, 31 insertions(+), 203 deletions(-)
 delete mode 100644 include/mach/multiboot.h

diff --git a/Makefrag.am b/Makefrag.am
index fef1e000..6e74697e 100644
--- a/Makefrag.am
+++ b/Makefrag.am
@@ -404,7 +404,6 @@ include_mach_HEADERS = \
include/mach/message.h \
include/mach/mig_errors.h \
include/mach/msg_type.h \
-   include/mach/multiboot.h \
include/mach/notify.h \
include/mach/pc_sample.h \
include/mach/policy.h \
diff --git a/i386/i386at/model_dep.c b/i386/i386at/model_dep.c
index 21a36bf2..b2a22a42 100644
--- a/i386/i386at/model_dep.c
+++ b/i386/i386at/model_dep.c
@@ -122,7 +122,7 @@ unsigned long *pfn_list = (void*) PFN_LIST;
 unsigned long la_shift = VM_MIN_KERNEL_ADDRESS;
 #endif
 #else  /* MACH_XEN */
-struct multiboot_info boot_info;
+struct multiboot_raw_info boot_info;
 #endif /* MACH_XEN */
 
 /* Command line supplied to kernel.  */
@@ -403,7 +403,7 @@ i386at_init(void)
}
 
if (boot_info.flags & MULTIBOOT_MODS && boot_info.mods_count) {
-   struct multiboot_module *m;
+   struct multiboot_raw_module *m;
int i;
 
if (! init_alloc_aligned(
@@ -591,13 +591,14 @@ void c_boot_entry(vm_offset_t bi)
 * so that the symbol table's memory won't be stomped on.
 */
if ((boot_info.flags & MULTIBOOT_AOUT_SYMS)
-   && boot_info.syms.a.addr)
+   && boot_info.shdr_addr)
{
vm_size_t symtab_size, strtab_size;
 
-   kern_sym_start = (vm_offset_t)phystokv(boot_info.syms.a.addr);
-   symtab_size = (vm_offset_t)phystokv(boot_info.syms.a.tabsize);
-   strtab_size = (vm_offset_t)phystokv(boot_info.syms.a.strsize);
+/* For simplicity we just use a simple boot_info_raw structure 
for elf */
+   kern_sym_start = (vm_offset_t)phystokv(boot_info.shdr_addr);
+   symtab_size = (vm_offset_t)phystokv(boot_info.shdr_num);
+   strtab_size = (vm_offset_t)phystokv(boot_info.shdr_size);
kern_sym_end = kern_sym_start + 4 + symtab_size + strtab_size;
 
printf("kernel symbol table at %08lx-%08lx (%ld,%ld)\n",
@@ -606,12 +607,12 @@ void c_boot_entry(vm_offset_t bi)
}
 
if ((boot_info.flags & MULTIBOOT_ELF_SHDR)
-   && boot_info.syms.e.num)
+   && boot_info.shdr_num)
{
-   elf_shdr_num = boot_info.syms.e.num;
-   elf_shdr_size = boot_info.syms.e.size;
-   elf_shdr_addr = (vm_offset_t)phystokv(boot_info.syms.e.addr);
-   elf_shdr_shndx = boot_info.syms.e.shndx;
+   elf_shdr_num = boot_info.shdr_num;
+   elf_shdr_size = boot_info.shdr_size;
+   elf_shdr_addr = (vm_offset_t)phystokv(boot_info.shdr_addr);
+   elf_shdr_shndx = boot_info.shdr_strndx;
 
printf("ELF section header table at %08lx\n", elf_shdr_addr);
}
diff --git a/i386/include/mach/i386/multiboot.h 
b/i386/include/mach/i386/multiboot.h
index 5a532576..40522d96 100644
--- a/i386/include/mach/i386/multiboot.h
+++ b/i386/include/mach/i386/multiboot.h
@@ -25,31 +25,6 @@
 
 #include 
 
-/* For a.out kernel boot images, the following header must appear
-   somewhere in the first 8192 bytes of the kernel image file.  */
-struct multiboot_header
-{
-   /* Must be MULTIBOOT_MAGIC */
-   unsignedmagic;
-
-   /* Feature flags - see below.  */
-   unsignedflags;
-
-   /*
-* Checksum
-*
-* The above fields plus this one must equal 0 mod 2^32.
-*/
-   unsignedchecksum;
-
-   /* These are only valid if MULTIBOOT_AOUT_KLUDGE is set.  */
-   vm_offset_t header_addr;
-   vm_offset_t load_addr;
-   vm_offset_t load_end_addr;
-   vm_offset_t bss_end_addr;
-   vm_offset_t entry;
-};
-
 /* The entire multiboot_header must be contained
within the first MULTIBOOT_SEARCH bytes of the kernel image.  */
 #define MULTIBOOT_SEARCH   8192
@@ -78,61 +53,7 @@ struct multiboot_header
that the multiboot method is being used */
 #define MULTIBOOT_VALID 0x2badb002
 
-/* The boot loader passes this data structure to the kernel in
-   register EBX on entry.  */
-struct multiboot_info
-{
-   /* These flags indicate which parts of the 

[PATCH 5/6] enable user access

2022-02-05 Thread Luca Dariz
The pmap module is a bit limited on 64 bit paging, so this should be
refined when we'll be able to use addresses over 4G.

Signed-off-by: Luca Dariz 
---
 i386/intel/pmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index d0bd3b5d..19b7b51c 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -1298,7 +1298,7 @@ pmap_t pmap_create(vm_size_t size)
  pa_to_pte(kvtophys((vm_offset_t) page_dir[i]))
  | INTEL_PTE_VALID
 #if !defined(MACH_HYP) || defined(MACH_PV_PAGETABLES)
- | INTEL_PTE_WRITE
+ | INTEL_PTE_WRITE | INTEL_PTE_USER
 #endif
  );
}
@@ -1309,7 +1309,7 @@ pmap_t pmap_create(vm_size_t size)
!= KERN_SUCCESS)
panic("pmap_create");
memset(p->l4base, 0, INTEL_PGBYTES);
-   WRITE_PTE(&p->l4base[0], pa_to_pte(kvtophys((vm_offset_t) p->pdpbase)) 
| INTEL_PTE_VALID | INTEL_PTE_WRITE);
+   WRITE_PTE(&p->l4base[0], pa_to_pte(kvtophys((vm_offset_t) p->pdpbase)) 
| INTEL_PTE_VALID | INTEL_PTE_WRITE | INTEL_PTE_USER);
 #ifdef MACH_PV_PAGETABLES
// FIXME: use kmem_cache_alloc instead
if (kmem_alloc_wired(kernel_map,
-- 
2.30.2




[PATCH 0/2] refine architecture-specific data types on gnumach

2022-04-03 Thread Luca Dariz
These patches introduce some new data types that allow for more
flexibility.  This is just the first step, once the new data types are
added they need to be used in various places
* syscall interface
* mig server routines
* functions that use port names instead of ports. Note that without
  this adaptation the code will still work thanks to the implicit
  32-to-64 extension, but it is a nice cleanup.

Luca Dariz (2):
  add port name types
  add rpc_versions for vm types

 i386/include/mach/i386/vm_types.h | 37 +++
 include/mach/mach_port.defs   |  6 ++---
 include/mach/mach_types.defs  | 31 +++---
 include/mach/port.h   | 36 ++
 include/mach/std_types.defs   |  6 ++---
 include/mach_debug/vm_info.h  | 24 ++--
 kern/thread.c |  4 ++--
 kern/thread.h |  4 ++--
 x86_64/configfrag.ac  | 12 +-
 9 files changed, 109 insertions(+), 51 deletions(-)

-- 
2.30.2




[PATCH 1/2] add port name types

2022-04-03 Thread Luca Dariz
* include/mach/mach_port.defs
  - use C type mach_port_name_array_t
* include/mach/port.h:
  - add new types mach_port_name_t and mach_port_name_array_t
  - refine mach_port_t type for user and kernel space
  - use port names in mach_port_status to allow compilation of 64-bit
  - use port name to have uniform sizes and remove the
old_mach_port_status_t as it's unused
* include/mach/std_types.defs
  - use C type mach_port_name_array_t
* kern/thread.{h,c}
  - fix prototype to use port names. So far it seems the only rpc to
cause a conflict between the mig-generated header and the regular
header, so compilation fails.

Signed-off-by: Luca Dariz 
---
 include/mach/mach_port.defs |  6 ++
 include/mach/port.h | 36 +---
 include/mach/std_types.defs |  6 ++
 kern/thread.c   |  4 ++--
 kern/thread.h   |  4 ++--
 5 files changed, 25 insertions(+), 31 deletions(-)

diff --git a/include/mach/mach_port.defs b/include/mach/mach_port.defs
index c21c34bc..7cb8a659 100644
--- a/include/mach/mach_port.defs
+++ b/include/mach/mach_port.defs
@@ -53,8 +53,7 @@ subsystem
 routine mach_port_names(
task: ipc_space_t;
out names   : mach_port_name_array_t =
-   ^array[] of mach_port_name_t
-   ctype: mach_port_array_t;
+   ^array[] of mach_port_name_t;
out types   : mach_port_type_array_t =
^array[] of mach_port_type_t);
 
@@ -209,8 +208,7 @@ routine mach_port_get_set_status(
task: ipc_space_t;
name: mach_port_name_t;
out members : mach_port_name_array_t =
-   ^array[] of mach_port_name_t
-   ctype: mach_port_array_t);
+   ^array[] of mach_port_name_t);
 
 /*
  * Puts the member port (the task must have receive rights)
diff --git a/include/mach/port.h b/include/mach/port.h
index e77e5c38..3c226f6c 100644
--- a/include/mach/port.h
+++ b/include/mach/port.h
@@ -38,8 +38,24 @@
 #include 
 #include 
 
+/*
+ * Port names are the type used by userspace, they are always 32-bit wide.
+ */
+typedef unsigned int mach_port_name_t;
+typedef mach_port_name_t *mach_port_name_array_t;
 
+/*
+ * A port is represented
+ * - by a port name in userspace
+ * - by a pointer in kernel space
+ * While in userspace mach_port_name_t and mach_port_name are interchangable,
+ * in kernelspace they need to be different and appropriately converted.
+ */
+#ifdef KERNEL
 typedef vm_offset_t mach_port_t;
+#else /* KERNEL */
+typedef mach_port_name_t mach_port_t;
+#endif
 typedef mach_port_t *mach_port_array_t;
 typedef const mach_port_t *const_mach_port_array_t;
 typedef int *rpc_signature_info_t;
@@ -121,7 +137,7 @@ typedef unsigned int mach_port_msgcount_t;  /* number of 
msgs */
 typedef unsigned int mach_port_rights_t;   /* number of rights */
 
 typedef struct mach_port_status {
-   mach_port_t mps_pset;   /* containing port set */
+   mach_port_name_tmps_pset;   /* containing port set */
mach_port_seqno_t   mps_seqno;  /* sequence number */
 /*mach_port_mscount_t*/natural_t mps_mscount;  /* make-send count */
 /*mach_port_msgcount_t*/natural_t mps_qlimit;  /* queue limit */
@@ -135,22 +151,4 @@ typedef struct mach_port_status {
 #define MACH_PORT_QLIMIT_DEFAULT   ((mach_port_msgcount_t) 5)
 #define MACH_PORT_QLIMIT_MAX   ((mach_port_msgcount_t) 16)
 
-/*
- *  Compatibility definitions, for code written
- *  before there was an mps_seqno field.
- *
- *  XXX: Remove this before releasing Gnumach 1.6.
- */
-
-typedef struct old_mach_port_status {
-   mach_port_t mps_pset;   /* containing port set */
-/*mach_port_mscount_t*/natural_t mps_mscount;  /* make-send count */
-/*mach_port_msgcount_t*/natural_t mps_qlimit;  /* queue limit */
-/*mach_port_msgcount_t*/natural_t mps_msgcount;/* number in the queue 
*/
-/*mach_port_rights_t*/natural_tmps_sorights;   /* how many send-once 
rights */
-/*boolean_t*/natural_t mps_srights;/* do send rights exist? */
-/*boolean_t*/natural_t mps_pdrequest;  /* port-deleted requested? */
-/*boolean_t*/natural_t mps_nsrequest;  /* no-senders requested? */
-} old_mach_port_status_t;
-
 #endif /* _MACH_PORT_H_ */
diff --git a/include/mach/std_types.defs b/include/mach/std_types.defs
index 5d95ab42..46987380 100644
--- a/include/mach/std_types.defs
+++ b/include/mach/std_types.defs
@@ -58,10 +58,8 @@ type mach_port_t = MACH_MSG_TYPE_COPY_SEND
 ;
 type mach_port_array_t = array[] of mach_port_t;
 
-type mach_port_name_t = MACH_MSG_TYPE_PORT_NAME
-   ctype: mach_port_t;
-type mach_port_name_array_t = array

[PATCH 1/2] add separate port_size and mach_port_name_size definitions

2022-04-03 Thread Luca Dariz
* cpu.sym: retrieve size of vm_offset_t and mach_port_name_t from
  gnumach headers at compile type.
* global.{c,h}: add port size as a variable and initialize it to the
  port name size.
* lexxer.l: apply port or port name size to the corresponding types,
  instead of using the word size.
* parser.y: update port size if we're generating for kernel-space
  (server or client). Also re-initialize default port types to account
  for this change.
* type.c: use port size instead of word size in default port types and
  runtime checks.

There are many assumptions about mach_port_t:
 - on kernel side, its size is the same as a pointer. This allows to
   replace the port name with the address of the corresponding data
   structure during copyin in mach_msg()
 - in mig, this is also the "word size", which is derived from gnumach
   headers as the size of integer_t
 - its size is also the same as natural_t, so it's possible to model
   structures like mach_port_status_t as an array of integer_t in
   mig. This is convenient since arrays and structures can't have
   mixed types.
 - its size is always the same as the port name size

This patch does not change the current behaviour on 32-bit kernels,
but allows for some of these assumptions to be broken on 64-bit
kernels. This is needed to have 32-bit port names on 64-bit kernels
and be able to support a 32-bit userspace.  It still leaves the choice
for a 64-bit userspace, if all integer_t and natural_t are to be
extended to 64 bit.

However keeping 32-bit port names seems to be the right thing, based on
previous discussions [1], even for a 64-bit kernel.

The only assumption kept is that in kernel-space ports are always the
size of a pointer, as they refer to a data structure and not to a
specific port name.  To ensure this is true for various user/kernel
combinations, we dynamically change the port size if we're generating
code for kernel-space server or clients, and keep the size of a port the
same of a port name for user-space servers and clients.

[1] https://lists.gnu.org/archive/html/bug-hurd/2012-04/msg00000.html

Signed-off-by: Luca Dariz 
---
 cpu.sym  |  4 
 global.c |  4 
 global.h |  3 +++
 lexxer.l | 24 
 parser.y |  7 +++
 type.c   | 10 +-
 6 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/cpu.sym b/cpu.sym
index fcf6241..5e34074 100644
--- a/cpu.sym
+++ b/cpu.sym
@@ -106,3 +106,7 @@ expr sizeof(double) sizeof_double
 expr sizeof(mach_msg_header_t) sizeof_mach_msg_header_t
 expr sizeof(mach_msg_type_long_t)  sizeof_mach_msg_type_long_t
 expr sizeof(mach_msg_type_t)   sizeof_mach_msg_type_t
+expr sizeof(vm_offset_t)   vm_offset_size
+expr (sizeof(vm_offset_t)*8)   vm_offset_size_in_bits
+expr sizeof(mach_port_name_t)  port_name_size
+expr (sizeof(mach_port_name_t)*8)  port_name_size_in_bits
diff --git a/global.c b/global.c
index 5685186..e2eb76e 100644
--- a/global.c
+++ b/global.c
@@ -24,6 +24,7 @@
  * rights to redistribute these changes.
  */
 
+#include "cpu.h"
 #include "error.h"
 #include "global.h"
 
@@ -65,6 +66,9 @@ string_t InternalHeaderFileName = strNULL;
 string_t UserFileName = strNULL;
 string_t ServerFileName = strNULL;
 
+int port_size = port_name_size;
+int port_size_in_bits = port_name_size_in_bits;
+
 void
 more_global(void)
 {
diff --git a/global.h b/global.h
index 8dbb6fd..cadd7e7 100644
--- a/global.h
+++ b/global.h
@@ -67,6 +67,9 @@ extern string_t InternalHeaderFileName;
 extern string_t UserFileName;
 extern string_t ServerFileName;
 
+extern int port_size;
+extern int port_size_in_bits;
+
 extern void more_global(void);
 
 #ifndef NULL
diff --git a/lexxer.l b/lexxer.l
index 48dda4a..71f43b2 100644
--- a/lexxer.l
+++ b/lexxer.l
@@ -160,7 +160,7 @@ static void doSharp(const char *body); /* process body of # 
directives */
 (?i:countinout)FRETURN(flCountInOut);
 (?i:retcode)   FRETURN(flNone);
 
-(?i:polymorphic)   
TRETURN(MACH_MSG_TYPE_POLYMORPHIC,word_size_in_bits);
+(?i:polymorphic)   
TRETURN(MACH_MSG_TYPE_POLYMORPHIC,port_size_in_bits);
 
 "MACH_MSG_TYPE_UNSTRUCTURED"   TRETURN(MACH_MSG_TYPE_UNSTRUCTURED,0);
 "MACH_MSG_TYPE_BIT"TRETURN(MACH_MSG_TYPE_BIT,1);
@@ -175,17 +175,17 @@ static void doSharp(const char *body); /* process body of 
# directives */
 "MACH_MSG_TYPE_STRING" TRETURN(MACH_MSG_TYPE_STRING,0);
 "MACH_MSG_TYPE_STRING_C"   TRETURN(MACH_MSG_TYPE_STRING_C,0);
 
-"MACH_MSG_TYPE_MOVE_RECEIVE"   
TPRETURN(MACH_MSG_TYPE_MOVE_RECEIVE,MACH_MSG_TYPE_PORT_RECEIVE,word_size_in_bits);
-"MACH_MSG_TYPE_COPY_SEND"  
TPRETURN(MACH_MSG_TYPE_COPY_SEND,MACH_MSG_TYPE_PORT_SEND,word_size_in_bits);
-"MACH_MSG_TYPE_MAKE_SEND"  
TPRETURN(MACH_MSG_TYPE_MAKE_SEND,MACH_MSG_TYPE_PORT_SEND,word_size_in_bits);
-"MAC

[PATCH 2/2] add rpc_versions for vm types

2022-04-03 Thread Luca Dariz
* vm_types.h: add new types and conversion functions
* mach_types.defs: adapt vm types depending on kernel user/server
* vm_info.h: adapt rpc structure to have uniformly-sized members also
  on 64-bit
* x86_64/configfrag.c: add new option to select the user-space variant.

Note that with this change the user-space interface is somehow fixed,
i.e. it can't support 32-bit and 64-bit tasks at the same time.
If this would be needed at some point, this change needs to be reworked.

Signed-off-by: Luca Dariz 
---
 i386/include/mach/i386/vm_types.h | 37 +++
 include/mach/mach_types.defs  | 31 +++---
 include/mach_debug/vm_info.h  | 24 ++--
 x86_64/configfrag.ac  | 12 +-
 4 files changed, 84 insertions(+), 20 deletions(-)

diff --git a/i386/include/mach/i386/vm_types.h 
b/i386/include/mach/i386/vm_types.h
index f49a95a1..16aedc44 100644
--- a/i386/include/mach/i386/vm_types.h
+++ b/i386/include/mach/i386/vm_types.h
@@ -37,6 +37,12 @@
 #ifdef __ASSEMBLER__
 #else  /* __ASSEMBLER__ */
 
+#include 
+
+#ifdef MACH_KERNEL
+#include 
+#endif
+
 /*
  * A natural_t is the type for the native
  * integer type, e.g. 32 or 64 or.. whatever
@@ -88,13 +94,36 @@ typedef unsigned long long rpc_phys_addr_t;
  * expressing the difference between two
  * vm_offset_t entities.
  */
-#ifdef __x86_64__
 typedefunsigned long   vm_size_t;
-#else
-typedefnatural_t   vm_size_t;
-#endif
 typedefvm_size_t * vm_size_array_t;
 
+/*
+ * rpc_types are for user/kernel interfaces. On kernel side they may differ 
from
+ * the native types, while on user space they shall be the same.
+ * These three types are always of the same size, so we can reuse the 
conversion
+ * functions.
+ */
+#if defined(MACH_KERNEL) && defined(USER32)
+typedef uint32_t   rpc_vm_address_t;
+typedef uint32_t   rpc_vm_offset_t;
+typedef uint32_t   rpc_vm_size_t;
+static inline uint64_t convert_vm_from_user(uint32_t uaddr)
+{
+return (uint64_t)uaddr;
+}
+static inline uint32_t convert_vm_to_user(uint64_t kaddr)
+{
+assert(kaddr <= 0x);
+return (uint32_t)kaddr;
+}
+#else /* MACH_KERNEL */
+typedef vm_offset_trpc_vm_address_t;
+typedef vm_offset_trpc_vm_offset_t;
+typedef vm_size_t  rpc_vm_size_t;
+#define convert_vm_to_user null_conversion
+#define convert_vm_from_user null_conversion
+#endif /* MACH_KERNEL */
+
 #endif /* __ASSEMBLER__ */
 
 /*
diff --git a/include/mach/mach_types.defs b/include/mach/mach_types.defs
index a0e9241c..a271d597 100644
--- a/include/mach/mach_types.defs
+++ b/include/mach/mach_types.defs
@@ -110,9 +110,34 @@ type ipc_space_t = mach_port_t
 #endif /* KERNEL_SERVER */
;
 
-type vm_address_t = natural_t;
-type vm_offset_t = natural_t;
-type vm_size_t = natural_t;
+#if defined(KERNEL_SERVER) && defined(USER32)
+type rpc_vm_size_t = uint32_t;
+#else /* KERNEL_SERVER and USER32 */
+#if defined(__x86_64__)
+type rpc_vm_size_t = uint64_t;
+#else /* __x86_64__ */
+type rpc_vm_size_t = uint32_t;
+#endif /* __x86_64__ */
+#endif /* KERNEL_SERVER and USER32 */
+
+type vm_address_t = rpc_vm_size_t
+#if defined(KERNEL_SERVER) || defined(KERNEL_USER)
+intran: vm_address_t convert_vm_from_user(rpc_vm_address_t)
+outtran: rpc_vm_address_t convert_vm_to_user(vm_address_t)
+#endif
+;
+type vm_offset_t = rpc_vm_size_t
+#if defined(KERNEL_SERVER) || defined(KERNEL_USER)
+intran: vm_offset_t convert_vm_from_user(rpc_vm_offset_t)
+outtran: rpc_vm_offset_t convert_vm_to_user(vm_offset_t)
+#endif
+;
+type vm_size_t = rpc_vm_size_t
+#if defined(KERNEL_SERVER) || defined(KERNEL_USER)
+intran: vm_size_t convert_vm_from_user(rpc_vm_size_t)
+outtran: rpc_vm_size_t convert_vm_to_user(vm_size_t)
+#endif
+;
 type vm_prot_t = int;
 type vm_inherit_t = int;
 type vm_statistics_data_t = struct[13] of integer_t;
diff --git a/include/mach_debug/vm_info.h b/include/mach_debug/vm_info.h
index b50fb92d..e68bb1d5 100644
--- a/include/mach_debug/vm_info.h
+++ b/include/mach_debug/vm_info.h
@@ -46,8 +46,8 @@
  */
 
 typedef struct vm_region_info {
-   vm_offset_t vri_start;  /* start of region */
-   vm_offset_t vri_end;/* end of region */
+   rpc_vm_offset_t vri_start;  /* start of region */
+   rpc_vm_offset_t vri_end;/* end of region */
 
 /*vm_prot_t*/natural_t vri_protection; /* protection code */
 /*vm_prot_t*/natural_t vri_max_protection; /* maximum protection */
@@ -55,8 +55,8 @@ typedef struct vm_region_info {
natural_t vri_wired_count;  /* number of times wired */
natural_t vri_user_wired_count; /* number of times user has wired */
 
-   vm_offset_t vri_object; /* the mapped object */
-   vm_offset_t vri_offset; /* offset into object */
+   rpc_vm_offset_t vri_object; /* the mapped object */
+   rp

[PATCH 2/2] improve error message

2022-04-03 Thread Luca Dariz
* parser.y: add information about type names

Signed-off-by: Luca Dariz 
---
 parser.y | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/parser.y b/parser.y
index 8d6b2aa..8521a84 100644
--- a/parser.y
+++ b/parser.y
@@ -526,8 +526,8 @@ IPCType :   PrimIPCType
$$.size = $1.size;
else
{
-   error("sizes in IPCTypes (%d, %d) aren't equal",
- $1.size, $3.size);
+   error("sizes in IPCTypes (%s %s %d, %s %s %d) aren't equal",
+ $1.instr, $1.outstr, $1.size, $3.instr, $3.outstr, $3.size);
$$.size = 0;
}
 }
-- 
2.30.2




[PATCH 0/2] refine architecture-specific data types on mig

2022-04-03 Thread Luca Dariz
These patches nees the mach_port_name_t type defined in gnumach
headers in the other patch set.

Luca Dariz (2):
  add separate port_size and mach_port_name_size definitions
  improve error message

 cpu.sym  |  4 
 global.c |  4 
 global.h |  3 +++
 lexxer.l | 24 
 parser.y | 11 +--
 type.c   | 10 +-
 6 files changed, 37 insertions(+), 19 deletions(-)

-- 
2.30.2




[PATCH 2/3] add check for whole message size

2022-06-28 Thread Luca Dariz
* user.c: ensure fixed-length messages have the correct size. In
  addition to the single-fields check, this also include padding.

Signed-off-by: Luca Dariz 
---
 user.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/user.c b/user.c
index 9aff07c..9a84fe4 100644
--- a/user.c
+++ b/user.c
@@ -275,7 +275,10 @@ WriteMsgSend(FILE *file, const routine_t *rt)
 char SendSize[24];
 
 if (rt->rtNumRequestVar == 0)
+{
 sprintf(SendSize, "%d", rt->rtRequestSize);
+fprintf(file, "\t_Static_assert(sizeof(Request) == %s, \"Request 
expected to be %s bytes\");\n", SendSize, SendSize);
+}
 else
strcpy(SendSize, "msgh_size");
 
@@ -339,8 +342,10 @@ WriteMsgRPC(FILE *file, const routine_t *rt)
 char SendSize[24];
 
 if (rt->rtNumRequestVar == 0)
+{
 sprintf(SendSize, "%d", rt->rtRequestSize);
-else
+fprintf(file, "\t_Static_assert(sizeof(Request) == %s, \"Request 
expected to be %s bytes\");\n", SendSize, SendSize);
+} else
strcpy(SendSize, "msgh_size");
 
 if (IsKernelUser)
-- 
2.30.2




[PATCH 3/3] fill msg size in the header for user stubs

2022-06-28 Thread Luca Dariz
* user.c:
  - adjust comment in generated file
  - set msgh_size with the same value passed to mach_msg()

Signed-off-by: Luca Dariz 
---
 user.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/user.c b/user.c
index 9a84fe4..886198b 100644
--- a/user.c
+++ b/user.c
@@ -159,7 +159,7 @@ WriteRequestHead(FILE *file, const routine_t *rt)
WriteHeaderPortType(rt->rtUReplyPort));
 }
 
-fprintf(file, "\t/* msgh_size passed as argument */\n");
+fprintf(file, "\t/* msgh_size filled below */\n");
 
 /*
  * KernelUser stubs need to cast the request and reply ports
@@ -282,6 +282,8 @@ WriteMsgSend(FILE *file, const routine_t *rt)
 else
strcpy(SendSize, "msgh_size");
 
+fprintf(file, "\tInP->Head.msgh_size = %s;\n\n", SendSize);
+
 if (IsKernelUser)
 {
fprintf(file, "\t%s %smach_msg_send_from_kernel(",
@@ -348,6 +350,8 @@ WriteMsgRPC(FILE *file, const routine_t *rt)
 } else
strcpy(SendSize, "msgh_size");
 
+fprintf(file, "\tInP->Head.msgh_size = %s;\n\n", SendSize);
+
 if (IsKernelUser)
fprintf(file, "\tmsg_result = %smach_msg_rpc_from_kernel(&InP->Head, 
%s, sizeof(Reply));\n",
SubrPrefix,
-- 
2.30.2




[PATCH 0/3] enforce alignment of message body

2022-06-28 Thread Luca Dariz
The patches to MIG improve support for running on a 64-bit kernel, by
keeping the same alignment of 32-bit kernels.
They are an extension of the previous patch set:

  https://mail.gnu.org/archive/html/bug-hurd/2022-04/msg00010.html

Luca Dariz (3):
  fix message fields alignment for 64 bit
  add check for whole message size
  fill msg size in the header for user stubs

 routine.c |  3 +--
 user.c| 13 +++--
 utils.c   |  2 ++
 3 files changed, 14 insertions(+), 4 deletions(-)

-- 
2.30.2




[PATCH 1/3] fix message fields alignment for 64 bit

2022-06-28 Thread Luca Dariz
On x86_64 alignment of structures is different, as the pointer size is 
different.
For simplicity we keep the same 4-byte alignment as used on
32-bit. This simplifies the support for 32-bit rpc on 64-bit kernels,
and also it seems not worth as an optimization, as we would need to
add more code in the ipc_kmsg* routines.

* routine.c: align both short and long descriptors
* utils.c: use a fixed alignment for data fields in structures
  representing messages.

Signed-off-by: Luca Dariz 
---
 routine.c | 3 +--
 utils.c   | 2 ++
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/routine.c b/routine.c
index 0edc6b9..e6e18c3 100644
--- a/routine.c
+++ b/routine.c
@@ -321,9 +321,8 @@ rtFindSize(const argument_t *args, u_int mask)
{
ipc_type_t *it = arg->argType;
 
+   size = (size + word_size-1) & ~(word_size-1);
if (arg->argLongForm) {
-   /* might need proper alignment on 64bit archies */
-   size = (size + word_size-1) & ~(word_size-1);
size += sizeof_mach_msg_type_long_t;
} else {
size += sizeof_mach_msg_type_t;
diff --git a/utils.c b/utils.c
index bdc39b7..a8ebc6b 100644
--- a/utils.c
+++ b/utils.c
@@ -338,10 +338,12 @@ void
 WriteStructDecl(FILE *file, const argument_t *args, write_list_fn_t *func,
u_int mask, const char *name)
 {
+fprintf(file, "#pragma pack(push,%d)\n", word_size);
 fprintf(file, "\ttypedef struct {\n");
 fprintf(file, "\t\tmach_msg_header_t Head;\n");
 WriteList(file, args, func, mask, "\n", "\n");
 fprintf(file, "\t} %s;\n", name);
+fprintf(file, "#pragma pack(pop)\n");
 fprintf(file, "\n");
 }
 
-- 
2.30.2




[PATCH 01/15] fix rpc types for KERNEL_USER stubs

2022-06-28 Thread Luca Dariz
* include/mach/mach_types.defs: use rpc_ vm types for KERNEL_USER stubs.

This change fixes two use cases:
* internal rpc, e.g. when a memory object is initialized as a
  consequence of vm_map(); for example, the bootstrap modules use the
  "time" kernel device and memory-map it during startup. This
  triggers a kernel-side rpc to initialize the memory object and
  install the map, which is completely transparent form user-space.
* notifications from kernel to user-space

Signed-off-by: Luca Dariz 
---
 include/mach/mach_types.defs | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/mach/mach_types.defs b/include/mach/mach_types.defs
index a271d597..5c9fb2a8 100644
--- a/include/mach/mach_types.defs
+++ b/include/mach/mach_types.defs
@@ -110,9 +110,9 @@ type ipc_space_t = mach_port_t
 #endif /* KERNEL_SERVER */
;
 
-#if defined(KERNEL_SERVER) && defined(USER32)
+#if defined(KERNEL) && defined(USER32)
 type rpc_vm_size_t = uint32_t;
-#else /* KERNEL_SERVER and USER32 */
+#else /* KERNEL and USER32 */
 #if defined(__x86_64__)
 type rpc_vm_size_t = uint64_t;
 #else /* __x86_64__ */
@@ -121,21 +121,27 @@ type rpc_vm_size_t = uint32_t;
 #endif /* KERNEL_SERVER and USER32 */
 
 type vm_address_t = rpc_vm_size_t
-#if defined(KERNEL_SERVER) || defined(KERNEL_USER)
+#if defined(KERNEL_SERVER)
 intran: vm_address_t convert_vm_from_user(rpc_vm_address_t)
 outtran: rpc_vm_address_t convert_vm_to_user(vm_address_t)
+#elif defined(KERNEL_USER)
+ctype: rpc_vm_address_t
 #endif
 ;
 type vm_offset_t = rpc_vm_size_t
-#if defined(KERNEL_SERVER) || defined(KERNEL_USER)
+#if defined(KERNEL_SERVER)
 intran: vm_offset_t convert_vm_from_user(rpc_vm_offset_t)
 outtran: rpc_vm_offset_t convert_vm_to_user(vm_offset_t)
+#elif defined(KERNEL_USER)
+ctype: rpc_vm_offset_t
 #endif
 ;
 type vm_size_t = rpc_vm_size_t
-#if defined(KERNEL_SERVER) || defined(KERNEL_USER)
+#if defined(KERNEL_SERVER)
 intran: vm_size_t convert_vm_from_user(rpc_vm_size_t)
 outtran: rpc_vm_size_t convert_vm_to_user(vm_size_t)
+#elif defined(KERNEL_USER)
+ctype: rpc_vm_size_t
 #endif
 ;
 type vm_prot_t = int;
-- 
2.30.2




[PATCH 03/15] fix argument passing to bootstrap modules

2022-06-28 Thread Luca Dariz
* kern/bootstrap.c: use rpc_ vm types to put the bootstrap module
  arguments on the stack, make it consistent with user-space types.

Signed-off-by: Luca Dariz 
---
 kern/bootstrap.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/kern/bootstrap.c b/kern/bootstrap.c
index 60648c9d..259821ae 100644
--- a/kern/bootstrap.c
+++ b/kern/bootstrap.c
@@ -612,7 +612,7 @@ build_args_and_stack(struct exec_info *boot_exec_info,
 *  and align to integer boundary
 */
arg_len += (sizeof(integer_t)
-   + (arg_count + 1 + envc + 1) * sizeof(char *));
+   + (arg_count + 1 + envc + 1) * sizeof(rpc_vm_offset_t));
arg_len = (arg_len + sizeof(integer_t) - 1) & ~(sizeof(integer_t)-1);
 
/*
@@ -633,7 +633,7 @@ build_args_and_stack(struct exec_info *boot_exec_info,
 */
string_pos = (arg_pos
  + sizeof(integer_t)
- + (arg_count + 1 + envc + 1) * sizeof(char *));
+ + (arg_count + 1 + envc + 1) * sizeof(rpc_vm_offset_t));
 
/*
 * first the argument count
@@ -651,10 +651,8 @@ build_args_and_stack(struct exec_info *boot_exec_info,
arg_item_len = strlen(arg_ptr) + 1; /* include trailing 0 */
 
/* set string pointer */
-   (void) copyout(&string_pos,
-   arg_pos,
-   sizeof (char *));
-   arg_pos += sizeof(char *);
+(void) copyout(&string_pos, arg_pos, sizeof (rpc_vm_offset_t));
+   arg_pos += sizeof(rpc_vm_offset_t);
 
/* copy string */
(void) copyout(arg_ptr, string_pos, arg_item_len);
@@ -664,8 +662,8 @@ build_args_and_stack(struct exec_info *boot_exec_info,
/*
 * Null terminator for argv.
 */
-   (void) copyout(&zero, arg_pos, sizeof(char *));
-   arg_pos += sizeof(char *);
+   (void) copyout(&zero, arg_pos, sizeof(rpc_vm_offset_t));
+   arg_pos += sizeof(rpc_vm_offset_t);
 
/*
 * Then the strings and string pointers for each environment variable
@@ -675,10 +673,8 @@ build_args_and_stack(struct exec_info *boot_exec_info,
arg_item_len = strlen(arg_ptr) + 1; /* include trailing 0 */
 
/* set string pointer */
-   (void) copyout(&string_pos,
-   arg_pos,
-   sizeof (char *));
-   arg_pos += sizeof(char *);
+(void) copyout(&string_pos, arg_pos, sizeof (rpc_vm_offset_t));
+   arg_pos += sizeof(rpc_vm_offset_t);
 
/* copy string */
(void) copyout(arg_ptr, string_pos, arg_item_len);
@@ -688,7 +684,7 @@ build_args_and_stack(struct exec_info *boot_exec_info,
/*
 * Null terminator for envp.
 */
-   (void) copyout(&zero, arg_pos, sizeof(char *));
+   (void) copyout(&zero, arg_pos, sizeof(rpc_vm_offset_t));
 }
 
 
-- 
2.30.2




[PATCH 02/15] simplify ipc_kmsg_copyout_body() usage

2022-06-28 Thread Luca Dariz
* ipc/ipc_kmsg.h: change prototype of ipc_kmsg_copyout_body()
* ipc/ipc_kmsg.c: change prototype and usage of
  ipc_kmsg_copyout_body() by incorporating common code
* ipc/mach_msg.c: change usage of ipc_kmsg_copyout_body()

Signed-off-by: Luca Dariz 
---
 ipc/ipc_kmsg.c | 24 
 ipc/ipc_kmsg.h |  2 +-
 ipc/mach_msg.c |  4 +---
 3 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/ipc/ipc_kmsg.c b/ipc/ipc_kmsg.c
index 28ed23c6..b9d29853 100644
--- a/ipc/ipc_kmsg.c
+++ b/ipc/ipc_kmsg.c
@@ -2336,13 +2336,17 @@ ipc_kmsg_copyout_object(
 
 mach_msg_return_t
 ipc_kmsg_copyout_body(
-   vm_offset_t saddr, 
-   vm_offset_t eaddr,
+ipc_kmsg_t kmsg,
ipc_space_t space,
vm_map_tmap)
 {
mach_msg_return_t mr = MACH_MSG_SUCCESS;
kern_return_t kr;
+vm_offset_t saddr, eaddr;
+
+saddr = (vm_offset_t) (&kmsg->ikm_header + 1);
+eaddr = (vm_offset_t) &kmsg->ikm_header +
+kmsg->ikm_header.msgh_size;
 
while (saddr < eaddr) {
vm_offset_t taddr = saddr;
@@ -2502,13 +2506,7 @@ ipc_kmsg_copyout(
return mr;
 
if (mbits & MACH_MSGH_BITS_COMPLEX) {
-   vm_offset_t saddr, eaddr;
-
-   saddr = (vm_offset_t) (&kmsg->ikm_header + 1);
-   eaddr = (vm_offset_t) &kmsg->ikm_header +
-   kmsg->ikm_header.msgh_size;
-
-   mr = ipc_kmsg_copyout_body(saddr, eaddr, space, map);
+   mr = ipc_kmsg_copyout_body(kmsg, space, map);
if (mr != MACH_MSG_SUCCESS)
mr |= MACH_RCV_BODY_ERROR;
}
@@ -2560,13 +2558,7 @@ ipc_kmsg_copyout_pseudo(
kmsg->ikm_header.msgh_local_port = reply_name;
 
if (mbits & MACH_MSGH_BITS_COMPLEX) {
-   vm_offset_t saddr, eaddr;
-
-   saddr = (vm_offset_t) (&kmsg->ikm_header + 1);
-   eaddr = (vm_offset_t) &kmsg->ikm_header +
-   kmsg->ikm_header.msgh_size;
-
-   mr |= ipc_kmsg_copyout_body(saddr, eaddr, space, map);
+   mr |= ipc_kmsg_copyout_body(kmsg, space, map);
}
 
return mr;
diff --git a/ipc/ipc_kmsg.h b/ipc/ipc_kmsg.h
index c6cd77f0..2d75b173 100644
--- a/ipc/ipc_kmsg.h
+++ b/ipc/ipc_kmsg.h
@@ -270,7 +270,7 @@ ipc_kmsg_copyout_object(ipc_space_t, ipc_object_t,
mach_msg_type_name_t, mach_port_t *);
 
 extern mach_msg_return_t
-ipc_kmsg_copyout_body(vm_offset_t, vm_offset_t, ipc_space_t, vm_map_t);
+ipc_kmsg_copyout_body(ipc_kmsg_t, ipc_space_t, vm_map_t);
 
 extern mach_msg_return_t
 ipc_kmsg_copyout(ipc_kmsg_t, ipc_space_t, vm_map_t, mach_port_t);
diff --git a/ipc/mach_msg.c b/ipc/mach_msg.c
index fe0c43e3..0ae8fe0c 100644
--- a/ipc/mach_msg.c
+++ b/ipc/mach_msg.c
@@ -1148,9 +1148,7 @@ mach_msg_trap(
kmsg->ikm_header.msgh_remote_port = MACH_PORT_NULL;
 
mr = ipc_kmsg_copyout_body(
-   (vm_offset_t) (&kmsg->ikm_header + 1),
-   (vm_offset_t) &kmsg->ikm_header
-   + kmsg->ikm_header.msgh_size,
+kmsg,
space,
current_map());
 
-- 
2.30.2




[PATCH 07/15] fix host_info structure definition

2022-06-28 Thread Luca Dariz
* include/mach/host_info.h: replace vm_size_t with rpc_ version for 64
  bit compatibility. Ideally it should use phys_addr_t or another unit
  like KB or MB

Signed-off-by: Luca Dariz 
---
 include/mach/host_info.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/mach/host_info.h b/include/mach/host_info.h
index 60a6aefd..82f3faac 100644
--- a/include/mach/host_info.h
+++ b/include/mach/host_info.h
@@ -60,7 +60,7 @@ typedef char  kernel_boot_info_t[KERNEL_BOOT_INFO_MAX];
 struct host_basic_info {
integer_t   max_cpus;   /* max number of cpus possible */
integer_t   avail_cpus; /* number of cpus now available */
-   vm_size_t   memory_size;/* size of memory in bytes */
+   rpc_vm_size_t   memory_size;/* size of memory in bytes */
cpu_type_t  cpu_type;   /* cpu type */
cpu_subtype_t   cpu_subtype;/* cpu subtype */
 };
-- 
2.30.2




[PATCH 06/15] kmsg: fix msg body alignment

2022-06-28 Thread Luca Dariz
* ipc/ipc_kmsg.c: align msg body to 4 bytes as done in mig

Signed-off-by: Luca Dariz 
---
 ipc/ipc_kmsg.c | 49 ++---
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/ipc/ipc_kmsg.c b/ipc/ipc_kmsg.c
index b9d29853..09801924 100644
--- a/ipc/ipc_kmsg.c
+++ b/ipc/ipc_kmsg.c
@@ -68,9 +68,10 @@
 #include 
 #endif
 
-#define is_misaligned(x)   ( ((vm_offset_t)(x)) & (sizeof(vm_offset_t)-1) )
-#define ptr_align(x)   \
-   ( ( ((vm_offset_t)(x)) + (sizeof(vm_offset_t)-1) ) & 
~(sizeof(vm_offset_t)-1) )
+/* msg body is always aligned to 4 bytes */
+#define msg_is_misaligned(x)   ( ((vm_offset_t)(x)) & (sizeof(uint32_t)-1) )
+#define msg_align(x)   \
+   ( ( ((vm_offset_t)(x)) + (sizeof(uint32_t)-1) ) & ~(sizeof(uint32_t)-1) 
)
 
 ipc_kmsg_t ipc_kmsg_cache[NCPUS];
 
@@ -232,8 +233,8 @@ ipc_kmsg_clean_body(
if (((mach_msg_type_t*)type)->msgt_longform) {
/* This must be aligned */
if ((sizeof(natural_t) > sizeof(mach_msg_type_t)) &&
-   (is_misaligned(type))) {
-   saddr = ptr_align(saddr);
+   (msg_is_misaligned(type))) {
+   saddr = msg_align(saddr);
continue;
}
name = type->msgtl_name;
@@ -250,7 +251,7 @@ ipc_kmsg_clean_body(
/* padding (ptrs and ports) ? */
if ((sizeof(natural_t) > sizeof(mach_msg_type_t)) &&
((size >> 3) == sizeof(natural_t)))
-   saddr = ptr_align(saddr);
+   saddr = msg_align(saddr);
 
/* calculate length of data in bytes, rounding up */
 
@@ -393,8 +394,8 @@ xxx:type = (mach_msg_type_long_t *) eaddr;
if (((mach_msg_type_t*)type)->msgt_longform) {
/* This must be aligned */
if ((sizeof(natural_t) > sizeof(mach_msg_type_t)) &&
-   (is_misaligned(type))) {
-   eaddr = ptr_align(eaddr);
+   (msg_is_misaligned(type))) {
+   eaddr = msg_align(eaddr);
goto xxx;
}
name = type->msgtl_name;
@@ -411,7 +412,7 @@ xxx:type = (mach_msg_type_long_t *) eaddr;
/* padding (ptrs and ports) ? */
if ((sizeof(natural_t) > sizeof(mach_msg_type_t)) &&
((size >> 3) == sizeof(natural_t)))
-   eaddr = ptr_align(eaddr);
+   eaddr = msg_align(eaddr);
 
/* calculate length of data in bytes, rounding up */
 
@@ -1324,8 +1325,8 @@ ipc_kmsg_copyin_body(
if (longform) {
/* This must be aligned */
if ((sizeof(natural_t) > sizeof(mach_msg_type_t)) &&
-   (is_misaligned(type))) {
-   saddr = ptr_align(saddr);
+   (msg_is_misaligned(type))) {
+   saddr = msg_align(saddr);
continue;
}
name = type->msgtl_name;
@@ -1354,7 +1355,7 @@ ipc_kmsg_copyin_body(
/* padding (ptrs and ports) ? */
if ((sizeof(natural_t) > sizeof(mach_msg_type_t)) &&
((size >> 3) == sizeof(natural_t)))
-   saddr = ptr_align(saddr);
+   saddr = msg_align(saddr);
 
/* calculate length of data in bytes, rounding up */
 
@@ -1376,9 +1377,6 @@ ipc_kmsg_copyin_body(
} else {
vm_offset_t addr;
 
-   if (sizeof(vm_offset_t) > sizeof(mach_msg_type_t))
-   saddr = ptr_align(saddr);
-
if ((eaddr - saddr) < sizeof(vm_offset_t)) {
ipc_kmsg_clean_partial(kmsg, taddr, FALSE, 0);
return MACH_SEND_MSG_TOO_SMALL;
@@ -1591,8 +1589,8 @@ ipc_kmsg_copyin_from_kernel(ipc_kmsg_t kmsg)
if (longform) {
/* This must be aligned */
if ((sizeof(natural_t) > sizeof(mach_msg_type_t)) &&
-   (is_misaligned(type))) {
-   saddr = ptr_align(saddr);
+   (msg_is_misaligned(type))) {
+   saddr = msg_align(saddr);
continue;
}
name = type->msgtl_name;
@@ -1609,7 +1607,7 @@ ipc_kmsg_copyin_from_k

[PATCH 05/15] sign-extend mask in vm_map() with 32-bit userspace

2022-06-28 Thread Luca Dariz
* vm/vm_user.c: sign-extend mask with USER32

Signed-off-by: Luca Dariz 
---
 vm/vm_user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/vm/vm_user.c b/vm/vm_user.c
index ad1fa75d..81c87d78 100644
--- a/vm/vm_user.c
+++ b/vm/vm_user.c
@@ -338,6 +338,11 @@ kern_return_t vm_map(
if (size == 0)
return KERN_INVALID_ARGUMENT;
 
+#ifdef USER32
+if (mask & 0x8000)
+mask |= 0x;
+#endif
+
*address = trunc_page(*address);
size = round_page(size);
 
-- 
2.30.2




[PATCH 09/15] x86_64: fix exception stack alignment

2022-06-28 Thread Luca Dariz
* i386/i386/pcb.c:
  - increase alignment of pcb cache to 16
  - ensure the stack is properly aligned when switching ktss
* i386/i386/thread.h:
  - add padding tomake iss field end aligned to 16 bytes
* i386/i386/trap.c:
  - ensure the state we get after the trap points to the correct place
in the pcb structure

When handling exceptions from IA-32e compatibility mode in user space,
on a 64-bit kernel, the exception stack where error info is pushed
needs to be aligned to 16 bytes (see Intel System Programming guide,
$6.14.2)

The exception stack frame is set in the middle of pcb->iss, but it's not always
16-byte aligned; to make sure it is, we increase the alignment of the
pcb cache and add a padding field in the pcb structure.

This issue resulted in a general protection failure due to CS being
corrupted after a page fault.  The corruption was happening when the
exception stack frame was not properly aligned and a page fault
happened; the error info was then pushed after re-aligning the stack,
so the value of eflags was actually written in CS place and other
fields were shifted too.

It also makes sense to ensure this by adding two assertions, although
these were primarly useful during debug.

Signed-off-by: Luca Dariz 
---
 i386/i386/pcb.c| 10 +-
 i386/i386/thread.h |  3 +++
 i386/i386/trap.c   |  4 
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c
index 6e215b3e..4a6cbdb0 100644
--- a/i386/i386/pcb.c
+++ b/i386/i386/pcb.c
@@ -147,6 +147,9 @@ void switch_ktss(pcb_t pcb)
pcb_stack_top = (pcb->iss.efl & EFL_VM)
? (long) (&pcb->iss + 1)
: (long) (&pcb->iss.v86_segs);
+#ifdef __x86_64__
+   assert((pcb_stack_top & 0xF) == 0);
+#endif
 
 #ifdef MACH_RING1
/* No IO mask here */
@@ -375,7 +378,12 @@ thread_t switch_context(
 
 void pcb_module_init(void)
 {
-   kmem_cache_init(&pcb_cache, "pcb", sizeof(struct pcb), 0,
+   kmem_cache_init(&pcb_cache, "pcb", sizeof(struct pcb),
+#ifdef __x86_64__
+   16,
+#else
+   0,
+#endif
NULL, 0);
 
fpu_module_init();
diff --git a/i386/i386/thread.h b/i386/i386/thread.h
index 4a9c1987..cb317bee 100644
--- a/i386/i386/thread.h
+++ b/i386/i386/thread.h
@@ -200,6 +200,9 @@ struct i386_machine_state {
 
 typedef struct pcb {
struct i386_interrupt_state iis[2]; /* interrupt and NMI */
+#ifdef __x86_64__
+   unsigned long pad; /* ensure exception stack is aligned to 16 */
+#endif
struct i386_saved_state iss;
struct i386_machine_state ims;
decl_simple_lock_data(, lock)
diff --git a/i386/i386/trap.c b/i386/i386/trap.c
index 4f8612bc..23cb9f17 100644
--- a/i386/i386/trap.c
+++ b/i386/i386/trap.c
@@ -361,6 +361,10 @@ int user_trap(struct i386_saved_state *regs)
int type;
thread_t thread = current_thread();
 
+#ifdef __x86_64__
+   assert(regs == &thread->pcb->iss);
+#endif
+
type = regs->trapno;
code = 0;
subcode = 0;
-- 
2.30.2




[PATCH 11/15] update syscall signature with rpc_vm_* and mach_port_name_t

2022-06-28 Thread Luca Dariz
* i386/i386/copy_user.h: add copyin/copyout helpers for ports and vm
  addresses.
* include/mach/mach_types.h: replace mach_port_t with mach_port_name_t
* kern/ipc_host.c: Likewise
* kern/ipc_tt.c: Likewise
* kern/ipc_tt.h: Likewise
* kern/syscall_sw.c Likewise
* kern/syscall_subr.c Likewise
* kern/syscall_subr.h: Likewise
* kern/ipc_mig.h: replace mach_port_t with mach_port_name_t and vm_*
  types with rpc_vm_* counterpart.
* kern/ipc_mig.c: update syscall prototypes and adapt to kernel types
  - for vm types use copyin_address() and copyout_address() helpers
  - for mach_port_name_t use copyin_port() and copyout_port() helpers

Signed-off-by: Luca Dariz 
---
 i386/i386/copy_user.h | 64 
 include/mach/mach_traps.h | 18 ++---
 kern/ipc_host.c   |  2 +-
 kern/ipc_mig.c| 78 ---
 kern/ipc_mig.h| 58 ++---
 kern/ipc_tt.c |  6 +--
 kern/ipc_tt.h |  2 +-
 kern/syscall_subr.c   |  2 +-
 kern/syscall_subr.h   |  2 +-
 kern/syscall_sw.c |  2 +-
 10 files changed, 145 insertions(+), 89 deletions(-)

diff --git a/i386/i386/copy_user.h b/i386/i386/copy_user.h
index ab932401..20487529 100644
--- a/i386/i386/copy_user.h
+++ b/i386/i386/copy_user.h
@@ -2,11 +2,75 @@
 #ifndef COPY_USER_H
 #define COPY_USER_H
 
+#include 
+
 #include 
 
 #include 
 #include 
 
+/*
+ * The copyin_32to64() and copyout_64to32() routines are meant for data types
+ * that have different size in kernel and user space. They should be 
independent
+ * of endianness and hopefully can be reused in the future on other archs.
+ * These types are e.g.:
+ * - port names vs port pointers, on a 64-bit kernel
+ * - memory addresses, on a 64-bit kernel and 32-bit user
+ */
+
+static inline int copyin_32to64(uint32_t *uaddr, uint64_t *kaddr)
+{
+  uint32_t rkaddr;
+  int ret;
+  ret = copyin(uaddr, &rkaddr, sizeof(uint32_t));
+  if (ret)
+return ret;
+  *kaddr = rkaddr;
+  return 0;
+}
+
+static inline int copyout_64to32(uint64_t *kaddr, uint32_t *uaddr)
+{
+  uint32_t rkaddr=*kaddr;
+  return copyout(&rkaddr, uaddr, sizeof(uint32_t));
+}
+
+static inline int copyin_address(rpc_vm_offset_t *uaddr, vm_offset_t *kaddr)
+{
+#ifdef __x86_64
+  return copyin_32to64(uaddr, kaddr);
+#else
+  return copyin(uaddr, kaddr, sizeof(*uaddr));
+#endif
+}
+
+static inline int copyout_address(vm_offset_t *kaddr, rpc_vm_offset_t *uaddr)
+{
+#ifdef __x86_64
+  return copyout_64to32(kaddr, uaddr);
+#else
+  return copyout(kaddr, uaddr, sizeof(*kaddr));
+#endif
+}
+
+static inline int copyin_port(mach_port_name_t *uaddr, mach_port_t *kaddr)
+{
+#ifdef __x86_64
+  return copyin_32to64(uaddr, kaddr);
+#else
+  return copyin(uaddr, kaddr, sizeof(*uaddr));
+#endif
+}
+
+static inline int copyout_port(mach_port_t *kaddr, mach_port_name_t *uaddr)
+{
+#ifdef __x86_64
+  return copyout_64to32(kaddr, uaddr);
+#else
+  return copyout(kaddr, uaddr, sizeof(*kaddr));
+#endif
+}
+
 // XXX we could add another field to kmsg to store the user-side size, but 
then we
 // should check if we can  obtain it for rpc and notifications originating from
 // the kernel
diff --git a/include/mach/mach_traps.h b/include/mach/mach_traps.h
index 0433707a..2a87f62a 100644
--- a/include/mach/mach_traps.h
+++ b/include/mach/mach_traps.h
@@ -35,19 +35,9 @@
 
 #include 
 
-mach_port_tmach_reply_port
-   (void);
-
-mach_port_tmach_thread_self
-   (void);
-
-#ifdef __386BSD__
-#undef mach_task_self
-#endif
-mach_port_tmach_task_self
-   (void);
-
-mach_port_tmach_host_self
-   (void);
+mach_port_name_t mach_reply_port (void);
+mach_port_name_t mach_thread_self (void);
+mach_port_name_t mach_task_self (void);
+mach_port_name_t mach_host_self (void);
 
 #endif /* _MACH_MACH_TRAPS_H_ */
diff --git a/kern/ipc_host.c b/kern/ipc_host.c
index a02eb6f6..6163beff 100644
--- a/kern/ipc_host.c
+++ b/kern/ipc_host.c
@@ -94,7 +94,7 @@ void ipc_host_init(void)
  * or other errors.
  */
 
-mach_port_t
+mach_port_name_t
 mach_host_self(void)
 {
ipc_port_t sright;
diff --git a/kern/ipc_mig.c b/kern/ipc_mig.c
index 22dac420..f77c189b 100644
--- a/kern/ipc_mig.c
+++ b/kern/ipc_mig.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -619,13 +620,13 @@ kern_return_t thread_set_state_KERNEL(
 
 kern_return_t
 syscall_vm_map(
-   mach_port_t target_map,
-   vm_offset_t *address,
-   vm_size_t   size,
-   vm_offset_t mask,
+   mach_port_name_ttarget_map,
+   rpc_vm_offset_t *address,
+   rpc_vm_size_t   size,
+   rpc_vm_offset_t mask,
boolean_t   anywhere,
-   mach_port_t memory_object,
-   vm_offset_t offset,
+   mach_port_name_tmemory_object,
+   rpc_vm_offset_t offset,
boolean_t   

[PATCH 10/15] x86_64: expand and shrink messages in copy{in, out}msg routines

2022-06-28 Thread Luca Dariz
* i386/i386/copy_user.h: new file to handle 32/64 bit differences
  (currently only msg_usize())
* include/mach/message.h: add mach_msg_user_header_t using port names
  instead of ports
* ipc/ipc_kmsg.c:
  - use mach_msg_user_header_t
* ipc/ipc_mqueue.c: use msg_usize() to check if we can actually
  receive the message
* ipc/mach_msg.c: Likewise for continuations in receive path
* x86_64/Makefrag.am: add x86_64/copy_user.c
* x86_64/copy_user.c: new file to handle message expansion and
  shrinking during copyinmsg/copyoutmsg for 64 bit kernels.
  - port names -> port pointers on all 64-bit builds
  - 32-bit pointer -> 64 bit pointer when using 32-bit userspace
* x86_64/locore.S: remove copyinmsg() and copyoutmsg()

Note that this depends on this change in mig for the correct size in msgh:

* fill msg size in the header for user stubs
---
 i386/i386/copy_user.h  |  22 
 include/mach/message.h |  11 ++
 ipc/ipc_kmsg.c |  30 -
 ipc/ipc_mqueue.c   |   5 +-
 ipc/mach_msg.c |  19 ++-
 x86_64/Makefrag.am |   1 +
 x86_64/copy_user.c | 280 +
 x86_64/locore.S|  79 
 8 files changed, 351 insertions(+), 96 deletions(-)
 create mode 100644 i386/i386/copy_user.h
 create mode 100644 x86_64/copy_user.c

diff --git a/i386/i386/copy_user.h b/i386/i386/copy_user.h
new file mode 100644
index ..ab932401
--- /dev/null
+++ b/i386/i386/copy_user.h
@@ -0,0 +1,22 @@
+
+#ifndef COPY_USER_H
+#define COPY_USER_H
+
+#include 
+
+#include 
+#include 
+
+// XXX we could add another field to kmsg to store the user-side size, but 
then we
+// should check if we can  obtain it for rpc and notifications originating from
+// the kernel
+#ifndef __x86_64__
+static inline size_t msg_usize(const mach_msg_header_t *kmsg)
+{
+  return kmsg->msgh_size;
+}
+#else /* __x86_64__ */
+size_t msg_usize(const mach_msg_header_t *kmsg);
+#endif /* __x86_64__ */
+
+#endif /* COPY_USER_H */
diff --git a/include/mach/message.h b/include/mach/message.h
index 0a7297e1..e1a8d663 100644
--- a/include/mach/message.h
+++ b/include/mach/message.h
@@ -132,6 +132,7 @@ typedef unsigned int mach_msg_size_t;
 typedef natural_t mach_msg_seqno_t;
 typedef integer_t mach_msg_id_t;
 
+/* full header structure, may have different size in user/kernel spaces*/
 typedefstruct {
 mach_msg_bits_tmsgh_bits;
 mach_msg_size_tmsgh_size;
@@ -144,6 +145,16 @@ typedefstruct {
 mach_msg_id_t  msgh_id;
 } mach_msg_header_t;
 
+/* user-side header format, needed in the kernel */
+typedefstruct {
+mach_msg_bits_tmsgh_bits;
+mach_msg_size_tmsgh_size;
+mach_port_name_t   msgh_remote_port;
+mach_port_name_t   msgh_local_port;
+mach_port_seqno_t  msgh_seqno;
+mach_msg_id_t  msgh_id;
+} mach_msg_user_header_t;
+
 /*
  *  There is no fixed upper bound to the size of Mach messages.
  */
diff --git a/ipc/ipc_kmsg.c b/ipc/ipc_kmsg.c
index 09801924..8f7045ee 100644
--- a/ipc/ipc_kmsg.c
+++ b/ipc/ipc_kmsg.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -503,7 +504,7 @@ ipc_kmsg_get(
 {
ipc_kmsg_t kmsg;
 
-   if ((size < sizeof(mach_msg_header_t)) || (size & 3))
+   if ((size < sizeof(mach_msg_user_header_t)) || (size & 3))
return MACH_SEND_MSG_TOO_SMALL;
 
if (size <= IKM_SAVED_MSG_SIZE) {
@@ -529,7 +530,6 @@ ipc_kmsg_get(
return MACH_SEND_INVALID_DATA;
}
 
-   kmsg->ikm_header.msgh_size = size;
*kmsgp = kmsg;
return MACH_MSG_SUCCESS;
 }
@@ -1393,7 +1393,19 @@ ipc_kmsg_copyin_body(
if (data == 0)
goto invalid_memory;
 
-   if (copyinmap(map, (char *) addr,
+   if (sizeof(mach_port_name_t) != 
sizeof(mach_port_t))
+   {
+   mach_port_name_t *src = 
(mach_port_name_t*)addr;
+   mach_port_t *dst = (mach_port_t*)data;
+   for (int i=0; i
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -540,7 +541,7 @@ ipc_mqueue_receive(
if (kmsg != IKM_NULL) {
/* check space requirements */
 
-   if (kmsg->ikm_header.msgh_size > max_size) {
+   if (msg_usize(&kmsg->ikm_header) > max_size) {
* (mach_msg_size_t *) kmsgp =
kmsg->ikm_header.msgh_size;
imq_unlock(mqueue);
@@ -649,7 +650,7 @@ ipc_mqueue_receive(
/* we have a kmsg; unlock the msg queue */
 
imq_unlock(mqueue);
-   assert(kmsg->ikm_header.msgh_size <= max_size);
+   assert(msg_usize(&kmsg->ikm_header) <= max_size);
 }
 
 {
diff --git a/ipc/mach_msg.c b/ipc/mach_msg.c

[PATCH 13/15] cleanup headers in printf.c

2022-06-28 Thread Luca Dariz
* kern/printf.c: remove unnecessary #include and reorder

This allows the file to be reused for minimal user-space tests.

Signed-off-by: Luca Dariz 
---
 kern/printf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kern/printf.c b/kern/printf.c
index 50f23623..cbc27ae6 100644
--- a/kern/printf.c
+++ b/kern/printf.c
@@ -116,12 +116,12 @@
  * (compatibility)
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
-#include 
-#include 
+
 
 #define isdigit(d) ((d) >= '0' && (d) <= '9')
 #define Ctod(c) ((c) - '0')
-- 
2.30.2




[PATCH 14/15] hack vm memory object proxy creation for vm arrays

2022-06-28 Thread Luca Dariz
* vm/memory_object_proxy.c: truncate vm array types as if they were
  the rpc_ version because MIG can't handle that. This rpc can't
  handle more than one element anyway.

Note that the same issue with vm arrays is present at least with
syscall emulation, but that functionality seems unused for now.

A better fix could be to add a vm descriptor type in include/mach/message.h,
but then probably we don't need to use the rpc_ types in MIG anymore,
they would be needed only for the syscall definitions.

Signed-off-by: Luca Dariz 
---
 vm/memory_object_proxy.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/vm/memory_object_proxy.c b/vm/memory_object_proxy.c
index 4d50bab4..f397351e 100644
--- a/vm/memory_object_proxy.c
+++ b/vm/memory_object_proxy.c
@@ -155,6 +155,11 @@ memory_object_create_proxy (ipc_space_t space, vm_prot_t 
max_protection,
   if (!IP_VALID(object[0]))
 return KERN_INVALID_NAME;
 
+  /* FIXME: fix mig or add a new VM data type in message.h */
+  *offset &= 0x;
+  *start &= 0x;
+  *len &= 0x;
+
   /* FIXME: Support a different offset from 0.  */
   if (offset[0] != 0)
 return KERN_INVALID_ARGUMENT;
-- 
2.30.2




[PATCH 12/15] fix warnings for 32 bit builds

2022-06-28 Thread Luca Dariz
Signed-off-by: Luca Dariz 
---
 device/cirbuf.c| 4 ++--
 i386/i386/debug_i386.c | 1 +
 i386/i386at/biosmem.c  | 2 --
 i386/i386at/com.c  | 2 +-
 i386/i386at/mem.c  | 1 +
 kern/boot_script.c | 1 +
 kern/bootstrap.h   | 5 -
 kern/exception.c   | 1 +
 vm/vm_debug.c  | 1 +
 vm/vm_map.c| 2 +-
 vm/vm_page.c   | 2 --
 vm/vm_pageout.c| 2 --
 12 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/device/cirbuf.c b/device/cirbuf.c
index 391297ce..a3c9407a 100644
--- a/device/cirbuf.c
+++ b/device/cirbuf.c
@@ -51,10 +51,10 @@ void
 cb_check(struct cirbuf *cb)
 {
if (!(cb->c_cf >= cb->c_start && cb->c_cf < cb->c_end))
-   panic("cf %x out of range [%x..%x)",
+   panic("cf %p out of range [%p..%p)",
cb->c_cf, cb->c_start, cb->c_end);
if (!(cb->c_cl >= cb->c_start && cb->c_cl < cb->c_end))
-   panic("cl %x out of range [%x..%x)",
+   panic("cl %p out of range [%p..%p)",
cb->c_cl, cb->c_start, cb->c_end);
if (cb->c_cf <= cb->c_cl) {
if (!(cb->c_cc == cb->c_cl - cb->c_cf))
diff --git a/i386/i386/debug_i386.c b/i386/i386/debug_i386.c
index 233caa72..4b8804cd 100644
--- a/i386/i386/debug_i386.c
+++ b/i386/i386/debug_i386.c
@@ -26,6 +26,7 @@
 #include "thread.h"
 #include "trap.h"
 #include "debug.h"
+#include "spl.h"
 
 void dump_ss(const struct i386_saved_state *st)
 {
diff --git a/i386/i386at/biosmem.c b/i386/i386at/biosmem.c
index 78e7bb21..fafdc048 100644
--- a/i386/i386at/biosmem.c
+++ b/i386/i386at/biosmem.c
@@ -29,8 +29,6 @@
 #include 
 #include 
 
-#define DEBUG 0
-
 #define __boot
 #define __bootdata
 #define __init
diff --git a/i386/i386at/com.c b/i386/i386at/com.c
index 1f305b23..d5842d8f 100644
--- a/i386/i386at/com.c
+++ b/i386/i386at/com.c
@@ -191,7 +191,7 @@ comcnprobe(struct consdev *cp)
 
if (strncmp(kernel_cmdline, CONSOLE_PARAMETER + 1,
strlen(CONSOLE_PARAMETER) - 1) == 0)
-   mach_atoi(kernel_cmdline + strlen(CONSOLE_PARAMETER) - 1,
+mach_atoi((u_char*)kernel_cmdline + strlen(CONSOLE_PARAMETER) - 1,
  &rcline);
 
maj = 0;
diff --git a/i386/i386at/mem.c b/i386/i386at/mem.c
index 07acc169..ac0fd301 100644
--- a/i386/i386at/mem.c
+++ b/i386/i386at/mem.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 
 /* This provides access to any memory that is not main RAM */
 
diff --git a/kern/boot_script.c b/kern/boot_script.c
index 9e8f60a7..7e31075f 100644
--- a/kern/boot_script.c
+++ b/kern/boot_script.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include "boot_script.h"
+#include "bootstrap.h"
 
 
 /* This structure describes a symbol.  */
diff --git a/kern/bootstrap.h b/kern/bootstrap.h
index b8ed8d9f..edf1f7f4 100644
--- a/kern/bootstrap.h
+++ b/kern/bootstrap.h
@@ -19,6 +19,9 @@
 #ifndef _KERN_BOOTSTRAP_H_
 #define _KERN_BOOTSTRAP_H_
 
-extern void bootstrap_create(void);
+#include 
+
+void bootstrap_create(void);
+int boot_script_ramdisk_create(struct cmd *cmd, char **name);
 
 #endif /* _KERN_BOOTSTRAP_H_ */
diff --git a/kern/exception.c b/kern/exception.c
index 246c1419..6a812490 100644
--- a/kern/exception.c
+++ b/kern/exception.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/vm/vm_debug.c b/vm/vm_debug.c
index 2dff2296..4b5c1521 100644
--- a/vm/vm_debug.c
+++ b/vm/vm_debug.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 
diff --git a/vm/vm_map.c b/vm/vm_map.c
index 7fe3e141..9f5eb13d 100644
--- a/vm/vm_map.c
+++ b/vm/vm_map.c
@@ -737,7 +737,7 @@ restart:
max_size = size + mask;
 
if (max_size < size) {
-   printf("max_size %x got smaller than size %x with mask %lx\n",
+   printf("max_size %lx got smaller than size %lx with mask %lx\n",
   max_size, size, mask);
goto error;
}
diff --git a/vm/vm_page.c b/vm/vm_page.c
index 06d62c97..f8b9afb1 100644
--- a/vm/vm_page.c
+++ b/vm/vm_page.c
@@ -48,8 +48,6 @@
 #include 
 #include 
 
-#define DEBUG 0
-
 #define __init
 #define __initdata
 #define __read_mostly
diff --git a/vm/vm_pageout.c b/vm/vm_pageout.c
index 575a9f5d..5e7bcff7 100644
--- a/vm/vm_pageout.c
+++ b/vm/vm_pageout.c
@@ -55,8 +55,6 @@
 #include 
 #include 
 
-#define DEBUG 0
-
 /*
  * Maximum delay, in milliseconds, between two pageout scans.
  */
-- 
2.30.2




[PATCH 00/15] Add preliminary support for 32-bit userspace on a x86_64 kernel

2022-06-28 Thread Luca Dariz
This patch set contains three kind of changes:
* changes for IPC on x86_64 (e.g. msg alignment, copyin/copyout)
* 32-bit userland support on 64-bit kernel (e.g. exception stack
  alignment)
* minor fixes and cleanup

This is just a preliminary version, not everything is working yet.

I've tested this with a very minimal ramdisk I created, and I can see
that /hurd/startup is launched, together with the auth and proc
servers.  Rumpdisk is not yet working, as the
ds_device_intr_register() implementation is still missing for x86_64.
I also tried using the ramdisk of a netinstall image, and the init
scripts seem to hang at some point with messages like:

task /bin/sh(1) deallocating a bogus port 4294967295, most probably a bug.
task mkdir(10) deallocating a bogus port 4294967295, most probably a bug.

I think most IPC issues should be addressed, although maybe it's not
very optimized yet (especially the new copyinmsg/copyoutmsg).

These patches are based on the previous work:

https://mail.gnu.org/archive/html/bug-hurd/2022-04/msg7.html
https://mail.gnu.org/archive/html/bug-hurd/2022-02/msg00006.html

Luca Dariz (15):
  fix rpc types for KERNEL_USER stubs
  simplify ipc_kmsg_copyout_body() usage
  fix argument passing to bootstrap modules
  compute mach port size from the corresponding type
  sign-extend mask in vm_map() with 32-bit userspace
  kmsg: fix msg body alignment
  fix host_info structure definition
  use port name type in  mach_port_names()
  x86_64: fix exception stack alignment
  x86_64: expand and shrink messages in copy{in,out}msg routines
  update syscall signature with rpc_vm_* and mach_port_name_t
  fix warnings for 32 bit builds
  cleanup headers in printf.c
  hack vm memory object proxy creation for vm arrays
  enable syscalls on x86_64

 device/cirbuf.c  |   4 +-
 i386/i386/copy_user.h|  86 +++
 i386/i386/debug_i386.c   |   1 +
 i386/i386/pcb.c  |  10 +-
 i386/i386/thread.h   |   3 +
 i386/i386/trap.c |   4 +
 i386/i386at/biosmem.c|   2 -
 i386/i386at/com.c|   2 +-
 i386/i386at/mem.c|   1 +
 include/mach/host_info.h |   2 +-
 include/mach/mach_traps.h|  18 +--
 include/mach/mach_types.defs |  16 +-
 include/mach/message.h   |  11 ++
 ipc/ipc_kmsg.c   | 103 +++--
 ipc/ipc_kmsg.h   |   2 +-
 ipc/ipc_machdep.h|  12 +-
 ipc/ipc_mqueue.c |   5 +-
 ipc/mach_msg.c   |  23 ++-
 ipc/mach_port.c  |  12 +-
 kern/boot_script.c   |   1 +
 kern/bootstrap.c |  22 ++-
 kern/bootstrap.h |   5 +-
 kern/exception.c |   1 +
 kern/ipc_host.c  |   2 +-
 kern/ipc_mig.c   |  78 +-
 kern/ipc_mig.h   |  58 
 kern/ipc_tt.c|   6 +-
 kern/ipc_tt.h|   2 +-
 kern/printf.c|   4 +-
 kern/syscall_subr.c  |   2 +-
 kern/syscall_subr.h  |   2 +-
 kern/syscall_sw.c|   2 +-
 vm/memory_object_proxy.c |   5 +
 vm/vm_debug.c|   1 +
 vm/vm_map.c  |   2 +-
 vm/vm_page.c |   2 -
 vm/vm_pageout.c  |   2 -
 vm/vm_user.c |   5 +
 x86_64/Makefrag.am   |   1 +
 x86_64/copy_user.c   | 280 +++
 x86_64/locore.S  |  82 --
 41 files changed, 599 insertions(+), 283 deletions(-)
 create mode 100644 i386/i386/copy_user.h
 create mode 100644 x86_64/copy_user.c

-- 
2.30.2




[PATCH 04/15] compute mach port size from the corresponding type

2022-06-28 Thread Luca Dariz
* ipc/ipc_machdep.h: re-define PORT_T_SIZE_IN_BITS to be computed from
  mach_port_t instead of being hardcoded.

Signed-off-by: Luca Dariz 
---
 ipc/ipc_machdep.h | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/ipc/ipc_machdep.h b/ipc/ipc_machdep.h
index c205ba45..29878dc9 100755
--- a/ipc/ipc_machdep.h
+++ b/ipc/ipc_machdep.h
@@ -27,18 +27,12 @@
 #ifndef _IPC_IPC_MACHDEP_H_
 #define _IPC_IPC_MACHDEP_H_
 
+#include 
+
 /*
  * At times, we need to know the size of a port in bits
  */
 
-/* 64 bit machines */
-#ifdefined(__alpha)
-#definePORT_T_SIZE_IN_BITS 64
-#endif
-
-/* default, 32 bit machines */
-#if!defined(PORT_T_SIZE_IN_BITS)
-#definePORT_T_SIZE_IN_BITS 32
-#endif
+#define PORT_T_SIZE_IN_BITS (sizeof(mach_port_t)*8)
 
 #endif /* _IPC_IPC_MACHDEP_H_ */
-- 
2.30.2




[PATCH 08/15] use port name type in mach_port_names()

2022-06-28 Thread Luca Dariz
* ipc/mach_port.c: use mach_port_name_t instead of mach_port_t, since
  they could have different size. Fortunately we can keep the same
  optimization about allocationg memory, since mach_port_type_t has
  the same size as a name.

Signed-off-by: Luca Dariz 
---
 ipc/mach_port.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/ipc/mach_port.c b/ipc/mach_port.c
index 0757bb84..54e2e09f 100644
--- a/ipc/mach_port.c
+++ b/ipc/mach_port.c
@@ -74,7 +74,7 @@ mach_port_names_helper(
ipc_port_timestamp_ttimestamp,
ipc_entry_t entry,
mach_port_t name,
-   mach_port_t *names,
+   mach_port_name_t*names,
mach_port_type_t*types,
ipc_entry_num_t *actualp)
 {
@@ -145,14 +145,14 @@ mach_port_names_helper(
 kern_return_t
 mach_port_names(
ipc_space_t space,
-   mach_port_t **namesp,
+   mach_port_name_t**namesp,
mach_msg_type_number_t  *namesCnt,
mach_port_type_t**typesp,
mach_msg_type_number_t  *typesCnt)
 {
ipc_entry_num_t actual; /* this many names */
ipc_port_timestamp_t timestamp; /* logical time of this operation */
-   mach_port_t *names;
+   mach_port_name_t *names;
mach_port_type_t *types;
kern_return_t kr;
 
@@ -163,7 +163,7 @@ mach_port_names(
vm_map_copy_t memory2;  /* copied-in memory, for types */
 
/* safe simplifying assumption */
-   assert_static(sizeof(mach_port_t) == sizeof(mach_port_type_t));
+   assert_static(sizeof(mach_port_name_t) == sizeof(mach_port_type_t));
 
if (space == IS_NULL)
return KERN_INVALID_TASK;
@@ -225,7 +225,7 @@ mach_port_names(
}
/* space is read-locked and active */
 
-   names = (mach_port_t *) addr1;
+   names = (mach_port_name_t *) addr1;
types = (mach_port_type_t *) addr2;
actual = 0;
 
@@ -287,7 +287,7 @@ mach_port_names(
}
}
 
-   *namesp = (mach_port_t *) memory1;
+   *namesp = (mach_port_name_t *) memory1;
*namesCnt = actual;
*typesp = (mach_port_type_t *) memory2;
*typesCnt = actual;
-- 
2.30.2




[PATCH 15/15] enable syscalls on x86_64

2022-06-28 Thread Luca Dariz
Signed-off-by: Luca Dariz 
---
 x86_64/locore.S | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/x86_64/locore.S b/x86_64/locore.S
index 198ac40a..615a2105 100644
--- a/x86_64/locore.S
+++ b/x86_64/locore.S
@@ -1056,9 +1056,6 @@ syscall_entry_2:
pushq   %rax/* save system call number */
pushq   $0  /* clear trap number slot */
 
-// TODO: test it before dropping ud2
-   ud2
-
pusha   /* save the general registers */
movq%ds,%rdx/* and the segment registers */
pushq   %rdx
-- 
2.30.2




Re: [PATCH 10/15] x86_64: expand and shrink messages in copy{in, out}msg routines

2022-08-30 Thread Luca Dariz
> Il 30/08/2022 08:17 CEST Samuel Thibault  ha scritto:
> Luca, le mar. 30 août 2022 07:57:23 +0200, a ecrit:
> > Il 28/08/22 15:13, Samuel Thibault ha scritto:
> > > This was breaking the 32bit kernel case. I have pushed a fix for that,
> > > that does this move of setting msgh_size to copyinmsg itself.
> > 
> > The 32-bit case was breaking because it needed an updated MIG,
> 
> ? You mean that the kernel would have to trust userland to set msgh_size
> properly? We cannot do that :)

The kernel is already taking the send size as a syscall parameter, what I mean 
is that the same value could be taken from msgh_size, but MIG only uses the 
syscall parameter.

Also the other option, i.e. deprecating msgh_size, would be ok, I was just 
thinking about a more uniform interface, now that messages can have a different 
size in kernel and user space.

About trusting this value, maybe the kernel should check whether the whole 
incoming message is in a valid range for the task (the same validation would be 
useful to all syscall and ipc). I didn't see any upper bound on the message 
size, maybe there could be one for inline data (4K?).

> 
> > As far as I understand, these routines should use stac/clac if the SMAP cpu
> > feature is supported on x86 as the Linux counterparts, so we would catch
> > these cases earlier.
> 
> Yes.
> 
> > I didn't find anything related to cpu features yet,
> 
> git grep -i feature i386/

silly me, I did see CPU_HAS_FEATURE in pmap, but then I forgot...

> > Is there a  minimum that we can assume to have?
> 
> I'd rather not. And particularly not SMAP which is very recent :)

Ok. So a good way to test the worst case could be using qemu with -cpu base.


Luca



[PATCH 2/7] x86_64: expand and shrink messages in copy{in, out}msg routines

2023-01-16 Thread Luca Dariz
* i386/i386/copy_user.h: new file to handle 32/64 bit differences
  - add msg_usize() to recontruct the user-space message size
  - add copyin/copyout helpers for addresses and ports
* include/mach/message.h: add msg alignment macros
* ipc/ipc_kmsg.c:
  - copyin/out ports names instead of using pointer magic
* ipc/ipc_mqueue.c: use msg_usize() to check if we can actually
  receive the message
* ipc/mach_msg.c: Likewise for continuations in receive path
* x86_64/Makefrag.am: add x86_64/copy_user.c
* x86_64/copy_user.c: new file to handle message expansion and
  shrinking during copyinmsg/copyoutmsg for 64 bit kernels.
  - port names -> port pointers on all 64-bit builds
  - 32-bit pointer -> 64 bit pointer when using 32-bit userspace
* x86_64/locore.S: remove copyinmsg() and copyoutmsg()
---
 i386/i386/copy_user.h  | 102 
 include/mach/message.h |  14 +-
 ipc/ipc_kmsg.c |  47 --
 ipc/ipc_mqueue.c   |   5 +-
 ipc/mach_msg.c |  17 +-
 x86_64/Makefrag.am |   1 +
 x86_64/copy_user.c | 362 +
 x86_64/locore.S|  81 -
 8 files changed, 522 insertions(+), 107 deletions(-)
 create mode 100644 i386/i386/copy_user.h
 create mode 100644 x86_64/copy_user.c

diff --git a/i386/i386/copy_user.h b/i386/i386/copy_user.h
new file mode 100644
index ..82b3a56e
--- /dev/null
+++ b/i386/i386/copy_user.h
@@ -0,0 +1,102 @@
+/*
+ *  Copyright (C) 2023 Free Software Foundation
+ *
+ * This program is free software ; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with the program ; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef COPY_USER_H
+#define COPY_USER_H
+
+#include 
+#include 
+
+#include 
+#include 
+
+/*
+ * The copyin_32to64() and copyout_64to32() routines are meant for data types
+ * that have different size in kernel and user space. They should be 
independent
+ * of endianness and hopefully can be reused in the future on other archs.
+ * These types are e.g.:
+ * - port names vs port pointers, on a 64-bit kernel
+ * - memory addresses, on a 64-bit kernel and 32-bit user
+ */
+
+static inline int copyin_32to64(const uint32_t *uaddr, uint64_t *kaddr)
+{
+  uint32_t rkaddr;
+  int ret;
+  ret = copyin(uaddr, &rkaddr, sizeof(uint32_t));
+  if (ret)
+return ret;
+  *kaddr = rkaddr;
+  return 0;
+}
+
+static inline int copyout_64to32(const uint64_t *kaddr, uint32_t *uaddr)
+{
+  uint32_t rkaddr=*kaddr;
+  return copyout(&rkaddr, uaddr, sizeof(uint32_t));
+}
+
+static inline int copyin_address(const rpc_vm_offset_t *uaddr, vm_offset_t 
*kaddr)
+{
+#ifdef __x86_64
+  return copyin_32to64(uaddr, kaddr);
+#else /* __x86_64__ */
+  return copyin(uaddr, kaddr, sizeof(*uaddr));
+#endif /* __x86_64__ */
+}
+
+static inline int copyout_address(const vm_offset_t *kaddr, rpc_vm_offset_t 
*uaddr)
+{
+#ifdef __x86_64
+  return copyout_64to32(kaddr, uaddr);
+#else /* __x86_64__ */
+  return copyout(kaddr, uaddr, sizeof(*kaddr));
+#endif /* __x86_64__ */
+}
+
+static inline int copyin_port(const mach_port_name_t *uaddr, mach_port_t 
*kaddr)
+{
+#ifdef __x86_64
+  return copyin_32to64(uaddr, kaddr);
+#else /* __x86_64__ */
+  return copyin(uaddr, kaddr, sizeof(*uaddr));
+#endif /* __x86_64__ */
+}
+
+static inline int copyout_port(const mach_port_t *kaddr, mach_port_name_t 
*uaddr)
+{
+#ifdef __x86_64
+  return copyout_64to32(kaddr, uaddr);
+#else /* __x86_64__ */
+  return copyout(kaddr, uaddr, sizeof(*kaddr));
+#endif /* __x86_64__ */
+}
+
+// XXX we could add another field to kmsg to store the user-side size, but 
then we
+// should check if we can  obtain it for rpc and notifications originating from
+// the kernel
+#ifndef __x86_64__
+static inline size_t msg_usize(const mach_msg_header_t *kmsg)
+{
+  return kmsg->msgh_size;
+}
+#else /* __x86_64__ */
+size_t msg_usize(const mach_msg_header_t *kmsg);
+#endif /* __x86_64__ */
+
+#endif /* COPY_USER_H */
diff --git a/include/mach/message.h b/include/mach/message.h
index c3081e66..16788fef 100644
--- a/include/mach/message.h
+++ b/include/mach/message.h
@@ -316,6 +316,19 @@ typedef integer_t mach_msg_option_t;
 
 #define MACH_SEND_ALWAYS   0x0001  /* internal use only */
 
+/* This is the alignment of msg descriptors and the actual data.
+ *
+ * On x86 it is made equal to the default structure alignment on
+ * 32-bit, so we can easily maintain compatibility with 32-bit user
+ * space on a 64-bit kernel.

[PATCH 3/7] update syscall signature with rpc_vm_* and mach_port_name_t

2023-01-16 Thread Luca Dariz
* include/mach/mach_types.h: use mach port names
* kern/ipc_mig.c:  update vm types and use copyin/copyout helpers
* kern/ipc_mig.h: Likewise

Signed-off-by: Luca Dariz 
---
 include/mach/mach_traps.h | 18 -
 kern/ipc_mig.c| 41 +--
 kern/ipc_mig.h| 30 ++--
 3 files changed, 41 insertions(+), 48 deletions(-)

diff --git a/include/mach/mach_traps.h b/include/mach/mach_traps.h
index 0433707a..2a87f62a 100644
--- a/include/mach/mach_traps.h
+++ b/include/mach/mach_traps.h
@@ -35,19 +35,9 @@
 
 #include 
 
-mach_port_tmach_reply_port
-   (void);
-
-mach_port_tmach_thread_self
-   (void);
-
-#ifdef __386BSD__
-#undef mach_task_self
-#endif
-mach_port_tmach_task_self
-   (void);
-
-mach_port_tmach_host_self
-   (void);
+mach_port_name_t mach_reply_port (void);
+mach_port_name_t mach_thread_self (void);
+mach_port_name_t mach_task_self (void);
+mach_port_name_t mach_host_self (void);
 
 #endif /* _MACH_MACH_TRAPS_H_ */
diff --git a/kern/ipc_mig.c b/kern/ipc_mig.c
index a9e3f53b..afda1016 100644
--- a/kern/ipc_mig.c
+++ b/kern/ipc_mig.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -576,12 +577,12 @@ port_name_to_space(mach_port_name_t name)
 kern_return_t
 syscall_vm_map(
mach_port_name_ttarget_map,
-   vm_offset_t *address,
-   vm_size_t   size,
-   vm_offset_t mask,
+   rpc_vm_offset_t *address,
+   rpc_vm_size_t   size,
+   rpc_vm_offset_t mask,
boolean_t   anywhere,
mach_port_name_tmemory_object,
-   vm_offset_t offset,
+   rpc_vm_offset_t offset,
boolean_t   copy,
vm_prot_t   cur_protection,
vm_prot_t   max_protection,
@@ -607,12 +608,12 @@ syscall_vm_map(
} else
port = (ipc_port_t) memory_object;
 
-   copyin(address, &addr, sizeof(vm_offset_t));
+   copyin_address(address, &addr);
result = vm_map(map, &addr, size, mask, anywhere,
port, offset, copy,
cur_protection, max_protection, inheritance);
if (result == KERN_SUCCESS)
-   copyout(&addr, address, sizeof(vm_offset_t));
+   copyout_address(&addr, address);
if (IP_VALID(port))
ipc_port_release_send(port);
vm_map_deallocate(map);
@@ -621,9 +622,9 @@ syscall_vm_map(
 }
 
 kern_return_t syscall_vm_allocate(
-   mach_port_name_ttarget_map,
-   vm_offset_t *address,
-   vm_size_t   size,
+   mach_port_name_ttarget_map,
+   rpc_vm_offset_t *address,
+   rpc_vm_size_t   size,
boolean_t   anywhere)
 {
vm_map_tmap;
@@ -634,19 +635,19 @@ kern_return_t syscall_vm_allocate(
if (map == VM_MAP_NULL)
return MACH_SEND_INTERRUPTED;
 
-   copyin(address, &addr, sizeof(vm_offset_t));
+   copyin_address(address, &addr);
result = vm_allocate(map, &addr, size, anywhere);
if (result == KERN_SUCCESS)
-   copyout(&addr, address, sizeof(vm_offset_t));
+   copyout_address(&addr, address);
vm_map_deallocate(map);
 
return result;
 }
 
 kern_return_t syscall_vm_deallocate(
-   mach_port_name_ttarget_map,
-   vm_offset_t start,
-   vm_size_t   size)
+   mach_port_name_ttarget_map,
+   rpc_vm_offset_t start,
+   rpc_vm_size_t   size)
 {
vm_map_tmap;
kern_return_t   result;
@@ -682,7 +683,7 @@ kern_return_t syscall_task_create(
(void) ipc_kmsg_copyout_object(current_space(),
   (ipc_object_t) port,
   MACH_MSG_TYPE_PORT_SEND, &name);
-   copyout(&name, child_task, sizeof(mach_port_name_t));
+   copyout_port(&name, child_task);
}
task_deallocate(t);
 
@@ -767,7 +768,9 @@ syscall_mach_port_allocate(
 
kr = mach_port_allocate(space, right, &name);
if (kr == KERN_SUCCESS)
-   copyout(&name, namep, sizeof(mach_port_name_t));
+   {
+   copyout_port(&name, namep);
+   }
is_release(space);
 
return kr;
@@ -873,8 +876,8 @@ syscall_device_write_request(mach_port_name_t   
device_name,
 mach_port_name_t   reply_name,
 dev_mode_t mode,
 recnum_t   recnum,
-vm_offset_tdata,
-vm_size_t  

[PATCH 7/7] replace mach_port_t with mach_port_name_t

2023-01-16 Thread Luca Dariz
This is a cleanup following the introduction of mach_port_name_t.
The same set of changes is applied to all files:
- rename mach_port_t to mach_port_name_t where a port name is used,
- use MACH_PORT_NAME_NULL and MACH_PORT_NAME_DEAD where appropriate,
- use invalid_port_to_name() and invalid_name_to_port() for conversion
  where appropriate,
- use regular copyout() insted of copyout_port() when we deal with
  mach_port_name_t already before copyout,
- use the new helper ipc_kmsg_copyout_object_to_port() when we really
  want to place a port name in the space of a mach_port_t.

* include/mach/notify.h: Likewise
* ipc/ipc_entry.c: Likewise
* ipc/ipc_kmsg.c: Likewise
* ipc/ipc_kmsg.h: Likewise, and add ipc_kmsg_copyout_object_to_port()
* ipc/ipc_marequest.c: Likewise
* ipc/ipc_object.c: Likewise
* ipc/ipc_port.c: Likewise
* ipc/ipc_space.h: Likewise
* ipc/mach_msg.c: Likewise
* ipc/mach_port.c: Likewise
* kern/exception.c: Likewise
* kern/ipc_mig.c: Likewise
---
 include/mach/notify.h |  6 +++---
 ipc/ipc_entry.c   |  2 +-
 ipc/ipc_kmsg.c| 40 +++-
 ipc/ipc_kmsg.h| 11 +++
 ipc/ipc_marequest.c   |  4 ++--
 ipc/ipc_object.c  |  4 ++--
 ipc/ipc_port.c|  6 +++---
 ipc/ipc_space.h   |  2 +-
 ipc/mach_msg.c|  2 +-
 ipc/mach_port.c   | 14 +++---
 kern/exception.c  | 12 ++--
 kern/ipc_mig.c| 16 
 12 files changed, 64 insertions(+), 55 deletions(-)

diff --git a/include/mach/notify.h b/include/mach/notify.h
index 6d783dde..14bcd6f6 100644
--- a/include/mach/notify.h
+++ b/include/mach/notify.h
@@ -58,13 +58,13 @@
 typedef struct {
 mach_msg_header_t  not_header;
 mach_msg_type_tnot_type;   /* MACH_MSG_TYPE_PORT_NAME */
-mach_port_tnot_port;
+mach_port_name_t   not_port;
 } mach_port_deleted_notification_t;
 
 typedef struct {
 mach_msg_header_t  not_header;
 mach_msg_type_tnot_type;   /* MACH_MSG_TYPE_PORT_NAME */
-mach_port_tnot_port;
+mach_port_name_t   not_port;
 } mach_msg_accepted_notification_t;
 
 typedef struct {
@@ -86,7 +86,7 @@ typedef struct {
 typedef struct {
 mach_msg_header_t  not_header;
 mach_msg_type_tnot_type;   /* MACH_MSG_TYPE_PORT_NAME */
-mach_port_tnot_port;
+mach_port_name_t   not_port;
 } mach_dead_name_notification_t;
 
 #endif /* _MACH_NOTIFY_H_ */
diff --git a/ipc/ipc_entry.c b/ipc/ipc_entry.c
index c24ea46c..f13c442f 100644
--- a/ipc/ipc_entry.c
+++ b/ipc/ipc_entry.c
@@ -127,7 +127,7 @@ ipc_entry_alloc_name(
kern_return_t kr;
ipc_entry_t entry, e, *prevp;
void **slot;
-   assert(MACH_PORT_VALID(name));
+   assert(MACH_PORT_NAME_VALID(name));
 
if (!space->is_active) {
return KERN_INVALID_TASK;
diff --git a/ipc/ipc_kmsg.c b/ipc/ipc_kmsg.c
index 495c4672..2477c576 100644
--- a/ipc/ipc_kmsg.c
+++ b/ipc/ipc_kmsg.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1078,7 +1079,7 @@ ipc_kmsg_copyin_header(
reply_soright = soright;
}
}
-   } else if (!MACH_PORT_VALID(reply_name)) {
+   } else if (!MACH_PORT_NAME_VALID(reply_name)) {
ipc_entry_t entry;
 
/*
@@ -1101,7 +1102,7 @@ ipc_kmsg_copyin_header(
if (IE_BITS_TYPE(entry->ie_bits) == MACH_PORT_TYPE_NONE)
ipc_entry_dealloc(space, dest_name, entry);
 
-   reply_port = (ipc_object_t) reply_name;
+   reply_port = (ipc_object_t) invalid_name_to_port(reply_name);
reply_soright = IP_NULL;
} else {
ipc_entry_t dest_entry, reply_entry;
@@ -1461,10 +1462,10 @@ ipc_kmsg_copyin_body(
((mach_msg_type_t*)type)->msgt_name = newname;
 
for (i = 0; i < number; i++) {
-   mach_port_name_t port = (mach_port_name_t) 
objects[i];
+   mach_port_name_t port = ((mach_port_t*)data)[i];
ipc_object_t object;
 
-   if (!MACH_PORT_VALID(port))
+   if (!MACH_PORT_NAME_VALID(port))
continue;
 
kr = ipc_object_copyin(space, port,
@@ -1846,7 +1847,7 @@ ipc_kmsg_copyout_header(
entry->ie_bits = gen | (MACH_PORT_TYPE_SEND_ONCE | 1);
}
 
-   assert(MACH_PORT_VALID(reply_name));
+   assert(MACH_PORT_NAME_VALID(reply_name));
entry->ie_object = (ipc_object_t) reply;
is_write_unlock(space);
 
@@ -2021,7 +2022,7 @@ ipc_kmsg_copyout_header(
is_write_unlock(space);
 
reply = IP_DEAD;
-  

[PATCH 5/7] adjust rdxtree key to the correct size

2023-01-16 Thread Luca Dariz
* Makefile.am: define RDXTREE_KEY_32
---
 Makefile.am | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Makefile.am b/Makefile.am
index fb557ba6..54fcf685 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -82,6 +82,9 @@ endif
 # We do not support or need position-independent
 AM_CFLAGS += \
-no-pie -fno-PIE -fno-pie -fno-pic
+
+# This must be the same size as port names, see e.g. ipc/ipc_entry.c
+AM_CFLAGS += -DRDXTREE_KEY_32
 
 #
 # Silent build support.
-- 
2.30.2




[PATCH 6/7] add conversion helpers for invalid mach port names

2023-01-16 Thread Luca Dariz
* include/mach/port.h: add _NAME_ variants for port NULL and DEAD and
  add helpers to check for invalid  port names
* ipc/port.h: add helpers to properly convert to/from invalid mach
  port names.
---
 include/mach/port.h |  8 ++--
 ipc/port.h  | 20 
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/mach/port.h b/include/mach/port.h
index e38be614..c9bbcf17 100644
--- a/include/mach/port.h
+++ b/include/mach/port.h
@@ -72,9 +72,13 @@ typedef int *rpc_signature_info_t;
 
 #define MACH_PORT_NULL 0 /* works with both user and kernel ports */
 #define MACH_PORT_DEAD ((mach_port_t) ~0)
+#define MACH_PORT_NAME_NULL((mach_port_name_t) 0)
+#define MACH_PORT_NAME_DEAD((mach_port_name_t) ~0)
 
-#defineMACH_PORT_VALID(name)   \
-   (((name) != MACH_PORT_NULL) && ((name) != MACH_PORT_DEAD))
+#defineMACH_PORT_VALID(port)   \
+   (((port) != MACH_PORT_NULL) && ((port) != MACH_PORT_DEAD))
+#defineMACH_PORT_NAME_VALID(name)  \
+   (((name) != MACH_PORT_NAME_NULL) && ((name) != 
MACH_PORT_NAME_DEAD))
 
 /*
  *  These are the different rights a task may have.
diff --git a/ipc/port.h b/ipc/port.h
index 9ef586c1..c85685d7 100644
--- a/ipc/port.h
+++ b/ipc/port.h
@@ -39,6 +39,7 @@
 #ifndef_IPC_PORT_H_
 #define _IPC_PORT_H_
 
+#include 
 #include 
 
 /*
@@ -83,4 +84,23 @@ typedef mach_port_name_t mach_port_gen_t;/* generation 
numbers */
 #defineMACH_PORT_UREFS_UNDERFLOW(urefs, delta) 
\
(((delta) < 0) && (-(delta) > (urefs)))
 
+
+static inline mach_port_t invalid_name_to_port(mach_port_name_t name)
+{
+  if (name == MACH_PORT_NAME_NULL)
+return MACH_PORT_NULL;
+  if (name == MACH_PORT_NAME_DEAD)
+return MACH_PORT_DEAD;
+  panic("invalid_name_to_port() called with a valid port");
+}
+
+static inline mach_port_name_t invalid_port_to_name(mach_port_t port)
+{
+  if (port == MACH_PORT_NULL)
+return MACH_PORT_NAME_NULL;
+  if (port == MACH_PORT_DEAD)
+return MACH_PORT_NAME_DEAD;
+  panic("invalid_port_to_name() called with a valid name");
+}
+
 #endif /* _IPC_PORT_H_ */
-- 
2.30.2




[PATCH 1/7] add msg_user_header_t for user-side msg structure

2023-01-16 Thread Luca Dariz
* include/mach/message.h: use mach_msg_user_header_t only in KERNEL,
  and define it as mach_msh_header_t for user space
* ipc/ipc_kmsg.c: use mach_msg_user_header_t where appropriate
* ipc/ipc_kmsg.h: Likewise
* ipc/mach_msg.c: Likewise
* ipc/mach_msg.h: Likewise
* kern/thread.h: Likewise
---
 include/mach/message.h | 17 -
 ipc/ipc_kmsg.c |  6 +++---
 ipc/ipc_kmsg.h |  4 ++--
 ipc/mach_msg.c | 10 +-
 ipc/mach_msg.h |  4 ++--
 kern/thread.h  |  2 +-
 6 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/mach/message.h b/include/mach/message.h
index 798b47b4..c3081e66 100644
--- a/include/mach/message.h
+++ b/include/mach/message.h
@@ -132,6 +132,7 @@ typedef unsigned int mach_msg_size_t;
 typedef natural_t mach_msg_seqno_t;
 typedef integer_t mach_msg_id_t;
 
+/* full header structure, may have different size in user/kernel spaces */
 typedefstruct mach_msg_header {
 mach_msg_bits_tmsgh_bits;
 mach_msg_size_tmsgh_size;
@@ -144,6 +145,20 @@ typedefstruct mach_msg_header {
 mach_msg_id_t  msgh_id;
 } mach_msg_header_t;
 
+#ifdef KERNEL
+/* user-side header format, needed in the kernel */
+typedefstruct {
+mach_msg_bits_tmsgh_bits;
+mach_msg_size_tmsgh_size;
+mach_port_name_t   msgh_remote_port;
+mach_port_name_t   msgh_local_port;
+mach_port_seqno_t  msgh_seqno;
+mach_msg_id_t  msgh_id;
+} mach_msg_user_header_t;
+#else
+typedef mach_msg_header_t mach_msg_user_header_t;
+#endif
+
 /*
  *  There is no fixed upper bound to the size of Mach messages.
  */
@@ -389,7 +404,7 @@ typedef kern_return_t mach_msg_return_t;
 
 extern mach_msg_return_t
 mach_msg_trap
-   (mach_msg_header_t *msg,
+   (mach_msg_user_header_t *msg,
 mach_msg_option_t option,
 mach_msg_size_t send_size,
 mach_msg_size_t rcv_size,
diff --git a/ipc/ipc_kmsg.c b/ipc/ipc_kmsg.c
index 62e138c7..d00c67d4 100644
--- a/ipc/ipc_kmsg.c
+++ b/ipc/ipc_kmsg.c
@@ -499,13 +499,13 @@ ipc_kmsg_free(ipc_kmsg_t kmsg)
 
 mach_msg_return_t
 ipc_kmsg_get(
-   mach_msg_header_t   *msg,
+   mach_msg_user_header_t  *msg,
mach_msg_size_t size,
ipc_kmsg_t  *kmsgp)
 {
ipc_kmsg_t kmsg;
 
-   if ((size < sizeof(mach_msg_header_t)) || (size & 3))
+   if ((size < sizeof(mach_msg_user_header_t)) || (size & 3))
return MACH_SEND_MSG_TOO_SMALL;
 
if (size <= IKM_SAVED_MSG_SIZE) {
@@ -587,7 +587,7 @@ ipc_kmsg_get_from_kernel(
 
 mach_msg_return_t
 ipc_kmsg_put(
-   mach_msg_header_t   *msg,
+   mach_msg_user_header_t  *msg,
ipc_kmsg_t  kmsg,
mach_msg_size_t size)
 {
diff --git a/ipc/ipc_kmsg.h b/ipc/ipc_kmsg.h
index ffda9b5e..16df31f5 100644
--- a/ipc/ipc_kmsg.h
+++ b/ipc/ipc_kmsg.h
@@ -242,13 +242,13 @@ extern void
 ipc_kmsg_free(ipc_kmsg_t);
 
 extern mach_msg_return_t
-ipc_kmsg_get(mach_msg_header_t *, mach_msg_size_t, ipc_kmsg_t *);
+ipc_kmsg_get(mach_msg_user_header_t *, mach_msg_size_t, ipc_kmsg_t *);
 
 extern mach_msg_return_t
 ipc_kmsg_get_from_kernel(mach_msg_header_t *, mach_msg_size_t, ipc_kmsg_t *);
 
 extern mach_msg_return_t
-ipc_kmsg_put(mach_msg_header_t *, ipc_kmsg_t, mach_msg_size_t);
+ipc_kmsg_put(mach_msg_user_header_t *, ipc_kmsg_t, mach_msg_size_t);
 
 extern void
 ipc_kmsg_put_to_kernel(mach_msg_header_t *, ipc_kmsg_t, mach_msg_size_t);
diff --git a/ipc/mach_msg.c b/ipc/mach_msg.c
index f15164a3..221ea975 100644
--- a/ipc/mach_msg.c
+++ b/ipc/mach_msg.c
@@ -89,7 +89,7 @@
 
 mach_msg_return_t
 mach_msg_send(
-   mach_msg_header_t   *msg,
+   mach_msg_user_header_t  *msg,
mach_msg_option_t   option,
mach_msg_size_t send_size,
mach_msg_timeout_t  time_out,
@@ -171,7 +171,7 @@ mach_msg_send(
 
 mach_msg_return_t
 mach_msg_receive(
-   mach_msg_header_t   *msg,
+   mach_msg_user_header_t  *msg,
mach_msg_option_t   option,
mach_msg_size_t rcv_size,
mach_port_name_trcv_name,
@@ -286,7 +286,7 @@ mach_msg_receive_continue(void)
ipc_thread_t self = current_thread();
ipc_space_t space = current_space();
vm_map_t map = current_map();
-   mach_msg_header_t *msg = self->ith_msg;
+   mach_msg_user_header_t *msg = self->ith_msg;
mach_msg_option_t option = self->ith_option;
mach_msg_size_t rcv_size = self->ith_rcv_size;
mach_msg_timeout_t time_out = self->ith_timeout;
@@ -380,7 +380,7 @@ mach_msg_receive_continue(void)
 
 mach_msg_return_t
 mach_msg_trap(
-   mach_msg_header_t   *msg,
+   mach_msg_user_header_t  *msg,
mach_msg_option_t   option,
mach_msg_size_t send_size,
mach_msg_size_t rcv_size,
@@ -1609,7 +1609,7 @@ mach_msg_continue(void)
task_t task = thread->task;
ipc_space_t space = task->itk_space;
vm_map_t map = t

[PATCH 4/7] update writev syscall signature with rpc types

2023-01-16 Thread Luca Dariz
* device/device_emul.h: write/writev: update trap argument types
* device/ds_routines.c: update argument types and adjust copyin
* device/ds_routines.h: write/writev: update trap argument type
* include/device/device_types.h: add rpc_io_buf_vec_t type
* kern/ipc_mig.c: write/writev: update trap argument type
* kern/ipc_mig.h: Likewise
---
 device/device_emul.h  |  4 ++--
 device/ds_routines.c  | 22 +-
 device/ds_routines.h  | 12 ++--
 include/device/device_types.h |  4 
 kern/ipc_mig.c|  6 +++---
 kern/ipc_mig.h|  6 +++---
 6 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/device/device_emul.h b/device/device_emul.h
index 957bd505..683fc802 100644
--- a/device/device_emul.h
+++ b/device/device_emul.h
@@ -56,9 +56,9 @@ struct device_emulation_ops
  vm_size_t, ipc_port_t *, boolean_t);
   void (*no_senders) (mach_no_senders_notification_t *);
   io_return_t (*write_trap) (void *, dev_mode_t,
-recnum_t, vm_offset_t, vm_size_t);
+rpc_recnum_t, rpc_vm_offset_t, rpc_vm_size_t);
   io_return_t (*writev_trap) (void *, dev_mode_t,
- recnum_t, io_buf_vec_t *, vm_size_t);
+ rpc_recnum_t, rpc_io_buf_vec_t *, rpc_vm_size_t);
 };
 
 #endif /* _I386AT_DEVICE_EMUL_H_ */
diff --git a/device/ds_routines.c b/device/ds_routines.c
index 11589d63..07cfd85b 100644
--- a/device/ds_routines.c
+++ b/device/ds_routines.c
@@ -412,7 +412,7 @@ ds_notify (mach_msg_header_t *msg)
 
 io_return_t
 ds_device_write_trap (device_t dev, dev_mode_t mode,
- recnum_t recnum, vm_offset_t data, vm_size_t count)
+ rpc_recnum_t recnum, rpc_vm_offset_t data, rpc_vm_size_t 
count)
 {
   /* Refuse if device is dead or not completely open.  */
   if (dev == DEVICE_NULL)
@@ -427,7 +427,7 @@ ds_device_write_trap (device_t dev, dev_mode_t mode,
 
 io_return_t
 ds_device_writev_trap (device_t dev, dev_mode_t mode,
-  recnum_t recnum, io_buf_vec_t *iovec, vm_size_t count)
+  rpc_recnum_t recnum, rpc_io_buf_vec_t *iovec, 
rpc_vm_size_t count)
 {
   /* Refuse if device is dead or not completely open.  */
   if (dev == DEVICE_NULL)
@@ -1713,7 +1713,7 @@ ds_trap_write_done(const io_req_t ior)
  */
 static io_return_t
 device_write_trap (mach_device_t device, dev_mode_t mode,
-  recnum_t recnum, vm_offset_t data, vm_size_t data_count)
+  rpc_recnum_t recnum, rpc_vm_offset_t data, rpc_vm_size_t 
data_count)
 {
io_req_t ior;
io_return_t result;
@@ -1752,7 +1752,7 @@ device_write_trap (mach_device_t device, dev_mode_t mode,
 * Copy the data from user space.
 */
if (data_count > 0)
-   copyin((void *)data, ior->io_data, data_count);
+   copyin((void*)(vm_offset_t)data, ior->io_data, data_count);
 
/*
 * The ior keeps an extra reference for the device.
@@ -1781,7 +1781,7 @@ device_write_trap (mach_device_t device, dev_mode_t mode,
 
 static io_return_t
 device_writev_trap (mach_device_t device, dev_mode_t mode,
-   recnum_t recnum, io_buf_vec_t *iovec, vm_size_t iocount)
+   rpc_recnum_t recnum, rpc_io_buf_vec_t *iovec, rpc_vm_size_t 
iocount)
 {
io_req_t ior;
io_return_t result;
@@ -1799,11 +1799,15 @@ device_writev_trap (mach_device_t device, dev_mode_t 
mode,
 */
if (iocount > 16)
return KERN_INVALID_VALUE; /* lame */
-   copyin(iovec,
-  stack_iovec,
-  iocount * sizeof(io_buf_vec_t));
-   for (data_count = 0, i = 0; i < iocount; i++)
+
+   for (data_count = 0, i=0; i

[PATCH 0/7] update rpc for x86_64

2023-01-16 Thread Luca Dariz
These patches address the comments raised in the previous submission
and add support for 32-bit rpc and syscalls on a 64-bit kernel.

Luca Dariz (7):
  add msg_user_header_t for user-side msg structure
  x86_64: expand and shrink messages in copy{in,out}msg routines
  update syscall signature with rpc_vm_* and mach_port_name_t
  update writev syscall signature with rpc types
  adjust rdxtree key to the correct size
  add conversion helpers for invalid mach port names
  replace mach_port_t with mach_port_name_t

 Makefile.am   |   3 +
 device/device_emul.h  |   4 +-
 device/ds_routines.c  |  22 ++-
 device/ds_routines.h  |  12 +-
 i386/i386/copy_user.h | 102 ++
 include/device/device_types.h |   4 +
 include/mach/mach_traps.h |  18 +-
 include/mach/message.h|  31 ++-
 include/mach/notify.h |   6 +-
 include/mach/port.h   |   8 +-
 ipc/ipc_entry.c   |   2 +-
 ipc/ipc_kmsg.c|  93 +
 ipc/ipc_kmsg.h|  15 +-
 ipc/ipc_marequest.c   |   4 +-
 ipc/ipc_mqueue.c  |   5 +-
 ipc/ipc_object.c  |   4 +-
 ipc/ipc_port.c|   6 +-
 ipc/ipc_space.h   |   2 +-
 ipc/mach_msg.c|  29 +--
 ipc/mach_msg.h|   4 +-
 ipc/mach_port.c   |  14 +-
 ipc/port.h|  20 ++
 kern/exception.c  |  12 +-
 kern/ipc_mig.c|  55 +++---
 kern/ipc_mig.h|  34 ++--
 kern/thread.h |   2 +-
 x86_64/Makefrag.am|   1 +
 x86_64/copy_user.c| 362 ++
 x86_64/locore.S   |  81 
 29 files changed, 711 insertions(+), 244 deletions(-)
 create mode 100644 i386/i386/copy_user.h
 create mode 100644 x86_64/copy_user.c

-- 
2.30.2




[PATCH 3/4] remove unused file ipc/mach_rpc.c

2023-01-16 Thread Luca Dariz
* Makefrag.am: remove ipc/mach_rpc.c
* ipc/mach_rpc.c: remove file, all functions here seem unused.
---
 Makefrag.am|   1 -
 ipc/mach_rpc.c | 150 -
 2 files changed, 151 deletions(-)
 delete mode 100644 ipc/mach_rpc.c

diff --git a/Makefrag.am b/Makefrag.am
index cb5651a2..9da44d55 100644
--- a/Makefrag.am
+++ b/Makefrag.am
@@ -114,7 +114,6 @@ libkernel_a_SOURCES += \
ipc/mach_msg.h \
ipc/mach_port.c \
ipc/mach_port.h \
-   ipc/mach_rpc.c \
ipc/mach_debug.c \
ipc/port.h
 EXTRA_DIST += \
diff --git a/ipc/mach_rpc.c b/ipc/mach_rpc.c
deleted file mode 100644
index 7b747f79..
--- a/ipc/mach_rpc.c
+++ /dev/null
@@ -1,150 +0,0 @@
-/* 
- * Copyright (c) 1994 The University of Utah and
- * the Computer Systems Laboratory (CSL).  All rights reserved.
- *
- * Permission to use, copy, modify and distribute this software is hereby
- * granted provided that (1) source code retains these copyright, permission,
- * and disclaimer notices, and (2) redistributions including binaries
- * reproduce the notices in supporting documentation, and (3) all advertising
- * materials mentioning features or use of this software display the following
- * acknowledgement: ``This product includes software developed by the
- * Computer Systems Laboratory at the University of Utah.''
- *
- * THE UNIVERSITY OF UTAH AND CSL ALLOW FREE USE OF THIS SOFTWARE IN ITS "AS
- * IS" CONDITION.  THE UNIVERSITY OF UTAH AND CSL DISCLAIM ANY LIABILITY OF
- * ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
- *
- * CSL requests users of this software to return to csl-d...@cs.utah.edu any
- * improvements that they make and grant CSL redistribution rights.
- *
- */
-
-#ifdef MIGRATING_THREADS
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#undef DEBUG_MPRC
-
-/*
- * XXX need to identify if one endpoint of an RPC is the kernel to
- * ensure proper port name translation (or lack of).  This is bogus.
- */
-#define ISKERNELACT(act)   ((act)->task == kernel_task)
-
-/*
- * Copy the indicated port from the task associated with the source
- * activation into the task associated with the destination activation.
- *
- * XXX on errors we should probably clear the portp to avoid leaking
- * info to the other side.
- */
-kern_return_t
-mach_port_rpc_copy(
-   struct rpc_port_desc*portp,
-   struct Act  *sact, 
-   struct Act  *dact)
-{
-   ipc_space_t sspace, dspace;
-   mach_msg_type_name_t tname;
-   ipc_object_t iname;
-   kern_return_t kr;
-
-#ifdef DEBUG_MPRC
-   printf("m_p_rpc_copy(portp=%x/%x, sact=%x, dact=%x): ",
-  portp->name, portp->msgt_name, sact, dact);
-#endif
-   sspace = sact->task->itk_space;
-   dspace = dact->task->itk_space;
-   if (sspace == IS_NULL || dspace == IS_NULL) {
-#ifdef DEBUG_MPRC
-   printf("bogus src (%x) or dst (%x) space\n", sspace, dspace);
-#endif
-   return KERN_INVALID_TASK;
-   }
-
-   if (!MACH_MSG_TYPE_PORT_ANY(portp->msgt_name)) {
-#ifdef DEBUG_MPRC
-   printf("invalid port type\n");
-#endif
-   return KERN_INVALID_VALUE;
-   }
-
-   if (ISKERNELACT(sact)) {
-   iname = (ipc_object_t) portp->name;
-   ipc_object_copyin_from_kernel(iname, portp->msgt_name);
-   kr = KERN_SUCCESS;
-   } else {
-   kr = ipc_object_copyin(sspace, portp->name, portp->msgt_name,
-  &iname);
-   }
-   if (kr != KERN_SUCCESS) {
-#ifdef DEBUG_MPRC
-   printf("copyin returned %x\n", kr);
-#endif
-   return kr;
-   }
-
-   tname = ipc_object_copyin_type(portp->msgt_name);
-   if (!IO_VALID(iname)) {
-   portp->name = (mach_port_name_t) iname;
-   portp->msgt_name = tname;
-#ifdef DEBUG_MPRC
-   printf("iport %x invalid\n", iname);
-#endif
-   return KERN_SUCCESS;
-   }
-
-   if (ISKERNELACT(dact)) {
-   portp->name = (mach_port_name_t) iname;
-   kr = KERN_SUCCESS;
-   } else {
-   kr = ipc_object_copyout(dspace, iname, tname, TRUE,
-   &portp->name);
-   }
-   if (kr != KERN_SUCCESS) {
-   ipc_object_destroy(iname, tname);
-
-   if (kr == KERN_INVALID_CAPABILITY)
-   portp->name = MACH_PORT_DEAD;
-   else {
-   portp->name = MACH_PORT_NULL;
-#ifdef DEBUG_MPRC
-   printf("copyout iport %x returned %x\n", iname);
-#endif
-   return kr;
-   }
-   }
-
-   portp->msgt_name = tname;
-#ifdef DEBUG_MPR

[PATCH 2/4] add required include

2023-01-16 Thread Luca Dariz
* kern/syscall_sw.h: add missing include

Signed-off-by: Luca Dariz 
---
 kern/syscall_sw.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kern/syscall_sw.h b/kern/syscall_sw.h
index 34eaf90b..9e76fc60 100644
--- a/kern/syscall_sw.h
+++ b/kern/syscall_sw.h
@@ -27,6 +27,8 @@
 #ifndef_KERN_SYSCALL_SW_H_
 #define_KERN_SYSCALL_SW_H_
 
+#include 
+
 /*
  * mach_trap_stack indicates the trap may discard
  * its kernel stack.  Some architectures may need
-- 
2.30.2




[PATCH 4/4] fix warnings

2023-01-16 Thread Luca Dariz
* ipc/ipc_kmsg.c: fix cast to the correct pointer type
* ipc/ipc_port.c: upcast rpc_vm_offset_t to full vm_offset_t
* kern/pc_sample.c: Likewise
---
 ipc/ipc_kmsg.c   | 5 ++---
 ipc/ipc_port.c   | 2 +-
 kern/pc_sample.c | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/ipc/ipc_kmsg.c b/ipc/ipc_kmsg.c
index 2477c576..2ebd729b 100644
--- a/ipc/ipc_kmsg.c
+++ b/ipc/ipc_kmsg.c
@@ -2434,11 +2434,10 @@ ipc_kmsg_copyout_body(
/* copyout port rights carried in the message */
 
for (i = 0; i < number; i++) {
-   ipc_object_t object =
-   (ipc_object_t) objects[i];
+   ipc_object_t object = objects[i];
 
mr |= ipc_kmsg_copyout_object_to_port(space, 
object,
-  name, 
&objects[i]);
+  name, 
(mach_port_t*)&objects[i]);
}
}
 
diff --git a/ipc/ipc_port.c b/ipc/ipc_port.c
index be6e06ac..f9ccc290 100644
--- a/ipc/ipc_port.c
+++ b/ipc/ipc_port.c
@@ -1283,7 +1283,7 @@ ipc_port_print(port)
printf(", sndrs=0x%x", port->ip_blocked.ithq_base);
printf(", kobj=0x%x\n", port->ip_kobject);
 
-   iprintf("protected_payload=%p\n", (void *) port->ip_protected_payload);
+   iprintf("protected_payload=%p\n", (void *) (vm_offset_t) 
port->ip_protected_payload);
 
indent -= 2;
 }
diff --git a/kern/pc_sample.c b/kern/pc_sample.c
index 280d8b54..d13beb07 100644
--- a/kern/pc_sample.c
+++ b/kern/pc_sample.c
@@ -61,7 +61,7 @@ void take_pc_sample(
 
 cp->seqno++;
 sample = &((sampled_pc_t *)cp->buffer)[cp->seqno % MAX_PC_SAMPLES];
-sample->id = (rpc_vm_offset_t)t;
+sample->id = (rpc_vm_offset_t)(vm_offset_t)t;
 sample->pc = (rpc_vm_offset_t)pc;
 sample->sampletype = flavor;
 }
-- 
2.30.2




[PATCH 1/4] add missing argument names

2023-01-16 Thread Luca Dariz
* ddb/db_break.c: add argument name, compilation fails on Debian/Linux
  stable with gcc 10.2 otherwise. For some reason on Debian/Hurd a
  simple test program without argname succeeds, unless I force
  -std=c11 or similar; I suppose because newer gcc have different
  defaults. Gnumach seem to still require c89 for some older code,
  otherwise we could explicitely use gnu99/c99 or gnu11/c11.
* ddb/db_cond.c: Likewise
* ddb/db_examine.c: Likewise
* ddb/db_macro.c: Likewise
* ddb/db_watch.c: Likewise
* device/dev_name.c: Likewise
---
 ddb/db_break.c| 16 
 ddb/db_cond.c |  8 
 ddb/db_examine.c  |  8 
 ddb/db_macro.c| 24 
 ddb/db_watch.c|  8 
 device/dev_name.c |  6 +++---
 6 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/ddb/db_break.c b/ddb/db_break.c
index 780c1ccc..676f5ca3 100644
--- a/ddb/db_break.c
+++ b/ddb/db_break.c
@@ -598,10 +598,10 @@ db_list_breakpoints(void)
 /*ARGSUSED*/
 void
 db_delete_cmd(
-   db_expr_t,
-   boolean_t,
-   db_expr_t,
-   const char *)
+   db_expr_t   addr_,
+   int have_addr,
+   db_expr_t   count,
+   const char *modif)
 {
int n;
thread_t thread;
@@ -735,10 +735,10 @@ db_breakpoint_cmd(
 /* list breakpoints */
 void
 db_listbreak_cmd(
-   db_expr_t,
-   boolean_t,
-   db_expr_t,
-   const char *)
+   db_expr_t   addr,
+   int have_addr,
+   db_expr_t   count,
+   const char *modif)
 {
db_list_breakpoints();
 }
diff --git a/ddb/db_cond.c b/ddb/db_cond.c
index 8f0c8b30..0644cf24 100644
--- a/ddb/db_cond.c
+++ b/ddb/db_cond.c
@@ -122,10 +122,10 @@ db_cond_print(bkpt)
 
 void
 db_cond_cmd(
-   db_expr_t,
-   boolean_t,
-   db_expr_t,
-   const char *)
+   db_expr_t   addr,
+   int have_addr,
+   db_expr_t   count,
+   const char *modif)
 {
int c;
struct db_cond *cp;
diff --git a/ddb/db_examine.c b/ddb/db_examine.c
index 62a887ad..30799360 100644
--- a/ddb/db_examine.c
+++ b/ddb/db_examine.c
@@ -347,10 +347,10 @@ db_strcpy(char *dst, const char *src)
  */
 void
 db_search_cmd(
-   db_expr_t,
-   boolean_t,
-   db_expr_t,
-   const char *)
+   db_expr_t e,
+   boolean_t b,
+   db_expr_t e2,
+   const char * cc)
 {
int t;
db_addr_t   addr;
diff --git a/ddb/db_macro.c b/ddb/db_macro.c
index d417abe1..7f3300dc 100644
--- a/ddb/db_macro.c
+++ b/ddb/db_macro.c
@@ -73,10 +73,10 @@ db_lookup_macro(const char *name)
 
 void
 db_def_macro_cmd(
-   db_expr_t,
-   boolean_t,
-   db_expr_t,
-   const char *)
+   db_expr_t   addr,
+   int have_addr,
+   db_expr_t   count,
+   const char *modif)
 {
char *p;
int c;
@@ -108,10 +108,10 @@ db_def_macro_cmd(
 
 void
 db_del_macro_cmd(
-   db_expr_t,
-   boolean_t,
-   db_expr_t,
-   const char *)
+   db_expr_t   addr,
+   int have_addr,
+   db_expr_t   count,
+   const char *modif)
 {
struct db_user_macro *mp;
 
@@ -128,10 +128,10 @@ db_del_macro_cmd(
 
 void
 db_show_macro(
-   db_expr_t,
-   boolean_t,
-   db_expr_t,
-   const char *)
+   db_expr_t   addr,
+   int have_addr,
+   db_expr_t   count,
+   const char *modif)
 {
struct db_user_macro *mp;
int  t;
diff --git a/ddb/db_watch.c b/ddb/db_watch.c
index 5db3f300..aa0f47b7 100644
--- a/ddb/db_watch.c
+++ b/ddb/db_watch.c
@@ -249,10 +249,10 @@ db_watchpoint_cmd(
 /* list watchpoints */
 void
 db_listwatch_cmd(
-   db_expr_t,
-   boolean_t,
-   db_expr_t,
-   const char *)
+   db_expr_t   addr,
+   int have_addr,
+   db_expr_t   count,
+   const char *modif)
 {
db_list_watchpoints();
 }
diff --git a/device/dev_name.c b/device/dev_name.c
index 66e6eafe..4595d31c 100644
--- a/device/dev_name.c
+++ b/device/dev_name.c
@@ -39,7 +39,7 @@
 /*
  * Routines placed in empty entries in the device tables
  */
-int nulldev_reset(dev_t)
+int nulldev_reset(dev_t dev)
 {
return (D_SUCCESS);
 }
@@ -78,12 +78,12 @@ int nulldev_portdeath(dev_t dev, mach_port_t port)
return (D_SUCCESS);
 }
 
-int nodev_async_in(dev_t, const ipc_port_t, int, filter_t*, unsigned int)
+int nodev_async_in(dev_t dev, const ipc_port_t port, int x, filter_t* filter, 
unsigned int j)
 {
return (D_INVALID_OPERATION);
 }
 
-int nodev_info(dev_t, int, int*)
+int nodev_info(dev_t dev, int a, int* b)
 {
return (D_INVALID_OPERATION);
 }
-- 
2.30.2




Mixing 32 and 64 bit userspace tasks (was: Re: [PATCH gnumach] Define rpc_vm_size_array_t and rpc_vm_offset_array_t)

2023-02-03 Thread Luca Dariz

Il 01/02/23 10:36, Sergey Bugaev ha scritto:

Note that another way to handle the size conversion between rpc_* and
regular types would be to add some new VM types to
include/mach/message.h; in this case, the shrink/expand would happen in
copyinmsg()/copyoutmsg() instead of the mig-generated code (as for mach
ports), but this approach would require to plan for a staged
introduction of this change.


Actually, yes, that's what I've been thinking about. Currently
vm_size_t & friends are conditionally (but statically) defined to
either MACH_MSG_TYPE_INTEGER_32 or MACH_MSG_TYPE_INTEGER_64. But what
if we also had MACH_MSG_TYPE_INTEGER_PTRSIZE (name TBD), which would
be equivalent to MACH_MSG_TYPE_INTEGER_{32,64} for {32,64}-bit tasks;
but the kernel would know about this and change its size accordingly
when transferring messages between 64- and 32-bit tasks. A value sent
as MACH_MSG_TYPE_INTEGER_32 is always received as
MACH_MSG_TYPE_INTEGER_32, no matter if the intent was to send a
pointer-sized value, whereas a 32-bit value sent by a 32-bit task as
MACH_MSG_TYPE_INTEGER_PTRSIZE would get received as a 64-bit value,
still MACH_MSG_TYPE_INTEGER_PTRSIZE, by a 64-bit task.


While this shouldn't be a problem for normal data transfers, where the 
kernel already copies the data from one task to the other adjusting the 
size of vm fields, I wonder if there are cases where the virtual address 
is not supposed to be translated between tasks, e.g. if we really need 
to send a task-specific virtual address. For example, are there cases 
where a task allocates memory on behalf of another task? (except for the 
exec task)




This would allow 32- and 64- bit tasks (including the kernel task) to
communicate transparently, without requiring separate rpc_* versions
of all the pointer-sized types. Possibly. Maybe. Uness there are a lot
of subtler details to this, which there of course are.
I think a more difficult case is the rpc_time_value_t type, but this 
case could be solved by adding yet another MACH_MSG_TYPE_TIME_VALUE value.


However the problem exists for any struct type (which are just int[] of 
fixed size in mig), because there could be differences due to both 
alignment and pointer-type fields. So either the kernel knows all the 
data structures exchanged by user-space tasks, or the task needs to 
handle a message differently depending if the sender is a 32 or 64 bit 
task. Maybe this could be handled by the mig stubs, I'm not sure. The 
information about the sending task could be encoded in one of the unused 
bits of the msg header.


Something similar to rpc_ types would still be needed for regular 
syscalls, but then they could be renamed to vm_offset_32_t and similar, 
since syscalls would have different entry points.



But I'm sure it has been a conscious decision to go with the current
design and not this way?


For the parts that I wrote, the choice was based on some information 
that I found in the wiki, on irc and to some extent it seemed faster to 
implement. Considering the above issue with struct, I'm not sure how 
much a new vm type would simplify message handling.



I'm also not arguing that mixing 32- and 64-bit tasks is worth
supporting. This was a hard requirement for OS X, since they had to be
able to run existing proprietary binaries unmodified. We build
everything from source, so just having all of the userland be 64-bit
(and fixing any issues we find on the way) sounds very viable. Using
32-bit win32 software via Wine has been the single reason to run
32-bit processes on my GNU/Linux system, but apparently that's now
changing too (WoW64). And I don't know how relevant using Wine on the
Hurd is anyway.


Maybe one advantage could be a reduced memory usage for 32-bit subhurds, 
if we want also 64-bit subhurds on the same machine.



Luca




[PATCH 4/6] fix rpc time value for 64 bit

2023-02-12 Thread Luca Dariz
* include/mach/task_info.h: use rpc variant of time_value_t
* include/mach/thread_info.h: Likewise
* kern/mach_clock.c: use rpc variant of time_value_t in
  read_time_stamp()
* kern/mach_clock.h: Likewise
* kern/thread.c: use rpc variant of thread_read_times()
* kern/timer.h_ add thread_read_times_rpc() by converting time_value_t
  to the corresponding rpc structures inline.
---
 include/mach/task_info.h   | 10 +-
 include/mach/thread_info.h |  6 +++---
 kern/mach_clock.c  |  2 +-
 kern/mach_clock.h  |  2 +-
 kern/thread.c  |  2 +-
 kern/timer.h   | 12 
 6 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/mach/task_info.h b/include/mach/task_info.h
index 3aaa7cd6..f448ee04 100644
--- a/include/mach/task_info.h
+++ b/include/mach/task_info.h
@@ -56,11 +56,11 @@ struct task_basic_info {
integer_t   base_priority;  /* base scheduling priority */
rpc_vm_size_t   virtual_size;   /* number of virtual pages */
rpc_vm_size_t   resident_size;  /* number of resident pages */
-   time_value_tuser_time;  /* total user run time for
+   rpc_time_value_tuser_time;  /* total user run time for
   terminated threads */
-   time_value_tsystem_time;/* total system run time for
+   rpc_time_value_tsystem_time;/* total system run time for
   terminated threads */
-   time_value_tcreation_time;  /* creation time stamp */
+   rpc_time_value_tcreation_time;  /* creation time stamp */
 };
 
 typedef struct task_basic_info task_basic_info_data_t;
@@ -89,9 +89,9 @@ typedef struct task_events_info   
*task_events_info_t;
   only accurate if suspended */
 
 struct task_thread_times_info {
-   time_value_tuser_time;  /* total user run time for
+   rpc_time_value_tuser_time;  /* total user run time for
   live threads */
-   time_value_tsystem_time;/* total system run time for
+   rpc_time_value_tsystem_time;/* total system run time for
   live threads */
 };
 
diff --git a/include/mach/thread_info.h b/include/mach/thread_info.h
index 569c8c84..46c1ceca 100644
--- a/include/mach/thread_info.h
+++ b/include/mach/thread_info.h
@@ -55,8 +55,8 @@ typedef   integer_t   
thread_info_data_t[THREAD_INFO_MAX];
 #define THREAD_BASIC_INFO  1   /* basic information */
 
 struct thread_basic_info {
-   time_value_tuser_time;  /* user run time */
-   time_value_tsystem_time;/* system run time */
+   rpc_time_value_tuser_time;  /* user run time */
+   rpc_time_value_tsystem_time;/* system run time */
integer_t   cpu_usage;  /* scaled cpu usage percentage */
integer_t   base_priority;  /* base scheduling priority */
integer_t   cur_priority;   /* current scheduling priority */
@@ -65,7 +65,7 @@ struct thread_basic_info {
integer_t   suspend_count;  /* suspend count for thread */
integer_t   sleep_time; /* number of seconds that thread
   has been sleeping */
-   time_value_tcreation_time;  /* time stamp of creation */
+   rpc_time_value_tcreation_time;  /* time stamp of creation */
 };
 
 typedef struct thread_basic_info   thread_basic_info_data_t;
diff --git a/kern/mach_clock.c b/kern/mach_clock.c
index 09717d16..ed38c76b 100644
--- a/kern/mach_clock.c
+++ b/kern/mach_clock.c
@@ -429,7 +429,7 @@ record_time_stamp(time_value_t *stamp)
  * real-time clock frame.
  */
 void
-read_time_stamp (const time_value_t *stamp, time_value_t *result)
+read_time_stamp (const time_value_t *stamp, rpc_time_value_t *result)
 {
time_value64_t result64;
TIME_VALUE_TO_TIME_VALUE64(stamp, &result64);
diff --git a/kern/mach_clock.h b/kern/mach_clock.h
index 7e8d3046..9a670011 100644
--- a/kern/mach_clock.h
+++ b/kern/mach_clock.h
@@ -98,7 +98,7 @@ extern void record_time_stamp (time_value_t *stamp);
  * Read a timestamp in STAMP into RESULT.  Returns values in the
  * real-time clock frame.
  */
-extern void read_time_stamp (const time_value_t *stamp, time_value_t *result);
+extern void read_time_stamp (const time_value_t *stamp, rpc_time_value_t 
*result);
 
 extern void mapable_time_init (void);
 
diff --git a/kern/thread.c b/kern/thread.c
index 17cc458c..4a6b9eda 100644
--- a/kern/thread.c
+++ b/kern/thread.c
@@ -1522,7 +1522,7 @@ kern_return_t thread_info(
 
/* fill in info */
 
-   thread_read_times(thread,
+   thread_read_times_rpc(thread,
&basic_info->user_time,
&basic_info->system_time);
basic_info->base_

[PATCH 3/6] add L4 kmem cache for x86_64

2023-02-12 Thread Luca Dariz
* i386/intel/pmap.c: allocate the L4 page table from a dedicate kmem
  cache instead of the generic kernel map.
  Also improve readability of nested ifdef's.
---
 i386/intel/pmap.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index ccbb03fc..615b0fff 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -397,13 +397,14 @@ boolean_t cpu_update_needed[NCPUS];
 struct pmapkernel_pmap_store;
 pmap_t kernel_pmap;
 
-struct kmem_cache  pmap_cache; /* cache of pmap structures */
-struct kmem_cache  pd_cache;   /* cache of page directories */
+struct kmem_cache pmap_cache;  /* cache of pmap structures */
+struct kmem_cache pd_cache;/* cache of page directories */
 #if PAE
-struct kmem_cache  pdpt_cache; /* cache of page
-  directory pointer
-  tables */
-#endif
+struct kmem_cache pdpt_cache;  /* cache of page directory pointer tables */
+#ifdef __x86_64__
+struct kmem_cache l4_cache;/* cache of L4 tables */
+#endif /* __x86_64__ */
+#endif /* PAE */
 
 boolean_t  pmap_debug = FALSE; /* flag for debugging prints */
 
@@ -1046,7 +1047,12 @@ void pmap_init(void)
kmem_cache_init(&pdpt_cache, "pdpt",
INTEL_PGBYTES, INTEL_PGBYTES, NULL,
KMEM_CACHE_PHYSMEM);
-#endif
+#ifdef __x86_64__
+   kmem_cache_init(&l4_cache, "L4",
+   INTEL_PGBYTES, INTEL_PGBYTES, NULL,
+   KMEM_CACHE_PHYSMEM);
+#endif /* __x86_64__ */
+#endif /* PAE */
s = (vm_size_t) sizeof(struct pv_entry);
kmem_cache_init(&pv_list_cache, "pv_entry", s, 0, NULL, 0);
 
@@ -1287,10 +1293,8 @@ pmap_t pmap_create(vm_size_t size)
  );
}
 #ifdef __x86_64__
-   // FIXME: use kmem_cache_alloc instead
-   if (kmem_alloc_wired(kernel_map,
-(vm_offset_t *)&p->l4base, INTEL_PGBYTES)
-   != KERN_SUCCESS)
+   p->l4base = (pt_entry_t *) kmem_cache_alloc(&l4_cache);
+   if (p->l4base == NULL)
panic("pmap_create");
memset(p->l4base, 0, INTEL_PGBYTES);
WRITE_PTE(&p->l4base[0], pa_to_pte(kvtophys((vm_offset_t) p->pdpbase)) 
| INTEL_PTE_VALID | INTEL_PTE_WRITE | INTEL_PTE_USER);
@@ -1426,16 +1430,16 @@ void pmap_destroy(pmap_t p)
pmap_set_page_readwrite(p->l4base);
pmap_set_page_readwrite(p->user_l4base);
pmap_set_page_readwrite(p->user_pdpbase);
-#endif
+#endif /* __x86_64__ */
pmap_set_page_readwrite(p->pdpbase);
 #endif /* MACH_PV_PAGETABLES */
 #ifdef __x86_64__
-   kmem_free(kernel_map, (vm_offset_t)p->l4base, INTEL_PGBYTES);
+kmem_cache_free(&l4_cache, (vm_offset_t) p->l4base);
 #ifdef MACH_PV_PAGETABLES
kmem_free(kernel_map, (vm_offset_t)p->user_l4base, INTEL_PGBYTES);
kmem_free(kernel_map, (vm_offset_t)p->user_pdpbase, INTEL_PGBYTES);
-#endif
-#endif
+#endif /* MACH_PV_PAGETABLES */
+#endif /* __x86_64__ */
kmem_cache_free(&pdpt_cache, (vm_offset_t) p->pdpbase);
 #endif /* PAE */
kmem_cache_free(&pmap_cache, (vm_offset_t) p);
-- 
2.30.2




[PATCH gnumach 0/6] minor fixes and las 32on64 compatibility issue

2023-02-12 Thread Luca Dariz
This series contains some minor fixes
  set unused members of thread state to 0
  fix hardcoded physical address
  add L4 kmem cache for x86_64
and the last two rpc compatibility issues
  fix rpc time value for 64 bit
  fix port name size in notifications
then at this stage it seems fine to enable syscall.

I tested this by booting a ramdisk (manually adding debian patches),
and the system can boot until reaching a shell. User-space drivers do
not work yet (intr rpc are not yet implemented for x86_64) and there
are various strange things, for example:
* the reported memory amount is not accurate, e.g. for 8G I see only
  1GB of available memory, with 3GB of free memory. This might br
  related with the memory amount type fixed to 32-bit in host_info().
* the startup task fails to set the args vector on the kernel task
  (invalid address), but this doesn't seem fatal

There could still be issues with internal mach devices, I didn't test
them thoroughly.

Luca Dariz (6):
  set unused members of thread state to 0
  fix hardcoded physical address
  add L4 kmem cache for x86_64
  fix rpc time value for 64 bit
  fix port name size in notifications
  enable syscalls on x86_64

 i386/i386/pcb.c|  1 +
 i386/i386at/com.c  |  2 +-
 i386/intel/pmap.c  | 34 +++---
 include/mach/task_info.h   | 10 +-
 include/mach/thread_info.h |  6 +++---
 ipc/ipc_machdep.h  |  1 +
 ipc/ipc_notify.c   |  8 
 kern/mach_clock.c  |  2 +-
 kern/mach_clock.h  |  2 +-
 kern/thread.c  |  2 +-
 kern/timer.h   | 12 
 x86_64/locore.S|  3 ---
 12 files changed, 49 insertions(+), 34 deletions(-)

-- 
2.30.2




[PATCH 6/6] enable syscalls on x86_64

2023-02-12 Thread Luca Dariz
Signed-off-by: Luca Dariz 
---
 x86_64/locore.S | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/x86_64/locore.S b/x86_64/locore.S
index 5b9c8ef4..95ece3cc 100644
--- a/x86_64/locore.S
+++ b/x86_64/locore.S
@@ -1075,9 +1075,6 @@ syscall_entry_2:
pushq   %rax/* save system call number */
pushq   $0  /* clear trap number slot */
 
-// TODO: test it before dropping ud2
-   ud2
-
pusha   /* save the general registers */
movq%ds,%rdx/* and the segment registers */
pushq   %rdx
-- 
2.30.2




[PATCH 1/6] set unused members of thread state to 0

2023-02-12 Thread Luca Dariz
* i386/i386/pcb.c: always set esp to 0, it  seems unused.
---
 i386/i386/pcb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c
index 9ac55a1c..924ed08b 100644
--- a/i386/i386/pcb.c
+++ b/i386/i386/pcb.c
@@ -706,6 +706,7 @@ kern_return_t thread_getstatus(
state->eip = saved_state->eip;
state->efl = saved_state->efl;
state->uesp = saved_state->uesp;
+   state->esp = 0;  /* unused */
 
state->cs = saved_state->cs;
state->ss = saved_state->ss;
-- 
2.30.2




[PATCH 2/6] fix hardcoded physical address

2023-02-12 Thread Luca Dariz
* i386/i386at/com.c use proper helper to convert physical to virtual
  address.
---
 i386/i386at/com.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/i386/i386at/com.c b/i386/i386at/com.c
index 000475db..de21206c 100644
--- a/i386/i386at/com.c
+++ b/i386/i386at/com.c
@@ -276,7 +276,7 @@ comcninit(struct consdev *cp)
 
{
charmsg[128];
-   volatile unsigned char *p = (volatile unsigned char *)0xb8000;
+   volatile unsigned char *p = (volatile unsigned char 
*)phystokv(0xb8000);
int i;
 
sprintf(msg, " using COM port %d for console ",
-- 
2.30.2




[PATCH 5/6] fix port name size in notifications

2023-02-12 Thread Luca Dariz
* ipc/ipc_machdep.h: define PORT_NAME_T_SIZE_IN_BITS
* ipc/ipc_notify.c: fix port name size in notification message
  templates
---
 ipc/ipc_machdep.h | 1 +
 ipc/ipc_notify.c  | 8 
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/ipc/ipc_machdep.h b/ipc/ipc_machdep.h
index 29878dc9..2871fc31 100755
--- a/ipc/ipc_machdep.h
+++ b/ipc/ipc_machdep.h
@@ -34,5 +34,6 @@
  */
 
 #define PORT_T_SIZE_IN_BITS (sizeof(mach_port_t)*8)
+#define PORT_NAME_T_SIZE_IN_BITS (sizeof(mach_port_name_t)*8)
 
 #endif /* _IPC_IPC_MACHDEP_H_ */
diff --git a/ipc/ipc_notify.c b/ipc/ipc_notify.c
index eea60116..d0b71cf2 100644
--- a/ipc/ipc_notify.c
+++ b/ipc/ipc_notify.c
@@ -72,7 +72,7 @@ ipc_notify_init_port_deleted(mach_port_deleted_notification_t 
*n)
m->msgh_id = MACH_NOTIFY_PORT_DELETED;
 
t->msgt_name = MACH_MSG_TYPE_PORT_NAME;
-   t->msgt_size = PORT_T_SIZE_IN_BITS;
+   t->msgt_size = PORT_NAME_T_SIZE_IN_BITS;
t->msgt_number = 1;
t->msgt_inline = TRUE;
t->msgt_longform = FALSE;
@@ -102,7 +102,7 @@ 
ipc_notify_init_msg_accepted(mach_msg_accepted_notification_t *n)
m->msgh_id = MACH_NOTIFY_MSG_ACCEPTED;
 
t->msgt_name = MACH_MSG_TYPE_PORT_NAME;
-   t->msgt_size = PORT_T_SIZE_IN_BITS;
+   t->msgt_size = PORT_NAME_T_SIZE_IN_BITS;
t->msgt_number = 1;
t->msgt_inline = TRUE;
t->msgt_longform = FALSE;
@@ -164,7 +164,7 @@ ipc_notify_init_no_senders(
m->msgh_id = MACH_NOTIFY_NO_SENDERS;
 
t->msgt_name = MACH_MSG_TYPE_INTEGER_32;
-   t->msgt_size = PORT_T_SIZE_IN_BITS;
+   t->msgt_size = 32;
t->msgt_number = 1;
t->msgt_inline = TRUE;
t->msgt_longform = FALSE;
@@ -215,7 +215,7 @@ ipc_notify_init_dead_name(
m->msgh_id = MACH_NOTIFY_DEAD_NAME;
 
t->msgt_name = MACH_MSG_TYPE_PORT_NAME;
-   t->msgt_size = PORT_T_SIZE_IN_BITS;
+   t->msgt_size = PORT_NAME_T_SIZE_IN_BITS;
t->msgt_number = 1;
t->msgt_inline = TRUE;
t->msgt_longform = FALSE;
-- 
2.30.2




[PATCH 6/9] add more explicit names for user space virtual space limits

2023-02-12 Thread Luca Dariz
* i386/i386/vm_param.h: add VM_MAX/MIN_USER_ADDRESS to kernel headers.
* i386/i386/db_interface.c
* i386/i386/ldt.c
* i386/i386/pcb.c
* i386/intel/pmap.c
* kern/task.c: replace VM_MAX/MIN_ADDRESS with VM_MAX/MIN_USER_ADDRESS
---
 i386/i386/db_interface.c |  4 ++--
 i386/i386/ldt.c  |  8 
 i386/i386/pcb.c  |  6 +++---
 i386/i386/vm_param.h |  6 +-
 i386/intel/pmap.c| 18 +-
 kern/task.c  |  4 ++--
 6 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/i386/i386/db_interface.c b/i386/i386/db_interface.c
index 3a331490..8e2dc6a2 100644
--- a/i386/i386/db_interface.c
+++ b/i386/i386/db_interface.c
@@ -119,8 +119,8 @@ kern_return_t db_set_debug_state(
int i;
 
for (i = 0; i <= 3; i++)
-   if (state->dr[i] < VM_MIN_ADDRESS
-|| state->dr[i] >= VM_MAX_ADDRESS)
+   if (state->dr[i] < VM_MIN_USER_ADDRESS
+|| state->dr[i] >= VM_MAX_USER_ADDRESS)
return KERN_INVALID_ARGUMENT;
 
pcb->ims.ids = *state;
diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c
index 3f9ac8ff..70fa24e2 100644
--- a/i386/i386/ldt.c
+++ b/i386/i386/ldt.c
@@ -64,13 +64,13 @@ ldt_fill(struct real_descriptor *myldt, struct 
real_descriptor *mygdt)
  (vm_offset_t)&syscall, KERNEL_CS,
  ACC_PL_U|ACC_CALL_GATE, 0);
fill_ldt_descriptor(myldt, USER_CS,
-   VM_MIN_ADDRESS,
-   VM_MAX_ADDRESS-VM_MIN_ADDRESS-4096,
+   VM_MIN_USER_ADDRESS,
+   VM_MAX_USER_ADDRESS-VM_MIN_USER_ADDRESS-4096,
/* XXX LINEAR_... */
ACC_PL_U|ACC_CODE_R, SZ_32);
fill_ldt_descriptor(myldt, USER_DS,
-   VM_MIN_ADDRESS,
-   VM_MAX_ADDRESS-VM_MIN_ADDRESS-4096,
+   VM_MIN_USER_ADDRESS,
+   VM_MAX_USER_ADDRESS-VM_MIN_USER_ADDRESS-4096,
ACC_PL_U|ACC_DATA_W, SZ_32);
 
/* Activate the LDT.  */
diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c
index 924ed08b..3ae9e095 100644
--- a/i386/i386/pcb.c
+++ b/i386/i386/pcb.c
@@ -622,10 +622,10 @@ kern_return_t thread_setstatus(
int_table = state->int_table;
int_count = state->int_count;
 
-   if (int_table >= VM_MAX_ADDRESS ||
+   if (int_table >= VM_MAX_USER_ADDRESS ||
int_table +
int_count * sizeof(struct v86_interrupt_table)
-   > VM_MAX_ADDRESS)
+   > VM_MAX_USER_ADDRESS)
return KERN_INVALID_ARGUMENT;
 
thread->pcb->ims.v86s.int_table = int_table;
@@ -834,7 +834,7 @@ thread_set_syscall_return(
 vm_offset_t
 user_stack_low(vm_size_t stack_size)
 {
-   return (VM_MAX_ADDRESS - stack_size);
+   return (VM_MAX_USER_ADDRESS - stack_size);
 }
 
 /*
diff --git a/i386/i386/vm_param.h b/i386/i386/vm_param.h
index 314fdb35..5e7f149a 100644
--- a/i386/i386/vm_param.h
+++ b/i386/i386/vm_param.h
@@ -31,6 +31,10 @@
 #include 
 #endif
 
+/* To avoid ambiguity in kernel code, make the name explicit */
+#define VM_MIN_USER_ADDRESS VM_MIN_ADDRESS
+#define VM_MAX_USER_ADDRESS VM_MAX_ADDRESS
+
 /* The kernel address space is usually 1GB, usually starting at virtual 
address 0.  */
 /* This can be changed freely to separate kernel addresses from user addresses
  * for better trace support in kdb; the _START symbol has to be offset by the
@@ -77,7 +81,7 @@
 #else
 /* On x86, the kernel virtual address space is actually located
at high linear addresses. */
-#define LINEAR_MIN_KERNEL_ADDRESS  (VM_MAX_ADDRESS)
+#define LINEAR_MIN_KERNEL_ADDRESS  (VM_MAX_USER_ADDRESS)
 #define LINEAR_MAX_KERNEL_ADDRESS  (0xUL)
 #endif
 
diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 9e9f91db..a9ff6f3e 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -1341,7 +1341,7 @@ pmap_t pmap_create(vm_size_t size)
  );
}
 #ifdef __x86_64__
-   // TODO alloc only PDPTE for the user range VM_MIN_ADDRESS, 
VM_MAX_ADDRESS
+   // TODO alloc only PDPTE for the user range VM_MIN_USER_ADDRESS, 
VM_MAX_USER_ADDRESS
// and keep the same for kernel range, in l4 table we have different 
entries
p->l4base = (pt_entry_t *) kmem_cache_alloc(&l4_cache);
if (p->l4base == NULL)
@@ -1349,7 +1349,7 @@ pmap_t pmap_create(vm_size_t size)
memset(p->l4base, 0, INTEL_PGBYTES);
WRITE_PTE(&p->l4base[lin2l4num(VM_MIN_KERNEL_ADDRESS)],
  pa_to_pte(kvtophys((vm_offset_t) pdp_kernel)) | 
INTEL_PTE_VALID | INTEL_PTE_WRITE | INTEL_PTE_USER);
-#if lin2l4num(VM_MIN_KERNEL_ADDRESS) != lin2l4num(VM_MAX_ADDRESS)
+#if lin2l4num(VM_MIN_KERNEL_ADDRESS) != lin2l4num(VM_MAX_USER_ADDRESS)
// TODO k

[PATCH 2/9] fix x86_64 asm for higher kernel addresses

2023-02-12 Thread Luca Dariz
* x86_64/interrupt.S: use 64-bit registers as variables could be
  stored at high addresses
* x86_64/locore.S: Likewise
---
 x86_64/interrupt.S | 4 ++--
 x86_64/locore.S| 6 ++
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/x86_64/interrupt.S b/x86_64/interrupt.S
index fe2b3858..31f386ec 100644
--- a/x86_64/interrupt.S
+++ b/x86_64/interrupt.S
@@ -59,10 +59,10 @@ ENTRY(interrupt)
 
movlS_IRQ,%eax  /* copy irq number */
shll$2,%eax /* irq * 4 */
-   movlEXT(iunit)(%eax),%edi   /* get device unit number as 1st arg */
+   movlEXT(iunit)(%rax),%edi   /* get device unit number as 1st arg */
 
shll$1,%eax /* irq * 8 */
-   call*EXT(ivect)(%eax)   /* call interrupt handler */
+   call*EXT(ivect)(%rax)   /* call interrupt handler */
 
movlS_IPL,%edi  /* restore previous ipl */
callsplx_cli/* restore previous ipl */
diff --git a/x86_64/locore.S b/x86_64/locore.S
index 95ece3cc..c54b5cd8 100644
--- a/x86_64/locore.S
+++ b/x86_64/locore.S
@@ -1152,7 +1152,7 @@ syscall_native:
 #endif
shll$5,%eax /* manual indexing of mach_trap_t */
xorq%r10,%r10
-   movlEXT(mach_trap_table)(%eax),%r10d
+   mov EXT(mach_trap_table)(%rax),%r10
/* get number of arguments */
andq%r10,%r10
jz  mach_call_call  /* skip argument copy if none */
@@ -1199,9 +1199,7 @@ mach_call_call:
/* will return with syscallofs still (or again) in eax */
 0:
 #endif /* DEBUG */
-
-   call*EXT(mach_trap_table)+8(%eax)
-   /* call procedure */
+   call*EXT(mach_trap_table)+8(%rax)  /* call procedure */
movq%rsp,%rcx   /* get kernel stack */
or  $(KERNEL_STACK_SIZE-1),%rcx
movq-7-IKS_SIZE(%rcx),%rsp  /* switch back to PCB stack */
-- 
2.30.2




[PATCH 9/9] move kernel virtual address space to upper addresses

2023-02-12 Thread Luca Dariz
* i386/i386/vm_param.h: adjust constants to the new kernel map
  - the boothdr.S code already sets up a temporary map to higher
addresses, so we can use INIT_VM_MIN_KERNEL_ADDRESS as in xen
  - increase the kernel map size to accomodate for bigger structures
and more memory
  - adjust kernel max address and directmap limit
* i386/i386at/biosmem.c: enable directmap check also on x86_64
* i386/include/mach/i386/vm_param.h: increase user virtual memory
  limit as it's not conflicting with the kernel's anymore
* i386/intel/pmap.h: adjust lin2pdenum_cont() and INTEL_PTE_PFN to the
  new kernel map
* x86_64/Makefrag.am: change KERNEL_MAP_BASE to be above 4G, and
  according to mcmodel=kernel. This will allow to use the full memory
  address space.
---
 i386/i386/vm_param.h  | 20 
 i386/i386at/biosmem.c |  2 --
 i386/include/mach/i386/vm_param.h |  2 +-
 i386/intel/pmap.h | 12 ++--
 x86_64/Makefrag.am| 12 ++--
 5 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/i386/i386/vm_param.h b/i386/i386/vm_param.h
index c2e623a6..8264ea11 100644
--- a/i386/i386/vm_param.h
+++ b/i386/i386/vm_param.h
@@ -45,7 +45,7 @@
 #define VM_MIN_KERNEL_ADDRESS  0xC000UL
 #endif
 
-#ifdef MACH_XEN
+#if defined(MACH_XEN) || defined (__x86_64__)
 /* PV kernels can be loaded directly to the target virtual address */
 #define INIT_VM_MIN_KERNEL_ADDRESS VM_MIN_KERNEL_ADDRESS
 #else  /* MACH_XEN */
@@ -72,12 +72,22 @@
  * Reserve mapping room for the kernel map, which includes
  * the device I/O map and the IPC map.
  */
+#ifdef __x86_64__
+/*
+ * Vm structures are quite bigger on 64 bit.
+ * This should be well enough for 8G of physical memory; on the other hand,
+ * maybe not all of them need to be in directly-mapped memory, see the parts
+ * allocated with pmap_steal_memory().
+ */
+#define VM_KERNEL_MAP_SIZE (512 * 1024 * 1024)
+#else
 #define VM_KERNEL_MAP_SIZE (152 * 1024 * 1024)
+#endif
 
 /* This is the kernel address range in linear addresses.  */
 #ifdef __x86_64__
 #define LINEAR_MIN_KERNEL_ADDRESS  VM_MIN_KERNEL_ADDRESS
-#define LINEAR_MAX_KERNEL_ADDRESS  (0xUL)
+#define LINEAR_MAX_KERNEL_ADDRESS  (0xUL)
 #else
 /* On x86, the kernel virtual address space is actually located
at high linear addresses. */
@@ -141,8 +151,10 @@
 #else /* MACH_XEN */
 #ifdef __LP64__
 #define VM_PAGE_MAX_SEGS 4
-#define VM_PAGE_DMA32_LIMIT DECL_CONST(0x1, UL)
-#define VM_PAGE_DIRECTMAP_LIMIT DECL_CONST(0x4000, UL)
+#define VM_PAGE_DMA32_LIMIT DECL_CONST(0x1000, UL)
+#define VM_PAGE_DIRECTMAP_LIMIT (VM_MAX_KERNEL_ADDRESS \
+- VM_MIN_KERNEL_ADDRESS \
+- VM_KERNEL_MAP_SIZE + 1)
 #define VM_PAGE_HIGHMEM_LIMIT   DECL_CONST(0x10, UL)
 #else /* __LP64__ */
 #define VM_PAGE_DIRECTMAP_LIMIT (VM_MAX_KERNEL_ADDRESS \
diff --git a/i386/i386at/biosmem.c b/i386/i386at/biosmem.c
index 78e7bb21..880989fe 100644
--- a/i386/i386at/biosmem.c
+++ b/i386/i386at/biosmem.c
@@ -637,10 +637,8 @@ biosmem_setup_allocator(const struct multiboot_raw_info 
*mbi)
  */
 end = vm_page_trunc((mbi->mem_upper + 1024) << 10);
 
-#ifndef __LP64__
 if (end > VM_PAGE_DIRECTMAP_LIMIT)
 end = VM_PAGE_DIRECTMAP_LIMIT;
-#endif /* __LP64__ */
 
 max_heap_start = 0;
 max_heap_end = 0;
diff --git a/i386/include/mach/i386/vm_param.h 
b/i386/include/mach/i386/vm_param.h
index a684ed97..e98f032c 100644
--- a/i386/include/mach/i386/vm_param.h
+++ b/i386/include/mach/i386/vm_param.h
@@ -74,7 +74,7 @@
*/
 #define VM_MIN_ADDRESS (0)
 #ifdef __x86_64__
-#define VM_MAX_ADDRESS (0x4000UL)
+#define VM_MAX_ADDRESS (0xC000UL)
 #else
 #define VM_MAX_ADDRESS (0xc000UL)
 #endif
diff --git a/i386/intel/pmap.h b/i386/intel/pmap.h
index 34c7cc89..78d27bc8 100644
--- a/i386/intel/pmap.h
+++ b/i386/intel/pmap.h
@@ -77,10 +77,10 @@ typedef phys_addr_t pt_entry_t;
 #define PDPNUM_KERNEL  (((VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) >> 
PDPSHIFT) + 1)
 #define PDPNUM_USER(((VM_MAX_USER_ADDRESS - VM_MIN_USER_ADDRESS) >> 
PDPSHIFT) + 1)
 #define PDPMASK0x1ff   /* mask for page directory pointer 
index */
-#else
+#else /* __x86_64__ */
 #define PDPNUM 4   /* number of page directory pointers */
 #define PDPMASK3   /* mask for page directory pointer 
index */
-#endif
+#endif /* __x86_64__ */
 #define PDPSHIFT   30  /* page directory pointer */
 #define PDESHIFT   21  /* page descriptor shift */
 #define PDEMASK0x1ff   /* mask for page descriptor index */
@@ -109,7 +109,11 @@ typedef phys_addr_t pt_entry_t;
 #if PAE
 /* Special version assuming contiguous page directories.  Making it
include the page directory pointer table index too.  */
+#ifdef __x86_64__
+#define lin2pdenum_cont(a) (((a

[PATCH 5/9] use L4 page table directly on x86_64 instead of short-circuiting to pdpbase

2023-02-12 Thread Luca Dariz
This is a preparation to run the kernel on high addresses, where the
user vm region and the kernel vm region will use different L3 page
tables.

* i386/intel/pmap.c: on x86_64, retrieve the value of pdpbase from the
  L4 table, and add the pmap_pdp() helper (useful also for PAE).
* i386/intel/pmap.h: remove pdpbase on x86_64.
---
 i386/intel/pmap.c | 97 ---
 i386/intel/pmap.h |  7 ++--
 2 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 470be744..9e9f91db 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -430,14 +430,11 @@ pt_entry_t *kernel_page_dir;
 static pmap_mapwindow_t mapwindows[PMAP_NMAPWINDOWS];
 def_simple_lock_data(static, pmapwindows_lock)
 
+#ifdef PAE
 static inline pt_entry_t *
-pmap_pde(const pmap_t pmap, vm_offset_t addr)
+pmap_ptp(const pmap_t pmap, vm_offset_t addr)
 {
-   pt_entry_t *page_dir;
-   if (pmap == kernel_pmap)
-   addr = kvtolin(addr);
-#if PAE
-   pt_entry_t *pdp_table, pdp, pde;
+   pt_entry_t *pdp_table, pdp;
 #ifdef __x86_64__
pdp = pmap->l4base[lin2l4num(addr)];
if ((pdp & INTEL_PTE_VALID) == 0)
@@ -446,6 +443,19 @@ pmap_pde(const pmap_t pmap, vm_offset_t addr)
 #else /* __x86_64__ */
pdp_table = pmap->pdpbase;
 #endif /* __x86_64__ */
+   return pdp_table;
+}
+#endif
+
+static inline pt_entry_t *
+pmap_pde(const pmap_t pmap, vm_offset_t addr)
+{
+   pt_entry_t *page_dir;
+   if (pmap == kernel_pmap)
+   addr = kvtolin(addr);
+#if PAE
+   pt_entry_t *pdp_table, pde;
+   pdp_table = pmap_ptp(pmap, addr);
pde = pdp_table[lin2pdpnum(addr)];
if ((pde & INTEL_PTE_VALID) == 0)
return PT_ENTRY_NULL;
@@ -585,6 +595,7 @@ vm_offset_t pmap_map_bd(
 static void pmap_bootstrap_pae(void)
 {
vm_offset_t addr;
+   pt_entry_t *pdp_kernel;
 
 #ifdef __x86_64__
 #ifdef MACH_HYP
@@ -595,13 +606,15 @@ static void pmap_bootstrap_pae(void)
memset(kernel_pmap->l4base, 0, INTEL_PGBYTES);
 #endif /* x86_64 */
 
+   // TODO: allocate only the PDPTE for kernel virtual space
+   // this means all directmap and the stupid limit above it
init_alloc_aligned(PDPNUM * INTEL_PGBYTES, &addr);
kernel_page_dir = (pt_entry_t*)phystokv(addr);
 
-   kernel_pmap->pdpbase = (pt_entry_t*)phystokv(pmap_grab_page());
-   memset(kernel_pmap->pdpbase, 0, INTEL_PGBYTES);
+   pdp_kernel = (pt_entry_t*)phystokv(pmap_grab_page());
+   memset(pdp_kernel, 0, INTEL_PGBYTES);
for (int i = 0; i < PDPNUM; i++)
-   WRITE_PTE(&kernel_pmap->pdpbase[i],
+   WRITE_PTE(&pdp_kernel[i],
  pa_to_pte(_kvtophys((void *) kernel_page_dir
  + i * INTEL_PGBYTES))
  | INTEL_PTE_VALID
@@ -611,10 +624,14 @@ static void pmap_bootstrap_pae(void)
);
 
 #ifdef __x86_64__
-   WRITE_PTE(&kernel_pmap->l4base[0], 
pa_to_pte(_kvtophys(kernel_pmap->pdpbase)) | INTEL_PTE_VALID | INTEL_PTE_WRITE);
+/* only fill the kernel pdpte during bootstrap */
+   WRITE_PTE(&kernel_pmap->l4base[lin2l4num(VM_MIN_KERNEL_ADDRESS)],
+  pa_to_pte(_kvtophys(pdp_kernel)) | INTEL_PTE_VALID | 
INTEL_PTE_WRITE);
 #ifdef MACH_PV_PAGETABLES
pmap_set_page_readonly_init(kernel_pmap->l4base);
-#endif
+#endif /* MACH_PV_PAGETABLES */
+#else  /* x86_64 */
+kernel_pmap->pdpbase = pdp_kernel;
 #endif /* x86_64 */
 }
 #endif /* PAE */
@@ -1243,7 +1260,7 @@ pmap_page_table_page_dealloc(vm_offset_t pa)
  */
 pmap_t pmap_create(vm_size_t size)
 {
-   pt_entry_t  *page_dir[PDPNUM];
+   pt_entry_t  *page_dir[PDPNUM], *pdp_kernel;
int i;
pmap_t  p;
pmap_statistics_t   stats;
@@ -1301,34 +1318,40 @@ pmap_t pmap_create(vm_size_t size)
 #endif /* MACH_PV_PAGETABLES */
 
 #if PAE
-   p->pdpbase = (pt_entry_t *) kmem_cache_alloc(&pdpt_cache);
-   if (p->pdpbase == NULL) {
+   pdp_kernel = (pt_entry_t *) kmem_cache_alloc(&pdpt_cache);
+   if (pdp_kernel == NULL) {
for (i = 0; i < PDPNUM; i++)
kmem_cache_free(&pd_cache, (vm_address_t) page_dir[i]);
kmem_cache_free(&pmap_cache, (vm_address_t) p);
return PMAP_NULL;
}
 
-   memset(p->pdpbase, 0, INTEL_PGBYTES);
+   memset(pdp_kernel, 0, INTEL_PGBYTES);
{
for (i = 0; i < PDPNUM; i++)
-   WRITE_PTE(&p->pdpbase[i],
+   WRITE_PTE(&pdp_kernel[i],
  pa_to_pte(kvtophys((vm_offset_t) page_dir[i]))
  | INTEL_PTE_VALID
 #if (defined(__x86_64__) && !defined(MACH_HYP)) || defined(MACH_PV_PAGETABLES)
  | INTEL_PTE_WRITE
 #ifdef __x86_64__
 

[PATCH 4/9] factor out PAE-specific bootstrap

2023-02-12 Thread Luca Dariz
* i386/intel/pmap.c: move it to pmap_bootstrap_pae()
---
 i386/intel/pmap.c | 72 ++-
 1 file changed, 40 insertions(+), 32 deletions(-)

diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 15577a09..470be744 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -581,8 +581,46 @@ vm_offset_t pmap_map_bd(
return(virt);
 }
 
+#ifdef PAE
+static void pmap_bootstrap_pae(void)
+{
+   vm_offset_t addr;
+
+#ifdef __x86_64__
+#ifdef MACH_HYP
+   kernel_pmap->user_l4base = NULL;
+   kernel_pmap->user_pdpbase = NULL;
+#endif
+   kernel_pmap->l4base = (pt_entry_t*)phystokv(pmap_grab_page());
+   memset(kernel_pmap->l4base, 0, INTEL_PGBYTES);
+#endif /* x86_64 */
+
+   init_alloc_aligned(PDPNUM * INTEL_PGBYTES, &addr);
+   kernel_page_dir = (pt_entry_t*)phystokv(addr);
+
+   kernel_pmap->pdpbase = (pt_entry_t*)phystokv(pmap_grab_page());
+   memset(kernel_pmap->pdpbase, 0, INTEL_PGBYTES);
+   for (int i = 0; i < PDPNUM; i++)
+   WRITE_PTE(&kernel_pmap->pdpbase[i],
+ pa_to_pte(_kvtophys((void *) kernel_page_dir
+ + i * INTEL_PGBYTES))
+ | INTEL_PTE_VALID
+#if (defined(__x86_64__) && !defined(MACH_HYP)) || defined(MACH_PV_PAGETABLES)
+ | INTEL_PTE_WRITE
+#endif
+   );
+
+#ifdef __x86_64__
+   WRITE_PTE(&kernel_pmap->l4base[0], 
pa_to_pte(_kvtophys(kernel_pmap->pdpbase)) | INTEL_PTE_VALID | INTEL_PTE_WRITE);
+#ifdef MACH_PV_PAGETABLES
+   pmap_set_page_readonly_init(kernel_pmap->l4base);
+#endif
+#endif /* x86_64 */
+}
+#endif /* PAE */
+
 #ifdef MACH_PV_PAGETABLES
-void pmap_bootstrap_xen()
+static void pmap_bootstrap_xen(void)
 {
/* We don't actually deal with the CR3 register content at all */
hyp_vm_assist(VMASST_CMD_enable, VMASST_TYPE_pae_extended_cr3);
@@ -691,37 +729,7 @@ void pmap_bootstrap(void)
/* Note: initial Xen mapping holds at least 512kB free mapped page.
 * We use that for directly building our linear mapping. */
 #if PAE
-   {
-   vm_offset_t addr;
-   init_alloc_aligned(PDPNUM * INTEL_PGBYTES, &addr);
-   kernel_page_dir = (pt_entry_t*)phystokv(addr);
-   }
-   kernel_pmap->pdpbase = (pt_entry_t*)phystokv(pmap_grab_page());
-   memset(kernel_pmap->pdpbase, 0, INTEL_PGBYTES);
-   {
-   int i;
-   for (i = 0; i < PDPNUM; i++)
-   WRITE_PTE(&kernel_pmap->pdpbase[i],
- pa_to_pte(_kvtophys((void *) kernel_page_dir
- + i * INTEL_PGBYTES))
- | INTEL_PTE_VALID
-#if (defined(__x86_64__) && !defined(MACH_HYP)) || defined(MACH_PV_PAGETABLES)
- | INTEL_PTE_WRITE
-#endif
- );
-   }
-#ifdef __x86_64__
-#ifdef MACH_HYP
-   kernel_pmap->user_l4base = NULL;
-   kernel_pmap->user_pdpbase = NULL;
-#endif
-   kernel_pmap->l4base = (pt_entry_t*)phystokv(pmap_grab_page());
-   memset(kernel_pmap->l4base, 0, INTEL_PGBYTES);
-   WRITE_PTE(&kernel_pmap->l4base[0], 
pa_to_pte(_kvtophys(kernel_pmap->pdpbase)) | INTEL_PTE_VALID | INTEL_PTE_WRITE);
-#ifdef MACH_PV_PAGETABLES
-   pmap_set_page_readonly_init(kernel_pmap->l4base);
-#endif
-#endif /* x86_64 */
+   pmap_bootstrap_pae();
 #else  /* PAE */
kernel_pmap->dirbase = kernel_page_dir = 
(pt_entry_t*)phystokv(pmap_grab_page());
 #endif /* PAE */
-- 
2.30.2




[PATCH 8/9] separate initialization of kernel and user PTP tables

2023-02-12 Thread Luca Dariz
* i386/i386/vm_param.h: temporariliy fix kernel upper address
* i386/intel/pmap.c: split kernel and user L3 map initialization. For
  simplicity in handling the different configurations, on 32-bit
  (+PAE) the name PDPNUM_KERNEL is used in place of PDPNUM, while only
  on x86_64 the PDPNUM_USER and PDPNUM_KERNEL are treated differently.
  Also, change iterating over PTP tables in case the kernel map is not
  right after the user map.
* i386/intel/pmap.h: define PDPNUM_USER and PDPNUM_KERNEL and move
  PDPSHIFT to simplify ifdefs.
---
 i386/i386/vm_param.h |  2 +-
 i386/intel/pmap.c| 62 ++--
 i386/intel/pmap.h|  8 +++---
 3 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/i386/i386/vm_param.h b/i386/i386/vm_param.h
index 5e7f149a..c2e623a6 100644
--- a/i386/i386/vm_param.h
+++ b/i386/i386/vm_param.h
@@ -77,7 +77,7 @@
 /* This is the kernel address range in linear addresses.  */
 #ifdef __x86_64__
 #define LINEAR_MIN_KERNEL_ADDRESS  VM_MIN_KERNEL_ADDRESS
-#define LINEAR_MAX_KERNEL_ADDRESS  (0x7fffUL)
+#define LINEAR_MAX_KERNEL_ADDRESS  (0xUL)
 #else
 /* On x86, the kernel virtual address space is actually located
at high linear addresses. */
diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index a9ff6f3e..7d4ad341 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -604,17 +604,22 @@ static void pmap_bootstrap_pae(void)
 #endif
kernel_pmap->l4base = (pt_entry_t*)phystokv(pmap_grab_page());
memset(kernel_pmap->l4base, 0, INTEL_PGBYTES);
+#else
+   const int PDPNUM_KERNEL = PDPNUM;
 #endif /* x86_64 */
 
-   // TODO: allocate only the PDPTE for kernel virtual space
-   // this means all directmap and the stupid limit above it
-   init_alloc_aligned(PDPNUM * INTEL_PGBYTES, &addr);
+   init_alloc_aligned(PDPNUM_KERNEL * INTEL_PGBYTES, &addr);
kernel_page_dir = (pt_entry_t*)phystokv(addr);
+   memset(kernel_page_dir, 0, PDPNUM_KERNEL * INTEL_PGBYTES);
 
pdp_kernel = (pt_entry_t*)phystokv(pmap_grab_page());
memset(pdp_kernel, 0, INTEL_PGBYTES);
-   for (int i = 0; i < PDPNUM; i++)
-   WRITE_PTE(&pdp_kernel[i],
+   for (int i = 0; i < PDPNUM_KERNEL; i++) {
+   int pdp_index = i;
+#ifdef __x86_64__
+   pdp_index += lin2pdpnum(VM_MIN_KERNEL_ADDRESS);
+#endif
+   WRITE_PTE(&pdp_kernel[pdp_index],
  pa_to_pte(_kvtophys((void *) kernel_page_dir
  + i * INTEL_PGBYTES))
  | INTEL_PTE_VALID
@@ -622,6 +627,7 @@ static void pmap_bootstrap_pae(void)
  | INTEL_PTE_WRITE
 #endif
);
+   }
 
 #ifdef __x86_64__
 /* only fill the kernel pdpte during bootstrap */
@@ -749,12 +755,12 @@ void pmap_bootstrap(void)
pmap_bootstrap_pae();
 #else  /* PAE */
kernel_pmap->dirbase = kernel_page_dir = 
(pt_entry_t*)phystokv(pmap_grab_page());
-#endif /* PAE */
{
unsigned i;
for (i = 0; i < NPDES; i++)
kernel_page_dir[i] = 0;
}
+#endif /* PAE */
 
 #ifdef MACH_PV_PAGETABLES
pmap_bootstrap_xen()
@@ -1260,6 +1266,10 @@ pmap_page_table_page_dealloc(vm_offset_t pa)
  */
 pmap_t pmap_create(vm_size_t size)
 {
+#ifdef __x86_64__
+   // needs to be reworked if we want to dynamically allocate PDPs
+   const int PDPNUM = PDPNUM_KERNEL;
+#endif
pt_entry_t  *page_dir[PDPNUM], *pdp_kernel;
int i;
pmap_t  p;
@@ -1328,8 +1338,12 @@ pmap_t pmap_create(vm_size_t size)
 
memset(pdp_kernel, 0, INTEL_PGBYTES);
{
-   for (i = 0; i < PDPNUM; i++)
-   WRITE_PTE(&pdp_kernel[i],
+   for (i = 0; i < PDPNUM; i++) {
+   int pdp_index = i;
+#ifdef __x86_64__
+   pdp_index += lin2pdpnum(VM_MIN_KERNEL_ADDRESS);
+#endif
+   WRITE_PTE(&pdp_kernel[pdp_index],
  pa_to_pte(kvtophys((vm_offset_t) page_dir[i]))
  | INTEL_PTE_VALID
 #if (defined(__x86_64__) && !defined(MACH_HYP)) || defined(MACH_PV_PAGETABLES)
@@ -1339,19 +1353,39 @@ pmap_t pmap_create(vm_size_t size)
 #endif /* __x86_64__ */
 #endif
  );
+   }
}
 #ifdef __x86_64__
-   // TODO alloc only PDPTE for the user range VM_MIN_USER_ADDRESS, 
VM_MAX_USER_ADDRESS
-   // and keep the same for kernel range, in l4 table we have different 
entries
p->l4base = (pt_entry_t *) kmem_cache_alloc(&l4_cache);
if (p->l4base == NULL)
panic("pmap_create");
memset(p->l4base, 0, INTEL_PGBYTES);
WRITE_PTE(&p->l4base[lin2l4num(VM_MIN_KERNEL_ADDRESS)],
- pa_to_pte(kvtophys((vm_offset

[PATCH 0/9 gnumach] move kernel vm map to high addresses on x86_64

2023-02-12 Thread Luca Dariz
The kernel vm region is moved to the last 2GB of the 64-bit address
space. According to the -mcmodel=kernel, the code must be placed in
this range, but probably other addresses can be used for other data
structures, direct memory mapping and so on (as in Linux).

This is just the first step towards being able to use more memory, as
now also user-space has the same 3GB as on a 32-bit kernel, but the
memory structure has not changed much otherwise. The kernel map is not
immediatly following the user map anymore, so the main changes are on
the pmap module.

Luca Dariz (9):
  prepare pmap helpers for full 64 bit memory map
  fix x86_64 asm for higher kernel addresses
  factor out xen-specific bootstrap
  factor out PAE-specific bootstrap
  use L4 page table directly on x86_64 instead of short-circuiting to
pdpbase
  add more explicit names for user space virtual space limits
  extend data types to hold a 64-bit address
  separate initialization of kernel and user PTP tables
  move kernel virtual address space to upper addresses

 i386/i386/db_interface.c  |   4 +-
 i386/i386/ldt.c   |   8 +-
 i386/i386/pcb.c   |   6 +-
 i386/i386/trap.c  |  12 +-
 i386/i386/vm_param.h  |  26 ++-
 i386/i386at/biosmem.c |   2 -
 i386/include/mach/i386/vm_param.h |   2 +-
 i386/intel/pmap.c | 328 --
 i386/intel/pmap.h |  27 ++-
 kern/task.c   |   4 +-
 x86_64/Makefrag.am|  12 +-
 x86_64/interrupt.S|   4 +-
 x86_64/locore.S   |  10 +-
 13 files changed, 290 insertions(+), 155 deletions(-)

-- 
2.30.2




[PATCH 3/9] factor out xen-specific bootstrap

2023-02-12 Thread Luca Dariz
* i386/intel/pmap.c: move it to pmap_bootstrap_xen()
---
 i386/intel/pmap.c | 107 --
 1 file changed, 56 insertions(+), 51 deletions(-)

diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 9fe16368..15577a09 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -581,6 +581,61 @@ vm_offset_t pmap_map_bd(
return(virt);
 }
 
+#ifdef MACH_PV_PAGETABLES
+void pmap_bootstrap_xen()
+{
+   /* We don't actually deal with the CR3 register content at all */
+   hyp_vm_assist(VMASST_CMD_enable, VMASST_TYPE_pae_extended_cr3);
+   /*
+* Xen may only provide as few as 512KB extra bootstrap linear memory,
+* which is far from enough to map all available memory, so we need to
+* map more bootstrap linear memory. We here map 1 (resp. 4 for PAE)
+* other L1 table(s), thus 4MiB extra memory (resp. 8MiB), which is
+* enough for a pagetable mapping 4GiB.
+*/
+#ifdef PAE
+#define NSUP_L1 4
+#else
+#define NSUP_L1 1
+#endif
+   pt_entry_t *l1_map[NSUP_L1];
+   vm_offset_t la;
+   int n_l1map;
+   for (n_l1map = 0, la = VM_MIN_KERNEL_ADDRESS; la >= 
VM_MIN_KERNEL_ADDRESS; la += NPTES * PAGE_SIZE) {
+   pt_entry_t *base = (pt_entry_t*) boot_info.pt_base;
+#ifdef PAE
+#ifdef __x86_64__
+   base = (pt_entry_t*) ptetokv(base[0]);
+#endif /* x86_64 */
+   pt_entry_t *l2_map = (pt_entry_t*) 
ptetokv(base[lin2pdpnum(la)]);
+#else  /* PAE */
+   pt_entry_t *l2_map = base;
+#endif /* PAE */
+   /* Like lin2pdenum, but works with non-contiguous boot L3 */
+   l2_map += (la >> PDESHIFT) & PDEMASK;
+   if (!(*l2_map & INTEL_PTE_VALID)) {
+   struct mmu_update update;
+   unsigned j, n;
+
+   l1_map[n_l1map] = (pt_entry_t*) 
phystokv(pmap_grab_page());
+   for (j = 0; j < NPTES; j++)
+   l1_map[n_l1map][j] = 
(((pt_entry_t)pfn_to_mfn(lin2pdenum(la - VM_MIN_KERNEL_ADDRESS) * NPTES + j)) 
<< PAGE_SHIFT) | INTEL_PTE_VALID | INTEL_PTE_WRITE;
+   pmap_set_page_readonly_init(l1_map[n_l1map]);
+   if (!hyp_mmuext_op_mfn (MMUEXT_PIN_L1_TABLE, kv_to_mfn 
(l1_map[n_l1map])))
+   panic("couldn't pin page %p(%lx)", 
l1_map[n_l1map], (vm_offset_t) kv_to_ma (l1_map[n_l1map]));
+   update.ptr = kv_to_ma(l2_map);
+   update.val = kv_to_ma(l1_map[n_l1map]) | 
INTEL_PTE_VALID | INTEL_PTE_WRITE;
+   hyp_mmu_update(kv_to_la(&update), 1, kv_to_la(&n), 
DOMID_SELF);
+   if (n != 1)
+   panic("couldn't complete bootstrap map");
+   /* added the last L1 table, can stop */
+   if (++n_l1map >= NSUP_L1)
+   break;
+   }
+   }
+}
+#endif /* MACH_PV_PAGETABLES */
+
 /*
  * Bootstrap the system enough to run with virtual memory.
  * Allocate the kernel page directory and page tables,
@@ -677,57 +732,7 @@ void pmap_bootstrap(void)
}
 
 #ifdef MACH_PV_PAGETABLES
-   /* We don't actually deal with the CR3 register content at all */
-   hyp_vm_assist(VMASST_CMD_enable, VMASST_TYPE_pae_extended_cr3);
-   /*
-* Xen may only provide as few as 512KB extra bootstrap linear memory,
-* which is far from enough to map all available memory, so we need to
-* map more bootstrap linear memory. We here map 1 (resp. 4 for PAE)
-* other L1 table(s), thus 4MiB extra memory (resp. 8MiB), which is
-* enough for a pagetable mapping 4GiB.
-*/
-#ifdef PAE
-#define NSUP_L1 4
-#else
-#define NSUP_L1 1
-#endif
-   pt_entry_t *l1_map[NSUP_L1];
-   {
-   vm_offset_t la;
-   int n_l1map;
-   for (n_l1map = 0, la = VM_MIN_KERNEL_ADDRESS; la >= 
VM_MIN_KERNEL_ADDRESS; la += NPTES * PAGE_SIZE) {
-   pt_entry_t *base = (pt_entry_t*) boot_info.pt_base;
-#ifdef PAE
-#ifdef __x86_64__
-   base = (pt_entry_t*) ptetokv(base[0]);
-#endif /* x86_64 */
-   pt_entry_t *l2_map = (pt_entry_t*) 
ptetokv(base[lin2pdpnum(la)]);
-#else  /* PAE */
-   pt_entry_t *l2_map = base;
-#endif /* PAE */
-   /* Like lin2pdenum, but works with non-contiguous boot 
L3 */
-   l2_map += (la >> PDESHIFT) & PDEMASK;
-   if (!(*l2_map & INTEL_PTE_VALID)) {
-   struct mmu_update update;
-   unsigned j, n;
-
-   l1_map[n_l1map] = (pt_entry_t*) 
phystokv(pmap_grab_page());
-   for (j = 0; j < NPTES; j++)
-   l1_map[n_l1map][j] = 
(((pt_entry_t)pfn_to_mfn(lin2pden

[PATCH 7/9] extend data types to hold a 64-bit address

2023-02-12 Thread Luca Dariz
* i386/i386/trap.c: change from int to a proper type to hold a
  register value
* x86_64/locore.S: use 64-bit register to avoid address truncation
---
 i386/i386/trap.c | 12 ++--
 x86_64/locore.S  |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/i386/i386/trap.c b/i386/i386/trap.c
index 1e04ae7d..9a35fb42 100644
--- a/i386/i386/trap.c
+++ b/i386/i386/trap.c
@@ -154,9 +154,9 @@ char *trap_name(unsigned int trapnum)
  */
 void kernel_trap(struct i386_saved_state *regs)
 {
-   int code;
-   int subcode;
-   int type;
+   unsigned long   code;
+   unsigned long   subcode;
+   unsigned long   type;
vm_map_tmap;
kern_return_t   result;
thread_tthread;
@@ -357,9 +357,9 @@ dump_ss(regs);
 int user_trap(struct i386_saved_state *regs)
 {
int exc = 0;/* Suppress gcc warning */
-   int code;
-   int subcode;
-   int type;
+   unsigned long   code;
+   unsigned long   subcode;
+   unsigned long   type;
thread_t thread = current_thread();
 
 #ifdef __x86_64__
diff --git a/x86_64/locore.S b/x86_64/locore.S
index c54b5cd8..a2663aff 100644
--- a/x86_64/locore.S
+++ b/x86_64/locore.S
@@ -590,7 +590,7 @@ trap_from_kernel:
 ENTRY(thread_exception_return)
 ENTRY(thread_bootstrap_return)
movq%rsp,%rcx   /* get kernel stack */
-   or  $(KERNEL_STACK_SIZE-1),%ecx
+   or  $(KERNEL_STACK_SIZE-1),%rcx
movq-7-IKS_SIZE(%rcx),%rsp  /* switch back to PCB stack */
jmp _return_from_trap
 
@@ -603,7 +603,7 @@ ENTRY(thread_bootstrap_return)
 ENTRY(thread_syscall_return)
movqS_ARG0,%rax /* get return value */
movq%rsp,%rcx   /* get kernel stack */
-   or  $(KERNEL_STACK_SIZE-1),%ecx
+   or  $(KERNEL_STACK_SIZE-1),%rcx
movq-7-IKS_SIZE(%rcx),%rsp  /* switch back to PCB stack */
movq%rax,R_EAX(%rsp)/* save return value */
jmp _return_from_trap
-- 
2.30.2




[PATCH 1/9] prepare pmap helpers for full 64 bit memory map

2023-02-12 Thread Luca Dariz
* i386/intel/pmap.c: start walking the page table tree from the L4
  table instead of the PDP table in pmap_pte() and pmap_pde(),
  preparing for the kernel to run on high addresses.
---
 i386/intel/pmap.c | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 615b0fff..9fe16368 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -437,10 +437,22 @@ pmap_pde(const pmap_t pmap, vm_offset_t addr)
if (pmap == kernel_pmap)
addr = kvtolin(addr);
 #if PAE
-   page_dir = (pt_entry_t *) ptetokv(pmap->pdpbase[lin2pdpnum(addr)]);
-#else
+   pt_entry_t *pdp_table, pdp, pde;
+#ifdef __x86_64__
+   pdp = pmap->l4base[lin2l4num(addr)];
+   if ((pdp & INTEL_PTE_VALID) == 0)
+   return PT_ENTRY_NULL;
+   pdp_table = (pt_entry_t *) ptetokv(pdp);
+#else /* __x86_64__ */
+   pdp_table = pmap->pdpbase;
+#endif /* __x86_64__ */
+   pde = pdp_table[lin2pdpnum(addr)];
+   if ((pde & INTEL_PTE_VALID) == 0)
+   return PT_ENTRY_NULL;
+   page_dir = (pt_entry_t *) ptetokv(pde);
+#else /* PAE */
page_dir = pmap->dirbase;
-#endif
+#endif /* PAE */
return &page_dir[lin2pdenum(addr)];
 }
 
@@ -457,14 +469,20 @@ pmap_pte(const pmap_t pmap, vm_offset_t addr)
pt_entry_t  *ptp;
pt_entry_t  pte;
 
-#if PAE
+#ifdef __x86_64__
+   if (pmap->l4base == 0)
+   return(PT_ENTRY_NULL);
+#elif PAE
if (pmap->pdpbase == 0)
return(PT_ENTRY_NULL);
 #else
if (pmap->dirbase == 0)
return(PT_ENTRY_NULL);
 #endif
-   pte = *pmap_pde(pmap, addr);
+   ptp = pmap_pde(pmap, addr);
+   if (ptp == 0)
+   return(PT_ENTRY_NULL);
+   pte = *ptp;
if ((pte & INTEL_PTE_VALID) == 0)
return(PT_ENTRY_NULL);
ptp = (pt_entry_t *)ptetokv(pte);
-- 
2.30.2




Re: [PATCH mig] Make MIG work for pure 64 bit kernel and userland.

2023-02-16 Thread Luca Dariz

Hi Sergey,

thanks a lot for the detailed explanation!

Il 12/02/23 19:16, Sergey Bugaev ha scritto:

But look at what Apple MIG does:

typedef struct {
 mach_msg_header_t Head;
 /* start of the kernel processed data */
 mach_msg_body_t msgh_body;
 mach_msg_port_descriptor_t port_arg;
 /* end of the kernel processed data */
 NDR_record_t NDR;
 int int_arg;
} Request

typedef struct {
 mach_msg_header_t Head;
 /* start of the kernel processed data */
 mach_msg_body_t msgh_body;
 mach_msg_port_descriptor_t out_port_arg;
 /* end of the kernel processed data */
 mach_msg_trailer_t trailer;
} Reply;


...


After the descriptors, we once again have an uninterpreted message
body where MIG puts all the data arguments into (here, NDR and the
int_arg).


With this layout alignment issues seem much simpler to solve, basically 
we can leave the compiler find the best combination for the second part 
of the message, and if the user/kernel archs are the same, the struct 
will have the same layout in the sending and receiving mig stubs.


One last thing though, we should still make sure that the struct 
starting address is correctly aligned, right? Otherwise this could blow 
up all the compiler's effort to find the correct alignment for the 
internal fields.


For example, if we have this struct on x86_64 (not using mig types for 
simplicity):


struct {
int a;// 4 bytes
char *b;  // 8 bytes
};

and let's say that the compiler places an empty space of 4 bytes between 
the fields "a" and "b", so the "b" field is aligned to 8 bytes. This 
assumes that also the starting address (where "a" is stored) is aligned 
at least to 8 bytes.


If we declare a structure, e.g. as a local variable, the compiler will 
align also the starting address, but if we convert a buffer to a pointer 
of that struct, it's our responsibility to make sure that the buffer is 
properly aligned.


In gnumach, in particular the ipc_kmsg_* functions, thin means that the 
ipc_kmsg_t struct must have the ikm_header field at a correct alignment, 
which could be 16 bytes or more, because the message is copied here with 
copyinmsg(), and for rpcs handled by the kernel, this is passed to the 
mig stubs.


I see that a kmsg is allocated with kalloc, so I guess that we need 
somehow to make sure that kalloc returns a memory chunk with the correct 
alignment. But from what I can see from a quick look, it seems that no 
particular alignment is enforced (i.e. where available, align=0).


Is this correct? Or did I miss something?



Attaching: two PostScript documents that describe this in some more
detail. It's not very easy to find Mach documentation on the web
nowadays. You can spend hours searching Google for "Untyped MIG" and
still would not find much.


Indeed, I couldn't find much information about this topic without a deep 
search. Thanks!


Luca



[PATCH 2/4] x86_64: load Elf64 bootstrap modules if ! USER32

2023-02-16 Thread Luca Dariz
* i386/include/mach/i386/exec/elf.h: add Elf64 definitions and define
  common Elf structures, corresponding to 32/64 bit variants at
  compile time.
* include/mach/exec/elf.h: add Elf64 definitions
* kern/elf-load.c: use common Elf structures
---
 i386/include/mach/i386/exec/elf.h | 20 -
 include/mach/exec/elf.h   | 36 +++
 kern/elf-load.c   | 10 -
 3 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/i386/include/mach/i386/exec/elf.h 
b/i386/include/mach/i386/exec/elf.h
index cfa988d2..582f8767 100644
--- a/i386/include/mach/i386/exec/elf.h
+++ b/i386/include/mach/i386/exec/elf.h
@@ -29,8 +29,26 @@ typedef unsigned int Elf32_Off;
 typedef signed int Elf32_Sword;
 typedef unsigned int   Elf32_Word;
 
-/* Architecture identification parameters for i386.  */
+typedef uint64_t   Elf64_Addr;
+typedef uint64_t   Elf64_Off;
+typedef int32_tElf64_Shalf;
+typedef int32_tElf64_Sword;
+typedef uint32_t   Elf64_Word;
+typedef int64_tElf64_Sxword;
+typedef uint64_t   Elf64_Xword;
+typedef uint32_t   Elf64_Half;
+typedef uint16_t   Elf64_Quarter;
+
+
+/* Architecture identification parameters for x86.  */
+#if defined(__x86_64__) && ! defined(USER32)
+#define MY_ELF_CLASS   ELFCLASS64
+#define MY_EI_DATA ELFDATA2LSB
+#define MY_E_MACHINE   EM_X86_64
+#else
+#define MY_ELF_CLASS   ELFCLASS32
 #define MY_EI_DATA ELFDATA2LSB
 #define MY_E_MACHINE   EM_386
+#endif
 
 #endif /* _MACH_I386_EXEC_ELF_H_ */
diff --git a/include/mach/exec/elf.h b/include/mach/exec/elf.h
index 81989309..3b545104 100644
--- a/include/mach/exec/elf.h
+++ b/include/mach/exec/elf.h
@@ -48,6 +48,22 @@ typedef struct {
   Elf32_Half   e_shstrndx;
 } Elf32_Ehdr;
 
+typedef struct {
+  unsigned chare_ident[EI_NIDENT]; /* Id bytes */
+  Elf64_Quartere_type; /* file type */
+  Elf64_Quartere_machine;  /* machine type */
+  Elf64_Half   e_version;  /* version number */
+  Elf64_Addr   e_entry;/* entry point */
+  Elf64_Offe_phoff;/* Program hdr offset */
+  Elf64_Offe_shoff;/* Section hdr offset */
+  Elf64_Half   e_flags;/* Processor flags */
+  Elf64_Quartere_ehsize;   /* sizeof ehdr */
+  Elf64_Quartere_phentsize;/* Program header entry size */
+  Elf64_Quartere_phnum;/* Number of program headers */
+  Elf64_Quartere_shentsize;/* Section header entry size */
+  Elf64_Quartere_shnum;/* Number of section headers */
+  Elf64_Quartere_shstrndx; /* String table index */
+} Elf64_Ehdr;
 
 /* e_ident[] identification indexes - figure 4-4, page 4-7 */
   
@@ -104,6 +120,7 @@ typedef struct {
 #define EM_SPARC64 11
 #define EM_PARISC  15
 #define EM_PPC 20
+#define EM_X86_64  62
 
 /* version - page 4-6 */
 
@@ -233,6 +250,17 @@ typedef struct {
   Elf32_Word   p_align;
 } Elf32_Phdr;
 
+typedef struct {
+  Elf64_Half   p_type; /* entry type */
+  Elf64_Half   p_flags;/* flags */
+  Elf64_Offp_offset;   /* offset */
+  Elf64_Addr   p_vaddr;/* virtual address */
+  Elf64_Addr   p_paddr;/* physical address */
+  Elf64_Xword  p_filesz;   /* file size */
+  Elf64_Xword  p_memsz;/* memory size */
+  Elf64_Xword  p_align;/* memory & file alignment */
+} Elf64_Phdr;
+
 /* segment types - page 5-3, figure 5-2 */
 
 #define PT_NULL0
@@ -291,6 +319,14 @@ typedef struct {
 #define DT_TEXTREL 22
 #define DT_JMPREL  23
 
+#if defined(__x86_64__) && ! defined(USER32)
+typedef Elf64_Ehdr Elf_Ehdr;
+typedef Elf64_Phdr Elf_Phdr;
+#else
+typedef Elf32_Ehdr Elf_Ehdr;
+typedef Elf32_Phdr Elf_Phdr;
+#endif
+
 /*
  * Bootstrap doesn't need machine dependent extensions.
  */
diff --git a/kern/elf-load.c b/kern/elf-load.c
index 3e80edfe..ce86327c 100644
--- a/kern/elf-load.c
+++ b/kern/elf-load.c
@@ -31,8 +31,8 @@ int exec_load(exec_read_func_t *read, exec_read_exec_func_t 
*read_exec,
  void *handle, exec_info_t *out_info)
 {
vm_size_t actual;
-   Elf32_Ehdr x;
-   Elf32_Phdr *phdr, *ph;
+   Elf_Ehdr x;
+   Elf_Phdr *phdr, *ph;
vm_size_t phsize;
int i;
int result;
@@ -51,7 +51,7 @@ int exec_load(exec_read_func_t *read, exec_read_exec_func_t 
*read_exec,
return EX_NOT_EXECUTABLE;
 
/* Make sure the file is of the right architecture.  */
-   if ((x.e_ident[EI_CLASS] != ELFCLASS32) ||
+   if ((x.e_ident[EI_CLASS] != MY_ELF_CLASS) ||
(x.e_ident[EI_DATA] != MY_EI_DATA) ||
(x.e_machine != MY_E_MACHINE))
return EX_WRONG_ARCH;
@@ -65,7 +65,7 @@ int exec_load(exec_read_func_t *read, exec_read_exec_func_t 
*r

[PATCH 1/4] x86_64: fix some compiler warnings

2023-02-16 Thread Luca Dariz
* i386/include/mach/i386/vm_param.h: extend the vm constants to ULL on
  x86_64 to avoid a shift overflow warning
* i386/intel/pmap.c: fix cast and unused variables
---
 i386/include/mach/i386/vm_param.h | 4 ++--
 i386/intel/pmap.c | 8 +++-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/i386/include/mach/i386/vm_param.h 
b/i386/include/mach/i386/vm_param.h
index e98f032c..f09049a5 100644
--- a/i386/include/mach/i386/vm_param.h
+++ b/i386/include/mach/i386/vm_param.h
@@ -72,9 +72,9 @@
not be increased to more than 3GB as glibc and hurd servers would not cope
with that.
*/
-#define VM_MIN_ADDRESS (0)
+#define VM_MIN_ADDRESS (0ULL)
 #ifdef __x86_64__
-#define VM_MAX_ADDRESS (0xC000UL)
+#define VM_MAX_ADDRESS (0xc000ULL)
 #else
 #define VM_MAX_ADDRESS (0xc000UL)
 #endif
diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 67c55e7d..302a60cb 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -1376,7 +1376,7 @@ pmap_t pmap_create(vm_size_t size)
pt_entry_t *user_page_dir = (pt_entry_t *) 
kmem_cache_alloc(&pd_cache);
memset(user_page_dir, 0, INTEL_PGBYTES);
WRITE_PTE(&pdp_user[i + lin2pdpnum(VM_MIN_USER_ADDRESS)],  // 
pdp_user
- pa_to_pte(kvtophys(user_page_dir))
+ pa_to_pte(kvtophys((vm_offset_t)user_page_dir))
  | INTEL_PTE_VALID
 #if (defined(__x86_64__) && !defined(MACH_HYP)) || defined(MACH_PV_PAGETABLES)
  | INTEL_PTE_WRITE | INTEL_PTE_USER
@@ -3136,14 +3136,13 @@ pmap_unmap_page_zero (void)
 void
 pmap_make_temporary_mapping(void)
 {
-   int i;
-
/*
 * We'll have to temporarily install a direct mapping
 * between physical memory and low linear memory,
 * until we start using our new kernel segment descriptors.
 */
 #if INIT_VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS
+   int i;
vm_offset_t delta = INIT_VM_MIN_KERNEL_ADDRESS - 
LINEAR_MIN_KERNEL_ADDRESS;
if ((vm_offset_t)(-delta) < delta)
delta = (vm_offset_t)(-delta);
@@ -3191,9 +3190,8 @@ pmap_set_page_dir(void)
 void
 pmap_remove_temporary_mapping(void)
 {
-   int i;
-
 #if INIT_VM_MIN_KERNEL_ADDRESS != LINEAR_MIN_KERNEL_ADDRESS
+   int i;
vm_offset_t delta = INIT_VM_MIN_KERNEL_ADDRESS - 
LINEAR_MIN_KERNEL_ADDRESS;
if ((vm_offset_t)(-delta) < delta)
delta = (vm_offset_t)(-delta);
-- 
2.30.2




[PATCH 3/4] x86_64: fix argument passing to bootstrap modules if ! USER32

2023-02-16 Thread Luca Dariz
* kern/bootstrap.c: replace integers with long/vm_offset_t
---
 kern/bootstrap.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kern/bootstrap.c b/kern/bootstrap.c
index 91f4241e..8f66a4b5 100644
--- a/kern/bootstrap.c
+++ b/kern/bootstrap.c
@@ -579,7 +579,7 @@ build_args_and_stack(struct exec_info *boot_exec_info,
vm_offset_t stack_base;
vm_size_t   stack_size;
char *  arg_ptr;
-   int arg_count, envc;
+   longarg_count, envc;
int arg_len;
char *  arg_pos;
int arg_item_len;
@@ -612,7 +612,7 @@ build_args_and_stack(struct exec_info *boot_exec_info,
 *  trailing 0 pointer
 *  and align to integer boundary
 */
-   arg_len += (sizeof(integer_t)
+   arg_len += (sizeof(rpc_vm_offset_t)
+ (arg_count + 1 + envc + 1) * sizeof(rpc_vm_offset_t));
arg_len = (arg_len + sizeof(integer_t) - 1) & ~(sizeof(integer_t)-1);
 
@@ -633,7 +633,7 @@ build_args_and_stack(struct exec_info *boot_exec_info,
 * Start the strings after the arg-count and pointers
 */
string_pos = (arg_pos
- + sizeof(integer_t)
+ + sizeof(rpc_vm_offset_t)
  + (arg_count + 1 + envc + 1) * sizeof(rpc_vm_offset_t));
 
/*
@@ -641,8 +641,8 @@ build_args_and_stack(struct exec_info *boot_exec_info,
 */
(void) copyout(&arg_count,
arg_pos,
-   sizeof(integer_t));
-   arg_pos += sizeof(integer_t);
+   sizeof(rpc_vm_offset_t));
+   arg_pos += sizeof(rpc_vm_offset_t);
 
/*
 * Then the strings and string pointers for each argument
-- 
2.30.2




[PATCH 4/4] x86_64: set user segments as 64-bit if ! USER32

2023-02-16 Thread Luca Dariz
* i386/i386/ldt.c: set the L bit if user-space is 64-bit
---
 i386/i386/ldt.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c
index 70fa24e2..b86a0e3c 100644
--- a/i386/i386/ldt.c
+++ b/i386/i386/ldt.c
@@ -45,6 +45,12 @@ extern
 #endif /* MACH_PV_DESCRIPTORS */
 struct real_descriptor ldt[LDTSZ];
 
+#if defined(__x86_64__) && ! defined(USER32)
+#define USER_SEGMENT_SIZEBITS SZ_64
+#else
+#define USER_SEGMENT_SIZEBITS SZ_32
+#endif
+
 void
 ldt_fill(struct real_descriptor *myldt, struct real_descriptor *mygdt)
 {
@@ -67,11 +73,11 @@ ldt_fill(struct real_descriptor *myldt, struct 
real_descriptor *mygdt)
VM_MIN_USER_ADDRESS,
VM_MAX_USER_ADDRESS-VM_MIN_USER_ADDRESS-4096,
/* XXX LINEAR_... */
-   ACC_PL_U|ACC_CODE_R, SZ_32);
+   ACC_PL_U|ACC_CODE_R, USER_SEGMENT_SIZEBITS);
fill_ldt_descriptor(myldt, USER_DS,
VM_MIN_USER_ADDRESS,
VM_MAX_USER_ADDRESS-VM_MIN_USER_ADDRESS-4096,
-   ACC_PL_U|ACC_DATA_W, SZ_32);
+   ACC_PL_U|ACC_DATA_W, USER_SEGMENT_SIZEBITS);
 
/* Activate the LDT.  */
 #ifdef MACH_PV_DESCRIPTORS
-- 
2.30.2




Re: [PATCH 6/6] enable syscalls on x86_64

2023-02-22 Thread Luca Dariz

Hi,

Il 22/02/23 15:44, Sergey Bugaev ha scritto:

On Sun, Feb 12, 2023 at 8:04 PM Luca Dariz  wrote:

-// TODO: test it before dropping ud2
-   ud2


Hi; could you please tell me how you are testing this?

What userland code do you run? How does it make syscalls, does it use
lcall $7, $0 like on i386, or what?


yes this is the entry point through the call gate, it works with any 
32-bit binary that you can compile on  32-bit hurd. For testing you can 
use a ramdisk, like the installer, or some simple executable (I have my 
own scripts [0] that I would like to polish and integrate somewhere).


Note that on a 64-bit kernel the linux drivers are not integrated, and 
user-space drivers currently don't work because of missing interrupt 
support (and maybe something else).


As far as I understand from Intel's datasheets, it should be possible to 
use a far call also from a 64-bit userspace, but I couldn't find the 
right way so far, so I was looking at the syscall/sysret mechanisms instead.


Luca

[0] https://gitlab.com/luckyd/gnumach/-/tree/prepare64_wip/tests



[PATCH 3/5] fix port name copyin

2023-02-27 Thread Luca Dariz
* x86_64/copy_user.c: in mach_msg_user_header_t there are some holes
  that need to be cleared, to adapt to the different layout of
  mach_msg_header_t.
---
 x86_64/copy_user.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/x86_64/copy_user.c b/x86_64/copy_user.c
index 86d23525..ae17c368 100644
--- a/x86_64/copy_user.c
+++ b/x86_64/copy_user.c
@@ -194,6 +194,8 @@ int copyinmsg (const void *userbuf, void *kernelbuf, const 
size_t usize)
 "mach_msg_header_t and mach_msg_user_header_t expected to be 
of the same size");
   if (copyin(umsg, kmsg, sizeof(mach_msg_header_t)))
 return 1;
+  kmsg->msgh_remote_port &= 0x; // FIXME: still have port names here
+  kmsg->msgh_local_port &= 0x;  // also, this assumes little-endian
 #endif
 
   vm_offset_t usaddr, ueaddr, ksaddr;
-- 
2.30.2




[PATCH 4/5] x86_64: fix user trap during syscall with an invalid user stack

2023-02-27 Thread Luca Dariz
* i386/i386/locore.h: user vm_offset_t in the recovery_table
* x86_64/locore.S: fix RECOVERY() location and keep user regs in %rbx,
  as it seems the convention. This only applies to 32-bit userspace.
---
 i386/i386/locore.h |  4 ++--
 x86_64/locore.S| 20 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/i386/i386/locore.h b/i386/i386/locore.h
index 00da07ad..a8807dbf 100644
--- a/i386/i386/locore.h
+++ b/i386/i386/locore.h
@@ -27,8 +27,8 @@
  * Fault recovery in copyin/copyout routines.
  */
 struct recovery {
-   int fault_addr;
-   int recover_addr;
+   vm_offset_t fault_addr;
+   vm_offset_t recover_addr;
 };
 
 extern struct recovery recover_table[];
diff --git a/x86_64/locore.S b/x86_64/locore.S
index a2663aff..47d9085c 100644
--- a/x86_64/locore.S
+++ b/x86_64/locore.S
@@ -1157,18 +1157,18 @@ syscall_native:
andq%r10,%r10
jz  mach_call_call  /* skip argument copy if none */
 
-   movqR_UESP(%rbx),%rbx   /* get user stack pointer */
-   addq$4,%rbx /* Skip user return address */
-
movq$USER_DS,%rdx   /* use user data segment for accesses */
mov %dx,%fs
movq%rsp,%r11   /* save kernel ESP for error recovery */
 
+   movqR_UESP(%rbx),%rbp   /* get user stack pointer */
+   addq$4,%rbp /* Skip user return address */
+
 #define PARAM(reg,ereg) \
-   RECOVER(mach_call_addr_push) \
xorq%reg,%reg   ;\
-   movl%fs:(%rbx),%ereg/* 1st parameter */ ;\
-   addq$4,%rbx ;\
+   RECOVER(mach_call_addr_push) \
+   movl%fs:(%rbp),%ereg/* 1st parameter */ ;\
+   addq$4,%rbp ;\
dec %r10;\
jz  mach_call_call
 
@@ -1179,12 +1179,12 @@ syscall_native:
PARAM(r8,r8d)   /* 5th parameter */
PARAM(r9,r9d)   /* 6th parameter */
 
-   lea (%rbx,%r10,4),%rbx  /* point past last argument */
+   lea (%rbp,%r10,4),%rbp  /* point past last argument */
xorq%r12,%r12
 
-0: subq$4,%rbx
+0: subq$4,%rbp
RECOVER(mach_call_addr_push)
-   movl%fs:(%rbx),%r12d
+   movl%fs:(%rbp),%r12d
pushq   %r12/* push argument on stack */
dec %r10
jnz 0b  /* loop for all arguments */
@@ -1208,7 +1208,7 @@ mach_call_call:
 
 /*
  * Address out of range.  Change to page fault.
- * %esi holds failing address.
+ * %rsi holds failing address.
  */
 mach_call_addr_push:
movq%r11,%rsp   /* clean parameters from stack */
-- 
2.30.2




[PATCH 0/5] basic syscall support on x86_64

2023-02-27 Thread Luca Dariz
This enables simple user-space programs compiled for x86_64. Regular
syscalls should work, RPCs are not yet properly working:
* the commit "fix port name copyin" is just to make simple rpc work
* To test the syscalls I patched mig to not add the _Static_assert()
  statements, otherwise gnumach fails to build.

Luca Dariz (5):
  x86_64: allow compilation if ! USER32
  fix copyin/outmsg header for ! USER32
  fix port name copyin
  x86_64: fix user trap during syscall with an invalid user stack
  x86_64: add 64-bit syscall entry point

 i386/i386/i386asm.sym   |  11 ++
 i386/i386/ldt.c |  15 ++-
 i386/i386/ldt.h |   7 +-
 i386/i386/locore.h  |  33 +-
 i386/include/mach/i386/syscall_sw.h |  16 +--
 i386/intel/pmap.c   |   6 +-
 x86_64/copy_user.c  |   6 +-
 x86_64/locore.S | 156 ++--
 8 files changed, 224 insertions(+), 26 deletions(-)

-- 
2.30.2




[PATCH 5/5] x86_64: add 64-bit syscall entry point

2023-02-27 Thread Luca Dariz
While theoretically we could still use the same call gate as for
32-bit userspace, it doesn't seem very common, and gcc seems to not
encode properly the instruction. Instead we use syscall/sysret as
other kernels (e.g. XNU,Linux). This version still has some
limitations, but should be enough to start working on the 64-bit user
space.

* i386/i386/i386asm.sym: add more constants to fill pcb->iss
* i386/i386/ldt.c: configure 64-bit syscall entry point
* i386/i386/ldt.h: swap CS/DS segments order if !USER32 as required by
  sysret
* i386/i386/locore.h: add syscall64 and MSR definitions
* i386/include/mach/i386/syscall_sw.h: add simple entry point from
  user space. This is just for simple tests, it seems glibc doesn't
  use this
* x86_64/locore.S: implement syscall64 entry point
---
 i386/i386/i386asm.sym   |  11 +++
 i386/i386/ldt.c |  15 ++-
 i386/i386/ldt.h |   7 +-
 i386/i386/locore.h  |  29 ++
 i386/include/mach/i386/syscall_sw.h |  16 ++--
 x86_64/locore.S | 136 
 6 files changed, 204 insertions(+), 10 deletions(-)

diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 8317db6c..733cc4eb 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -52,6 +52,8 @@ expr  CALL_SINGLE_FUNCTION_BASE
 
 offset ApicLocalUnit   lu  apic_id APIC_ID
 
+offset pcb pcb iss
+
 offset thread  th  pcb
 offset thread  th  task
 offset thread  th  recover
@@ -82,9 +84,15 @@ size i386_kernel_state   iks
 
 size   i386_exception_link iel
 
+offset i386_saved_stater   gs
+offset i386_saved_stater   fs
 offset i386_saved_stater   cs
 offset i386_saved_stater   uesp
 offset i386_saved_stater   eax
+offset i386_saved_stater   ebx
+offset i386_saved_stater   ecx
+offset i386_saved_stater   edx
+offset i386_saved_stater   ebp
 offset i386_saved_stater   trapno
 offset i386_saved_stater   err
 offset i386_saved_stater   efl R_EFLAGS
@@ -92,6 +100,9 @@ offset   i386_saved_stater   eip
 offset i386_saved_stater   cr2
 offset i386_saved_stater   edi
 #ifdef __x86_64__
+offset i386_saved_stater   r12
+offset i386_saved_stater   r13
+offset i386_saved_stater   r14
 offset i386_saved_stater   r15
 #endif
 
diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c
index b86a0e3c..61a03d65 100644
--- a/i386/i386/ldt.c
+++ b/i386/i386/ldt.c
@@ -31,6 +31,7 @@
 #include 
 
 #include 
+#include 
 
 #include "vm_param.h"
 #include "seg.h"
@@ -65,10 +66,22 @@ ldt_fill(struct real_descriptor *myldt, struct 
real_descriptor *mygdt)
ACC_PL_K|ACC_LDT, 0);
 #endif /* MACH_PV_DESCRIPTORS */
 
-   /* Initialize the 32bit LDT descriptors.  */
+   /* Initialize the syscall entry point */
+#if defined(__x86_64__) && ! defined(USER32)
+if (!(CPU_HAS_FEATURE(CPU_FEATURE_MSR) && 
CPU_HAS_FEATURE(CPU_FEATURE_SEP)))
+panic("syscall support is missing on 64 bit");
+/* Enable 64-bit syscalls */
+wrmsr(MSR_REG_EFER, rdmsr(MSR_REG_EFER) | MSR_EFER_SCE);
+wrmsr(MSR_REG_LSTAR, syscall64);
+wrmsr(MSR_REG_STAR, long)USER_CS - 16) << 16) | (long)KERNEL_CS) 
<< 32);
+wrmsr(MSR_REG_FMASK, 0);  // ?
+#else /* defined(__x86_64__) && ! defined(USER32) */
fill_ldt_gate(myldt, USER_SCALL,
  (vm_offset_t)&syscall, KERNEL_CS,
  ACC_PL_U|ACC_CALL_GATE, 0);
+#endif /* defined(__x86_64__) && ! defined(USER32) */
+
+   /* Initialize the 32bit LDT descriptors.  */
fill_ldt_descriptor(myldt, USER_CS,
VM_MIN_USER_ADDRESS,
VM_MAX_USER_ADDRESS-VM_MIN_USER_ADDRESS-4096,
diff --git a/i386/i386/ldt.h b/i386/i386/ldt.h
index b15f11a5..4490f99f 100644
--- a/i386/i386/ldt.h
+++ b/i386/i386/ldt.h
@@ -45,9 +45,14 @@
 #defineUSER_SCALL  0x07/* system call gate */
 #ifdef __x86_64__
 /* Call gate needs two entries */
-#endif
+
+/* The sysret instruction puts some constraints on the user segment indexes */
+#defineUSER_CS 0x1f/* user code segment */
+#defineUSER_DS 0x17/* user data segment */
+#else
 #defineUSER_CS 0x17/* user code segment */
 #defineUSER_DS 0x1f/* user data segment */
+#endif
 
 #defineLDTSZ   4
 
diff --git a/i386/i386/locore.h b/i386/i386/locore.h
index a8807dbf..39545ff5 100644
--- a/i386/i386/locore.h
+++ b/i386/i386/locore.h
@@ -57,6 +57,7 @@ extern int inst_fetch (int eip, int cs);
 extern void cpu_shutdown (void);
 
 extern int syscal

[PATCH 2/5] fix copyin/outmsg header for ! USER32

2023-02-27 Thread Luca Dariz
* x86_64/copy_user.c: fix copyin/out, we already have a pointer to
  user/kernel buffers
---
 x86_64/copy_user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/x86_64/copy_user.c b/x86_64/copy_user.c
index dd9fe2d7..86d23525 100644
--- a/x86_64/copy_user.c
+++ b/x86_64/copy_user.c
@@ -192,7 +192,7 @@ int copyinmsg (const void *userbuf, void *kernelbuf, const 
size_t usize)
   /* The 64 bit interface ensures the header is the same size, so it does not 
need any resizing. */
   _Static_assert(sizeof(mach_msg_header_t) == sizeof(mach_msg_user_header_t),
 "mach_msg_header_t and mach_msg_user_header_t expected to be 
of the same size");
-  if (copyin(&umsg, &kmsg, sizeof(mach_msg_header_t)))
+  if (copyin(umsg, kmsg, sizeof(mach_msg_header_t)))
 return 1;
 #endif
 
@@ -290,7 +290,7 @@ int copyoutmsg (const void *kernelbuf, void *userbuf, const 
size_t ksize)
  sizeof(kmsg->msgh_seqno) + sizeof(kmsg->msgh_id)))
 return 1;
 #else
-  if (copyout(&kmsg, &umsg, sizeof(mach_msg_header_t)))
+  if (copyout(kmsg, umsg, sizeof(mach_msg_header_t)))
 return 1;
 #endif  /* USER32 */
 
-- 
2.30.2




[PATCH 1/5] x86_64: allow compilation if ! USER32

2023-02-27 Thread Luca Dariz
* i386/intel/pmap.c: remove #error and allow compilation, keeping a
  reminder to fix the pmap module.
---
 i386/intel/pmap.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/i386/intel/pmap.c b/i386/intel/pmap.c
index 302a60cb..40f672b5 100644
--- a/i386/intel/pmap.c
+++ b/i386/intel/pmap.c
@@ -1473,7 +1473,8 @@ void pmap_destroy(pmap_t p)
/* In this case we know we have one PDP for user space */
pt_entry_t *pdp = (pt_entry_t *) 
ptetokv(p->l4base[lin2l4num(VM_MIN_USER_ADDRESS)]);
 #else
-#error "TODO do 64-bit userspace need more that 512G?"
+#warning "TODO do 64-bit userspace need more that 512G?"
+   pt_entry_t *pdp = (pt_entry_t *) 
ptetokv(p->l4base[lin2l4num(VM_MIN_USER_ADDRESS)]);
 #endif /* USER32 */
page_dir = (pt_entry_t *) ptetokv(pdp[i]);
 #else /* __x86_64__ */
@@ -2482,7 +2483,8 @@ void pmap_collect(pmap_t p)
/* In this case we know we have one PDP for user space */
pdp = (pt_entry_t *) 
ptetokv(p->l4base[lin2l4num(VM_MIN_USER_ADDRESS)]);
 #else
-#error "TODO do 64-bit userspace need more that 512G?"
+#warning "TODO do 64-bit userspace need more that 512G?"
+   pdp = (pt_entry_t *) 
ptetokv(p->l4base[lin2l4num(VM_MIN_USER_ADDRESS)]);
 #endif /* USER32 */
page_dir = (pt_entry_t *) ptetokv(pdp[i]);
 #else /* __x86_64__ */
-- 
2.30.2




Re: [PATCH 5/5] x86_64: add 64-bit syscall entry point

2023-02-28 Thread Luca Dariz

Il 27/02/23 23:02, Samuel Thibault ha scritto:

Luca Dariz, le lun. 27 févr. 2023 21:45:01 +0100, a ecrit:

diff --git a/i386/i386/ldt.h b/i386/i386/ldt.h
index b15f11a5..4490f99f 100644
--- a/i386/i386/ldt.h
+++ b/i386/i386/ldt.h
@@ -45,9 +45,14 @@
  #define   USER_SCALL  0x07/* system call gate */
  #ifdef __x86_64__
  /* Call gate needs two entries */
-#endif
+
+/* The sysret instruction puts some constraints on the user segment indexes */
+#defineUSER_CS 0x1f/* user code segment */
+#defineUSER_DS 0x17/* user data segment */


I'd say we'd rather avoid changing them for the x86_64 && USER32 case?


Right, I forgot to add ! USER32 here


+#else
  #define   USER_CS 0x17/* user code segment */
  #define   USER_DS 0x1f/* user data segment */
+#endif
  
  #define	LDTSZ		4
  
diff --git a/i386/include/mach/i386/syscall_sw.h b/i386/include/mach/i386/syscall_sw.h

index 86f6ff2f..20ef7c13 100644
--- a/i386/include/mach/i386/syscall_sw.h
+++ b/i386/include/mach/i386/syscall_sw.h
@@ -29,16 +29,16 @@
  
  #include 
  
-#if BSD_TRAP

-#define kernel_trap(trap_name,trap_number,number_args) \
-ENTRY(trap_name) \
-   movl$ trap_number,%eax; \
-   SVC; \
-   jb LCL(cerror); \
-   ret; \
+#if defined(__x86_64__) && ! defined(USER32)
+#define kernel_trap(trap_name,trap_number,number_args)  \
+ENTRY(trap_name)   \
+   movq$ trap_number,%rax; \



+   movq%rcx,%r10;  \


What is that for?


The syscall instruction automatically stores RIP in RCX, but RCX is also 
the place for the 4th arg passed to a function, so we need another 
register to store it. In this case R10 is the only non-callee-preserved 
register remaining. In the syscall64 code below, this value is moved 
back to RCX after saving the thread state.



+   syscall;\
+   ret;\
  END(trap_name)
  #else
-#define kernel_trap(trap_name,trap_number,number_args) \
+#define kernel_trap(trap_name,trap_number,number_args)  \
  ENTRY(trap_name) \
movl$ trap_number,%eax; \
SVC; \
diff --git a/x86_64/locore.S b/x86_64/locore.S
index 47d9085c..fdf7300b 100644
--- a/x86_64/locore.S
+++ b/x86_64/locore.S
@@ -1281,6 +1281,142 @@ DATA(cpu_features_ecx)
  
  END(syscall)
  
+

+/* Entry point for 64-bit syscalls.
+ * On entry we're still on the user stack, so better not use it. Instead we
+ * save the thread state immediately in thread->pcb->iss, then try to invoke
+ * the syscall.
+ * TODO:
+ - for now we assume the return address is canonical, but apparently there
+   can be cases where it's not (see how Linux handles this). Does it apply
+   here?
+ - do we need to check for ast on syscalls? Maybe on interrupts is enough
+ - check that the case where a task is suspended, and later returns via
+   iretq from return_from_trap, works fine in all combinations
+ - emulated syscalls - are they used anywhere?


Not that I know of.


Ok, I'll update the comment about emulated syscalls.


+ */
+ENTRY(syscall64)
+   /* RFLAGS[32:63] are reserved, so combine syscall num (32 bit) and
+* eflags in RAX to allow using r11 as temporary register */
+   shlq$32,%r11
+   shlq$32,%rax/* make sure bits 32:63 of %rax are zero */
+   shrq$32,%rax
+   or  %r11,%rax
+
+   /* Save thread state in pcb->iss, as on exception entry.
+* Since this is triggered synchronously from userspace, we can
+* save only the callee-preserved status according to the C ABI,
+* plus RIP and EFLAGS for sysret */
+   CPU_NUMBER(%r11)
+   movqCX(EXT(active_threads),%r11),%r11 /* point to current thread */
+   movqTH_PCB(%r11),%r11   /* point to pcb */
+   addq$ PCB_ISS,%r11  /* point to saved state */
+
+   mov %gs,R_GS(%r11)
+   mov %fs,R_FS(%r11)
+   mov %rsp,R_UESP(%r11)   /* callee-preserved register */
+   mov %rcx,R_EIP(%r11)/* syscall places user RIP in RCX */
+   mov %rbx,R_EBX(%r11)/* callee-preserved register */
+   mov %rax,%rbx   /* Now we can unpack eflags again */
+   shr $32,%rbx
+   mov %rbx,R_EFLAGS(%r11) /* ... and save them in pcb as well */
+   mov %rbp,R_EBP(%r11)/* callee-preserved register */
+   mov %r12,R_R12(%r11)/* callee-preserved register */
+   mov %r13,R_R13(%r11)/* callee-preserved register */
+   mov %r14,R_R14(%r11)/* callee-preserved register */
+   mov %r15,R_R15(%r11)/* callee-preserved register */
+   mov %r11,%rbx   /* prepa

Re: [PATCH 5/5] x86_64: add 64-bit syscall entry point

2023-02-28 Thread Luca Dariz

Il 28/02/23 07:39, Sergey Bugaev ha scritto:

On Mon, Feb 27, 2023 at 11:46 PM Luca Dariz  wrote:

+static inline void wrmsr(uint32_t regaddr, uint64_t value)
+{
+uint32_t low=(uint32_t)value, high=((uint32_t)(value >> 32));


I think it'd be more idiomatic in both GNU and Mach styles to put more
spaces here, like this:

uint32_t low = (uint32_t) value, high = (uint32_t) (value >> 32);


+asm volatile("wrmsr\n"  \


I don't think you even need the \n for a single instruction.

Does this really need the backslashes? They are needed in macros, but why here?


I'll clean up the formatting, \n and backslashes


+ :  \
+ : "c" (regaddr), "a" (low), "d" (high) \
+ : "memory" \
+);
+}


Why "memory" here? Can wrmsr clobber unrelated memory? (I don't know,
maybe it can -- if so, perhaps add a comment?)


I admit I added "memory" just because I saw it in Linux, but looking 
deeper it turns out that wrmsr is a serializing instruction, which is 
even more than a memory barrier, so I think it's better to explicitly 
tell the compiler about it (see e.g. §8.3 of Intel's System Programming 
Guide, vol. 3 [0[).



+static inline uint64_t rdmsr(uint32_t regaddr)
+{
+uint32_t low, high;
+asm volatile("rdmsr\n"  \
+ : "=a" (low), "=d" (high)  \
+ : "c" (regaddr)\
+);
+return ((uint64_t)high << 32) | low;


Ditto about spacing -- and does this need volatile? As in, does
reading from a MSR have side effects that we're interested in, or does
it only output the read value? (Again, I don't know!)


I'd say both reading and writing a MSR is a side effect, as its value 
could change potentially in ways not predictable by the compiler, and 
the MSR itself is not directly "seen" by the compiler. It's a kind of 
I/O in this sense.



diff --git a/i386/include/mach/i386/syscall_sw.h 
b/i386/include/mach/i386/syscall_sw.h
index 86f6ff2f..20ef7c13 100644
--- a/i386/include/mach/i386/syscall_sw.h
+++ b/i386/include/mach/i386/syscall_sw.h
@@ -29,16 +29,16 @@

  #include 

-#if BSD_TRAP
-#define kernel_trap(trap_name,trap_number,number_args) \
-ENTRY(trap_name) \
-   movl$ trap_number,%eax; \
-   SVC; \
-   jb LCL(cerror); \
-   ret; \
+#if defined(__x86_64__) && ! defined(USER32)
+#define kernel_trap(trap_name,trap_number,number_args)  \
+ENTRY(trap_name)   \
+   movq$ trap_number,%rax; \
+   movq%rcx,%r10;  \
+   syscall;\
+   ret;\
  END(trap_name)


OK, so the x86_64 syscall definition stays in i386/syscall_sw.h, and
not in a separate x86_64/syscall_sw.h file? That's what I thought. In
this case, we do want that mach-machine patch in glibc. Samuel, does
this make sense to you?


I'll try to change the installed file depending on i386/x86_64/USER32.


Predicating on USER32 is not really going to work here. This header is
AFAICS not used by Mach itself, it's the UAPI header, one that is
installed into the system's include dir for the userspace to include &
use. And surely userspace is not going to define USER32 either way.


Right, I'll remove USER32.


Thanks!

Luca


[0] here's the combined version, with volumes 1-4 
https://cdrdv2.intel.com/v1/dl/getContent/671200




Re: [PATCH 5/5] x86_64: add 64-bit syscall entry point

2023-02-28 Thread Luca Dariz

Il 28/02/23 15:14, Sergey Bugaev ha scritto:

On Tue, Feb 28, 2023 at 4:26 PM Luca Dariz  wrote:

+/* check if we need to place some arguments on the stack */
+_syscall64_args_stack:
+mov EXT(mach_trap_table)(%rax),%r10 /* get number of arguments */
+subq$6,%r10 /* the first 6 args are already in place */
+jl  _syscall64_call /* skip argument copy if >6 args */


jle?


Right, I didn't test a 6-args syscall.


+
+movqR_UESP(%rbx),%r11   /* get user stack pointer */
+addq$8,%r11 /* Skip user return address */
+
+mov $USER_DS,%r12   /* use user data segment for accesses */
+mov %r12,%fs
+
+lea (%r11,%r10,8),%r11  /* point past last argument */


Do I understand it right that for the most interesting syscall (which
takes 7 args!), I *am* supposed to pass the 7th arg on the stack (in
mem[rsp + 8]) -- unlike on Linux?


I think on Linux all syscalls have <= 6 arguments, so they will never 
use the stack for the 7th arg. This use of the stack is due to x86_64 
calling conventions, and since we have a syscall with 11 args 
(syscall_vm_map) I don't see currently a better way to pass the extra 
args. (but I don't exclude there is one :))



Or in other words: do I understand it right that the ABI here is:

- syscall number in rax
- arguments are passed just as per x86_64 calling convention, except
the 4th arg is in r10 and not rcx
- return value is in rax


correct


- rcx and r11 are additionally clobbered -- or not?


They will still contain RIP and EFLAGS on return, so they have a special 
treatment compared to the usual calling conventions, but they can 
probably be considered clobbered for this purpose.



- nothing else is clobbered, in particular not rflags (or is the
"reserved" half of rflags clobbered?) and not the registers that
contain args


if we follow the usual calling conventions, the registers containing 
args are clobbered. In fact, in the code I set them to 0 before sysret, 
to avoid the risk of them containing sensitive information from the 
syscall execution.


In the syscall invoker in syscall_sw.h the special syscall conventions 
should be masked in a regular function, so from userspace the mach_msg() 
or syscall_vm_allocate() or any other syscall should be just like a 
regular function. I've done some testing but I didn't test it with 
glibc, so this stub might need to be improved.



Luca



[PATCH 5/5 v2 gnumach] x86_64: add 64-bit syscall entry point

2023-03-01 Thread Luca Dariz
While theoretically we could still use the same call gate as for
32-bit userspace, it doesn't seem very common, and gcc seems to not
encode properly the instruction. Instead we use syscall/sysret as
other kernels (e.g. XNU,Linux). This version still has some
limitations, but should be enough to start working on the 64-bit user
space.

* i386/i386/i386asm.sym: add more constants to fill pcb->iss
* i386/i386/ldt.c: configure 64-bit syscall entry point. We can just
  check for the SEP bit as MSR are always available on x86_64.
* i386/i386/ldt.h: swap CS/DS segments order if !USER32 as required by
  sysret
* i386/i386/locore.h: add syscall64 and MSR definitions
* i386/include/mach/i386/syscall_sw.h: remove old BSD_TRAP
* x86_64/Makefrag.am: install syscall_sw.h depending on USER32
* x86_64/include/syscall_sw.h: add entry point from user space
* x86_64/locore.S: implement syscall64 entry point
---
 i386/i386/i386asm.sym   |  11 +++
 i386/i386/ldt.c |  15 ++-
 i386/i386/ldt.h |   9 +-
 i386/i386/locore.h  |  29 ++
 i386/include/mach/i386/syscall_sw.h |  12 +--
 x86_64/Makefrag.am  |   7 +-
 x86_64/include/syscall_sw.h |  40 
 x86_64/locore.S | 136 
 8 files changed, 244 insertions(+), 15 deletions(-)
 create mode 100644 x86_64/include/syscall_sw.h

diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 8317db6c..733cc4eb 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -52,6 +52,8 @@ expr  CALL_SINGLE_FUNCTION_BASE
 
 offset ApicLocalUnit   lu  apic_id APIC_ID
 
+offset pcb pcb iss
+
 offset thread  th  pcb
 offset thread  th  task
 offset thread  th  recover
@@ -82,9 +84,15 @@ size i386_kernel_state   iks
 
 size   i386_exception_link iel
 
+offset i386_saved_stater   gs
+offset i386_saved_stater   fs
 offset i386_saved_stater   cs
 offset i386_saved_stater   uesp
 offset i386_saved_stater   eax
+offset i386_saved_stater   ebx
+offset i386_saved_stater   ecx
+offset i386_saved_stater   edx
+offset i386_saved_stater   ebp
 offset i386_saved_stater   trapno
 offset i386_saved_stater   err
 offset i386_saved_stater   efl R_EFLAGS
@@ -92,6 +100,9 @@ offset   i386_saved_stater   eip
 offset i386_saved_stater   cr2
 offset i386_saved_stater   edi
 #ifdef __x86_64__
+offset i386_saved_stater   r12
+offset i386_saved_stater   r13
+offset i386_saved_stater   r14
 offset i386_saved_stater   r15
 #endif
 
diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c
index b86a0e3c..8b7add38 100644
--- a/i386/i386/ldt.c
+++ b/i386/i386/ldt.c
@@ -31,6 +31,7 @@
 #include 
 
 #include 
+#include 
 
 #include "vm_param.h"
 #include "seg.h"
@@ -65,10 +66,22 @@ ldt_fill(struct real_descriptor *myldt, struct 
real_descriptor *mygdt)
ACC_PL_K|ACC_LDT, 0);
 #endif /* MACH_PV_DESCRIPTORS */
 
-   /* Initialize the 32bit LDT descriptors.  */
+   /* Initialize the syscall entry point */
+#if defined(__x86_64__) && ! defined(USER32)
+if (!CPU_HAS_FEATURE(CPU_FEATURE_SEP))
+panic("syscall support is missing on 64 bit");
+/* Enable 64-bit syscalls */
+wrmsr(MSR_REG_EFER, rdmsr(MSR_REG_EFER) | MSR_EFER_SCE);
+wrmsr(MSR_REG_LSTAR, (vm_offset_t)syscall64);
+wrmsr(MSR_REG_STAR, long)USER_CS - 16) << 16) | (long)KERNEL_CS) 
<< 32);
+wrmsr(MSR_REG_FMASK, 0);  // ?
+#else /* defined(__x86_64__) && ! defined(USER32) */
fill_ldt_gate(myldt, USER_SCALL,
  (vm_offset_t)&syscall, KERNEL_CS,
  ACC_PL_U|ACC_CALL_GATE, 0);
+#endif /* defined(__x86_64__) && ! defined(USER32) */
+
+   /* Initialize the 32bit LDT descriptors.  */
fill_ldt_descriptor(myldt, USER_CS,
VM_MIN_USER_ADDRESS,
VM_MAX_USER_ADDRESS-VM_MIN_USER_ADDRESS-4096,
diff --git a/i386/i386/ldt.h b/i386/i386/ldt.h
index b15f11a5..51867f47 100644
--- a/i386/i386/ldt.h
+++ b/i386/i386/ldt.h
@@ -43,11 +43,16 @@
  * User descriptors for Mach - 32-bit flat address space
  */
 #defineUSER_SCALL  0x07/* system call gate */
-#ifdef __x86_64__
+#if defined(__x86_64__) && ! defined(USER32)
 /* Call gate needs two entries */
-#endif
+
+/* The sysret instruction puts some constraints on the user segment indexes */
+#defineUSER_CS 0x1f/* user code segment */
+#defineUSER_DS 0x17/* user data segment */
+#else
 #defineUSER_CS 0x17/* user code segment */
 #defineUSER_D

Re: [PATCH 5/5 v2 gnumach] x86_64: add 64-bit syscall entry point

2023-03-01 Thread Luca Dariz

Il 01/03/23 21:18, Samuel Thibault ha scritto:

Luca Dariz, le mer. 01 mars 2023 18:40:37 +0100, a ecrit:

+asm volatile("wrmsr"
+ :
+ : "c" (regaddr), "a" (low), "d" (high)
+ : "memory"  /* wrmsr is a serializing instruction */


The comment could be misleading.

The fact that it's a serialization instruction does not *require* to
express it to the compiler.

But the fact that wrmsr needs to be a serialization instruction (because
it may depend on other writes etc.) means that one *also* wants to make
the asm snippet serialized by the compiler thanks to the memory clobber.

So I'd rather see:

  : "memory"  /* wrmsr usage needs serialization */


The comment comes directly from the intel doc about WRMSR:

The WRMSR instruction is a serializing instruction (see “Serializing 
Instructions” in Chapter 8 of the Intel® 64 and IA-32 Architectures 
Software Developer’s Manual, Volume 3A)


and in chapter 8, sec. 3:

The Intel 64 and IA-32 architectures define several serializing 
instructions. These instructions force the processor to complete all 
modifications to flags, registers, and memory by previous instructions 
and to drain all buffered writes to memory before the next instruction 
is fetched and executed.


so in my understanding the serialization is a side effect of the wrmsr 
instruction rather than a requirement, and we want to make sure the 
compiler is aware of this and the optimizers do not assume otherwise.


I'll try to make the comment more accurate.


+   /* avoid leaking information in callee-clobbered registers */
+   xorq$0,%rdi
+   xorq$0,%rsi
+   xorq$0,%rdx
+   xorq$0,%r10
+   xorq$0,%r9
+   xorq$0,%r8


No, that's a no-op :)


argh, silly mistake... Thanks for checking this!


Luca




Re: [PATCH 5/5 v2 gnumach] x86_64: add 64-bit syscall entry point

2023-03-02 Thread Luca Dariz

Il 02/03/23 09:00, Samuel Thibault ha scritto:

Luca Dariz, le jeu. 02 mars 2023 08:55:38 +0100, a ecrit:

Il 01/03/23 21:18, Samuel Thibault ha scritto:

Luca Dariz, le mer. 01 mars 2023 18:40:37 +0100, a ecrit:

+asm volatile("wrmsr"
+ :
+ : "c" (regaddr), "a" (low), "d" (high)
+ : "memory"  /* wrmsr is a serializing instruction */


The comment could be misleading.

The fact that it's a serialization instruction does not *require* to
express it to the compiler.

But the fact that wrmsr needs to be a serialization instruction (because
it may depend on other writes etc.) means that one *also* wants to make
the asm snippet serialized by the compiler thanks to the memory clobber.

So I'd rather see:

   : "memory"  /* wrmsr usage needs serialization */


The comment comes directly from the intel doc about WRMSR:

The WRMSR instruction is a serializing instruction


I'm not saying it's not a serializing instruction.

I'm saying that the compiler does not have to *care* about the
instruction being serializing.

But I'm also saying that the very reason why the instruction is
serialized is also the reason why which should give a memory clobber to
the compiler.


I think we agree on this


The Intel 64 and IA-32 architectures define several serializing
instructions. These instructions force the processor to complete all
modifications to flags, registers, and memory by previous instructions and
to drain all buffered writes to memory before the next instruction is
fetched and executed.


Yes, and that is *completely* fine with the compiler not flushing
variables etc. to buffers before that.

But the very reason why all these flushes are done is that wrmsr can
have side effect which *requires* that to be done (e.g. structure
preparation and whatnot), and thus we should *also* tell the compiler to
do the same.


and on this as well.


Luca



Re: [PATCH 5/5 v2 gnumach] x86_64: add 64-bit syscall entry point

2023-03-02 Thread Luca Dariz

Il 02/03/23 14:14, Samuel Thibault ha scritto:

Luca Dariz, le jeu. 02 mars 2023 09:20:15 +0100, a ecrit:

Il 02/03/23 09:00, Samuel Thibault ha scritto:

Luca Dariz, le jeu. 02 mars 2023 08:55:38 +0100, a ecrit:

Il 01/03/23 21:18, Samuel Thibault ha scritto:

Luca Dariz, le mer. 01 mars 2023 18:40:37 +0100, a ecrit:

+asm volatile("wrmsr"
+ :
+ : "c" (regaddr), "a" (low), "d" (high)
+ : "memory"  /* wrmsr is a serializing instruction */


The comment could be misleading.

The fact that it's a serialization instruction does not *require* to
express it to the compiler.

But the fact that wrmsr needs to be a serialization instruction (because
it may depend on other writes etc.) means that one *also* wants to make
the asm snippet serialized by the compiler thanks to the memory clobber.

So I'd rather see:

: "memory"  /* wrmsr usage needs serialization */


The comment comes directly from the intel doc about WRMSR:

The WRMSR instruction is a serializing instruction


I'm not saying it's not a serializing instruction.

I'm saying that the compiler does not have to *care* about the
instruction being serializing.

But I'm also saying that the very reason why the instruction is
serialized is also the reason why which should give a memory clobber to
the compiler.


I think we agree on this


The Intel 64 and IA-32 architectures define several serializing
instructions. These instructions force the processor to complete all
modifications to flags, registers, and memory by previous instructions and
to drain all buffered writes to memory before the next instruction is
fetched and executed.


Yes, and that is *completely* fine with the compiler not flushing
variables etc. to buffers before that.

But the very reason why all these flushes are done is that wrmsr can
have side effect which *requires* that to be done (e.g. structure
preparation and whatnot), and thus we should *also* tell the compiler to
do the same.


and on this as well.


Ok, then perhaps

  "memory"  /* wrmsr usage needs serialization from the compiler too */


we could even sum up the discussion above with:

 : "memory"  /* wrmsr is a Serializing Instruction,
  * therefore we need serialization from the
  * compiler too  */

or is it too redundant? "Serializing" as it is a keyword that the reader 
can look up if needed.



Luca




[PATCH v3 gnumach] x86_64: add 64-bit syscall entry point

2023-03-08 Thread Luca Dariz
While theoretically we could still use the same call gate as for
32-bit userspace, it doesn't seem very common, and gcc seems to not
encode properly the instruction. Instead we use syscall/sysret as
other kernels (e.g. XNU,Linux). This version still has some
limitations, but should be enough to start working on the 64-bit user
space.

* i386/i386/i386asm.sym: add more constants to fill pcb->iss
* i386/i386/ldt.c: configure 64-bit syscall entry point. We can just
  check for the SEP bit as MSR are always available on x86_64.
* i386/i386/ldt.h: swap CS/DS segments order if !USER32 as required by
  sysret
* i386/i386/locore.h: add syscall64 and MSR definitions
* i386/include/mach/i386/syscall_sw.h: remove old BSD_TRAP
* x86_64/Makefrag.am: selectively install syscall_sw.h depending on
  USER32
* x86_64/include/syscall_sw.h: add entry point template from user
  space
* x86_64/locore.S: implement syscall64 entry point
---
 i386/i386/i386asm.sym   |  15 +++
 i386/i386/ldt.c |  15 ++-
 i386/i386/ldt.h |   9 +-
 i386/i386/locore.h  |  30 ++
 i386/include/mach/i386/syscall_sw.h |  12 +--
 x86_64/Makefrag.am  |   7 +-
 x86_64/include/syscall_sw.h |  40 
 x86_64/locore.S | 150 
 8 files changed, 263 insertions(+), 15 deletions(-)
 create mode 100644 x86_64/include/syscall_sw.h

diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 8317db6c..1b9b40bb 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -52,6 +52,8 @@ expr  CALL_SINGLE_FUNCTION_BASE
 
 offset ApicLocalUnit   lu  apic_id APIC_ID
 
+offset pcb pcb iss
+
 offset thread  th  pcb
 offset thread  th  task
 offset thread  th  recover
@@ -82,16 +84,29 @@ sizei386_kernel_state   iks
 
 size   i386_exception_link iel
 
+offset i386_saved_stater   gs
+offset i386_saved_stater   fs
 offset i386_saved_stater   cs
 offset i386_saved_stater   uesp
 offset i386_saved_stater   eax
+offset i386_saved_stater   ebx
+offset i386_saved_stater   ecx
+offset i386_saved_stater   edx
+offset i386_saved_stater   ebp
 offset i386_saved_stater   trapno
 offset i386_saved_stater   err
 offset i386_saved_stater   efl R_EFLAGS
 offset i386_saved_stater   eip
 offset i386_saved_stater   cr2
 offset i386_saved_stater   edi
+offset i386_saved_stater   esi
 #ifdef __x86_64__
+offset i386_saved_stater   r8
+offset i386_saved_stater   r9
+offset i386_saved_stater   r10
+offset i386_saved_stater   r12
+offset i386_saved_stater   r13
+offset i386_saved_stater   r14
 offset i386_saved_stater   r15
 #endif
 
diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c
index b86a0e3c..8b7add38 100644
--- a/i386/i386/ldt.c
+++ b/i386/i386/ldt.c
@@ -31,6 +31,7 @@
 #include 
 
 #include 
+#include 
 
 #include "vm_param.h"
 #include "seg.h"
@@ -65,10 +66,22 @@ ldt_fill(struct real_descriptor *myldt, struct 
real_descriptor *mygdt)
ACC_PL_K|ACC_LDT, 0);
 #endif /* MACH_PV_DESCRIPTORS */
 
-   /* Initialize the 32bit LDT descriptors.  */
+   /* Initialize the syscall entry point */
+#if defined(__x86_64__) && ! defined(USER32)
+if (!CPU_HAS_FEATURE(CPU_FEATURE_SEP))
+panic("syscall support is missing on 64 bit");
+/* Enable 64-bit syscalls */
+wrmsr(MSR_REG_EFER, rdmsr(MSR_REG_EFER) | MSR_EFER_SCE);
+wrmsr(MSR_REG_LSTAR, (vm_offset_t)syscall64);
+wrmsr(MSR_REG_STAR, long)USER_CS - 16) << 16) | (long)KERNEL_CS) 
<< 32);
+wrmsr(MSR_REG_FMASK, 0);  // ?
+#else /* defined(__x86_64__) && ! defined(USER32) */
fill_ldt_gate(myldt, USER_SCALL,
  (vm_offset_t)&syscall, KERNEL_CS,
  ACC_PL_U|ACC_CALL_GATE, 0);
+#endif /* defined(__x86_64__) && ! defined(USER32) */
+
+   /* Initialize the 32bit LDT descriptors.  */
fill_ldt_descriptor(myldt, USER_CS,
VM_MIN_USER_ADDRESS,
VM_MAX_USER_ADDRESS-VM_MIN_USER_ADDRESS-4096,
diff --git a/i386/i386/ldt.h b/i386/i386/ldt.h
index b15f11a5..51867f47 100644
--- a/i386/i386/ldt.h
+++ b/i386/i386/ldt.h
@@ -43,11 +43,16 @@
  * User descriptors for Mach - 32-bit flat address space
  */
 #defineUSER_SCALL  0x07/* system call gate */
-#ifdef __x86_64__
+#if defined(__x86_64__) && ! defined(USER32)
 /* Call gate needs two entries */
-#endif
+
+/* The sysret instruction puts some constraints on the user segment indexes */
+#defineUSER_CS 0x1f/* user code s

[PATCH 5/5] add setting gs/fsbase

2023-04-19 Thread Luca Dariz
* i386/i386/i386asm.sym: add offsets for asm
* i386/i386/pcb.c: switch FSBASE/GSBASE on context switch and
  implement accessors in thread setstatus/getstatus
* i386/i386/thread.h: add new state to thread saved state
* kern/thread.c: add i386_FSGS_BASE_STATE handler
* x86_64/locore.S: fix fs/gs handling, skipping the base address and
  avoid resetting it by manually re-loading fs/gs
---
 i386/i386/i386asm.sym |  2 +
 i386/i386/pcb.c   | 39 +--
 i386/i386/thread.h|  4 ++
 kern/thread.c |  3 ++
 x86_64/locore.S   | 89 ++-
 5 files changed, 116 insertions(+), 21 deletions(-)

diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 1b9b40bb..fd0be557 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -108,6 +108,8 @@ offset  i386_saved_stater   r12
 offset i386_saved_stater   r13
 offset i386_saved_stater   r14
 offset i386_saved_stater   r15
+offset i386_saved_stater   fsbase
+offset i386_saved_stater   gsbase
 #endif
 
 offset i386_interrupt_statei   eip
diff --git a/i386/i386/pcb.c b/i386/i386/pcb.c
index 61125fe8..8a9e3bf4 100644
--- a/i386/i386/pcb.c
+++ b/i386/i386/pcb.c
@@ -51,6 +51,7 @@
 #include "eflags.h"
 #include "gdt.h"
 #include "ldt.h"
+#include "msr.h"
 #include "ktss.h"
 #include "pcb.h"
 
@@ -372,7 +373,10 @@ thread_t switch_context(
 *  Load the rest of the user state for the new thread
 */
switch_ktss(new->pcb);
-
+#if defined(__x86_64__) && !defined(USER32)
+wrmsr(MSR_REG_FSBASE, new->pcb->iss.fsbase);
+wrmsr(MSR_REG_GSBASE, new->pcb->iss.gsbase);
+#endif
return Switch_context(old, continuation, new);
 }
 
@@ -667,7 +671,23 @@ kern_return_t thread_setstatus(
return ret;
break;
}
-
+#if defined(__x86_64__) && !defined(USER32)
+   case i386_FSGS_BASE_STATE:
+{
+struct i386_fsgs_base_state *state;
+if (count < i386_FSGS_BASE_STATE_COUNT)
+return KERN_INVALID_ARGUMENT;
+
+state = (struct i386_fsgs_base_state *) tstate;
+thread->pcb->iss.fsbase = state->fs_base;
+thread->pcb->iss.gsbase = state->gs_base;
+if (thread == current_thread()) {
+wrmsr(MSR_REG_FSBASE, state->fs_base);
+wrmsr(MSR_REG_GSBASE, state->gs_base);
+}
+break;
+}
+#endif
default:
return(KERN_INVALID_ARGUMENT);
}
@@ -843,7 +863,20 @@ kern_return_t thread_getstatus(
*count = i386_DEBUG_STATE_COUNT;
break;
}
-
+#if defined(__x86_64__) && !defined(USER32)
+   case i386_FSGS_BASE_STATE:
+{
+struct i386_fsgs_base_state *state;
+if (*count < i386_FSGS_BASE_STATE_COUNT)
+return KERN_INVALID_ARGUMENT;
+
+state = (struct i386_fsgs_base_state *) tstate;
+state->fs_base = thread->pcb->iss.fsbase;
+state->fs_base = thread->pcb->iss.gsbase;
+*count = i386_FSGS_BASE_STATE_COUNT;
+break;
+}
+#endif
default:
return(KERN_INVALID_ARGUMENT);
}
diff --git a/i386/i386/thread.h b/i386/i386/thread.h
index 933b43d8..b5fc5ffb 100644
--- a/i386/i386/thread.h
+++ b/i386/i386/thread.h
@@ -51,6 +51,10 @@
  */
 
 struct i386_saved_state {
+#ifdef __x86_64__
+   unsigned long   fsbase;
+   unsigned long   gsbase;
+#endif
unsigned long   gs;
unsigned long   fs;
unsigned long   es;
diff --git a/kern/thread.c b/kern/thread.c
index 392d38f8..f0cea804 100644
--- a/kern/thread.c
+++ b/kern/thread.c
@@ -1472,6 +1472,9 @@ kern_return_t thread_set_state(
if (flavor == i386_DEBUG_STATE && thread == current_thread())
/* This state can be set directly for the curren thread.  */
return thread_setstatus(thread, flavor, new_state, 
new_state_count);
+   if (flavor == i386_FSGS_BASE_STATE && thread == current_thread())
+   /* This state can be set directly for the curren thread.  */
+   return thread_setstatus(thread, flavor, new_state, 
new_state_count);
 #endif
 
if (thread == THREAD_NULL || thread == current_thread())
diff --git a/x86_64/locore.S b/x86_64/locore.S
index 1b17d921..ba0400a3 100644
--- a/x86_64/locore.S
+++ b/x86_64/locore.S
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -41,6 +42,46 @@
 #define pusha pushq %rax ; pushq %rcx ; pushq %rdx ; pushq %rbx ; subq $8,%rsp 
; pushq %rbp ; pushq %rsi ; pushq %rdi ; pushq %r8 ; pushq %r9 ; pushq %r

[PATCH 1/5] fix address fault for 32-on-64-bit syscall

2023-04-19 Thread Luca Dariz
* x86_64/locore.S: the faulty address is found in %rbp and not in
  %rsi, so copy that in CR2
---
 x86_64/locore.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86_64/locore.S b/x86_64/locore.S
index 47d9085c..ea5c71d6 100644
--- a/x86_64/locore.S
+++ b/x86_64/locore.S
@@ -1213,7 +1213,7 @@ mach_call_call:
 mach_call_addr_push:
movq%r11,%rsp   /* clean parameters from stack */
 mach_call_addr:
-   movq%rsi,R_CR2(%rbx)/* set fault address */
+   movq%rbp,R_CR2(%rbx)/* set fault address */
movq$(T_PAGE_FAULT),R_TRAPNO(%rbx)
/* set page-fault trap */
movq$(T_PF_USER),R_ERR(%rbx)
-- 
2.30.2




[PATCH 3/5] fix exception message format for 64-bit userspace

2023-04-19 Thread Luca Dariz
* kern/exception.c: message fields need to be aligned to 8 bytes for a
  64-bit userspace, so add the required padding if needed, as done by
  MIG.
---
 kern/exception.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kern/exception.c b/kern/exception.c
index 10435b5c..757f793e 100644
--- a/kern/exception.c
+++ b/kern/exception.c
@@ -274,8 +274,14 @@ struct mach_exception {
mach_port_t task;
mach_msg_type_t exceptionType;
integer_t   exception;
+#if defined(__x86_64__) && ! defined(USER32)
+   char exceptionPad[4];
+#endif
mach_msg_type_t codeType;
integer_t   code;
+#if defined(__x86_64__) && ! defined(USER32)
+   char codePad[4];
+#endif
mach_msg_type_t subcodeType;
rpc_long_integer_t  subcode;
 };
-- 
2.30.2




[PATCH 2/5] fix copyoutmsg for 64-bit userspace

2023-04-19 Thread Luca Dariz
* x86_64/copy_user.c: use the correct user/kernel msg structure
---
 x86_64/copy_user.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/x86_64/copy_user.c b/x86_64/copy_user.c
index b5084996..f76e44c9 100644
--- a/x86_64/copy_user.c
+++ b/x86_64/copy_user.c
@@ -430,7 +430,7 @@ int copyoutmsg (const void *kernelbuf, void *userbuf, const 
size_t ksize)
   usaddr = (vm_offset_t)(umsg + 1);
   keaddr = ksaddr + ksize - sizeof(mach_msg_header_t);
 
-  if (ksize > sizeof(mach_msg_user_header_t))
+  if (ksize > sizeof(mach_msg_header_t))
 {
   while (ksaddr < keaddr)
 {
@@ -484,8 +484,7 @@ int copyoutmsg (const void *kernelbuf, void *userbuf, const 
size_t ksize)
 
   mach_msg_size_t usize;
   usize = sizeof(mach_msg_user_header_t) + usaddr - (vm_offset_t)(umsg + 1);
-  usize = usize;
-  if (copyout(&usize, &umsg->msgh_size, sizeof(kmsg->msgh_size)))
+  if (copyout(&usize, &umsg->msgh_size, sizeof(umsg->msgh_size)))
 return 1;
 
   return 0;
-- 
2.30.2




[PATCH 4/5 (v4)] x86_64: add 64-bit syscall entry point

2023-04-19 Thread Luca Dariz
While theoretically we could still use the same call gate as for
32-bit userspace, it doesn't seem very common, and gcc seems to not
encode properly the instruction. Instead we use syscall/sysret as
other kernels (e.g. XNU,Linux). This version still has some
limitations, but should be enough to start working on the 64-bit user
space.

* i386/i386/i386asm.sym: add more constants to fill pcb->iss
* i386/i386/ldt.c: configure 64-bit syscall entry point. We can just
  check for the SEP bit as MSR are always available on x86_64.
* i386/i386/ldt.h: swap CS/DS segments order if !USER32 as required by
  sysret
* i386/i386/locore.h: add syscall64 prototype
* i386/i386/msr.h: add MSR definitions and C read/write helpers
* i386/include/mach/i386/syscall_sw.h: remove old BSD_TRAP
* x86_64/Makefrag.am: selectively install syscall_sw.h depending on
  USER32
* x86_64/include/syscall_sw.h: add entry point template from user
  space
* x86_64/locore.S: implement syscall64 entry point and use it when a
  64-bit user-space is configured
---
 i386/i386/i386asm.sym   |  15 +++
 i386/i386/ldt.c |  16 ++-
 i386/i386/ldt.h |   9 +-
 i386/i386/locore.h  |   1 +
 i386/i386/msr.h |  56 ++
 i386/include/mach/i386/syscall_sw.h |  12 +--
 x86_64/Makefrag.am  |   7 +-
 x86_64/include/syscall_sw.h |  40 +++
 x86_64/locore.S | 158 +++-
 9 files changed, 294 insertions(+), 20 deletions(-)
 create mode 100644 i386/i386/msr.h
 create mode 100644 x86_64/include/syscall_sw.h

diff --git a/i386/i386/i386asm.sym b/i386/i386/i386asm.sym
index 8317db6c..1b9b40bb 100644
--- a/i386/i386/i386asm.sym
+++ b/i386/i386/i386asm.sym
@@ -52,6 +52,8 @@ expr  CALL_SINGLE_FUNCTION_BASE
 
 offset ApicLocalUnit   lu  apic_id APIC_ID
 
+offset pcb pcb iss
+
 offset thread  th  pcb
 offset thread  th  task
 offset thread  th  recover
@@ -82,16 +84,29 @@ sizei386_kernel_state   iks
 
 size   i386_exception_link iel
 
+offset i386_saved_stater   gs
+offset i386_saved_stater   fs
 offset i386_saved_stater   cs
 offset i386_saved_stater   uesp
 offset i386_saved_stater   eax
+offset i386_saved_stater   ebx
+offset i386_saved_stater   ecx
+offset i386_saved_stater   edx
+offset i386_saved_stater   ebp
 offset i386_saved_stater   trapno
 offset i386_saved_stater   err
 offset i386_saved_stater   efl R_EFLAGS
 offset i386_saved_stater   eip
 offset i386_saved_stater   cr2
 offset i386_saved_stater   edi
+offset i386_saved_stater   esi
 #ifdef __x86_64__
+offset i386_saved_stater   r8
+offset i386_saved_stater   r9
+offset i386_saved_stater   r10
+offset i386_saved_stater   r12
+offset i386_saved_stater   r13
+offset i386_saved_stater   r14
 offset i386_saved_stater   r15
 #endif
 
diff --git a/i386/i386/ldt.c b/i386/i386/ldt.c
index b86a0e3c..4d7ec19a 100644
--- a/i386/i386/ldt.c
+++ b/i386/i386/ldt.c
@@ -31,6 +31,7 @@
 #include 
 
 #include 
+#include 
 
 #include "vm_param.h"
 #include "seg.h"
@@ -38,6 +39,7 @@
 #include "ldt.h"
 #include "locore.h"
 #include "mp_desc.h"
+#include "msr.h"
 
 #ifdef MACH_PV_DESCRIPTORS
 /* It is actually defined in xen_boothdr.S */
@@ -65,10 +67,22 @@ ldt_fill(struct real_descriptor *myldt, struct 
real_descriptor *mygdt)
ACC_PL_K|ACC_LDT, 0);
 #endif /* MACH_PV_DESCRIPTORS */
 
-   /* Initialize the 32bit LDT descriptors.  */
+   /* Initialize the syscall entry point */
+#if defined(__x86_64__) && ! defined(USER32)
+if (!CPU_HAS_FEATURE(CPU_FEATURE_SEP))
+panic("syscall support is missing on 64 bit");
+/* Enable 64-bit syscalls */
+wrmsr(MSR_REG_EFER, rdmsr(MSR_REG_EFER) | MSR_EFER_SCE);
+wrmsr(MSR_REG_LSTAR, (vm_offset_t)syscall64);
+wrmsr(MSR_REG_STAR, long)USER_CS - 16) << 16) | (long)KERNEL_CS) 
<< 32);
+wrmsr(MSR_REG_FMASK, 0);  // ?
+#else /* defined(__x86_64__) && ! defined(USER32) */
fill_ldt_gate(myldt, USER_SCALL,
  (vm_offset_t)&syscall, KERNEL_CS,
  ACC_PL_U|ACC_CALL_GATE, 0);
+#endif /* defined(__x86_64__) && ! defined(USER32) */
+
+   /* Initialize the 32bit LDT descriptors.  */
fill_ldt_descriptor(myldt, USER_CS,
VM_MIN_USER_ADDRESS,
VM_MAX_USER_ADDRESS-VM_MIN_USER_ADDRESS-4096,
diff --git a/i386/i386/ldt.h b/i386/i386/ldt.h
index b15f11a5..51867f47 100644
--- a/i386/i386/ldt.h
+++ b/i386/i386/ldt.h
@@ -43,11 +43,16 @@
  * User descriptors f

Re: [PATCH 3/5] fix exception message format for 64-bit userspace

2023-04-20 Thread Luca Dariz

Hi Flavio,

Il 20/04/23 04:04, Flávio Cruz ha scritto:
On Wed, Apr 19, 2023 at 3:47 PM Luca Dariz <mailto:l...@orpolo.org>> wrote:


* kern/exception.c: message fields need to be aligned to 8 bytes for a
   64-bit userspace, so add the required padding if needed, as done by
   MIG.

I believe this shouldn't be necessary due to 
https://git.savannah.gnu.org/cgit/hurd/gnumach.git/commit/?id=8e5e86fc13732b60d2e4d14152d92db1f1ae73f9 <https://git.savannah.gnu.org/cgit/hurd/gnumach.git/commit/?id=8e5e86fc13732b60d2e4d14152d92db1f1ae73f9> which forces mach_msg_type_t to always be 8 byte aligned. If you check the size before and after your patch, it will be the same.


you're right, I must have made this change before this commit.


Luca



Re: [PATCH 5/5] add setting gs/fsbase

2023-04-22 Thread Luca Dariz

Hi Sergey,

Il 20/04/23 13:51, Sergey Bugaev ha scritto:

On Wed, Apr 19, 2023 at 11:52 PM Sergey Bugaev  wrote:
We do reach the call to __thread_set_state (), but then it uses
__mig_memcpy (), which is just 'return memcpy (...)'. And (because of
the static build?), that memcpy () is the full real ifunc-selected
memcpy. So it jumps to the memcpy@plt, which jumps to
*(mem...@got.plt), which is supposed to jump into the rtld and then
run the ifunc resolver and do its smarts and eventually jump to the
right memcpy...


I don't know if you already found a good solution for this, but if not 
maybe we could add an x86_64-specific plain syscall to set fsbase/gsbase 
during bootstrap, if it can make things more manageable.



Luca



  1   2   >