from:"Richard Braun"

Boot issue with e1000 netdde driver

2020-11-29 Thread Richard Braun

On Tue, Sep 22, 2020 at 01:28:42PM +0200, Richard Braun wrote:
> On Fri, Aug 28, 2020 at 06:39:29PM +0200, Samuel Thibault wrote:
> > One way to avoid the bug is to use the e1000 hardware network type. I
> > guess the rtl8139 driver that we ship in netdde has a bug in its irq
> > handling (we had already seen that kind of bug a long time ago in the
> > IDE driver). Moving to rump-based network drivers will probably help
> > fixing this kind of issue long-term :)
> 
> It does seem to help. Thanks.
> 
> For information, the virtual machines were using the pcnet device,
> not the rtl8139 one.

Actually, with the e1000 driver, very often, network is not functional
at boot time and I have to manually kill netdde and trigger some activity
for it to work.

-- 
Richard Braun

Re: PTHREAD_MUTEX_ADAPTIVE_NP undefined in Gecko

2021-02-09 Thread Richard Braun

On Tue, Feb 09, 2021 at 10:36:28AM +0100, Riccardo Mottola wrote:
> how is compilation of Gecko related browsers going on? I am trying to
> compile ArcticFox and I get this:
> 
>  3:43.44
> /home/multix/code/Arctic-Fox/nsprpub/pr/src/pthreads/ptsynch.c:60:48: error:
> ‘PTHREAD_MUTEX_ADAPTIVE_NP’ undeclared (first use in this function); did you
> mean ‘PTHREAD_MUTEX_FAST_NP’?
>  3:43.44    60 | rv = pthread_mutexattr_settype(&_pt_mattr,
> PTHREAD_MUTEX_ADAPTIVE_NP);
>  3:43.44   | ^
>  3:43.44   | PTHREAD_MUTEX_FAST_NP
> 
> 
> The code is almost unchanged with current gecko, so I wonder if there were
> any workarounds attempted and/or upliftable patches for that.

The _NP suffix means non-posix, and is generally used for system-specific
features. The use of "adaptive" mutexes is probably just an optimization,
so it should be completely safe to just comment it out on systems where
the macro doesn't exist (assuming it's a macro).

-- 
Richard Braun

Re: Hurd Security vulnerabilities, please upgrade!

2021-08-10 Thread Richard Braun

On Tue, Aug 10, 2021 at 02:26:54PM +0300, Sergey Bugaev wrote:
> On Tue, Aug 10, 2021 at 5:04 AM Samuel Thibault  wrote:
> > In the past months, Sergey Bugaev has been working on fixing some
> > Hurd security vulnerabilities.
> 
> I urge everybody to upgrade (and reboot!) their systems as soon as
> possible. I have already updated mine, and can confirm that all my
> exploits fail now.

I have issues mounting /home (or rather, keeping it mounted) on
darnassus.sceen.net since the reboot. Samuel already has root access
so he can look at it when he has time.

I haven't looked at the work, but if Samuel merged it, I trust it's
good. Thanks :).

-- 
Richard Braun

Re: Hurd wiki in https

2021-08-17 Thread Richard Braun

On Mon, Aug 16, 2021 at 11:23:23PM +0200, Denis 'GNUtoo' Carikli wrote:
> While it's relatively easy to create an account on the Hurd wiki[1] and
> to edit it, everything is in http.
> 
> So at the account creation and during the login, the password is most
> probably transmitted in clear.
> 
> Would it be possible to fix that?

On vacation right now, but I'll get to it eventually.

-- 
Richard Braun

Re: directmap vs highmem usage in gnu mach?

2021-08-25 Thread Richard Braun

On Mon, Aug 23, 2021 at 01:56:52PM +0200, Samuel Thibault wrote:
> Do you know if it is on purpose that all vm_page_grab() calls use the
> directmem segment, and not the highmem segment?  Notably vm_page_fault()
> ends up filling the directmem segment first, I even wonder how the
> highmem segment ends up getting filled :)
> 
> Is that perhaps to rather try allocating in directmem that has faster
> access from kernel than highmem? (although for e.g. all kmsg purpose the
> kernel just uses the userland page table so it's not faster, right?)
> 
> I was thinking about adding a vm_page_grab_high() that can be used when
> there are no directmap constraints, that will probably solve the
> (relatively rare) OOM issues on the buildds.

Hello,

I can't recall exactly, but that's why we write comments, right ?
See [1].

Fixing this requires changing all callers to not assume that allocated
memory can always be directly mapped, in particular some kvtophys calls.

-- 
Richard Braun

[1] 
https://git.savannah.gnu.org/cgit/hurd/gnumach.git/tree/vm/vm_resident.c?id=4dbaa718038fadc51d9b95d2383868a229d91457#n776

Re: directmap vs highmem usage in gnu mach?

2021-08-25 Thread Richard Braun

On Wed, Aug 25, 2021 at 04:23:23PM +0200, Samuel Thibault wrote:
> Richard Braun, le mer. 25 août 2021 15:59:12 +0200, a ecrit:
> > I can't recall exactly, but that's why we write comments, right ?
> > See [1].
> > 
> > Fixing this requires changing all callers to not assume that allocated
> > memory can always be directly mapped, in particular some kvtophys calls.
> 
> Ok, I was not sure which way to interpret the comment: it says "Instead
> of updating all users" and talks about wiring, so I wasn't sure it was
> really talking about directmap vs highmem (which AIUI doesn't matter for
> wiring), or something else, and whether it was already done/optimized.
> 
> I have a patch that makes the vm_fault_page, vm_map_copy_steal_pages,
> vm_page_convert, and vm_page_alloc use highmem. It boots fine, can build
> packages without problems, I guess that's a good-enough test?

For practical purposes, sure, but in the long run, there must be
some kind of very clear (preferrably explicit) policy about all that.

Personally, I'd do the same as Linux, with all callers explicitely
passing flags, instead of having various versions of vm_page_grab.
These flags may become useful on multiple layers of the "memory stack".

-- 
Richard Braun

Re: Hurd wiki in https

2021-09-07 Thread Richard Braun

On Tue, Aug 17, 2021 at 10:29:42AM +0200, Richard Braun wrote:
> On Mon, Aug 16, 2021 at 11:23:23PM +0200, Denis 'GNUtoo' Carikli wrote:
> > While it's relatively easy to create an account on the Hurd wiki[1] and
> > to edit it, everything is in http.
> > 
> > So at the account creation and during the login, the password is most
> > probably transmitted in clear.
> > 
> > Would it be possible to fix that?
> 
> On vacation right now, but I'll get to it eventually.

Done.

-- 
Richard Braun

Re: date bug

2021-11-13 Thread Richard Braun

On Sat, Nov 13, 2021 at 11:13:13AM +0100, Samuel Thibault wrote:
> There used to be a bug that was making GNU Mach erroneously jump to year
> 2032 or something like that. In april 2021 I had committed

> Did anybody notice the bug again since then? If not, we can probably
> revert 

I'm happy to report it's been a long time since I've seen this indeed.
Thanks !

-- 
Richard Braun

[PATCH] libps: Fix memory leak

2022-11-05 Thread Richard Braun

---
 libps/procstat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libps/procstat.c b/libps/procstat.c
index d0423410..131d3a4b 100644
--- a/libps/procstat.c
+++ b/libps/procstat.c
@@ -1097,6 +1097,7 @@ _proc_stat_free (ps)
0, &ps->task_events_info_buf, char);
   MFREEMEM (PSTAT_THREAD_WAITS, thread_waits, ps->thread_waits_len,
ps->thread_waits_vm_alloced, 0, char);
+  MFREEMEM (PSTAT_EXE, exe, sizeof(string_t), ps->exe_vm_alloced, 0, char);
 
   FREE (ps);
 }
-- 
2.37.2

Re: GNU Mach: disabling all network device drivers

2010-12-10 Thread Richard Braun

On Wed, Dec 08, 2010 at 02:52:10AM -0300, Diego Nieto Cid wrote:
> Writting CSR0_NORMAL to pcnet's CSR0 register sets the 'interruption enabled'
> bit allowing the hardware to raise interruptions.
> 
> So, why would such an amount of interruptions be triggered?
> Their alternating pattern is also interesting. What's interrupt 7 and
> how is it related to 11?

That's some nice debugging work. But I don't understand why you're
asking the question. You have a patched qemu, you seem to know how to
deal with gdb, is there something preventing you from inspecting the
kernel (and more specifically, the IDT) to determine what interrupts 7
and 11 are ?

-- 
Richard Braun

Migration threads

2010-12-26 Thread Richard Braun

Hello.

>From what I understand of the Hurd history, GNU Mach is based on Mach 4.
I read that this version was intended to include the result of research
work made at the university of Utah, the most important being thread
migration (for those not familiar with the concept, it means that
threads running in client tasks "migrate" into server tasks when
performing sync RPC). There is actually code about that in GNU Mach,
but I'm wondering if it's actually used in the Hurd. I couldn't find
anything related to this in the userspace servers, but the interface
change could be light, and I might have missed them. Does anyone know if
the Hurd servers actually use thread migration ?

-- 
Richard Braun

Re: device_set_filter documentation update

2011-01-17 Thread Richard Braun

On Sun, Jan 16, 2011 at 10:20:35PM -0300, Diego Nieto Cid wrote:
> Hello,
> 
> Commit 1ca2a1632d7325ee26b2c701b38c1d2e2fcb6f80 in gnumach changed
> the interface with the
> packet filter and the GNU Mach Reference Manual got outdated.
> 
> Here's the email where the patch was submitted:
> 
> http://lists.gnu.org/archive/html/bug-hurd/2006-04/msg00032.html
> 
> It contains a good reference with examples but I don't think they
> fit well
> with the rest of the document.
> 
> I'd like to propose the following update. Everyone is invited to
> rephrase it :)

Hello,

Now that was a long time ago. I don't have a network access at home for
now, and I'm quite busy at work, but give me some time to get reminded of
what the code actually does and I'll comment on your changes soon.

Thanks for noticing this.

-- 
Richard Braun

Re: device_set_filter documentation update

2011-01-20 Thread Richard Braun

On Sun, Jan 16, 2011 at 10:20:35PM -0300, Diego Nieto Cid wrote:
> Hello,
> 
> Commit 1ca2a1632d7325ee26b2c701b38c1d2e2fcb6f80 in gnumach changed
> the interface with the
> packet filter and the GNU Mach Reference Manual got outdated.
> 
> Here's the email where the patch was submitted:
> 
> http://lists.gnu.org/archive/html/bug-hurd/2006-04/msg00032.html
> 
> It contains a good reference with examples but I don't think they
> fit well
> with the rest of the document.
> 
> I'd like to propose the following update. Everyone is invited to
> rephrase it :)

1/
"The type defaults to native NETF filter and may be omitted"

contradicts

"NETF filters are considered to be native filters, so there is no macro
for them".

There really is no macro, so the type cannot be anything else than
omitted. Later in your patch, you give the "name" 0 to the native type,
which is a bit disturbing. To make things clear, state that the default
type is the native one, and the NETF_BPF type must be passed to use this
alternate type.

2/
"Data ingressing the device will be subject to the filter"
"Data egressing from the device will be subject to the filter"

Sorry for being annoying, but isn't it "ingressing to" or "in" ? Also,
more common expressions like "received" and "transmitted by the device"
sound more appropriate. But this is just a detail though.

3/
You should probably describe that, when a packet is received by a
listener, the flags are *either* NETF_IN or NETF_OUT, and not a
combination of both, even though common sense would make it obvious.

-- 
Richard Braun

Re: [PATCH] gnumach: vm: zone_gc: Got rid of useless function

2011-04-19 Thread Richard Braun

On Mon, Apr 18, 2011 at 10:05:57PM +0200, Samuel Thibault wrote:
> I don't think this patch can be applied. Building a list and then
> freeing it was done for some reason, at least because kmem_free, (thus
> vm_map_delete), may actually need to use zalloc in order to split
> some areas (which is actually the reason for the double map_entry
> allocation/release), which thus needs to lock the zone, and thus ends up
> with a deadlock.

This change seems rather dangerous. It brings no real improvement while
adding a potential deadlock.

In addition to avoiding deadlocks, freeing list elements in two steps
is also a technique commonly used to reduce contention.

-- 
Richard Braun

Re: port leak when starting a translator

2011-05-05 Thread Richard Braun

On Thu, May 05, 2011 at 12:55:46AM +0200, Samuel Thibault wrote:
> I've noticed a port leak on buildds when starting a translator. The
> backtrace of the port allocation is this:
> 
> _ports_create_port_internal
> ports_create_port_noinstall
> diskfs_start_protid
> diskfs_create_protid
> diskfs_S_dir_lookup (libdiskfs/dir-lookup.c:260)
> _Xdir_lookup
> 
> What I understand from the diskfs_S_dir_lookup code with my poor
> libports knowledge:
> - diskfs_make_peropen (dnp, 0, dircred->po, &newpo) creates a peropen
>   structure "newpo". Its reference counter is initialized to 0.
> - diskfs_create_portid(newpo, user, &newpi) creates a libport port with 
> receive right. Its
>   reference count is initialized to 1. newpo is made to reference it,
>   its reference counter gets increased to 1. The port right is added to
>   the diskfs_port_bucket portset.
> - dirport = ports_get_send_right(newpi) adds a port send right to newpi, whose
>   reference counter gets increased to 2.
> - ports_port_deref(newpi) drops a reference, the counter thus gets back
>   to 1. As I understand it, it's supposed to be the send one.
> - newpi=0 forgets about the receive part of newpi, only the send part
>   dirport remains.
> - dirport is given to fshelp_fetch_root(), which copies its send right
>   only.
> - mach_port_deallocate (mach_task_self(), dirport); release a send
>   right.
> 
> I'm unsure how the receive part of the kernel port in newpi port is
> supposed to get deallocated in that scenario, I guess the send right
> release at the end is supposed to trigger NOTIFY_NO_SENDERS, and
> libports handles deallocation from that?  The thing is that it doesn't
> seem to happen, thus leaving a port leak here.
> 
> Any idea?

Does this leak appear often ? It seems to be triggered by starting a new
translator only. I only took a quick look but are you sure the dirport
(which is copied as dotdot in fshelp_fetch_root()) isn't kept around in
the "child" translator as long as it runs, as a reference to the
translator managing .. (that is, ports[INIT_PORT_CWDIR]) ? In addition,
is this a mach port leak or a libports leak ?

-- 
Richard Braun

Re: port leak when starting a translator

2011-05-05 Thread Richard Braun

On Thu, May 05, 2011 at 01:54:30PM +0200, Samuel Thibault wrote:
> It's the big leak I have on the buildds. 50k ports after like 10h
> building stuff.

Assuming this leak only happens when starting a new translator, this
implies building stuff starts a lot of them. Identifying which ones
may help narrowing the leak origin. Also, does portinfo print lots of
receive only (or maybe receive/send-once) rights (I assume the leak
leaves no send right in ext2fs) ?

> > It seems to be triggered by starting a new translator only.
> 
> So you are actually observing it too?

No, but that's what the code seems to do.

> the ports array is given to fshelp_start_translator_long, which just
> passes it to the file_exec() RPC. AIUI, that won't make a copy inside
> ext2fs itself.

Check the status and the number of user references of newpi/dirport
using mach_port_get_receive_status() and mach_port_get_refs() right
before the call to mach_port_deallocate on dirport. The number of urefs
should be 1, and status->mps_nsrequest should be true. You should also
find how no-sender notifications are received, since that's how such
receive rights get deallocated: once the child translator terminates,
all its rights on the port are destroyed, which generates a no-sender
notification, on which the receive port is destroyed. I hope the child
translator doesn't copy this right to other tasks, in which case we
would have to track where and how references are passed.

-- 
Richard Braun

Re: user-level drivers

2011-05-09 Thread Richard Braun

On Mon, May 09, 2011 at 12:07:16AM +0200, Samuel Thibault wrote:
> I've started having a look at Zheng Da's user-level driver integration.
> I've cleaned his tree a bit, and now considering adding patches to
> the debian packages for wider testing.  The patches however add a few
> kernel RPCs, which we should probably agree on first, at the minimum
> that their existence makes sense, so we can reserve slots in upstream
> gnumach.  Basically, it's about allocating physically-contiguous memory
> for DMAs, and getting IRQ events:
> 
> /*
>  *  This routine is created for allocating DMA buffers.
>  *  We are going to get a contiguous physical memory
>  *  and its physical address in addition to the virtual address.
>  */
> routine vm_allocate_contiguous(
> host_priv   : host_priv_t;
> target_task : vm_task_t;
> out vaddr   : vm_address_t;
> out paddr   : vm_address_t;
> size: vm_size_t);

This RPC lacks a few additional constraints like boundaries, alignment
and maybe phase. We may not use them now, but they're important for
portability (e.g. if GNU Mach supports PAE, drivers that can't use
physical memory beyond the 4 GiB limit must be able to express it).

> /* Requesting IRQ events on a port */
> 
> routine device_intr_notify(
>master_port : mach_port_t;
>in  irq : int;
>in  id  : int;
>in  flags   : int;
>in  receive_port: mach_port_send_t
>);
> 
> The actual event:
> 
> simpleroutine mach_notify_irq(
>notify  : notify_port_t;
>name: int);
> 
> And a way to mask/unmask irqs:
> 
> /*
>  * enable/disable the specified irq.
>  */
> routine device_irq_enable(
>master_port : mach_port_t;
>irq : int;
>status  : char);
> 
> Should this be rather split into device_irq_enable/disable and drop the
> status paramter?

Naming a function taht can disable something "xxx_enable" is confusing.
By the way, I also think using both "intr" and "irq" is confusing. We
should stick with "intr". device_intr_notify() isn't a suitable name as
it doesn't make it obvious that it's a registration request. It should
rather be named device_intr_establish() or device_intr_register().
Parameters named "irq" could be renamed "line" instead. And we should
also follow the way Mach names are built (i.e. module_object_method).
mach_notify_irq() could be renamed mach_intr_notify().
device_irq_enable() would be named device_intr_enable().

-- 
Richard Braun

Re: user-level drivers

2011-05-09 Thread Richard Braun

On Mon, May 09, 2011 at 12:17:51PM +0200, Samuel Thibault wrote:
> > Hmm, I guess we don't have anything that is better than using
> > vm_address_t for physical addresses?  At least not in
> > include/mach/std_types.h, i386/include/mach/i386/vm_types.h.  Should we?
> > (phys_address_t based on natural_t?)
> 
> Maybe we should, indeed, else we can't do PAE.

I'd suggest using natural_t (or unsigned long) too. But then, it can't
be used to address >4 GiB physical memory. Consider expressing physical
memory in page frame numbers.

-- 
Richard Braun

Re: user-level drivers

2011-05-09 Thread Richard Braun

On Mon, May 09, 2011 at 01:27:32PM +0200, Thomas Schwinge wrote:
> > I'd suggest using natural_t (or unsigned long) too. But then, it can't
> > be used to address >4 GiB physical memory. Consider expressing physical
> > memory in page frame numbers.
> 
> Good idea!  But: what about differently sized frames (4 KiB/2
> MiB/whatever amd64 allows)?  (In case it'd make sense to support these at
> some point?)  Or is this over-engineering already?

These are virtual frames. And in any case, they are multiple of the base
page size. This interface is suitable for allocating large pages. It
would just require the requested size to be a multiple of the large page
size to indicate large pages are wanted. This "large page" size would be
provided through a new macro sitting next to PAGE_SIZE.

-- 
Richard Braun

Re: user-level drivers

2011-05-10 Thread Richard Braun

On Tue, May 10, 2011 at 03:29:11AM +0200, Samuel Thibault wrote:
> There are archs which have varying page size.  I'd rather avoid
> introducing the page size, can't mig cope with 64bit values on 32bit
> archs?

Right. Even mmap2() forces multiples of 4096 and not the page size. So
using 64 bits seems reasonable. Mig should have no problem with 64 bits
values.

-- 
Richard Braun

Re: user-level drivers

2011-05-10 Thread Richard Braun

On Tue, May 10, 2011 at 03:26:26AM +0200, Samuel Thibault wrote:
> What do you mean by "phase"?

Offset from the alignment. But I don't think it's useful at all in this
case. Minimum and maximum addresses and alignment should do. Maybe
boundary crossing but usually, specifying the right alignment and size
is enough.

-- 
Richard Braun

Re: Race problem in Mach/Hurd?

2011-05-10 Thread Richard Braun

On Mon, May 09, 2011 at 06:33:14PM +0200, Svante Signell wrote:
> Continuing the struggle with the ghc and ruby build problems I stumbled
> into finding two kernel (signal) threads in addition to the main (user)
> thread. According to Neal on IRC there shouldn't be more than one kernel
> thread running. This could be due to some race condition (don't know yet
> where). 

What makes you think there are two signal threads ?

-- 
Richard Braun

Re: Race condition (was problem) in Mach/Hurd?

2011-05-10 Thread Richard Braun

On Tue, May 10, 2011 at 10:53:00AM +0200, Svante Signell wrote:
> On Tue, 2011-05-10 at 10:11 +0200, Richard Braun wrote:
> > What makes you think there are two signal threads ?
> 
> Did you look at the pasted output?

Are you serious asking that ?

> I would call threads 5 and 6 signal
> threads and thread 4 the main thrread!? Is there any way to find out??
> Thread 4 is the main thread, I just didn't set the code directory path
> in gdb to see that.
>  
> gdb) thread apply all bt
>  
> Thread 6 (Thread 19821.6):
> #0  0x010f7f4c in mach_msg_trap ()
> at 
> /home/buildd/build/chroot-sid/home/buildd/byhand/eglibc-2.11.2/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
> #1  0x010f8749 in __mach_msg (msg=0x29fd4, option=1282, send_size=0, 
> rcv_size=0, rcv_name=127, timeout=20, notify=0) at msg.c:110
> #2  0x01195608 in timer_thread () at ../sysdeps/mach/hurd/setitimer.c:91
> Cannot access memory at address 0x2a000
> 
> Thread 5 (Thread 19821.5):
> #0  0x010f7f4c in mach_msg_trap ()
> at 
> /home/buildd/build/chroot-sid/home/buildd/byhand/eglibc-2.11.2/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2
> #1  0x010f8749 in __mach_msg (msg=0x17fef30, option=2, send_size=0, 
> rcv_size=4096, rcv_name=119, timeout=0, notify=0) at msg.c:110
> #2  0x010f8e04 in __mach_msg_server_timeout (demux=0x1109980
> , max_size=4096, rcv_name=119, option=0, timeout=0) at
> msgserver.c:101
> #3  0x010f8f4b in __mach_msg_server (demux=0x1109980 , 
> max_size=4096, rcv_name=119) at msgserver.c:196
> #4  0x0110994f in _hurd_msgport_receive () at msgportdemux.c:68
> #5  0x01274c3c in entry_point (
> start_routine=0x11098e0 <_hurd_msgport_receive>, arg=0x0)
> at 
> /home/buildd/build/chroot-sid/home/buildd/byhand/hurd/./libpthread/pthread/pt-create.c:50
> #6  0x in ?? ()
>  
> Thread 4 (Thread 19821.4):
> #0  0x01113970 in trampoline () from /lib/libc.so.0.3
> #1  0x000e in ?? ()
> #2  0x in ?? ()

I just see threads calling mach_msg() ... One seems to be related to
POSIX timers (timer_thread) and the other is a POSIX thread receiving
a message. I'm not even sure you can see the signal thread with gdb
(I'll check that). So, again, what makes you think there are two signal
threads ?

-- 
Richard Braun

Re: Race condition (was problem) in Mach/Hurd?

2011-05-10 Thread Richard Braun

On Tue, May 10, 2011 at 11:13:29AM +0200, Svante Signell wrote:
> Sorry, but your question was not referring to the pasted code at all,
> just a direct question. You know better than me of course, I'm just
> trying to understand things and admittedly do some guess work.

I wonder what else it could refer to.

> > > (gdb) thread apply all bt
> > >  
> > > Thread 6 (Thread 19821.6):
> ..
> > > #2  0x01195608 in timer_thread () at ../sysdeps/mach/hurd/setitimer.c:91
> > > Cannot access memory at address 0x2a000
> 
> > I just see threads calling mach_msg() ... One seems to be related to
> > POSIX timers (timer_thread) and the other is a POSIX thread receiving
> > a message. I'm not even sure you can see the signal thread with gdb
> > (I'll check that). So, again, what makes you think there are two signal
> > threads ?
> See above.

Again, I just see a thread related to POSIX timers.

> BTW: What is triggering a new thread, a fork or?
> Can you explain why thread number 6 points to an invalid memory address?

I'll check how glibc deals with POSIX timers. It could simply be that
timer_create() is called with SIGEV_THREAD.

The invalid memory address is likely garbage at the top of the stack.
I wouldn't worry about it.

-- 
Richard Braun

Re: Race condition (was problem) in Mach/Hurd?

2011-05-10 Thread Richard Braun

On Tue, May 10, 2011 at 11:18:21AM +0200, Richard Braun wrote:
> I'll check how glibc deals with POSIX timers. It could simply be that
> timer_create() is called with SIGEV_THREAD.
> 
> The invalid memory address is likely garbage at the top of the stack.
> I wouldn't worry about it.

So, after a quick glance at glibc, and also based on Thomas Bushnell's
reply, the output you pasted shows the signal thread (Thread 5), which
you can easily recognize because it calls _hurd_msgport_receive(), and
a POSIX timer thread, which you can also easily identify because it
calls timer_thread(). There aren't two signal threads. Everything
actually looks fine in your trace.

Also, in your original post, you mentioned finding "kernel threads" in
addition to the main thread. You *can't* see kernel threads with gdb.
Kernel threads run in the kernel. What you saw are really user space
threads.

Be as careful and rigorous as possible in your reports, in particular
when it involves system internals many people are not familiar with.
Don't hesitate asking about stuff you're not sure about though.

-- 
Richard Braun

Re: IRC meeting 2011-05-25 (Wednesday!)

2011-05-24 Thread Richard Braun

On Sun, May 22, 2011 at 08:46:50PM -0500, Oz wrote:
> do you all ever hold meetings on saturdays?
> 
> what day of the week are most of you off from work?

Personally, when I'm not working, I'm not on IRC either, nor reading
mail. So saturdays or sundays would be problematic too. I guess it may
be the same for other people as well.

-- 
Richard Braun

Bash and virtual mappings

2011-07-14 Thread Richard Braun

Hello,

I've noticed something strange with the shell recently :

$ vminfo $$ | wc -l
1038

Why would bash use so many map entries ?

-- 
Richard Braun

Re: Real-time operation of Mach microkernel?

2011-07-21 Thread Richard Braun

On Mon, Jul 18, 2011 at 03:58:40PM -0400, John Wason wrote:
> I am a robotics engineer and find Hurd's design interesting.  It is  
> difficult to run robotic software on a full operating system because the  
> kernel is way to complicated to specialize for a real-time robot system.  
> Real-time linux does exist, but you still end up dealing with all the 
> complexity of linux at the low-level control loop.  Are there any plans 
> for doing a real-time version of the Mach kernel?  I think it would 
> actually be a "relatively minor" change to a microkernel when compared to 
> the full linux kernel.  Basically certain callback function would need to 
> be executed at exactly the right time, and have the ability to read and 
> write to hardware very quickly at a low level.  I curretly use a program 
> that is more or less dos, and uses raw pci equivalents of "inp" and 
> "outp".  Some DMA control may be useful as well.

First, don't confuse Mach and the Hurd, what you're asking has actually
little to do with the latter.

There already are real time versions of Mach (see [1] for an example),
but they're mostly abandonned. Concerning GNU Mach, the current priority
is userspace drivers through DDE (which makes real time hardware
accesses even harder), and there are no plans to make it real time in
the near future.

-- 
Richard Braun

[1] http://www.rtmach.org/index-e.html

Re: Degradation of GNU/Hurd ``system performance''

2011-09-06 Thread Richard Braun

On Mon, Sep 05, 2011 at 06:42:22AM +0200, olafbuddenha...@gmx.net wrote:
> The slowdown might be a result of the growing memory usage, causing a
> need to swap all the time. There is no audible thrashing though while
> the system behaves sluggish... And also, it wouldn't explain the crash
> while there is still plenty of free swap.
> 
> Another possible explanation would be some kind of fragmentation, making
> VM operations increasingly more costly, and finally causing some kind of
> resource exhaustion. (The disappearing memory however wouldn't be
> explained by this either I guess...)
> 
> Either way, I have no idea how to go about narrowing down the problem
> :-(

Here is a simple suggestion: try to measure the depth of the shadow
object chains.

-- 
Richard Braun

Re: COPY_DELAY could perform worse than COPY_NONE

2011-10-08 Thread Richard Braun

On Sat, Oct 08, 2011 at 01:55:42AM +0200, Sergio López wrote:
> Nowadays, the copy strategy is decided per-object, but I think the
> desired type of access to an object should also be considered. For
> instance, the operation generated by vm_copy should almost always use
> COPY_NONE, while vm_map, when doing things such as providing a private
> map, should use COPY_DELAY.

I really don't see why vm_copy() should use COPY_NONE. Could you explain
it briefly (sorry if I'm asking about something you may already have
described, but I feel it would be better with a clear, simple,
centralized explanation).

-- 
Richard Braun

Re: Calling vm_object_deactivate_pages for each vm_object_deallocate kills performance.

2011-10-08 Thread Richard Braun

On Sat, Oct 08, 2011 at 02:06:04AM +0200, Sergio López wrote:
> Reducing lock contention would surely help.

vm_page_queue_lock is a simple lock if I'm not mistaken, which means
it's a nop in our case. We shouldn't change anything we can't measure
and test easily unless it's really simple enough.

-- 
Richard Braun

Re: Disk drivers at SPL6 rather than 5?

2011-10-10 Thread Richard Braun

On Mon, Oct 10, 2011 at 12:58:50AM +0200, Samuel Thibault wrote:
> The sym53c8xx vs rtl8139 issue is actually partly due to block drivers
> running at spl5, while net drivers running at spl6. Is there a reason
> for not raising block drivers to spl6, to just drop that issue?  I can't
> find any reason in our current code.

Looks fine, yes. I believe the reason was mainly historical anyway.

-- 
Richard Braun

Requirements concerning GNU Mach additions

2011-10-10 Thread Richard Braun

Hello,

What are the requirements concerning additions (specifically, new files)
in GNU Mach ? In particular, what coding style should be used for them ?

-- 
Richard Braun

Re: COPY_DELAY could perform worse than COPY_NONE

2011-10-14 Thread Richard Braun

On Fri, Oct 14, 2011 at 01:01:35AM +0200, Sergio López wrote:
> vm_object_copy_delayed(), which is being used when copying from an
> object with the MEMORY_OBJECT_COPY_DELAY attribute, is not designed
> for doing small, repetitive reads from an object (which is what we're
> usually doing with vm_copy). Using it this way is slower than copying
> the pages without any optimization (as with MEMORY_OBJECT_COPY_NONE).

I assume you're comparing copies with no further modification to the
transferred buffers. In this case, it seems more like another effect of
the Mach VM issues. I agree that trying to fix those is tedious, and
even dangerous, but the workaround you're suggesting doesn't look
appropriate either. The speed of raw and virtual copies is very
dependent on both the size of the transfers and the hardware used. We
shouldn't change that behaviour until at least solid measurements are
available. Even if doing so, it's still a hack. I agree however that
the access type should be considered. In order to avoid changing the
existing interface, a new call, say vm_copy_raw(), could be added and
unconditionnally use the COPY_NONE strategy, for applications which
know they'll modify the received buffer, or frequently transfer small
amounts of data. The same could be done for vm_read/vm_write of course.

-- 
Richard Braun

Re: Braun/Planeta slab allocator (was: GNU Hurd development blog: 2011-q3)

2011-11-24 Thread Richard Braun

On Thu, Nov 24, 2011 at 04:02:25PM +0200, Maksym Planeta wrote:
> And thanks, Richard, for mentoring me. You helped me a lot, especially
> with your code style corrections. I'll try to keep all your suggestions.

You're welcome, although I hope I didn't inadvertently make you value
form more than content :-).

> Difference isn't big, but I disagree that it's just a noise, because
> this difference is systematic and constant in some range. At least
> according to tests that I made in September, but your results don't
> disprove it.

Still, people really shouldn't expect any difference in performance. The
pros are really 1/ reduced fragmentation, 2/ clean, maintainable code
and 3/ debugging features (although being smp-ready is nice too). The
slab allocator may probably be *slower* for some workloads because of
the efforts it makes to keep fragmentation low, but as seen while
observing the allocator statistics, the few arbitrary limits still
present, most importantly the hard limit on the maximum number of
objects retained in the page cache, prevent the kernel from having "too
many" allocated objects.

For those interested in getting statistics from a kernel using the slab
allocator, a slabinfo [1] tool is publically available (although it will
take some time to get it in the Hurd, because of the mach_debug
interface not being exported, I'll open discussion on that soon).

-- 
Richard Braun

[1] http://git.sceen.net/rbraun/slabinfo.git/

Re: GNUnet News: vfork and the signal race

2011-11-26 Thread Richard Braun

On Sat, Nov 26, 2011 at 11:35:29AM +0100, Samuel Thibault wrote:
> > (Naturally, the result would be non-portable for systems where fork==vfork,
> > but then maybe implementing vfork as fork is the bug? ;-))
> 
> It's what the norm says.

I think you meant the norm explicitely says vfork can be implemented as
fork.

> Now, about the problem you mention on the webpage, isn't it possible to
> use a semaphore to tell the hypervisor when the "just-forked" process
> has changed its signal handler for something that is fine with SIGTERM?
> Or use a signal back (IIRC Xorg does this). That would implement the
> "make father wait" in a portable way.

How about blocking signals until the child is actually able to handle
SIGTERM correctly ? POSIX clearly states signal masks are inherited on
forks. AIUI, this case doesn't need the parent to wait, only to avoid
the kill/exec race.

-- 
Richard Braun

Slab allocation in GNU Mach - git integration branch ready

2011-12-17 Thread Richard Braun

Hello,

The master-slab branch of GNU Mach has been pushed on the public
repository. This branch replaces the old zone allocator with an improved
implementation for kernel allocations based on work from myself and
Maksym Planeta. It should be readily mergeable into the master branch,
provided enough positive feedbacks are obtained, so feel free to test
and report your comments.

Thanks.

-- 
Richard Braun

Packet capture

2012-01-28 Thread Richard Braun

Hello,

I'm currently working on getting libpcap to a decent state on the Hurd.
As already discussed on IRC, permission checking will be performed by
the new devopen translator (devnode in the incubator/dde branch), and
libpcap will include a new Hurd specific capture module. For now, I've
written a (Debian specific) patch that includes a first experimental
version of this module. It currently uses the embedded BPF filter code
in libpcap so that it can be used for both GNU Mach and DDE drivers
(I know GNU Mach supports BPF filters in kernel, not sure about DDE
though, and not sure either it's worth the effort as probably noone
uses the Hurd for any performance critical application; comments are
welcome).

The Debian packages are available at my brand new repository. Add these
to your sources.list file (ftp:// access should also work) :
deb http://ftp.sceen.net/debian-hurd experimental/
deb-src http://ftp.sceen.net/debian-hurd experimental/

For the curious, darnassus.sceen.net is already using the experimental
libpcap0.8 code. Both tcpdump and wireshark were tested with success.

If anyone knows an application that makes use of pcap_inject(), I'd be
happy to hear about it.

Enjoy.

-- 
Richard Braun

Re: Packet capture

2012-01-30 Thread Richard Braun

On Sat, Jan 28, 2012 at 08:56:29PM +0100, Richard Braun wrote:
> The Debian packages are available at my brand new repository. Add these
> to your sources.list file (ftp:// access should also work) :
> deb http://ftp.sceen.net/debian-hurd experimental/
> deb-src http://ftp.sceen.net/debian-hurd experimental/

For those who tried, the repository setup has been fixed :).

-- 
Richard Braun

Re: Packet capture

2012-03-03 Thread Richard Braun

On Sat, Jan 28, 2012 at 08:56:29PM +0100, Richard Braun wrote:
> I'm currently working on getting libpcap to a decent state on the Hurd.

> The Debian packages are available at my brand new repository. Add these
> to your sources.list file (ftp:// access should also work) :
> deb http://ftp.sceen.net/debian-hurd experimental/
> deb-src http://ftp.sceen.net/debian-hurd experimental/

> If anyone knows an application that makes use of pcap_inject(), I'd be
> happy to hear about it.

The latest libpcap packages now include support for pcap_inject(), and
a new tcpreplay package has been uploaded next to them. Feedbacks are
welcome.

-- 
Richard Braun

Re: 64bit GNU Mach

2012-04-02 Thread Richard Braun

On Sun, Apr 01, 2012 at 05:45:19PM +0200, Samuel Thibault wrote:
> I don't really see why there should be a "size of a pointer" property
> anyway: AIUI, vm_offset_t and vm_size_t are there for that. I rather
> see natural_t and integer_t as a "nice integer" type, i.e. something
> that can hold big enough values (e.g. >32bit), but not too big either
> (there is usually no point in having 64bit values), and thus the int
> type is "just fine". It is currently essentially used in RPCs to provide
> statistics, and for the RPC mechanism itself.

Kernel port names are addresses of their target IPC port object. There
are numerous occurrences of this in e.g. ipc/mach_msg.c or
ipc/ipc_kmsg.c. But it's an optimization and can be worked around. As
a simple solution, the kernel allocator could be tuned to return only
low addresses, so that natural_t could remain 32-bits (since userspace
is 32-bits), and keep the optmization inside the kernel. It would also
require some work on the IPC space table, since it's allocated using
VM functions.

> - The actual RPC mechanism however needs some decisions, notably the
> mach_port_t type.  For now it is a vm_offset_t type, and the kernel
> ipc mechanism will assume pointer alignment. Of course, 32bit userland
> will use a 32bit type and 32bit alignment there, so conversion will
> be needed. I've however noticed that there, xnu also simply used the
> natural_t type for port numbers, which is thus the same for 32 and 64bit
> builds, and saves a lot of headaches. I guess we don't really really
> need >2^32 ports anyway, so wouldn't it make sense to use a 32bit type
> too for a 64bit port?

I guess it's the preferable way. I'm not sure Xnu "chose" anything
there, as port names were early intended to be natural_t (because of
the optimization described above).

-- 
Richard Braun

Re: 64bit GNU Mach

2012-04-02 Thread Richard Braun

On Mon, Apr 02, 2012 at 12:35:25PM +0200, Samuel Thibault wrote:
> Richard Braun, le Mon 02 Apr 2012 12:03:34 +0200, a écrit :
> > Kernel port names are addresses of their target IPC port object.
> 
> Inside the kernel, yes. In the userland processes, no. That's the
> difference that xnu makes between mach_port_t (kernelland) and
> mach_port_name_t (userland) (sorry I forgot to mention that in my
> previous mail).

[...]

> We can also rather simply fix the code that converts from names to
> pointers into using different sizes, like xnu does.

How do they convert names from/to pointers ? Regular IPC space lookups ?

-- 
Richard Braun

Re: 64bit GNU Mach

2012-04-02 Thread Richard Braun

On Mon, Apr 02, 2012 at 01:57:31PM +0200, Samuel Thibault wrote:
> Richard Braun, le Mon 02 Apr 2012 13:46:08 +0200, a écrit :
> > How do they convert names from/to pointers ? Regular IPC space lookups ?
> 
> Well, yes, just like GNU Mach does, in ipc_kmsg_copyin_header etc.

So we'd loose the optimization, which doesn't seem to be a problem
considering it doesn't concern our main performance issues, and we'd
gain a cleaner interface between ports and names. From what I've seen,
the Hurd doesn't rely on the size of port names either, as it always
uses some lookup mechanism (actually ihash only if I'm right). Now, is
there a point having two types for kernel/user port names, or can't we
just stick to mach_port_t having the right size (the one which matters
for user tasks) ?

-- 
Richard Braun

Re: 64bit GNU Mach

2012-04-02 Thread Richard Braun

On Mon, Apr 02, 2012 at 02:18:34PM +0200, Samuel Thibault wrote:
> But then it needs some trick to convert to pointer everywhere needed.
> Trying to manage low-4GiB allocation to avoid the trick would make
> things nasty: remember that x86_64 does not have segmentation any more,
> only flat addressing space, so we'd still need a 4GiB User/Kernel
> separation, which was precisely what I wanted to avoid.

I'm not following there. The conversion to pointer would be a simple
lookup (changing all occurrences in the kernel can be tedious, agreed).
I don't see the relation with segmentation and the 4GiB split. What is
the layout you expect for the kernel space ? First 4 GiB user then
kernel ? And you thought of segmentation to implicitely shift
addresses ? IMHO, changing GNU Mach to cleanly convert port names where
needed remains the sane choice.

-- 
Richard Braun

Re: 64bit GNU Mach

2012-04-02 Thread Richard Braun

On Mon, Apr 02, 2012 at 02:39:05PM +0200, Samuel Thibault wrote:
> > I don't see the relation with segmentation and the 4GiB split.
> 
> I said: "to avoid the trick", i.e. just use 32bit pointers, to just use
> the same type as in userland as you suggested.
> 
> > What is the layout you expect for the kernel space ? First 4 GiB user
> > then kernel ?
> 
> First 4GiB user, last 4GiB kernel.
> 
> > And you thought of segmentation to implicitely shift addresses ?
> 
> Again, there is no segmentation in x86_64.

I was wondering why you referred to segmentation in the first place.
I guess segmentation is what you referred to as "the trick". It wasn't
a suggestion :).

> > IMHO, changing GNU Mach to cleanly convert port names where needed
> > remains the sane choice.
> 
> That is, use 32bit port names for userland, and convert to 64bit port
> addresses for kernelland. But that can only work if using different
> types.

I believe we're thinking of two different things here. My current idea
of the solution is to directly convert names (32-bits) to ports or other
IPC objects (e.g. port sets, 64-bits). You may be thinking of converting
user names (32-bits) to kernel names (64-bits), then kernel names to IPC
objects. Am I right ?

-- 
Richard Braun

Re: 64bit GNU Mach

2012-04-02 Thread Richard Braun

On Mon, Apr 02, 2012 at 02:54:17PM +0200, Samuel Thibault wrote:
> > I believe we're thinking of two different things here. My current idea
> > of the solution is to directly convert names (32-bits) to ports or other
> > IPC objects (e.g. port sets, 64-bits).
> 
> That's what I'm suggesting from the beginning. That needs two differents
> types (64bit mach_port_t and 32bit mach_port_name_t), where there is
> currently just one (vm_offset_t mach_port_t).

Then I don't see why you want two different types. There would be one
32-bits mach_port_t, and no mach_port_name_t.

-- 
Richard Braun

Re: 64bit GNU Mach

2012-04-02 Thread Richard Braun

On Mon, Apr 02, 2012 at 03:21:06PM +0200, Samuel Thibault wrote:
> Because there is code in GNU Mach that uses mach_port_t as a pointer,
> see e.g. kdportdeath(), which just casts mach_port_t into ipc_port_t, or
> ipc_kmsg_copyin_header which casts back the just-resolved object into
> mach_port_t. Such code might have to simply use the ipc_port_t type
> instead, I don't know. What I know for sure, however, is that xnu does
> typedef ipc_port_t mach_port_t, which makes me tend to think that in the
> kernel, mach_port_t is supposed to be a pointer, while mach_port_name_t
> is for the port names.

Keeping names-to-pointer casts wouldn't be the cleanest way, but it
may save some effort. Using ipc_port_t seems better to me, but as it's
completely internal to the kernel, it doesn't matter much. To answer
your inital question, yes, it doesn't hurt much and makes sense to use
a 32-bit type for port names, whatever the word size.

-- 
Richard Braun

Re: Setting behavior for clustered IO

2012-04-02 Thread Richard Braun

On Mon, Apr 02, 2012 at 12:23:03AM +0300, Maksym Planeta wrote:
> The project I suppose to work on is not new for GSoC. Two years ago a
> student, whose nick on irc channel is kam, tried to implement readahead,
> but didn't manage to do that for some reason. I looked through patch he
> has sent and there he appended function vm_behavior_set for setting
> behavior of page fault handler. As parameter, this function  accepts
> constant, that specifies behavior in way that is similar to the way is
> implemented in madvise.

This function would better be named vm_advise() (think of vm_map and
vm_protect, which are named mmap and mprotect on BSD/POSIX systems).

> But I think, that if difference between behavior lies only in size of
> chunk to be read it will be more flexible to accept as parameter of
> vm_behavior_set not constant, that represents behavior, but size of
> chunk. And madvise implementation in library will convert behavior name
> to size of chunk (and for sure do other work). This will allow do more
> work in user space and add to kernel less modifications.

I don't think it's the right way to achieve that. Fault handling
parameters should be handled by the same code. If it is extracted from
the kernel some day, then several alternative implementations could use
different policies for readahead. But putting such decisions in the
kernel or libc makes little difference in practice. Also, don't forget
that it's not simply forward readahead : most implementations also read
pages preceding the fault target. Our vm_advise() should follow the
existing POSIX interface, taking into account any Mach-specific detail.

> Also kam takes into account the direction of reading for sequential
> behavior, but I doubt if this really needed, because it seems to be
> unusual way of reading. Additionally madvise doesn't have appropriate
> parameters which could be used to inform kernel about direction of
> reading.
> 
> Am I right or is it better to do like kam was doing?

I don't think it's needed, no. Again, make it easy to implement the
POSIX call. That's plenty for a first attempt.

-- 
Richard Braun

Re: Setting behavior for clustered IO

2012-04-06 Thread Richard Braun

On Mon, Apr 02, 2012 at 08:43:41PM +0300, Maksym Planeta wrote:
> Sergio Lopez  writes:
> > Instead of implementing a new call, I think
> > memory_object_change_attributes should be extended to support setting
> > the cluster size for a given object.
> 
> Indeed, this seems to be better.

No, as 1/ the call applies to a memory range, like vm_protect, and
2/ it's about more than a cluster size.

As a side note, I'd like to remind that it's more important to focus on
the kernel/pagers protocol than applications/POSIX support. For now, you
could even mostly ignore that readahead policy can be changed (keep it
in mind to isolate its implementation, but adding a default policy only
would still do). The vm_advise call should be added later, when actually
allowing applications to change the policy.

-- 
Richard Braun

[PATCH] vm_map: augment VM maps with a red-black tree

2012-04-23 Thread Richard Braun

Because of heavy map entry fragmentation and a poor map entry merging
implementation, the kernel can sometimes need thousands of map entries
to describe the address space of a task (most often a pager). This
change introduces a red-black tree in the VM maps with the purpose of
speeding up entry lookups.

Feedbacks on performance improvements using this patch are welcome.
---
 version.m4  |2 +-
 vm/vm_map.c |  112 --
 vm/vm_map.h |   11 --
 3 files changed, 71 insertions(+), 54 deletions(-)

diff --git a/version.m4 b/version.m4
index 3bf4275..4185e45 100644
--- a/version.m4
+++ b/version.m4
@@ -1,4 +1,4 @@
 m4_define([AC_PACKAGE_NAME],[GNU Mach])
-m4_define([AC_PACKAGE_VERSION],[1.3.99])
+m4_define([AC_PACKAGE_VERSION],[1.3.99_redblackify_vmmap])
 m4_define([AC_PACKAGE_BUGREPORT],[bug-hurd@gnu.org])
 m4_define([AC_PACKAGE_TARNAME],[gnumach])
diff --git a/vm/vm_map.c b/vm/vm_map.c
index 8012bcf..487f8bb 100644
--- a/vm/vm_map.c
+++ b/vm/vm_map.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -213,6 +214,7 @@ void vm_map_setup(map, pmap, min, max, pageable)
vm_map_last_entry(map)  = vm_map_to_entry(map);
map->hdr.nentries = 0;
map->hdr.entries_pageable = pageable;
+   rbtree_init(&map->hdr.tree);
 
map->size = 0;
map->ref_count = 1;
@@ -307,9 +309,40 @@ void _vm_map_entry_dispose(map_header, entry)
 }
 
 /*
+ * Red-black tree lookup/insert comparison functions
+ */
+static inline int vm_map_entry_cmp_lookup(vm_offset_t addr,
+  const struct rbtree_node *node)
+{
+struct vm_map_entry *entry;
+
+entry = rbtree_entry(node, struct vm_map_entry, tree_node);
+
+if (addr >= entry->vme_end)
+return 1;
+
+if (addr >= entry->vme_start)
+return 0;
+
+return -1;
+}
+
+static inline int vm_map_entry_cmp_insert(const struct rbtree_node *a,
+  const struct rbtree_node *b)
+{
+struct vm_map_entry *entry;
+
+entry = rbtree_entry(a, struct vm_map_entry, tree_node);
+return vm_map_entry_cmp_lookup(entry->vme_start, b);
+}
+
+/*
  * vm_map_entry_{un,}link:
  *
  * Insert/remove entries from maps (or map copies).
+ *
+ * The start and end addresses of the entries must be properly set
+ * before using these macros.
  */
 #define vm_map_entry_link(map, after_where, entry) \
_vm_map_entry_link(&(map)->hdr, after_where, entry)
@@ -324,6 +357,8 @@ void _vm_map_entry_dispose(map_header, entry)
(entry)->vme_next = (after_where)->vme_next;\
(entry)->vme_prev->vme_next =   \
 (entry)->vme_next->vme_prev = (entry); \
+   rbtree_insert(&(hdr)->tree, &(entry)->tree_node,\
+ vm_map_entry_cmp_insert); \
MACRO_END
 
 #define vm_map_entry_unlink(map, entry)\
@@ -337,6 +372,7 @@ void _vm_map_entry_dispose(map_header, entry)
(hdr)->nentries--;  \
(entry)->vme_next->vme_prev = (entry)->vme_prev; \
(entry)->vme_prev->vme_next = (entry)->vme_next; \
+   rbtree_remove(&(hdr)->tree, &(entry)->tree_node);   \
MACRO_END
 
 /*
@@ -413,70 +449,41 @@ boolean_t vm_map_lookup_entry(map, address, entry)
register vm_offset_taddress;
vm_map_entry_t  *entry; /* OUT */
 {
-   register vm_map_entry_t cur;
-   register vm_map_entry_t last;
+   register struct rbtree_node *node;
+   register vm_map_entry_t hint;
 
/*
-*  Start looking either from the head of the
-*  list, or from the hint.
+*  First, make a quick check to see if we are already
+*  looking at the entry we want (which is often the case).
 */
 
simple_lock(&map->hint_lock);
-   cur = map->hint;
+   hint = map->hint;
simple_unlock(&map->hint_lock);
 
-   if (cur == vm_map_to_entry(map))
-   cur = cur->vme_next;
-
-   if (address >= cur->vme_start) {
-   /*
-*  Go from hint to end of list.
-*
-*  But first, make a quick check to see if
-*  we are already looking at the entry we
-*  want (which is usually the case).
-*  Note also that we don't need to save the hint
-*  here... it is the same hint (unless we are
-*  at the header, in which case the hint didn't
-*  buy us anything anyway).
-*/
-   last = vm_map_to_entry(map);
-   if ((cur != last) && (cur->vme_end > address)) {
-   *entry = cur;
-   return(TRUE);
-   }
-   }
-   else {
-

[PATCH v2] vm_map: augment VM maps with a red-black tree

2012-04-25 Thread Richard Braun

Because of heavy map entry fragmentation and a poor map entry merging
implementation, the kernel can sometimes need thousands of map entries
to describe the address space of a task (most often a pager). This
change introduces a red-black tree in the VM maps with the purpose of
speeding up entry lookups.

This new version adds code to check if the hint is immediately preceding
an address (in addition to containing the address), which increases the
hint hit ratio by around 2%, a value decreasing as the number of entries
gets larger.

Feedbacks on performance improvements using this patch are welcome.
---
 version.m4  |2 +-
 vm/vm_map.c |  117 ++-
 vm/vm_map.h |   11 --
 3 files changed, 77 insertions(+), 53 deletions(-)

diff --git a/version.m4 b/version.m4
index 3bf4275..e4e0d6c 100644
--- a/version.m4
+++ b/version.m4
@@ -1,4 +1,4 @@
 m4_define([AC_PACKAGE_NAME],[GNU Mach])
-m4_define([AC_PACKAGE_VERSION],[1.3.99])
+m4_define([AC_PACKAGE_VERSION],[1.3.99_redblackify_vmmap_v2])
 m4_define([AC_PACKAGE_BUGREPORT],[bug-hurd@gnu.org])
 m4_define([AC_PACKAGE_TARNAME],[gnumach])
diff --git a/vm/vm_map.c b/vm/vm_map.c
index 8012bcf..c46afc0 100644
--- a/vm/vm_map.c
+++ b/vm/vm_map.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -95,7 +96,7 @@ MACRO_END
  * Synchronization is required prior to most operations.
  *
  * Maps consist of an ordered doubly-linked list of simple
- * entries; a single hint is used to speed up lookups.
+ * entries; a hint and a red-black tree are used to speed up lookups.
  *
  * Sharing maps have been deleted from this version of Mach.
  * All shared objects are now mapped directly into the respective
@@ -213,6 +214,7 @@ void vm_map_setup(map, pmap, min, max, pageable)
vm_map_last_entry(map)  = vm_map_to_entry(map);
map->hdr.nentries = 0;
map->hdr.entries_pageable = pageable;
+   rbtree_init(&map->hdr.tree);
 
map->size = 0;
map->ref_count = 1;
@@ -307,9 +309,39 @@ void _vm_map_entry_dispose(map_header, entry)
 }
 
 /*
+ * Red-black tree lookup/insert comparison functions
+ */
+static inline int vm_map_entry_cmp_lookup(vm_offset_t addr,
+  const struct rbtree_node *node)
+{
+   struct vm_map_entry *entry;
+
+   entry = rbtree_entry(node, struct vm_map_entry, tree_node);
+
+   if (addr < entry->vme_start)
+   return -1;
+   else if (addr < entry->vme_end)
+   return 0;
+   else
+   return 1;
+}
+
+static inline int vm_map_entry_cmp_insert(const struct rbtree_node *a,
+  const struct rbtree_node *b)
+{
+   struct vm_map_entry *entry;
+
+   entry = rbtree_entry(a, struct vm_map_entry, tree_node);
+   return vm_map_entry_cmp_lookup(entry->vme_start, b);
+}
+
+/*
  * vm_map_entry_{un,}link:
  *
  * Insert/remove entries from maps (or map copies).
+ *
+ * The start and end addresses of the entries must be properly set
+ * before using these macros.
  */
 #define vm_map_entry_link(map, after_where, entry) \
_vm_map_entry_link(&(map)->hdr, after_where, entry)
@@ -324,6 +356,8 @@ void _vm_map_entry_dispose(map_header, entry)
(entry)->vme_next = (after_where)->vme_next;\
(entry)->vme_prev->vme_next =   \
 (entry)->vme_next->vme_prev = (entry); \
+   rbtree_insert(&(hdr)->tree, &(entry)->tree_node,\
+ vm_map_entry_cmp_insert); \
MACRO_END
 
 #define vm_map_entry_unlink(map, entry)\
@@ -337,6 +371,7 @@ void _vm_map_entry_dispose(map_header, entry)
(hdr)->nentries--;  \
(entry)->vme_next->vme_prev = (entry)->vme_prev; \
(entry)->vme_prev->vme_next = (entry)->vme_next; \
+   rbtree_remove(&(hdr)->tree, &(entry)->tree_node);   \
MACRO_END
 
 /*
@@ -413,70 +448,49 @@ boolean_t vm_map_lookup_entry(map, address, entry)
register vm_offset_taddress;
vm_map_entry_t  *entry; /* OUT */
 {
-   register vm_map_entry_t cur;
-   register vm_map_entry_t last;
+   register struct rbtree_node *node;
+   register vm_map_entry_t hint;
 
/*
-*  Start looking either from the head of the
-*  list, or from the hint.
+*  First, make a quick check to see if we are already
+*  looking at the entry we want (which is often the case).
 */
 
simple_lock(&map->hint_lock);
-   cur = map->hint;
+   hint = map->hint;
simple_unlock(&map->hint_lock);
 
-   if (cur == vm_map_to_entry(map))
-   cur = cur->vme_next;
-
-   if (address >= cur->vme_start) {
-   /*
-*  Go from hint

Re: »No more PTYs« message

2012-05-25 Thread Richard Braun

On Fri, May 25, 2012 at 11:40:47AM +0200, Samuel Thibault wrote:
> Also, it only happens on some boxes, on buildds and my hurd box, I don't
> have the issue. It's very simple to test actually: log off from the
> machine, log on again, and check whether ssh was able to reuse the tty
> you had.

On darnassus, this bug was seen faster when syslogd was enabled.

-- 
Richard Braun

Re: Moving ufs to pthreads

2012-06-17 Thread Richard Braun

On Sun, Jun 17, 2012 at 02:58:06PM -0700, Thomas Thomas wrote:
> PS - Does anyone use UFS?

Probably not.

-- 
Richard Braun

Page cache issues

2012-07-03 Thread Richard Braun

Hello,

I'm currently working on the VM cache as implemented by GNU Mach, and
more specifically on the maximum cached objects limit, the goal being
to remove it and let memory pressure be the only flush trigger.

But when removing the limit, I experience far more ext2fs stability
issues than usual. After digging a bit into the problem, I've traced it
to pager_sync (called from diskfs_sync_everything). There doesn't seem
to be any obvious locking issue there, and the completion function looks
right too. What happens between these two functions ?

In addition, this problem may be related to the libpager deadlock issue
reported some years ago [1] [2], although it happens with both the
Debian hurd package and the upstream Hurd code. I regularly get
segmentation faults, but also deadlocks or infinite loops (which makes
me think of stack corruptions). If someone with more experience with
this part of the Hurd would take a look, it would be appreciated.

-- 
Richard Braun

[1] http://lists.gnu.org/archive/html/bug-hurd/2010-03/msg00127.html
[2] http://lists.gnu.org/archive/html/bug-hurd/2010-04/msg00012.html

Re: procfs, separate repo?

2012-07-03 Thread Richard Braun

On Sun, Jul 01, 2012 at 12:59:47PM +0700, Ivan Shmakov wrote:
> >>>>> Samuel Thibault  writes:
> 
>  > Should we keep procfs in a separate repository, or merge it into the
>  > main hurd repository?
> 
>  > Generally enough, did we write a guideline somewhere as to what
>  > should be in the main hurd repository, or should be separate?
> 
>   Given that Git has support submodules, but not (AIUI) for
>   repository merging and splitting, my opinion would be to keep
>   all but the bare minimum off the main Hurd repository.
> 
>   There could be a kind of hurd-full.git repository, which has all
>   the relevant submodules' configuration to tie all the Hurd
>   repositories together, though.

Unless it's very easy to use submodules, we should use one repository.
Other projects with much more content and history have showed it's
perfectly sane to keep that much in one place, and it simplifies keeping
the tightly coupled modules of the Hurd in sync.

-- 
Richard Braun

[PATCH] Fix stack corruption in ext2fs server

2012-07-03 Thread Richard Braun

* ext2fs/inode.c (diskfs_node_iterate): allocate the temporary node
table from the heap instead of the stack.

Signed-off-by: Richard Braun 
---
 ext2fs/inode.c |   12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/ext2fs/inode.c b/ext2fs/inode.c
index f25cc1f..2da8a95 100644
--- a/ext2fs/inode.c
+++ b/ext2fs/inode.c
@@ -552,7 +552,16 @@ diskfs_node_iterate (error_t (*fun)(struct node *))
 for (node = nodehash[n]; node; node = node->dn->hnext)
   num_nodes++;
 
-  node_list = alloca (num_nodes * sizeof (struct node *));
+  /* TODO This method doesn't scale beyond a few dozen nodes and should be
+ replaced.  */
+  node_list = malloc (num_nodes * sizeof (struct node *));
+  if (node_list == NULL)
+{
+  spin_unlock (&diskfs_node_refcnt_lock);
+  ext2_debug ("unable to allocate temporary node table");
+  return ENOMEM;
+}
+
   p = node_list;
   for (n = 0; n < INOHSZ; n++)
 for (node = nodehash[n]; node; node = node->dn->hnext)
@@ -576,6 +585,7 @@ diskfs_node_iterate (error_t (*fun)(struct node *))
   diskfs_nrele (node);
 }
 
+  free (node_list);
   return err;
 }
 
-- 
1.7.10.4

Re: [PATCH] Fix stack corruption in ext2fs server

2012-07-03 Thread Richard Braun

On Tue, Jul 03, 2012 at 09:13:36PM -0300, Samuel Thibault wrote:
> Richard Braun, le Tue 03 Jul 2012 23:13:26 +0200, a écrit :
> > * ext2fs/inode.c (diskfs_node_iterate): allocate the temporary node
> > table from the heap instead of the stack.
> 
> That can be quite a lot for the stack indeed, thanks!

It's also quite a lot of heap too, and every 5 seconds. We should
consider increasing the interval between global syncs to reduce the
impact of this potentially heavy operation.

-- 
Richard Braun

Re: procfs, separate repo?

2012-07-03 Thread Richard Braun

On Wed, Jul 04, 2012 at 08:29:13AM +0200, Thomas Schwinge wrote:
> On Tue, 3 Jul 2012 21:19:12 +0200, Richard Braun  wrote:
> > Unless it's very easy to use submodules, we should use one repository.
> > Other projects with much more content and history have showed it's
> > perfectly sane to keep that much in one place, and it simplifies keeping
> > the tightly coupled modules of the Hurd in sync.
> 
> On the other Hand, one of the Hurd's unique characteristics is that it is
> *not* a big monolothic blob like other systems, but instead does have
> clearly defined interfaces allowing (and encouraging!) for separation of
> components.  I think that keeping our operating system modules separate
> helps to highlight this fact, which is why I already years ago favored
> separation over putting it all into one repository.  Of course,
> decoupling stuff too much comes with additional maintenance burden, so we
> have to set a limit somewhere.  But, packaging procfs as its own Debian
> package (like other translators are too) from its own source tree
> shouldn't need much maintenance effort, I think?

It's also one of the things that make it tedious to change things, and
can keep developers away (me being one of them). Also, I consider procfs
part of the core servers as it is required for some often used coreutils
and psmisc tools to work. It's fine to keep the less often used (and
maintained) extra Hurd programs (such as xmlfs) external, and we could
even remove those that got to this state (such as ufs and bsdfsck) from
the main repository, but anything that is used and actively maintained
(or very simple to keep in sync) should be close to the core code.

-- 
Richard Braun

Re: [PATCH] Fix stack corruption in ext2fs server

2012-07-04 Thread Richard Braun

On Wed, Jul 04, 2012 at 07:58:13AM -0300, Samuel Thibault wrote:
> Richard Braun, le Wed 04 Jul 2012 08:22:13 +0200, a écrit :
> > It's also quite a lot of heap too, and every 5 seconds. We should
> > consider increasing the interval between global syncs to reduce the
> > impact of this potentially heavy operation.
> 
> It would increase the amount of work you loose on crash. Did you measure
> the CPU use that you get with a typical filesystem, by e.g. touching a
> file every 4 seconds to trigger the sync all the time?

No.

> I guess the real fix would be to be better at knowing which inodes need
> to be recorded, and filter on that.

Yes, hence the comment in my patch, but I leave that as an exercice for
another motivated hacker.

-- 
Richard Braun

Weekly meetings during the GSoC

2012-07-09 Thread Richard Braun

Hello,

Following our discussions on IRC, the weekly meetings will be reinstated
during the Google summer of code, to better follow the progress of the
students (although other subjects are of course welcome). The meeting is
scheduled on Thurdsay, 19:00 UTC.

-- 
Richard Braun

[PATCH] VM cache policy change.

2012-07-09 Thread Richard Braun

This patch lets the kernel unconditionnally cache non empty unreferenced
objects instead of using a fixed arbitrary limit. As the pageout daemon
evicts pages, it collects cached objects that have become empty. The
effective result is a graceful adjustment of the number of objects
related to memory management (virtual memory objects, their associated
ports, and potentially objects maintained in the external memory
managers). Physical memory can now be almost entirely filled up with
cached pages. In addition, these cached pages are not automatically
deactivated as objects can quickly be referenced again.

There are problems with this patch however. The first is that, on
machines with a large amount of physical memory (above 1 GiB but it also
depends on usage patterns), scalability issues are exposed. For example,
file systems which don't throttle their writeback requests can create
thread storms, strongly reducing system responsiveness. Other issues
such as linear scans of memory objects also add visible CPU overhead.

The second is that, as most memory is used, it increases the chances of
swapping deadlocks. The pageout daemon priority and the thresholds used
to wake it have been increased to help with that, and in practice, this
hack works well for most workloads. Applications that map large objects
and quickly cause lots of page faults can still easily bring the system
to its knees.

This patch should improve performance a lot on real hardware, but it
can slightly reduce it for virtualized systems with much physical memory
where disk accesses use the host cache.

Note that you need up-to-date versions of both GNU Mach [1] and the
Hurd [2] to test this patch, as it needs important bug fixes to avoid
file system crashes and kernel panics.

Remember that, in addition to vmstat, you can monitor the state of the
slab allocator with slabinfo [3].

Feedbacks are very welcome.

[1] git commit 8d219eab0dcfbdcf464340630d568c4e16d7acbd
[2] git commit 2f4f65ce9137aab6acaf1004bacc09d3a975d881
[3] http://git.sceen.net/rbraun/slabinfo.git/
---
 vm/vm_object.c   |  177 --
 vm/vm_object.h   |9 ++-
 vm/vm_pageout.c  |   12 +++-
 vm/vm_resident.c |2 +
 4 files changed, 71 insertions(+), 129 deletions(-)

diff --git a/vm/vm_object.c b/vm/vm_object.c
index f101708..15c9ac0 100644
--- a/vm/vm_object.c
+++ b/vm/vm_object.c
@@ -61,8 +61,6 @@ void memory_object_release(
pager_request_t pager_request,
ipc_port_t  pager_name); /* forward */
 
-void vm_object_deactivate_pages(vm_object_t);
-
 /*
  * Virtual memory objects maintain the actual data
  * associated with allocated virtual memory.  A given
@@ -141,7 +139,11 @@ void vm_object_deactivate_pages(vm_object_t);
  * ZZZ Continue this comment.
  */
 
-struct kmem_cache  vm_object_cache; /* vm backing store cache */
+/*
+ * Cache used for vm_object allocations, not to be confused
+ * with the cache of persisting virtual memory objects.
+ */
+struct kmem_cache  vm_object_cache;
 
 /*
  * All wired-down kernel memory belongs to a single virtual
@@ -163,8 +165,9 @@ vm_object_t kernel_object = &kernel_object_store;
  *
  * The kernel may choose to terminate objects from this
  * queue in order to reclaim storage.  The current policy
- * is to permit a fixed maximum number of unreferenced
- * objects (vm_object_cached_max).
+ * is to let memory pressure dynamically adjust the number
+ * of unreferenced objects. The pageout daemon attempts to
+ * collect objects after removing pages from them.
  *
  * A simple lock (accessed by routines
  * vm_object_cache_{lock,lock_try,unlock}) governs the
@@ -179,8 +182,6 @@ vm_object_t kernel_object = &kernel_object_store;
  * not be held to make simple references.
  */
 queue_head_t   vm_object_cached_list;
-intvm_object_cached_count;
-intvm_object_cached_max = 4000;/* may be patched*/
 
 decl_simple_lock_data(,vm_object_cached_lock_data)
 
@@ -334,6 +335,33 @@ void vm_object_init(void)
IKOT_PAGING_NAME);
 }
 
+void vm_object_collect(
+   register vm_object_tobject)
+{
+   vm_object_unlock(object);
+
+   /*
+*  The cache lock must be acquired in the proper order.
+*/
+
+   vm_object_cache_lock();
+   vm_object_lock(object);
+
+   /*
+*  If the object was referenced while the lock was
+*  dropped, cancel the termination.
+*/
+
+   if (!vm_object_collectable(object)) {
+   vm_object_unlock(object);
+   vm_object_cache_unlock();
+   return;
+   }
+
+   queue_remove(&vm_object_cached_list, object, vm_object_t, cached_list);
+   vm_object_terminate(object);
+}
+
 /*
  * vm_object_reference:
  *
@@ -394,102 +422,31 @@ void vm_object_deallocate(
 
/*
 *  See whether this obj

Re: [PATCH] VM cache policy change.

2012-07-09 Thread Richard Braun

On Mon, Jul 09, 2012 at 04:44:57PM +0200, Richard Braun wrote:
> Note that you need up-to-date versions of both GNU Mach [1] and the
> Hurd [2] to test this patch, as it needs important bug fixes to avoid
> file system crashes and kernel panics.

Debian packages with the patch and the required bug fixes are available
at my repository :

deb http://ftp.sceen.net/debian-hurd experimental/
deb-src http://ftp.sceen.net/debian-hurd experimental/

They are currently used on the public hurd box darnassus.sceen.net.

Re: [PATCH] VM cache policy change.

2012-07-09 Thread Richard Braun

On Tue, Jul 10, 2012 at 03:15:00AM +0200, Samuel Thibault wrote:
> Richard Braun, le Tue 10 Jul 2012 03:03:09 +0200, a écrit :
> > Debian packages with the patch and the required bug fixes are available
> > at my repository :
> 
> I guess they are all already in upstream gits?
> 
> I'm currently uploading new hurd, gnumach & netdde packages.

The bug fixes have been committed, but not the VM cache patch, which is
included in my gnumach package. I'll rebase it soon.

-- 
Richard Braun

Re: [PATCH] VM cache policy change.

2012-07-10 Thread Richard Braun

On Tue, Jul 10, 2012 at 08:24:30AM +0200, Richard Braun wrote:
> On Tue, Jul 10, 2012 at 03:15:00AM +0200, Samuel Thibault wrote:
> > I'm currently uploading new hurd, gnumach & netdde packages.
> 
> The bug fixes have been committed, but not the VM cache patch, which is
> included in my gnumach package. I'll rebase it soon.

An up-to-date gnumach package including the patch is now available.

-- 
Richard Braun

Re: New procfs implementation

2012-07-12 Thread Richard Braun

On Thu, Jul 12, 2012 at 04:03:55PM +0200, Thomas Schwinge wrote:
> I have now promoted this branch from jkoenig/master to master -- it is
> the procfs variant that we've been using ever since, and the two
> different branches recently confused Richard.

That was quick, thanks.

-- 
Richard Braun

[PATCH] Page cache accounting

2012-07-14 Thread Richard Braun

If no major objection is raised (particularly about the new Mig
routine), I'll commit this patch in a few days.

The decision to add the call to the mach4 interface stems from
pragmatism. It doesn't break existing compatibility, and it doesn't add
the burden of a new interface.
---
 include/mach/mach4.defs|   10 ++
 include/mach/mach_types.h  |1 +
 include/mach/vm_cache_statistics.h |   33 +
 vm/vm_object.c |   14 ++
 vm/vm_object.h |   19 +++
 vm/vm_resident.c   |   13 +
 vm/vm_user.c   |   14 ++
 vm/vm_user.h   |2 ++
 8 files changed, 106 insertions(+), 0 deletions(-)
 create mode 100644 include/mach/vm_cache_statistics.h

diff --git a/include/mach/mach4.defs b/include/mach/mach4.defs
index 114edf4..8ac49e2 100644
--- a/include/mach/mach4.defs
+++ b/include/mach/mach4.defs
@@ -110,3 +110,13 @@ routine memory_object_create_proxy(
start   : vm_offset_array_t;
len : vm_offset_array_t;
out proxy   : mach_port_t);
+
+type vm_cache_statistics_data_t = struct[16] of integer_t;
+
+/*
+ * Return page cache statistics for the host on which the target task
+ * resides.
+ */
+routine vm_cache_statistics(
+   target_task : vm_task_t;
+   out vm_cache_stats  : vm_cache_statistics_data_t);
diff --git a/include/mach/mach_types.h b/include/mach/mach_types.h
index f6ceac3..8768482 100644
--- a/include/mach/mach_types.h
+++ b/include/mach/mach_types.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef MACH_KERNEL
 #include  /* for task_array_t */
diff --git a/include/mach/vm_cache_statistics.h 
b/include/mach/vm_cache_statistics.h
new file mode 100644
index 000..6328019
--- /dev/null
+++ b/include/mach/vm_cache_statistics.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2012 Free Software Foundation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef _MACH_VM_CACHE_STATISTICS_H_
+#define _MACH_VM_CACHE_STATISTICS_H_
+
+#include 
+
+struct vm_cache_statistics {
+   integer_t   object_count;   /* # of cached objects */
+   integer_t   page_count; /* # of cached pages */
+   integer_t   _reserved[14];
+};
+
+typedef struct vm_cache_statistics *vm_cache_statistics_t;
+typedef struct vm_cache_statistics vm_cache_statistics_data_t;
+
+#endif /* _MACH_VM_CACHE_STATISTICS_H_ */
diff --git a/vm/vm_object.c b/vm/vm_object.c
index f101708..7eae3d7 100644
--- a/vm/vm_object.c
+++ b/vm/vm_object.c
@@ -192,6 +192,15 @@ decl_simple_lock_data(,vm_object_cached_lock_data)
simple_unlock(&vm_object_cached_lock_data)
 
 /*
+ * Number of physical pages referenced by cached objects.
+ * This counter is protected by its own lock to work around
+ * lock ordering issues.
+ */
+intvm_object_cached_pages;
+
+decl_simple_lock_data(,vm_object_cached_pages_lock_data)
+
+/*
  * Virtual memory objects are initialized from
  * a template (see vm_object_allocate).
  *
@@ -410,6 +419,7 @@ void vm_object_deallocate(
queue_enter(&vm_object_cached_list, object,
vm_object_t, cached_list);
overflow = (++vm_object_cached_count > 
vm_object_cached_max);
+   
vm_object_cached_pages_update(object->resident_page_count);
vm_object_cache_unlock();
 
vm_object_deactivate_pages(object);
@@ -1860,6 +1870,7 @@ vm_object_t vm_object_lookup(
queue_remove(&vm_object_cached_list, object,
 vm_object_t, cached_list);
vm_object_cached_count--;
+   
vm_object_cached_pages_update(-object->resident_page_count);
}
 
object->ref_count++;
@@ -1891,6 +1902,7 @@ vm_object_t vm_object_lookup_name(
queue_remove(&vm_object_cached_list, object,
 vm_object_t, cached_list);

Page cache accounting and policy, new Debian packages

2012-07-15 Thread Richard Braun

On Tue, Jul 10, 2012 at 07:26:54PM +0200, Richard Braun wrote:
> An up-to-date gnumach package including the patch is now available.

Patched versions of the gnumach and hurd packages are available at my
repository. They include the page cache accounting and policy change
patches (gnumach), and modifications to make vmstat and procfs report
cache statistics (hurd). There also are eglibc packages, rebuilt to
include the new vm_cache_statistics call.

These new versions have been installed on the public hurd box
darnassus.sceen.net.

-- 
Richard Braun

Re: [PATCH] Page cache accounting

2012-07-15 Thread Richard Braun

On Sun, Jul 15, 2012 at 06:39:05PM +0200, Neal H. Walfield wrote:
> I'd encourage you to use a more self-describing data structure.  In
> the very least, please consider a version number and a bitmask
> indicating which fields are valid.

What about versioning the call itself, e.g. vm_cache_statistics1 etc.. ?

-- 
Richard Braun

Re: [PATCH] VM cache policy change.

2012-07-16 Thread Richard Braun

On Mon, Jul 16, 2012 at 10:30:23PM +0200, Thomas Schwinge wrote:
> I have logs for »ps -w 0 -AF hurd-long«, »vmstat«, »slabinfo« for all
> runs, directly after booting, and some time after the glibc »make check«,
> please tell me which you'd like to see; it's a lot of data already.

Thanks for this report. The results of vmstat and slabinfo (vm_object,
thread and ipc_port lines) are what I'm mostly interested in. I suspect
there are quite a lot of them, and the performance regression probably
comes from the scalability issues previously described.

-- 
Richard Braun

Re: [PATCH] Page cache accounting

2012-07-17 Thread Richard Braun

On Sat, Jul 14, 2012 at 05:06:37PM +0200, Richard Braun wrote:
> If no major objection is raised (particularly about the new Mig
> routine), I'll commit this patch in a few days.

As I intend to continue working on the page cache, and because the
policy patch alone can't be included now, I have created a branch
in the gnumach and hurd repositories. For now they contain the changes
concerning the page cache statistics, ready to be merged. A separate
patch for procfs will follow.

-- 
Richard Braun

[PATCH] Report page cache size

2012-07-17 Thread Richard Braun

* rootdir.c: Add #include .
(rootdir_gc_meminfo): Call vm_cache_statistics() and report the result.
---
 rootdir.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/rootdir.c b/rootdir.c
index 1fa71b0..c24585e 100644
--- a/rootdir.c
+++ b/rootdir.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -263,6 +264,7 @@ rootdir_gc_meminfo (void *hook, char **contents, ssize_t 
*contents_len)
   host_basic_info_data_t hbi;
   mach_msg_type_number_t cnt;
   struct vm_statistics vmstats;
+  struct vm_cache_statistics cachestats;
   default_pager_info_t swap;
   error_t err;
 
@@ -270,6 +272,10 @@ rootdir_gc_meminfo (void *hook, char **contents, ssize_t 
*contents_len)
   if (err)
 return EIO;
 
+  err = vm_cache_statistics (mach_task_self (), &cachestats);
+  if (err)
+return EIO;
+
   cnt = HOST_BASIC_INFO_COUNT;
   err = host_info (mach_host_self (), HOST_BASIC_INFO, (host_info_t) &hbi, 
&cnt);
   if (err)
@@ -294,7 +300,7 @@ rootdir_gc_meminfo (void *hook, char **contents, ssize_t 
*contents_len)
   (long unsigned) hbi.memory_size / 1024,
   (long unsigned) vmstats.free_count * PAGE_SIZE / 1024,
   0,
-  0,
+  (long unsigned) cachestats.page_count * PAGE_SIZE / 1024,
   (long unsigned) vmstats.active_count * PAGE_SIZE / 1024,
   (long unsigned) vmstats.inactive_count * PAGE_SIZE / 1024,
   (long unsigned) vmstats.wire_count * PAGE_SIZE / 1024,
-- 
1.7.10.4

Re: Concerning pthreads and such

2012-07-18 Thread Richard Braun

On Tue, Jul 17, 2012 at 04:46:37PM -0400, Barry deFreese wrote:
> Have you gotten any feedback on this?  Are you still working on it/testing 
> it?  Great stuff!

Agreed. Although, if you could set up a git repository with a dedicated
branch, it would make it easier for us to see and test your progress.

-- 
Richard Braun

About the io_select call

2012-07-22 Thread Richard Braun

Hello,

In the hurd/io.defs file, which defines the Hurd I/O interface, the
io_select call includes a timeout. However, none of the libraries and
servers implementing this call receive it. What makes this argument
"disappear" during stub code generation ?

-- 
Richard Braun

Re: About the io_select call

2012-07-22 Thread Richard Braun

On Sun, Jul 22, 2012 at 02:20:21PM +0200, Ludovic Courtès wrote:
> It is a “waittime” parameter, which gets special treatment from MIG.
> This is somewhat documented in
> <http://www.cs.cmu.edu/afs/cs/project/mach/public/doc/unpublished/mig.ps>.

All right, I got confused because I've been looking for the raw string
"waittime" in the MiG code, whereas the parser seems to be case
insensitive... Thanks for your reply.

-- 
Richard Braun

Re: Hurd build system Makeconf

2012-07-25 Thread Richard Braun

On Tue, Jul 24, 2012 at 11:12:11PM -0400, Barry deFreese wrote:
> Now the question is, what to do with cancel-cond.c?  I'm thinking of just 
> sticking it in
> libshouldbeinlibc for now?  Is there a more appropriate place?

Why not in libpthread, near the pthread_cond_wait and
pthread_cond_timedwait functions, considering how related they are ?

-- 
Richard Braun

Fixing select

2012-07-25 Thread Richard Braun

Hello,

A while ago [1], issues were found with the use of select along with
a timeout. In an attempt to fix those, I have created a branch in the
Hurd repository which adds a new call to the io interface,
io_select_timeout, and adjusts the servers to support this new call.
The core of the implementation lies in the new hurd_condition_timedwait
call, and the consistent use of the new Hurd-specific time_data_t type.
Comments on these changes are welcome.

For now, this branch only adds support for server-side timeouts. The
select implementation in glibc still needs to be changed to use the new
call when appropriate.

THanks.

-- 
Richard Braun

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=79358

Re: Fixing select

2012-07-27 Thread Richard Braun

On Wed, Jul 25, 2012 at 06:34:02PM +0200, Richard Braun wrote:
> For now, this branch only adds support for server-side timeouts. The
> select implementation in glibc still needs to be changed to use the new
> call when appropriate.

The change I want to apply means io_select calls must be asynchronous,
as replies will normally be processed by _hurd_select through a port
set. But io_select is declared as a routine in io.defs. While it's
relevant to keep it a routine for server side stub code generation, it
should be a simpleroutine when built in glibc (instead of a routine with
a timeout of 0). What's the recommended way to do that ? Is there a
macro I can use to cleanly identify what the stub code is built for,
e.g. a macro indicating that user stub code is generated ?

-- 
Richard Braun

Re: Hurd_condition_wait in glibc libpthreads in Debian

2012-08-01 Thread Richard Braun

On Wed, Aug 01, 2012 at 08:44:20AM -0700, Thomas DiModica wrote:
> Most of what I understand is from what Marcus has to say in this thread here:
> http://lists.gnu.org/archive/html/hurd-devel/2002-07/msg00010.html

That link explains the problem very well. It's better to keep the
current calls to hurd_condition_wait (i.e. through plain error handling)
than try to replace that with complicated cleanup callbacks.

> My understanding is that hurd_condition_wait is used with hurd_thread_cancel
> in libc, as libc uses a different cancellation semantics than is provided by
> POSIX in pthreads.

No, the semantics are the same. The internal implementation may slightly
differ, I haven't looked in detail. The point is how to handle
cancellation from a cancelled thread, not how to mark a thread as being
cancelled. The hurd_thread_cancel function merely exists because there
isn't any in the cthreads package. You should be able to replace it with
pthread_cancel without further effort.

-- 
Richard Braun

Re: Hurd_condition_wait in glibc libpthreads in Debian

2012-08-02 Thread Richard Braun

On Wed, Aug 01, 2012 at 06:40:05PM -0700, Thomas DiModica wrote:
> 
> 
> >No, the semantics are the same. The internal implementation may slightly
> >differ, I haven't looked in detail. The point is how to handle
> >cancellation from a cancelled thread, not how to mark a thread as being
> >cancelled. The hurd_thread_cancel function merely exists because there
> >isn't any in the cthreads package. You should be able to replace it with
> >pthread_cancel without further effort.
> 
> >-- 
> >Richard Braun
> 
> I don't think that the semantics are the same. From what I can tell,
> hurd_thread_cancel kindly informs the thread that it has been canceled
> and should take an appropriate action, while pthread_cancel has the
> thread call pthread_exit at the next cancellation point. How they mark a
> thread as canceled differs too: hurd cancellation is flagged within the
> thread's struct hurd_sigstate, while pthread cancellation is flagged
> within the thread's struct __pthread.

No, the semantics are the same. And you're saying it yourself :
"hurd_thread_cancel kindly informs the thread that it has been
canceled". The description of pthread_cancel is "The pthread_cancel()
function shall request that thread be canceled. [...] The cancellation
processing in the target thread shall run asynchronously with respect
to the calling thread returning from pthread_cancel()." [1]. The Linux
man page goes as far as to say "pthread_cancel - send a cancellation
request to a thread".

The difference, again, is in how cancellation requests are handled by a
thread noticing it has been cancelled (don't even consider asynchronous
cancellation in the Hurd). And as I said earlier, even if the
implementation varies, it does only slightly, as you confirmed it. The
only relevance of this difference is for the new hurd_condition_wait
function, obviously.

> Here is the declaration of hurd_thread_cancel from glibc's hurd.h:
> /* Cancel pending operations on THREAD.  If it is doing an interruptible RPC,
>    that RPC will now return EINTR; otherwise, the "cancelled" flag will be
>    set, causing the next `hurd_check_cancel' call to return nonzero or the
>    next interruptible RPC to return EINTR (whichever is called first).  */
> extern error_t hurd_thread_cancel (thread_t thread);

It should probably be replaced with pthread_cancel completely, since
they do the same thing. The difficulty will be adjusting the Hurd so
that the server threads never stop because of an unexpected cancellation
point. One way that comes to my mind is to disable cancellation by
default in Hurd server threads, and making pthread_cancel ignore that
state (i.e. still mark such threads as being cancelled). Normal
cancellation points will simply ignore those requests, whereas
hurd_condition_wait will report it. For more clarity, you could
introduce a Hurd-specific cancellation state/type with that effect.

-- 
Richard Braun

[1] http://pubs.opengroup.org/onlinepubs/009696799/functions/pthread_cancel.html

Re: Hurd_condition_wait in glibc libpthreads in Debian

2012-08-02 Thread Richard Braun

On Thu, Aug 02, 2012 at 03:13:12PM -0700, Thomas DiModica wrote:
> I get what you're saying: I'm confusing the semantics of how cancellation
> is HANDLED with the semantics of how it is SIGNALED. The signaling
> semantics are the same.

That's it.

> You mean like a PTHREAD_CANCEL_GNU? It would be doable.

I would personally name it PTHREAD_CANCEL_HURD to avoid increasing the
already existing confusion.

> If that were the case, we could merge hurd_condition_wait into
> phread_cond_wait as a special case. Maybe: streamio
> intermingles the two, pflocal did.
> That would also handle hurd_check_cancel.

Why not.

-- 
Richard Braun

Re: PTHREAD_CANCEL_HURD

2012-08-03 Thread Richard Braun

On Fri, Aug 03, 2012 at 12:50:29PM -0700, Roland McGrath wrote:
> Why add a new cancellation type?  PTHREAD_CANCEL_DEFERRED is already fine.
> You just need an extension function that is like pthread_testcancel but
> clears and returns the state instead of triggering cancellation handlers.

The problem with that is, if a Hurd server internally calls a function
that is a cancellation point, the calling thread will automatically exit
before having a chance to reply to the client. A Hurd-specific type
should allow the safe use of more standard functions.

-- 
Richard Braun

Re: PTHREAD_CANCEL_HURD

2012-08-03 Thread Richard Braun

On Fri, Aug 03, 2012 at 12:20:09PM -0700, Thomas DiModica wrote:
> There are interesting implications for this, however. pthread_join becomes a
> function that may eat a signal to cancel. The main implication is, 
> essentially,
> that every occurrence of condition_wait becomes a call to hurd_condition_wait.
> I don't know if any of the calls to condition_wait weren't cancellation 
> points for
> good reasons. Secondly, would this conflict with the proper functioning of the
> libraries? Does a utility that links to a hurd library have to declare itself 
> a
> server to pthreads for the library to work properly (if the library expects 
> these
> cancellation semantics)?

For any code unaware of this type, it would simply act as if
cancellation was disabled.

> PS. In timedwait, we dequeue the thread, but the cleanup handler (which is
> always run) ensures that the thread is dequeued also. Is this necessary, or
> just an artifact that is due to how all of the timed functions were 
> implemented?

The timing of your question is interesting. My current work on fixing
select has led me to introduce a hurd_condition_timedwait function in
the cthreads library. It prove more difficult than I initially expected
precisely because of the issues around the condition queue (they also
apply to pthread_mutex_timedlock). In short, this is a bug. You can take
a look at my work directly on the Hurd repository [1].

-- 
Richard Braun

[1] http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=rbraun/select_timeout

Re: PTHREAD_CANCEL_HURD

2012-08-03 Thread Richard Braun

On Fri, Aug 03, 2012 at 01:18:43PM -0700, Roland McGrath wrote:
> I see.  So the meaning of the new type is that normal cancellation
> handling is never triggered, instead the "cancelled" flag can only be
> polled by the explicit check API.  What do cancellable functions do,
> then?  Do they just fail with ECANCELED while leaving the cancelled
> flag set?

How would that make Hurd servers behave ? Would client receive the
ECANCELED error ? Isn't it better to just completely ignore the
cancellation everywhere except where hurd_condition_wait is called, as
it is done currently ? Except from the increased latencies (which can
easily be fixed by adding new hurd_condition_wait calls), I don't see
any issue with that, while I suspect some applications could react
angrily when meeting unexpected ECANCELED results.

> I'd call this new type by a name that's descriptive of what it means,
> rather than what programs are expected to use it.  For example,
> PTHREAD_CANCEL_POLLED.

Could it hurt portability (not that I know of any other system using
glibc that uses such tricks, but still) ?

-- 
Richard Braun

Re: PTHREAD_CANCEL_HURD

2012-08-03 Thread Richard Braun

On Fri, Aug 03, 2012 at 01:44:54PM -0700, Roland McGrath wrote:
> That's certainly not what's done currently.  hurd_thread_cancel causes
> all "interruptible" RPC stubs to return EINTR, either by preventing
> the RPC from being made or by sending interrupt_operation RPCs to
> servers.  This is the effect of the INTR_INTERFACE macro that appears
> in hurd/*.defs.  (See libc/hurd/intr-msg.c, _hurdsig_abort_rpcs, etc.)
> This is the moral equivalent to the POSIX calls that are cancellation
> points.  (The Hurd code predates the specification of pthreads by a
> few years, but it was designed according to the same concept of
> cancellation that lead to that part of the pthreads specification.)

Right, EINTR is indeed the expected error in such cases.

> I can't really imagine what issue you have in mind.  To fit with the
> convention in NPTL for extensions, it should be PTHREAD_CANCEL_POLLED_NP.
> (The "_NP" suffix indicates it's a non-POSIX feature.)

Nothing particular in mind, It's just that I initially saw this type as
being Hurd-specific, but there is no reason to tie it to the Hurd.

-- 
Richard Braun

Re: Fixing select

2012-08-05 Thread Richard Braun

On Wed, Jul 25, 2012 at 06:34:02PM +0200, Richard Braun wrote:
> A while ago [1], issues were found with the use of select along with
> a timeout. In an attempt to fix those, I have created a branch in the
> Hurd repository which adds a new call to the io interface,
> io_select_timeout, and adjusts the servers to support this new call.
> The core of the implementation lies in the new hurd_condition_timedwait
> call, and the consistent use of the new Hurd-specific time_data_t type.
> Comments on these changes are welcome.

Debian packages are available on my repository [1]. The Hurd packages
provide the new io_select_timeout call, while glibc includes a modified
_hurd_select implementation. They are currently being tested on the
darnassus.sceen.net public hurdbox. I haven't adjusted the dependencies,
so you should install the Hurd packages first and reboot, in order to
avoid any trouble.

Feedbacks are, as usual, very welcome.

-- 
Richard Braun

[1] deb http://ftp.sceen.net/debian-hurd experimental/

Re: PTHREAD_CANCEL_HURD

2012-08-07 Thread Richard Braun

On Mon, Aug 06, 2012 at 07:07:23PM -0700, Thomas DiModica wrote:
> I understand now: we want to immediately dequeue ourselves, even if it means
> wasting cycles later by checking to ensure that we were dequeued. The last
> thing we want is to return ETIMEDOUT when another thread has intervened,
> and deququed us to wake for a condition_signal. But... why are we using the
> mutex's lock, if we're queued on the condition? That's the bug.

One of them only. There are various subtleties concerning thread
queueing and wakeup. For example, the comment in __pthread_cond_broadcast
states that the queue can be walked without holding a lock, but when a
timeout occurs, a thread will dequeue itself from that same queue,
possibly concurrently (and as you can imagine, unsafely). Another
problem concerns the wakeup itself : as stated in
__pthread_cond_timedwait_internal, messages could be queued after a
thread timed out, which can have various effects, the most obvious one
being spurious wakeups (there is a chance for filled message queues
blocking sending threads but I'm not sure it can happen in practice).

My work on cthreads solves these issues by moving the wakeup operation
in critical sections, using a per-thread flag to avoid sending more than
one message. When a thread is about to return from condition_timedwait,
a few cases must be tested. Obviously, the message delivered flag must
be checked, as it also tells if the thread has already been dequeued by
another one. But it doesn't mean the current thread didn't time out (it
could have timed out, and then received a wakeup message before grabbing
the condition lock). So if a timeout occurred, the message queue must
be drained to avoid spurious wakeups. If the message delivered flag is
unset, no other thread acted on the condition, so the thread must remove
itself from the condition queue. It gets a bit more complicated with
cancellation as a cancellation request can suspend the target thread, so
the sigstate lock must be held before getting the condition lock.
Otherwise, the thread could be suspended after having acquired the
condition lock, and when the sender thread then tries to wake the target
thread (by using condition_broadcast), it would deadlock on the
condition lock. This is all explained in libthreads/cancel-cond.c.

I'll apply those fixes to libpthread soon as they are perfectly
relevant, since the algorithms are very similar (if not exactly the
same).

-- 
Richard Braun

Re: PTHREAD_CANCEL_HURD

2012-08-07 Thread Richard Braun

On Fri, Aug 03, 2012 at 12:20:09PM -0700, Thomas DiModica wrote:
> I was thinking about what Richard Braun said about removing 
> hurd_thread_cancel.
> Attached is an exploration into implementing PTHREAD_CANCEL_HURD as a
> cancellation state extension. It compiles, but I haven't tested if the 
> resulting library
> still works.
> 
> A server would call pthread_hurd_server_np upon start-up, which irreversibly 
> puts
> pthreads into hurd_server mode. All this really does is set the default 
> cancellation
> state for new threads to be the new cancellation state. The function
> pthread_check_and_clear_cancel_np replaces hurd_check_cancel. The name is
> more verbose, but describes what the function does. pthread_testcancel doesn't
> return a value and I didn't want to change that. Disgustingly, I've reused
> state_lock to act as the lock for cancel_pending, so we don't accidentally 
> lose
> any signals to cancel.

After some thinking, I don't think adding the new Hurd-specific calls is
a good thing. For one, having libports set the appropriate state is very
easy. Although I doubt it, we could imagine some servers might actually
use pthread cancellation the standard way (e.g. ported FUSE servers), so
making it the default for all threads doesn't sound good. And as I told
you privately, making it explicit (for custom threads, not libports
ones) is better for readability.

I'd say pthread_testcancel_np, returning a simple int, would be a better
name for the replacement of hurd_check_cancel. I suppose (but haven't
checked thoroughly) that holding the state_lock is mandatory and you
shouldn't feel disgust at having reused it.

-- 
Richard Braun

Question about peropens in the term server

2012-08-09 Thread Richard Braun

Hello,

While investigating an issue in the term server, I think I came across
a weird problem. The issue is that, after logging in and out through
ssh, a pty master will still consider itself open and refuse new
sessions, leading to the exhaustion of all the precreated pairs after
some time.

The GDB traced I've obtained [1] shows a couple of matching
create/destroy hooks, but the first (create) call, which trivfs control
matches ptyctl (po->cntl == ptyctl), is never destroyed. At the same
time, the last call shows a peropen address never seen before. How is
it possible to see a peropen at release time never seen at creation
time ?

-- 
Richard Braun

Re: Question about peropens in the term server

2012-08-09 Thread Richard Braun

On Fri, Aug 10, 2012 at 12:49:05AM +0200, Richard Braun wrote:
> The GDB traced I've obtained [1] shows a couple of matching
> create/destroy hooks, but the first (create) call, which trivfs control
> matches ptyctl (po->cntl == ptyctl), is never destroyed. At the same
> time, the last call shows a peropen address never seen before. How is
> it possible to see a peropen at release time never seen at creation
> time ?

[1] http://www.sceen.net/~rbraun/gdb.txt

-- 
Richard Braun

Re: PTHREAD_CANCEL_HURD

2012-08-11 Thread Richard Braun

On Fri, Aug 03, 2012 at 01:44:54PM -0700, Roland McGrath wrote:
> I can't really imagine what issue you have in mind.  To fit with the
> convention in NPTL for extensions, it should be PTHREAD_CANCEL_POLLED_NP.
> (The "_NP" suffix indicates it's a non-POSIX feature.)

What would you recommend we name the matching pthread_cond_wait-like
function ? Something like pthread_cond_poll_np ?

-- 
Richard Braun

Re: PTHREAD_CANCEL_HURD

2012-08-11 Thread Richard Braun

On Sat, Aug 11, 2012 at 07:42:27PM +0200, Richard Braun wrote:
> On Fri, Aug 03, 2012 at 01:44:54PM -0700, Roland McGrath wrote:
> > I can't really imagine what issue you have in mind.  To fit with the
> > convention in NPTL for extensions, it should be PTHREAD_CANCEL_POLLED_NP.
> > (The "_NP" suffix indicates it's a non-POSIX feature.)
> 
> What would you recommend we name the matching pthread_cond_wait-like
> function ? Something like pthread_cond_poll_np ?

Or rather, should we make pthread_cond_wait return ECANCELED/EINTR as
discussed ?

-- 
Richard Braun

Re: Question about your libpthread

2012-08-13 Thread Richard Braun

On Mon, Aug 13, 2012 at 11:38:02AM +0200, Neal H. Walfield wrote:
> The problem has to do with the recycling of stacks.  The problem is
> that a thread cannot easily free its own stack when it exits.  If it
> frees its stack, it can't free it's kernel thread and vice versa.  To
> work around this, the thread does neither.  Instead, later threads
> reuse the stack and kernel thread.  If the stacks had a non-standard,
> we would need some additional code to find an appropriately sized
> stack or implement a mechanism to clean up exited threads
> periodically.

It seems __pthread_create_internal does a bit of the two, i.e. It
attempts to "find" a suitable stack by checking the size of the recycled
thread, and if the size doesn't fit, it releases it, leaving itself
in a state that tells later functions to allocate a new stack. "Fit"
here means the stack size is the default. Why not check against the real
stack size as stored in the __pthread structure ?

In addition, here you only mention the recylcing problem, not the
threadvar alignment problem. Or is there no such problem ?

-- 
Richard Braun

Re: Compiling ext2fs.static with pthreads

2012-08-15 Thread Richard Braun

On Tue, May 08, 2012 at 07:16:09PM -0700, Thomas Thomas wrote:
> So, it runs as a translator. Spews out some unexpected debugging info that 
> Barry
> added, but I can add files, delete files, list directories, etc. I haven't 
> tested it under
> load, however. Thank you for the link on debugging translators. I tried to 
> start a
> sub-hurd using it, but for some reason it gets to somewhere in libc while 
> trying to
> create its first thread and derefs a null pointer. I don't have complete 
> debugging info
> for where it falls apart. Here's the backtrace:

I have also reached this state with the branch we're working on. With
the fixed pthread_hurd_cond_wait_np, and the hack to reduce stack sizes,
I could also test the ext2fs server under load, without any particular
issue.

> #0 0x080d0366 in _hurd_sigstate_lock ()
> #1 0x0805a570 in __pthread_sigstate ()
> #2 0x080598d8 in __pthread_create_internal ()
> #3 0x08059b67 in pthread_create ()
> #4 0x0805b522 in diskfs_start_disk_pager (upi=0x81e0d48,
>     pager_bucket=0x81e0d58, may_cache=1, notify_on_evict=1, size=268435456,
>     image=0x81c1ec4) at disk-pager.c:60
> #5 0x08054036 in create_disk_pager () at pager.c:1203
> #6 0x0804826b in main (argc=7, argv=0x3ffe78) at ext2fs.c:186
> 
> and the offending instruction is (x/i $pc)
> 
> => 0x80d0366 <_hurd_sigstate_lock+70>: xchg %eax,0x4(%edx)

I'm also currently working on solving this.

> So, with (x/i $pc-8), I find
> 0x80d035e <_hurd_sigstate_lock+62>: mov $0x1,%eax
> This looks to me to be the mutex lock code, though I am not good at x86 
> assembly,
> and don't know where the previous instruction could be (but then again, no 
> one can).
> I want to blame cthreads dependencies in libc, because that is the easy 
> solution.

It's a spin lock actually. Here is _hurd_sigstate_lock :

void
_hurd_sigstate_lock (struct hurd_sigstate *ss)
{
  if (sigstate_is_global_rcv (ss))
__spin_lock (&_hurd_global_sigstate->lock);
  __spin_lock (&ss->lock);
}

The problem is that _hurd_global_sigstate is NULL at this moment. It's
part of the global signal dispositions patch, and migrating Hurd servers
to pthreads means they now have to behave more POSIXely. The global
signal state is normally created by _hurdsig_init, but only if the proc
server exists (see hurd/hurdinit.c). Obviously, when used as a root file
system, this isn't the case.

What prevents _hurdsig_init from being called unconditionally and
earlier ?

-- 
Richard Braun

Re: Compiling ext2fs.static with pthreads

2012-08-15 Thread Richard Braun

On Wed, Aug 15, 2012 at 11:24:20AM +0200, Richard Braun wrote:
> What prevents _hurdsig_init from being called unconditionally and
> earlier ?

In the meantime, I merely made sigstate_is_global_rcv check that
_hurd_global_sigstate isn't NULL. Now I have a subhurd completely
populated by pthreads-enabled servers (libthreads.so.0.3 doesn't even
exist any more).

-- 
Richard Braun

Re: The workings of the pthreads ufs patch

2012-08-15 Thread Richard Braun

On Tue, Aug 14, 2012 at 01:01:03PM -0700, Thomas DiModica wrote:
> Richard, if you wish to remove this optimization for the sake of clarity, be
> my guest. I was just trying to be faithful to the working of the original
> code.

I won't remove it for two reasons: first, it looks indeed faithful to
the ctrheads version, and your explanations makes sense. Second, ufs
is planned to be phased out as noone uses it.

-- 
Richard Braun

Re: Question about your libpthread

2012-08-15 Thread Richard Braun

On Wed, Aug 15, 2012 at 03:03:55PM +0200, Samuel Thibault wrote:
> The alignment issue is for threadvars: as long as we have them, we
> need the stack to be aligned, and some constant size.  Fortunately,
> threadvars are to be dropped, but it's still not complete yet.

That's also the idea I got from reading the code. The work around I'm
currently using is to have a weak __pthread_stack_default_size symbol
in libpthread. If defined at runtime, this size will be used instead
of the 2 MiB default. For Hurd servers, this symbol is defined in
libports/manage-multithread.c, and set to 64 KiB, as for cthreads.
These parts are of course covered with FIXMEs everywhere :-).

Re: The semantics of pthread_cond_wait

2012-08-16 Thread Richard Braun

On Wed, Aug 15, 2012 at 05:47:47PM -0700, Thomas DiModica wrote:
> My understanding is that pthread_cond_wait is a cancellation point.
> It achieves this by entering asynchronous cancellation mode before blocking.
> 
> I don't see, however, any code that checks for a pending cancellation when
> we enter the function. As far as I can tell, the implementation is that
> pthread_cond_wait is a cancellation point if and only if the thread is
> canceled while it is blocked.

It seems so, yes.

> Right now, I can't think of any solution involving checking for cancellation
> that doesn't include a potential gap between checking for cancellation and
> entering asynchronous cancellation mode.

My opinion is that the implementation shouldn't rely on switching to
async mode, and rather add explicit cancellation tests. This would
require pthread_cancel be modified to wake the target thread in deferred
mode.

> PS Non-informative: setcanceltype in the cleanup function appears to
> be in the wrong place. Shouldn't it be the first thing to happen? Spin lock
> and spin unlock aren't async-cancel-safe, right? So, if we are in cleanup,
> lock the lock, and someone cancels us: async-cancellation injects a
> call to pthread_exit, where we call cleanup again and deadlock trying
> to lock a lock we hold.

That's not possible. The cleanup function is called from pthread_exit,
which disables cancellation, whichever type is set. This call in the
cleanup callback is meant to restore the type on entry in case no
cancellation occurred.

-- 
Richard Braun

Re: The semantics of pthread_cond_wait

2012-08-16 Thread Richard Braun

On Thu, Aug 16, 2012 at 11:48:35AM +0200, Richard Braun wrote:
> On Wed, Aug 15, 2012 at 05:47:47PM -0700, Thomas DiModica wrote:
> > My understanding is that pthread_cond_wait is a cancellation point.
> > It achieves this by entering asynchronous cancellation mode before blocking.
> > 
> > I don't see, however, any code that checks for a pending cancellation when
> > we enter the function. As far as I can tell, the implementation is that
> > pthread_cond_wait is a cancellation point if and only if the thread is
> > canceled while it is blocked.
> 
> It seems so, yes.

This tiny test shows the bug :

#include 
#include 
#include 

static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int predicate;
int cancelled;

static void *
run(void *arg)
{
printf("spin until cancellation is sent\n");

while (!cancelled)
sched_yield();

printf("waiting for predicate\n");

pthread_mutex_lock(&mutex);

while (!predicate)
pthread_cond_wait(&cond, &mutex);

pthread_mutex_unlock(&mutex);

printf("exiting thread\n");
return NULL;
}

int
main(int argc, char *argv[])
{
pthread_t thread;

pthread_create(&thread, NULL, run, NULL);
printf("sending cancellation\n");
pthread_cancel(thread);
cancelled = 1;

printf("joining thread\n");
pthread_join(thread, NULL);
return EXIT_SUCCESS;
}

-- 
Richard Braun

[PATCH] libports: reduce thread starvation on message floods

2012-08-20 Thread Richard Braun

libports/manage-multithread.c: Add #include 
(ports_manage_port_operations_multithread): Make threads depress
their priority on startup.
---
 libports/manage-multithread.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/libports/manage-multithread.c b/libports/manage-multithread.c
index 82fa2ac..2c690d2 100644
--- a/libports/manage-multithread.c
+++ b/libports/manage-multithread.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void
 ports_manage_port_operations_multithread (struct port_bucket *bucket,
@@ -123,6 +124,16 @@ ports_manage_port_operations_multithread (struct 
port_bucket *bucket,
   int timeout;
   error_t err;
 
+  /* XXX To reduce starvation, the priority of new threads is initially
+depressed. This helps already existing threads complete their job
+and be recycled to handle new messages. The duration of this
+depression is made a function of the total number of threads because
+more threads implies more contention, and the priority of threads
+blocking on a contented spin lock is also implicitely depressed.
+The lock isn't needed, since an approximation is sufficient. */
+  timeout = (((totalthreads - 1) / 100) + 1) * 10;
+  thread_switch(MACH_PORT_NULL, SWITCH_OPTION_DEPRESS, timeout);
+
   if (hook)
(*hook) ();
 
-- 
1.7.10.4

Status report on the conversion of the Hurd to pthreads

2012-08-21 Thread Richard Braun

Hello,

With the help of Barry deFreese and Thomas DiModica, a few machines are
currently running Hurd systems with no code dependency on the previous
cthreads library. Concerning upstream code, the appropriate branches can
be found there :
http://git.savannah.gnu.org/cgit/hurd/glibc.git/log/?h=rbraun/hurd_with_pthreads
http://git.savannah.gnu.org/cgit/hurd/libpthread.git/log/?h=rbraun/hurd_with_pthreads
http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=rbraun/hurd_with_pthreads

The Debian-specific branches are available at :
http://git.sceen.net/rbraun/debian_hurd.git/
http://git.sceen.net/rbraun/debian_netdde.git/

Debian packages are available for testing using this repository :
deb http://ftp.sceen.net/debian-hurd experimental/
deb-src http://ftp.sceen.net/debian-hurd experimental/

Upgrade glibc packages first, then the hurd and netdde ones.


The tests run so far show that almost everything behaves as it used to
with cthreads. Very few problems have been seen and most of them already
exist in the original implementation as well. The only major issue faced
concerns the "aggressive multithreading" of most Hurd servers. Some
slight differences between cthreads and libpthread cause servers to
suffer from starvation a lot more than they did. Although various
workarounds have been tried, this problem appears to have no real
solution without a design change. It happens when messages are received
faster than they can be processed, which cause servers to spawn new
threads to handle those messages. Limiting the number of threads isn't
an appropriate solution since some servers block until an expected
message arrives. Throttling thread creation (either based on time or
with thread priority as it is done in the current implementation) only
reduces the severity of the problem without solving it. The libports
ports_manage_port_operations_multithread function accepts timeouts to
regulate the lifetime of worker threads, but it only partially helps
since threads are detached, which means their stack remains allocated
(although this could be limited with some form of garbage collection in
libpthread).

On the other hand, one major source of "message flood" is the select
call. When select returns, either because of an event or a timeout,
reply ports are destroyed, which in turn cause as many dead name
notifications to be received by the server. It's a bit surprising to
discover that the reply port for such a call is passed as a normal
send right (using MACH_MSG_TYPE_MAKE_SEND). A send-once right would
probably be more appropriate. Future work will attempt to reduce the
message floods at their source so that libpthread based Hurd servers
can be used until a redesign allows really fixing this issue.

Thanks.

-- 
Richard Braun

1 2 3 4 5 6 >

1 - 100 of 595 matches

Mail list logo