from:"Arun Sharma"

TBB support for FreeBSD

2007-09-29 Thread Arun Sharma

FreeBSD support - tested on 7.0-CURRENT/amd64.
Should apply cleanly to tbb20_20070815oss_src.tar.gz.

Signed-off-by: Arun Sharma <[EMAIL PROTECTED]>

diff -r 627751b671bb -r ac2c116b7cee build/common.inc
--- a/build/common.inc  Sat Sep 29 16:18:03 2007 -0700
+++ b/build/common.inc  Sat Sep 29 16:51:17 2007 -0700
@@ -37,6 +37,9 @@ ifndef tbb_os
   endif
   ifeq ($(OS), Darwin)
export tbb_os=macos
+  endif
+  ifeq ($(OS), FreeBSD)
+   export tbb_os=freebsd
   endif
  endif
 endif
diff -r 627751b671bb -r ac2c116b7cee build/freebsd.gcc.inc
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/build/freebsd.gcc.inc Sat Sep 29 16:51:17 2007 -0700
@@ -0,0 +1,98 @@
+# Copyright 2005-2007 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+COMPILE_ONLY = -c -MMD
+PREPROC_ONLY = -E -MMD
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY = -fPIC
+WARNING_KEY = -Wall -Werror
+DYLIB_KEY = -shared
+LIBDL =
+
+TBB_NOSTRICT = 1
+
+CPLUS = g++ 
+INCLUDES += -I$(tbb_root)/src/tbb -I$(tbb_root)/include -I$(tbb_root)/src
+LIB_LINK_FLAGS = -shared
+LIBS = -lpthread -lrt
+C_FLAGS = $(CPLUS_FLAGS) -x c
+
+ifeq ($(cfg), release)
+CPLUS_FLAGS = -DDO_ITT_NOTIFY -O2 -DUSE_PTHREAD
+endif
+ifeq ($(cfg), debug)
+CPLUS_FLAGS = -DTBB_DO_ASSERT -DDO_ITT_NOTIFY -g -O0 -DUSE_PTHREAD
+endif
+
+ASM=
+ASM_FLAGS=
+
+TBB_ASM.OBJ=
+
+ifeq (itanium,$(arch))
+# Position-independent code (PIC) is a must for IA-64
+CPLUS_FLAGS += $(PIC_KEY)
+$(PIC_KEY) = 
+endif 
+
+ifeq (em64t,$(arch))
+CPLUS_FLAGS += -m64
+LIB_LINK_FLAGS += -m64
+endif 
+
+ifeq (ia32,$(arch))
+CPLUS_FLAGS += -m32
+LIB_LINK_FLAGS += -m32
+endif 
+
+#--
+# Setting assembler data.
+#--
+%.$(OBJ): %.s
+   cpp $(ASM_FLAGS) <$< | grep -v '^#' >$*.tmp
+   $(ASM) -o $@ $*.tmp
+   rm $*.tmp
+
+ASSEMBLY_SOURCE=$(arch)-gas
+ifeq (itanium,$(arch))
+ASM=ias
+TBB_ASM.OBJ = atomic_support.o lock_byte.o log2.o pause.o
+endif 
+#--
+# End of setting assembler data.
+#--
+
+#--
+# Setting tbbmalloc data.
+#--
+M_INCLUDES = $(INCLUDES) -I$(MALLOC_ROOT) -I$(MALLOC_SOURCE_ROOT)
+M_CPLUS_FLAGS = $(CPLUS_FLAGS) -fno-rtti -fno-exceptions -fno-schedule-insns2
+#--
+# End of setting tbbmalloc data.
+#--
diff -r 627751b671bb -r ac2c116b7cee build/freebsd.inc
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/build/freebsd.inc Sat Sep 29 16:51:17 2007 -0700
@@ -0,0 +1,86 @@
+# Copyright 2005-2007 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public Licens

Re: TBB support for FreeBSD

2007-09-30 Thread Arun Sharma

On 9/29/07, Mike Meyer <[EMAIL PROTECTED]> wrote:
>
> Any chance of getting this packaged as a FreeBSD port, which can apply
> the patch until it gets rolled into the distributed tarball? I don't
> see a TBB port.

I just send-pr'ed it. You can also get it from:

http://www.sharma-home.net/people/arun/misc/tbb.shar

 -Arun
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Results of investigating optimizing calloc()...

1999-08-04 Thread Arun Sharma

On Wed, Aug 04, 1999 at 01:20:59PM +0200, Dag-Erling Smorgrav wrote:
> "Kelly Yancey"  writes:
> > [...]
> 
> Which reminds me - has anyone thought of using DMA for zeroing pages,
> to avoid cache invalidation? The idea is to keep a chunk of zeroes on
> disk and DMA it into memory instead of clearing pages "manually". This
> assumes your disk supports DMA, of course.

On a Pentium III, you can use the new instructions to do page zero'ing
without allocating cache lines.

-Arun



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Excessive assembly code ?

1999-08-05 Thread Arun Sharma

Taking a quick look at /usr/src/sys/i386:

find . -name *.s | xargs wc -l
  44 ./svr4/svr4_locore.s
 216 ./apm/apm_setup.s
  24 ./linux/linux_locore.s
 461 ./isa/apic_ipl.s
1057 ./isa/apic_vector.s
 168 ./isa/icu_ipl.s
 224 ./isa/icu_vector.s
 387 ./isa/ipl.s
 113 ./isa/vector.s
  59 ./i386/bioscall.s
 340 ./i386/exception.s
 192 ./i386/globals.s
1000 ./i386/locore.s
 319 ./i386/mpboot.s
 555 ./i386/mplock.s
 310 ./i386/simplelock.s
1636 ./i386/support.s
 833 ./i386/swtch.s
 190 ./i386/vm86bios.s
8128 total  

I wonder if so much assembly code is really necessary for FreeBSD. One
argument for minimal usage of assembly code is that it is easier to code
non trivial algorithms in C.

One such example is the scheduler. Since the decision about which process
is going to run next is decided in assembly code, it is restricted to a
relatively dumb algorithm of scanning the runqs and picking one. If the
mechanism (i.e nuts and bolts of the context switch) is coded in assembly
and the policy (which process to pick next) is done in C, the code would
be much more maintainable, IMO.

How do people feel about it here ?

-Arun



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Usenix 93 paper on hardware profiling of 386BSD

1999-08-06 Thread Arun Sharma

Does anyone have a copy of Andrew McRae's Usenix 93 paper ?

The URL: ftp://ftp.cisco.com/amcrae/hardprof.PS doesn't
seem to be valid any more.

Thanks!

-Arun




To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: mmap bug

1999-08-12 Thread Arun Sharma

On Thu, Aug 12, 1999 at 12:02:19PM +0100, Tony Finch wrote:
> Matthew Dillon  wrote:
> >
> >One solution would be to map clean R+W pages RO and force a write fault
> >to occur, allowing the system to recognize that there are too many dirty
> >pages in vm_fault before it is too late and flush some of them.  The
> >downside of this is that, of course, we take unnecessary faults.
> 
> Surely they aren't unnecessary faults if they are required for correctness?

They _are_ unnecessary faults, if other correct solutions exist. 
The second alternative - to mark system daemons as special
sounds much more attractive.

-Arun



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: mmap bug

1999-08-12 Thread Arun Sharma

On Fri, Aug 13, 1999 at 03:04:43PM +0930, Mark Newton wrote:
> Arun Sharma wrote:
> 
>  > The second alternative - to mark system daemons as special
>  > sounds much more attractive.
> 
> Ok, now define the difference between "system daemons" and any other
> daemon (or, for that matter, any other process).

That's easy. 

$ ps aux | head
USER   PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED  TIME COMMAND
root 23924  5.0 30.2 41312 38716  ??  SSat05PM 191:41.92 /usr/X11R6/bin/
root 0  0.0  0.0 00  ??  DLs  31Jul99   0:02.30  (swapper)
root 1  0.0  0.2   504  200  ??  ILs  31Jul99   0:00.05 /sbin/init --
root 2  0.0  0.0 00  ??  DL   31Jul99   0:03.18  (pagedaemon)
root 3  0.0  0.0 00  ??  DL   31Jul99   0:00.00  (vmdaemon)
root 4  0.0  0.0 00  ??  DL   31Jul99   0:03.55  (bufdaemon)
root 5  0.0  0.0 00  ??  DL   31Jul99  12:06.17  (syncer) 

The daemons which are involved in freeing up pages during low memory
conditions qualify as system daemons. Making sure that these daemons
don't block avoids the deadlock.

-Arun



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: cache-friendly scheduling for SMP

1999-09-16 Thread Arun Sharma

On Thu, Sep 16, 1999 at 12:25:52PM +, greg wrote:

> Can anybody point me to a paper, mailing list discussion, etc. that discusses 
> scheduling processes to not thrash the cpu caches?  Or if there's anything in 
> place, how I can take advantage of it, etc.  I got stumped on the idea
> a while ago, so I'm really curious...

In -current, there is already code to do trivial CPU affinity. Basically,
given multiple processes in the same priority queue to choose from, the
scheduler will pick the one that last ran on the same CPU.

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

pv_table/pv_entry

1999-06-01 Thread Arun Sharma

Going through the 4.4 BSD book, I learnt that the purpose of the pv_table
is to be able to locate all the mappings to a given physical page.

However, comparing this to the Linux approach, which chains vm_area_struct
(analogous to vm_map_entry in FreeBSD) together to locate the shared
mappings, it appears to me that the Linux approach is more space efficient.

So why not eliminate pv_table and chain vm_map_entries together to represent
the sharing information ?

-Arun
 


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: pv_table/pv_entry

1999-06-02 Thread Arun Sharma

On Wed, Jun 02, 1999 at 11:16:32AM -0700, Jason Thorpe wrote:
> On Tue, 1 Jun 1999 18:08:35 -0700 
>  Arun Sharma  wrote:
> 
>  > Going through the 4.4 BSD book, I learnt that the purpose of the pv_table
>  > is to be able to locate all the mappings to a given physical page.
>  > 
>  > However, comparing this to the Linux approach, which chains vm_area_struct
>  > (analogous to vm_map_entry in FreeBSD) together to locate the shared
>  > mappings, it appears to me that the Linux approach is more space efficient.
>  > 
>  > So why not eliminate pv_table and chain vm_map_entries together to 
> represent
>  > the sharing information ?
> 
> because in the Mach VM system (which is what FreeBSD is derived from),
> map entries may represent several virtual (and thus physical) pages.
> 

That's right. But by chaining vm_map_entries, you don't lose any sharing
information. You can _infer_ the same information by walking the 
vm_map_entries. By keeping the pv_table, you're making it explicit, which
makes certain operations very fast - at the cost of some space.

The overhead seems to be a constant of 0.4% + sizeof(pv_entry) * degree
of sharing. Sounds like a good trade off to me.

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: problem for the VM gurus

1999-06-06 Thread Arun Sharma

Brian Feldman  writes:

>   In the long-standing tradition of deadlocks, I present to you all
>   a new one. This one locks in getblk, and causes other processes to
>   lock in inode. It's easy to induce, but I have no idea how I'd go
>   about fixing it myself (being very new to that part of the
>   kernel.)  Here's the program which induces the deadlock:

I could reproduce it with 4.0-current. The stack trace was:

tsleep
getblk
bread
ffs_read
ffs_getpages
vnode_pager_getpages
vm_fault
---
slow_copyin
ffs_write
vn_write
dofilewrite
write
syscall 

getblk finds that the buffer is marked B_BUSY and sleeps on it. But I
can't figure out who marked it busy.

-Arun

PS: Does anyone know how to get the stack trace by pid in ddb ? I can
manually type trace p_addr>. But is there an easier way ?

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: linux and freebsd kernels conceptually different?

1999-06-07 Thread Arun Sharma

Christoph Kukulies  writes:

Comments from someone who's studied Linux for a while and has started
studying FreeBSD only recently.

> Could one say that Linux vs. FreeBSD kernels are conceptually
> different what task scheduling, queueing, interrupt handling,
> driver architecture, buffer caching, vm etc. is concerned?

- task scheduling

- Linux uses a linear linked list of runnable processes and
  recalculates priorities on every reschedule.

- FreeBSD uses multilevel runqueues

- Buffer caching

- Linux hasn't achieved the perfect integration between the
  page cache and the buffer cache yet. write(2) goes through
  the buffer cache, read(2) goes through the page cache. Also,
  buffers are cached based on  basis. But
  an IOlite style buffer caching scheme is in the works.

- FreeBSD seems to avoid data replication by a better
  integration of the page and buffer caches. Also, buffers are
  cached based on  basis

- VM

- Linux uses a 3 level page table as a generic data structure
  at the low level and data structures similar to SVR4 for
  high level mapping info. Also, Linux avoids chaining of ptes
  mapping the same data completely.

- FreeBSD separates machine dependent and independent data
  using the pmap abstraction. FreeBSD also uses pv_table to
  keep track of multiple ptes mapping same data. FreeBSD VM is
  based on Mach.

But for the most part, they are based on the same principles
documented in early UNIX internals text books. So it would be unfair
to say they are conceptually very different.

I'd say most of the differences are in implementation and development
methodology. Linux camp seems to be proud of breaking traditions and
concepts invented after lengthy research. I haven't seen that many
iconoclasts in my short encounter with FreeBSD.

Hope that helps,

-Arun


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: High syscall overhead?

1999-06-11 Thread Arun Sharma

"David E. Cross"  writes:

> Looking through the exception.s it appears that on entry to the
> kernel an MP lock is obtained...  I thought we had splX(); to
> protect concurancy in the kernel.

Can someone explain to me why is SYSCALL_LOCK necessary ? It certainly
seems to hurt system call performance on a MP machine.

Also, is there any data on lock contention in FreeBSD ? Is anyone
working on decomposing some of the giant locks ?

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: High syscall overhead?

1999-06-12 Thread Arun Sharma

Aaron Smith  writes:

> I'm still trying to figure out the deal with "lockmgr".

I found the following doc useful:

http://www.freebsd.org/~fsmp/SMP/Locking.html

-Arun


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: High syscall overhead?

1999-06-12 Thread Arun Sharma

"Christopher R. Bowman"  writes:

> 
> I can't speak authoritatively since I don't know specifically what
> SYSCALL_LOCK is, but if it is what is often referred to on this list
> as the Giant Kernel Lock(tm) then the following should generally
> apply.
> 

You're right. The SYSCALL_LOCK is the same as the giant lock. The name
kinda misled me to assume that it's a different lock.

i386/i386/lock.h:

/*
 * Some handy macros to allow logical organization and
 * convenient reassignment of various locks.
 */

#define FPU_LOCKcall_get_fpu_lock
#define ALIGN_LOCK  call_get_align_lock
#define SYSCALL_LOCKcall_get_syscall_lock
#define ALTSYSCALL_LOCK call_get_altsyscall_lock

All of the above routines seem to be identical. But the code is
duplicated for some reason.

Also, it might be beneficial to define these locks in a header file
and inline them, instead of generating a call for each simple_lock.

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: High syscall overhead?

1999-06-12 Thread Arun Sharma

"John S. Dyson"  writes:

> Finegrained locking either requires developers with IQ's of 200 or higher,
> or a different kernel structure.  I suggest that finegrained locking is cool,
> and can be intelligently used to mitigate (but not solve) the effects of
> lots of problems 

Fine grained locking is hard - but it isn't exactly rocket
science. It's been tackled in a number of OSes, papers have been
written about it.

> -- however, it would be unwise to embark on an effort to make
> the FreeBSD kernel into an efficent 16way SMP kernel by using finegrained
> locking all over the place.

Sure. But 2 and 4-way boxes are becoming more and more mainstream. And
any IO bound job is not going to perform well on FreeBSD because of
giant locking.

One way of tackling the problem is - to implement per lock profiling
and detect which locks are being contested heavily and try breaking
them down. That would be a practical way of doing things. 

An alternative way, which requires a good understanding of both the
theory and implementation of the kernel is - 

(a) Implement per subsystem locking
(b) Figure out in a "typical" workload, how much time is being spent
in which subsystem and try increasing parallelism (i.e. finer
grained locking) in subsystems where more time is being spent. 

The result of this approach should be more logical, cleaner and
possibly better performing than the previous one.

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: High syscall overhead?

1999-06-12 Thread Arun Sharma

Brian Feldman  writes:

> > One way of tackling the problem is - to implement per lock profiling
> > and detect which locks are being contested heavily and try breaking
> > them down. That would be a practical way of doing things. 
> 
> But you can't generalize FreeBSD's usage, can you?
> 

While it's true that no one can see all possible uses of FreeBSD, one
has to make assumptions about the typical usage -  web server, file
server etc and use it as the design center, while making sure that it
doesn't perform too badly on other less common workloads.

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: Inactive vs. free Memory

1999-06-15 Thread Arun Sharma

"James E. Housley"  writes:

> Just for my infomation.  What is the difference between "Inactive" and
> "Free" memory.  Right now top says I have 157M Inact and 3260K Free.

Inactive means the page contains valid data belonging to some file,
but is not mapped into any address space. Free means, the page doesn't
contain valid data.

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: [Call for review] init(8): new feature

1999-06-15 Thread Arun Sharma


While we're on the init topic, is there any strong feeling here about
BSD /etc/rc* scripts Vs SysV ? The nice thing about SysV initscripts
is the ability to start and stop any service that I like.

-Arun


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: [Call for review] init(8): new feature

1999-06-15 Thread Arun Sharma

Mark Newton  writes:

> Arun Sharma wrote:
> 
>  > While we're on the init topic, is there any strong feeling here about
>  > BSD /etc/rc* scripts Vs SysV ? The nice thing about SysV initscripts
>  > is the ability to start and stop any service that I like.
> 
> That's fine -- there are lots of ways to start and stop any service you
> like without involving SysV init.

Like sending a signal to the process providing the service ? The
problem with that approach is, the signal you send and the clean up
you do is non-standard for each service and having a standard
interface:

/etc/rc.d/ stop|start|restart

makes it standard. 

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

SMP locking (Was Re: Microsoft performance (was: ...))

1999-06-24 Thread Arun Sharma

On Thu, Jun 24, 1999 at 10:56:07PM -0700, Julian Elischer wrote:
> Alan Cox has just started passing around some code that starts on the
> breakdown of the GKL
> 
> I suggest that all intersted parties go to the SMP list
> if they wish to take part in this action.

You mean freebsd-...@freebsd.org ? I've been reading the list for a while,
but haven't seen any code there. Am I missing something ?

-Arun



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: implementing poll() in a device driver (fwd)

1999-07-16 Thread Arun Sharma

Vasudha Ramnath  writes:

> 
> I'm running FreeBSD 3.1-RELEASE.
> 
> Could someone explain what the poll() function in a device driver should
> do ?
> 
> Can it return POLLERR or POLLHUP ?
> 
> I have a test driver that returns these values from the poll() function.
> However, the application
> that called the select() is not getting an error. Instead, the select
> is returning that the particular file descriptor is, in this case, 
> 'readable' !

Take a look at "selscan" algorithm in /usr/src/sys/kern/sys_generic.c
if you wish to learn more.

Basically, if your driver doesn't implement the poll() functionality,
it can always return 0. This will ensure that select never wakes up
because of a file descriptor associated with your driver.

-Arun

To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: Setting memory allocators for library functions.

2001-02-26 Thread Arun Sharma


On 26 Feb 2001 18:56:18 +0100, Matt Dillon <[EMAIL PROTECTED]> wrote:
> Ha.  Right.  Go through any piece of significant code and just see how
> much goes flying out the window because the code wants to simply assume
> things work.  Then try coding conditionals all the way through to fix
> it... and don't forget you need to propogate the error condition back
> up the procedure chain too so the original caller knows why it failed.

So, it all comes down to reimplementing the UNIX kernel in a language
that supports exceptions, just like Linus suggested :) 

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Setting memory allocators for library functions.

2001-02-28 Thread Arun Sharma

On Tue, Feb 27, 2001 at 10:39:13PM -0800, Julian Elischer wrote:
> no, something specifically designed around kernel type of actions.
> declarations of "physical pointer", "kvm pointer" "User Pointer"
> for example, and being able to declare a structure (not 'struct') 
> and say "this list is 'per process'"  and have the list head 
> automatically in the proc struct
> without haviong to add it there.. i.e backwards from today..

Rumor has it that MS has several compiler extensions, just for supporting
their kernel. Some of what you say above could be built on top of the
compiler, declaratively. Language support works well in cases where writing
the same code by hand is tedious and error prone or down right ugly - like
several hundred if (foo = null) return blah checks.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

http://www.freebsd.org/send-pr.html

2001-05-05 Thread Arun Sharma


Doesn't allow me to attach a diff. The attached diff adds that capability
to the HTML, but more changes will be needed to the CGI script that
handles the form.

If someone can point me to the CGI source, I can change that too. 

-Arun


--- send-pr.html.orig   Sat May  5 23:11:00 2001
+++ send-pr.htmlSat May  5 23:13:33 2001
@@ -93,12 +93,14 @@
   Fix to the problem if known: 
   
 
+  Attachments: 
+   
+
   
   
+
 
 
-Note: copy/paste will destroy TABs and spacing, and this web
-  form should not be used to submit code as plain text.
   
   
 [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

FreeBSD ld.so performance ?

2001-05-09 Thread Arun Sharma


http://www.suse.de/~bastian/Export/linking.txt

Has anyone done a comparative study ?

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

MxN threads on Linux

2001-05-16 Thread Arun Sharma


Ran into this on freshmeat today:

http://oss.software.ibm.com/developerworks/opensource/pthreads/

Why isn't the FreeBSD equivalent happening on a public cvs
branch ? I'm not demanding that it should happen that way,
just curious about the reasons :)

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

RE: MxN threads on Linux

2001-05-16 Thread Arun Sharma


> 
> In the last episode (May 16), Arun Sharma said:
> > Ran into this on freshmeat today:
> > 
> > http://oss.software.ibm.com/developerworks/opensource/pthreads/
> > 

For those interested, it took me about an hour to write up
pth_native_freebsd.c (http://sharmas.dhs.org/~adsharma/pth_native_freebsd.c)
to use rfork_thread().

It creates multiple threads just fine, runs for a while and then core
dumps.

#0  0x2810badd in isatty () from /usr/lib/libc.so.5
#1  0x30 in ?? ()
#2  0x1 in ?? ()
#3  0x28079f08 in __DTOR_END__ ()
   from /usr.current/home/adsharma/src/ngpt-0.9.4/.libs/libpthread.so.9
#4  0xe82075c0 in ?? ()

Any hints on what this could be about ? I think getting a working implementation
of thread aware truss/gdb is pretty critical.

-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

libc threadsafe ?

2001-05-20 Thread Arun Sharma



I see some changes to -current as of Jan 2001, that attempt to make libc
threadsafe without -pthread and _THREAD_SAFE.

http://groups.google.com/groups?q=Daniel+Eischen&hl=en&lr=&safe=off&scoring=d&as_drrb=b&as_mind=1&as_minm=1&as_miny=2001&as_maxd=20&;
as_maxm=1&as_maxy=2001&rnum=4&ic=1&selm=94amg1%242fnu%241%40FreeBSD.csie.NCTU.edu.tw

I'm attempting to port IBM's NGPT to freebsd and most of the things
seem to be working fine. The following is the C file needed to make
it work, apart from some minor work arounds for Makefiles.

http://sharmas.dhs.org/~adsharma/pth_native_freebsd.c

I'm trying to hunt down a stack corruption that I'm seeing after
a sigsetjmp and siglongjmp. It could be due to a bug in NGPT or
it could be due to the fact that I'm linking -lc and not -lc_r
and -lc is not completely thread safe. The stack in question was
malloc'ed and passed as an argument to rfork_thread.

My question is, do I need to do anything else (apart from incrementing
__isthreaded and providing strong references to locking routines)
to get -lc to work in a MT environment ?

-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

_SC_NPROCESSORS_CONF

2001-05-20 Thread Arun Sharma


Single UNIX spec doesn't include the above sysconf(3) argument, but 
many UNIX variants do. What's the BSD way of doing this ? 

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: [pthreads-devel] Bug in pth_native.c ? + FreeBSD port

2001-05-20 Thread Arun Sharma

On Sun, May 20, 2001 at 08:05:19AM -0400, Bill Abt wrote:
> Yeah, your right about slot.  It should be allocated off the heap...  Hmm,
> that would probably explain a few inconsistencies we've seen as well.
> Thanks
> 
> As far as incorporating your changes into the release, sure!!!  Another
> platform/os would be great.
> 

Ok, the patch is here:

http://sharmas.dhs.org/~adsharma/ngpt-freebsd.patch.txt

Rough edges:

(a) @NATIVE@ needs to be substituted with pth_native.c or 
pth_native_freebsd.c depending on the platform. I'm not
too good at autoconf.

(b) The changes to pth_lib.c can probably be ignored. They're there to
fix compilation errors on FreeBSD and it's not clear to me what the
correct solution is.

(c) This is a mysterious bug that I'm not able to solve even after
fighting with it for a couple of days:

-void (* volatile mctx_starting_func)(void);
+static void (* volatile mctx_starting_func)(void);

   This variable gets corrupted on FreeBSD after a context switch.
   I suspect that this could be a compiler issue, but haven't been
   able to pin point the problem. I'm using:

$ gcc -v
Using builtin specs.
gcc version 2.95.3 20010315 (release)

   Datapoints:

   1. Increasing the stack size, didn't help. It also makes it unlikely
  that someone is accidentally stepping on the malloc'ed stack.

   2. The problem disappeared after I put some debug statements in the
  surrounding code. This might have tickled the compiler in such a
  way that the problem got masked.

   Making the variable static makes the problem go away. This shouldn't
   be a problem, since all threads get bootstrapped the same way ?

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: _SC_NPROCESSORS_CONF

2001-05-20 Thread Arun Sharma


On Sun, May 20, 2001 at 04:57:17PM -0400, Andrew Gallatin wrote:
> 
> Arun Sharma writes:
>  > Single UNIX spec doesn't include the above sysconf(3) argument, but 
>  > many UNIX variants do. What's the BSD way of doing this ? 
> 
> How about the hw.ncpu sysctl?

Any objections to a patch implementing 
sysconf(_SC_NPROCESSORS_CONF) in terms of hw.ncpu ?

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: _SC_NPROCESSORS_CONF

2001-05-20 Thread Arun Sharma


On Sun, May 20, 2001 at 01:56:55PM -0700, Arun Sharma wrote:
> On Sun, May 20, 2001 at 04:57:17PM -0400, Andrew Gallatin wrote:
> > 
> > Arun Sharma writes:
> >  > Single UNIX spec doesn't include the above sysconf(3) argument, but 
> >  > many UNIX variants do. What's the BSD way of doing this ? 
> > 
> > How about the hw.ncpu sysctl?
> 
> Any objections to a patch implementing 
> sysconf(_SC_NPROCESSORS_CONF) in terms of hw.ncpu ?

http://www.freebsd.org/cgi/query-pr.cgi?pr=27489

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: http://phk.freebsd.dk/Gnats/

2001-05-28 Thread Arun Sharma


On 29 May 2001 00:46:42 +0200, Poul-Henning Kamp <[EMAIL PROTECTED]> wrote:
> 
> It seems that my little plot of our abysmal performance when it comes
> to our PR database actually helped spur some activity, at least the
> end of the graph points in the right direction now.
> 
> But we are far from done yet, so find a couple of PR's and close them,
> there are 3000 to choose from...

It might get worse, if this goes in :)

http://www.freebsd.org/cgi/query-pr.cgi?pr=27653
http://www.freebsd.org/cgi/query-pr.cgi?pr=27654

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

libwi and KWireless

2001-06-25 Thread Arun Sharma


KWireless is a KDE kicker applet to display the signal qualtiy of a IEEE
802.11b wireless network.

http://www.sharma-home.net/~adsharma/projects/KWireless/

It depends on libwi, a library version of wicontrol(8).

http://www.sharma-home.net/~adsharma/projects/libwi/

I know this is not in a commitable state and would appreciate some
feedback on what I need to do, before it can be commited.

Enjoy!

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: libwi and KWireless

2001-06-25 Thread Arun Sharma

On Mon, Jun 25, 2001 at 03:37:00PM +0100, Doug Rabson wrote:
> I can't configure it. It doesn't contain a configure script and autoconf
> doesn't seem to like the (possible misnamed?) configure.in.in file. This
> is from 4.3-stable with autoconf-2.13_1.

Try 

$ gmake -f Makefile.dist
$ cat ~/bin/kdeconfig
MOC=moc2 LIBQT=-lqt2 ./configure --with-extra-libs=/usr/local/lib
--with-qt-includes=/usr/X11R6/include/qt2
--with-extra-includes=/usr/local/include --prefix=/usr/local
--with-qt-libraries=/usr/X11R6/lib
$ kdeconfig
$ make
# make install

The above configure line corresponds to a normal 4.3-stable box with
the basic KDE ports installed.

Once installed, K -> configure panel -> Add -> Applet -> Kwireless
should display an icon in your panel.

Let me know if you're having trouble with Makefile.dist. I'll put up
a tarball with a configure script.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

NGPT 1.0.0 port to freebsd

2001-06-29 Thread Arun Sharma


http://freshmeat.net/projects/ngpt
http://www.sharma-home.net/~adsharma/projects/freebsd/ngpt-1.0.0-freebsd.tar.gz

Notes:

- The project has gotten more Linux specific since the last port (0.9.4)
  There are a lot of ugly hacks that need cleanup.
- Please commit 27489 to help this port
- There were many deviations from the freebsd pthread.h (specifically
  the omission of "const" int vs size_t etc)
- The main point of this port is to have a reasonable native freebsd
  pthread implementation till the scheduler activations stuff is ready.

- Java heads: does this help to pass the JCK ? Is that the main reason
  we can't have a binary FreeBSD JDK distribution ? I've read -java for
  several months now and I still can't find the answer.

To test the above port:

- make test_pthread; ./test_pthread
- You may want to turn off debugging in pth_p.h
- Tested only on a UP machine (my laptop) so far. Needs SMP testing.
  The earliest I can do it is this weekend. 

Disclaimer:

- I've mainly done the "monkey work" of fixing compile errors and making
  sure that the test program works. Haven't had a chance to look at the
  implementation specifics yet. I didn't like some design decisions in
  0.9.4.

- Someone here had a makecontext() patch. I think commiting it would
  surely help. The way GNU pth does context creation is really
  inefficient, in order to be portable (read the pth paper).

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Java (Was Re: NGPT 1.0.0 port to freebsd)

2001-06-29 Thread Arun Sharma


On Fri, Jun 29, 2001 at 09:05:25AM -0600, Nate Williams wrote:
> With the current license, this won't be installed as part of the base
> kernel.  (GPL and/or LGPL)

I understand it'll continue to be a port. Am I hearing that it is
unacceptable even as a temporary solution because of the license ?

> It's been answered time and time again over the past months, so you must
> not be paying attention.  The binary distribution hasn't been created
> because we don't have a legal license to do so (yet).

Yes, I've been reading that for a long time now, but it (what Sun is
doing) doesn't make any sense to me. Are Sun's reasons

(a) Technical ? Passing of JCK etc ? 
(b) Political ? Yet another competitor to Solaris ?

>From your posting it appears that it's technical (not passing JCK), as
well as political (not getting the license to run JCK). What is their
answer reg: blackdown.org doing the same ?

May be getting Zdnet to publish an article on this is the right way to
go ? The bug parades and votes didn't seem to help much.

> In summary, a Java binary distribution of JDK1.2.2 will come out *very
> soon* after a usable license with Sun has been signed.  Hopefully, we'll
> have a JDK1.3 binary distribution soon after, as Greg Lewis has made
> alot of progress on it and has it limping along right now.

That's good to hear. Eagerly awaiting the news.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: How many files can I put in one diretory?

2000-06-22 Thread Arun Sharma

On Wed, 21 Jun 2000 23:42:37 -0700 (PDT), Nicole Harrington. <[EMAIL PROTECTED]> 
wrote:
> 
>  Hello
>  I have a user who needs to store a large amount of small html files. Like
> around 2 million...
> 
>  Assuming FreeBSD 4.0-Stable with Soft Updates, what is a sane number that can
> be handled per directory?

I investigated this for about 25k files and it seemed to be fine. Note that 
if you keep the in memory directory cache (which is hashed) large enough,
you might be able to get away with a one time linear search cost in the
directory. So your worst case is scanning two million filenames in a directory.
The average case can be made O(1)

Also, picking names intelligently is also a good idea -

fbar123456789

is a bad idea, because the string comparision routine has to skip over
the first 50 character, before it finds a mismatch. I think netscape 
commits this sin.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: VM coloring description in NOTES

2000-06-26 Thread Arun Sharma


[This message has also been posted.]
On Mon, 26 Jun 2000 10:42:35 +0100, Koster, K.J. <[EMAIL PROTECTED]> wrote:
> > 
> > > > currently ->  candidate
> > > > PQ_HUGECACHE  PQ_CACHE1024
> > > > PQ_LARGECACHE PQ_CACHE512
> > > > PQ_MEDIUMCACHEPQ_CACHE256
> > > > PQ_NORMALCACHEPQ_CACHE64
> > 
> Hmm. At boot time, the BIOS displayes this square box with a lot of grub in
> it that FreeBSD then proceeds to rediscover. Is there no way to whack the
> BIOS into submission and have it cough up the cache size?
> 
> It's probably going to be BIOS-vendor specific *sigh*. Then again, perhaps
> it would be nice to have an interface to some of the more widely used
> bioses. I image you could pry all sorts of tuning information about the
> machine from its clammy little hands. Cache size, cache scheme, memory type.

For Intel processors, CPUID instruction spits out both L1 and L2 cache
sizes. Perhaps, these things should be made a runtime option than a
compile time option ?

-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: VM coloring description in NOTES

2000-06-26 Thread Arun Sharma

On Mon, Jun 26, 2000 at 12:50:41PM -0400, Kenneth Wayne Culver wrote:
> Just curious because I have no experience in this area... but what exactly
> does cache coloring get us... I've never actually gotten a really straight
> answer on this... Thanks

Read Curt Schimmel's book UNIX systems for modern architectures for an
answer.

Basically, it ensures that if P1 and P2 are two pages that are allocated
successively (temporal locality), then the first cache line in P1 and
the first cache line in P2 do not compete with each other for the L2 cache.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

libc_r, signals and modifying sigcontext

2001-07-21 Thread Arun Sharma


Greetings. I'm trying to port an application to FreeBSD. I have
a signal handler registered using signal(2). It modifies the
data pointed to by the third argument - of type sigcontext (specifically
sc_eip) - so that the execution would resume at a different point).

However, when execution resumes, it resumes at the same point where
it was interrupted. A quick search of the archives brought up this
thread:

http://groups.google.com/groups?hl=en&safe=off&th=6d5b8c3ead4a79ab,5&seekm=9fo8vq%241ma8%241%40FreeBSD.csie.NCTU.edu.tw#p

I tried:

_thread_sys_sigreturn(sc);

as suggested, but truss shows that sigreturn is failing. So my question
is: what is the correct way to modify the sigcontext in FreeBSD ? Are
there other multi threaded apps (using pthreads, linked to libc_r),
which do this ?

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: libc_r, signals and modifying sigcontext

2001-07-21 Thread Arun Sharma


On Sat, Jul 21, 2001 at 07:17:47PM -0700, Arun Sharma wrote:
> Greetings. I'm trying to port an application to FreeBSD. I have
> a signal handler registered using signal(2). It modifies the
> data pointed to by the third argument - of type sigcontext (specifically
> sc_eip) - so that the execution would resume at a different point).
> 
> However, when execution resumes, it resumes at the same point where
> it was interrupted. A quick search of the archives brought up this
> thread:

Another data point: the problem doesn't happen with IBM's MxN
pthread library ported to FreeBSD (mainly because it uses libc
and not libc_r).

http://oss.software.ibm.com/developerworks/opensource/pthreads/

I wonder how the scheduler activations based stuff is going to
handle this case.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Need a clean room implementation of this function

2001-07-26 Thread Arun Sharma


I'm porting a BSD licensed Java VM from Linux to FreeBSD and ran into
the following Linux function which is not implemented in BSDs.

To avoid GPL contamination issues, can someone complete[1] the following
method in inlined IA-32 assembly ? Intel instruction reference documents
an instruction called BTS, which does just this.

Thanks!

-Arun

[1] I've already looked at the Linux implementation - does that
disqualify me ? Has anyone dealt with such issues in the past ?

/**
 * test_and_set_bit - Set a bit and return its old value
 * @nr: Bit to set
 * @addr: Address to count from
 *
 * This operation is atomic and cannot be reordered.  
 * It also implies a memory barrier.
 */
static __inline__ int test_and_set_bit(int nr, volatile void * addr);


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Need a clean room implementation of this function

2001-07-26 Thread Arun Sharma


On Thu, Jul 26, 2001 at 11:15:40PM +0200, Bernd Walter wrote:
> > static __inline__ int test_and_set_bit(int nr, volatile void * addr);
> 
> -current has a lot of atomic functions in src/sys/i386/include/atomic.h.

It has byte, word, int, long level operations - what I want is bit
level.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Need a clean room implementation of this function

2001-07-26 Thread Arun Sharma

On Thu, Jul 26, 2001 at 02:43:24PM -0700, John Baldwin wrote:
> {
> int val;
> 
> do {
> val = *(int *)addr;
> } while (atomic_cmpset_int(addr, val, val | (1 << nr) == 0);
> return (val & (1 << nr));
> }

Thanks! I think that'd work. But code using BTS would be
more efficient (fewer cycles).

Many people asked me this question: the code I'm porting is:

http://www.intel.com/research/mrl/orp/

Please see my messages to [EMAIL PROTECTED] about the port.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Need a clean room implementation of this function

2001-07-26 Thread Arun Sharma

On Thu, Jul 26, 2001 at 11:59:27PM +0200, Bernd Walter wrote:
> [...]
> ATOMIC_ASM(set,  char,  "orb %b2,%0",   v)
> ATOMIC_ASM(clear,char,  "andb %b2,%0", ~v)
> [...]

That does set, not test-and-set. What I want is exactly what the Intel
BTS instruction does: atomically test and set a bit.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Need a clean room implementation of this function

2001-07-26 Thread Arun Sharma

On Thu, Jul 26, 2001 at 03:49:58PM -0700, John Baldwin wrote:
> > That does set, not test-and-set. What I want is exactly what the Intel
> > BTS instruction does: atomically test and set a bit.
> 
> Unfortunately that is very ia32 specific.  The code would be more
> friendly on alpha and ia64 at least if the algo was changed to use
> cmpset on a word instead of test-and-set of a bit.

Another way to look at it is as an IA-32 specific optimization. For 
other architectures, we could just use the code you posted earlier.

> If you want, I can look at the code to see where it uses test_and_set()
> to determine how hard that would be.  (It might be very easy to do.)

The piece of code which uses it is attached.

-Arun

inline 
void acquire_header_lock (volatile POINTER_SIZE_INT *p_header)
{
while (true) {
// Try to grab the lock.
volatile PVOID free_header =(PVOID)(*p_header & BUSY_FORWARDING_BIT_MASK);
volatile PVOID locked_header =(PVOID)((POINTER_SIZE_INT)free_header | 
BUSY_FORWARDING_BIT);
assert (locked_header != free_header);

// IA64 - What are the semantics of test_and_set_bit with regards to acq and 
rel?
// Hopefully, test_and_set_bit will have acquire semantics and
// test_and_clear_bit will have release semantics.
if ( test_and_set_bit (BUSY_FORWARDING_BIT_OFFSET, (PVOID *)p_header) == 0) 
{
assert ((*p_header & BUSY_FORWARDING_BIT) == BUSY_FORWARDING_BIT);

return; // got it this is the only way out.
}
// Try until you get the lock.

while ((*p_header & BUSY_FORWARDING_BIT) == BUSY_FORWARDING_BIT) {
  Sleep (0); // Sleep until it might be free.
}
}
}

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: libc_r, signals and modifying sigcontext

2001-07-29 Thread Arun Sharma


On Sun, Jul 22, 2001 at 10:50:01AM -0400, Daniel Eischen wrote:

Dan,

I tried this patch against 4.3-STABLE (had to substitute
_get_curthread() with _thread_run), without success. After
the sigreturn, EIP remains the same.

Should I be testing against -current ?

-Arun

> Try this patch:
> 
> -- 
> Dan Eischen
> 
> Index: uthread/pthread_private.h
> ===
> RCS file: /opt/b/CVS/src/lib/libc_r/uthread/pthread_private.h,v
> retrieving revision 1.59
> diff -u -r1.59 pthread_private.h 
> --- uthread/pthread_private.h 2001/07/20 04:23:10 1.59
> +++ uthread/pthread_private.h 2001/07/22 04:29:10
> @@ -654,6 +654,7 @@
>   int sig_has_args;   /* use signal args if true */
>   ucontext_t  uc;
>   siginfo_t   siginfo;
> + int restore_context;
>  };
>  
>  /*
> Index: uthread/uthread_sig.c
> ===
> RCS file: /opt/b/CVS/src/lib/libc_r/uthread/uthread_sig.c,v
> retrieving revision 1.38
> diff -u -r1.38 uthread_sig.c
> --- uthread/uthread_sig.c 2001/06/29 17:09:07 1.38
> +++ uthread/uthread_sig.c 2001/07/22 04:28:02
> @@ -1004,6 +1004,10 @@
>   else
>   (*(sigfunc))(psf->signo,
>   (siginfo_t *)psf->siginfo.si_code, &psf->uc);
> + if (psf->restore_context != 0) {
> + memcpy(&thread->ctx.uc, &psf->uc, sizeof(psf->uc));
> + thread->ctxtype = CTX_UC;
> + }
>   }
>   /*
>* Call the kernel scheduler to safely restore the frame and
> @@ -1046,6 +1050,7 @@
>   stackp -= sizeof(struct pthread_signal_frame);
>  
>   psf = (struct pthread_signal_frame *) stackp;
> + psf->restore_context = 0;
>  
>   /* Save the current context in the signal frame: */
>   thread_sigframe_save(thread, psf);
> @@ -1059,6 +1064,8 @@
>   sizeof(psf->uc));
>   memcpy(&psf->siginfo, &_thread_sigq[psf->signo - 1].siginfo,
>   sizeof(psf->siginfo));
> + psf->restore_context = ((thread == _get_curthread()) &&
> + (thread->ctxtype == CTX_UC));
>   }
>  
>   /* Setup the signal mask: */

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: libc_r, signals and modifying sigcontext

2001-07-29 Thread Arun Sharma


On Sun, Jul 29, 2001 at 09:48:30AM -0400, Daniel Eischen wrote:
> Can you breakpoint or add a print statement to see if the thread
> chosen to handle the signal is the current thread (_thread_run
> == thread) in the patched section below?

Yes, the following condition was true according to my printfs:

_thread_run == thread

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

truss that supports fork and rfork

2001-08-26 Thread Arun Sharma


I just ported over my old patches to truss to -current that 
I first posted here in May 2000:

http://groups.google.com/groups?hl=en&safe=off&threadm=fa.g3c7itv.5imipd%40ifi.uio.no&rnum=1&prev=/groups%3Fas_q%3Dtruss%26as_uauthors%3DArun%2520Sharma

The new patch is here:

http://www.sharma-home.net/~adsharma/projects/freebsd/truss.diff.gz

For people using rfork based POSIX threads implementations like
NGPT (see the ports collection), this may be a useful tool.

The new flag to use is -f (for follow children). If the feedback
is positive, I'll clean up the patch, update the man page etc.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

POSIX compatibility issue

2001-09-05 Thread Arun Sharma


Can someone take a look at this PR ?

http://www.freebsd.org/cgi/query-pr.cgi?pr=30317

It's necessary to fix compilation issues for a POSIX compliant Java VM,
that uses sockets.

There are similar open bug reports against NetBSD too, without any
comments on why this change can not be made.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

NGPT port upgraded to 1.0.1

2001-09-15 Thread Arun Sharma


Added spinlock support, so that libc functions are reentrant.
This is based on the Aug 3 release from the NGPT project.

http://www.freebsd.org/cgi/query-pr.cgi?pr=30599


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: truss vs ktrace

2001-10-20 Thread Arun Sharma


On Wed, 17 Oct 2001 02:02:07 + (UTC), Dag-Erling Smorgrav <[EMAIL PROTECTED]> wrote:
> Jim Pirzyk <[EMAIL PROTECTED]> writes:
> > So which should I use? Why is there two around?  I see that truss has
> > less command line switches than ktrace, but it is a little bit more
> > standard.
> 
>  - truss slows down the slave process a *lot* as the slave process
>stops at every syscall and waits for truss to notice, obtain and
>process information (a task which in itself requires a bunch of
>additional context switches and syscalls) and print its output.
>This also means that if you pipe truss through less and don't page
>down fast enough, the slave process will hang waiting for truss
>which is waitig for less to absorb its output.

True, the choice depends on which one meets your debugging needs better.
I find the performance of truss usually adequate for my purposes.

> 
>  - truss currently can't follow forks and trace children of the
>original process.  This should be fixable, though.
> 

This has been taken care of:

http://www.sharma-home.net/~adsharma/projects/freebsd/truss.diff.gz
http://www.sharma-home.net/~adsharma/projects/freebsd/truss.tar.gz

The above patch supports fork as well as rfork, so can be used with
libraries using rfork for pthread implementations.

Another advantage of truss is that the output is "online" and interactive. 
ktrace requires you to use kdump to view the trace.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

listing sysinit order ?

2003-01-18 Thread Arun Sharma

Hello,

I'm trying to figure out why recent -current snapshots hang at boot/install
time on my Thinkpad. The problem is, at the point where it hangs, I
don't know exactly which driver it's in (yes, I have boot_verbose turned
on).

So my question is, is there a simple tool to list the order in which
various initialization/probe routines get called in mi_startup ? If not,
what would it take to write one ?

-Arun

PS: more info in kern/46619

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: listing sysinit order ?

2003-01-18 Thread Arun Sharma

Terry Lambert wrote:

Arun Sharma wrote:


So my question is, is there a simple tool to list the order in which
various initialization/probe routines get called in mi_startup ? If not,
what would it take to write one ?


more /sys/sys/kernel.h



Yes, I'm aware of this one, but it doesn't tell me very pricisely which 
drivers get initialized in what order.

You can not cause messages to be printed until after SI_SUB_CONSOLE;
if you want to put a printf in the init_main.c, verify that the
sysinit_sub_id is > SI_SUB_CONSOLE before attempting to call the
printf.


At that point only a function pointer is available. Is there a good way 
of converting it into a printable string ?

	-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: listing sysinit order ?

2003-01-19 Thread Arun Sharma

On Sun, Jan 19, 2003 at 04:57:13AM -0800, Terry Lambert wrote:
> You will get the information you seem to be asking for (unless I'm
> misunderstanding you, and you are trying to lead upo asking for a
> string identifier, and for some reason you don't want to come out
> and ask for a modification of the SYSINIT macro, for some reason...). 

That may be the right thing to do. I was worried about the unnecessary
bloat it would add to a non-debug kernel.

However, I figured that I was barking up the wrong tree. To debug driver
initialization hangs, I need to put printfs in kern/subr_bus.c, not the
sysinit code.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

verbose device probing ?

2003-01-19 Thread Arun Sharma

Having just spent 5 hours debugging a silent hang in EISA bus probe
(even with boot -v) I'm tempted to ask, why doesn't
device_probe_and_attach explicitly announce the device it's going to
probe if bootverbose is set ?

Thought I'd ask here before I submit a PR.

-Arun

BTW: There seem to be 30+ critical + 130+ serious bugs against 5.0 at
the time of its release. Are developers looking at the gnats db at all ?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: listing sysinit order ?

2003-01-19 Thread Arun Sharma

On Sun, Jan 19, 2003 at 10:45:02PM -0800, Terry Lambert wrote:
> 
> SYSINIT would at least get you to where it's hanging, and you
> may not need information over and above that, FWIW.

Well, knowing that the kernel hangs in a function called "configure"
(SI_SUB_CONFIGURE, SI_ORDER_THIRD) isn't terribly useful. However,
knowing that it specifically hangs in eisa_probe() is useful.

Also, see the mail I just sent to -hackers about making
device_probe_and_attach verbose if bootverbose is set.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: verbose device probing ?

2003-01-20 Thread Arun Sharma

On Mon, Jan 20, 2003 at 08:33:09AM -0800, Bruce A. Mah wrote:
> 
> PS.  I personally ignore the severity and priority fields of PRs.  The
> importance of many PRs I've dealt with is very much inflated.
> 

Perhaps you should change the severity field to a lower level then ? Or
is there a different problem (such as lack of good tools) that prevent you
from doing that ?

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

device probing not verbose when using boot -v

2003-01-20 Thread Arun Sharma

>Submitter-Id:  current-users
>Originator:    Arun Sharma
>Organization:  
>Confidential:  no 
>Synopsis:  device probing not verbose when using boot -v
>Severity:  
>Priority:  
>Category:  kern
>Class: sw-bug 
>Release:   FreeBSD 5.0 i386
>Environment:

When FreeBSD has trouble booting on some hardware, the lockup is silent
and leaves no clues about where the hang is happening, even if the user
is using boot -v. This makes it very hard for a tester to report
meaningful bugs.

Specific case in question: kern/44619. The hang was in eisa_probe, but
the kernel messages don't provide a clue.

So the proposal is to print the name of the device being probed and
attached before attempting to do it in

kern/subr_bus.c:device_probe_and_attach()

if (bootverbose) is set.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: verbose device probing ?

2003-01-21 Thread Arun Sharma

On Tue, Jan 21, 2003 at 08:26:08AM -0800, Bruce A. Mah wrote:
> 
> The severity and priority fields can be changed manually but that
> doesn't solve the problem that relying on the user-specified severity
> and priority fields for anything meaningful just doesn't work.
> 

If you override the user-specified severity manually, you're no longer
relying on the user-specified field. But yes, that means more work for
the developer responsible for looking at incoming bugs and assigning
them.

All I'm trying to do here is to find a good channel to raise my issues.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

0xdeadxxxx ?

2002-06-09 Thread Arun Sharma


I just got a kernel mode page fault. I'd like to find out more
about 

> fault virtual address   = 0xdeadc162

It looks like the address is meant to signal a particular class of
error. Which one ?

-Arun

Background fsck:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic.id = 
fault virtual address   = 0xdeadc162
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc0277ebe
stack pointer   = 0x10:0xcaee688c
frame pointer   = 0x10:0xcaee68b0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 343 (cron)
kernel: type 12 trap, code=0
Stopped at  ufs_strategy+0xbc:  calll   *0(%edx,%eax,4)
db> trace
ufs_strategy(caee68e4,caee6900,c01e7a1a,caee68e4,0) at ufs_strategy+0xbc
ufs_vnoperate(caee68e4) at ufs_vnoperate+0x13
breadn(cacdbb00,0,0,400,0) at breadn+0xc4
bread(cacdbb00,0,0,400,0) at bread+0x20
ffs_blkatoff(cacdbb00,0,0,0,caee69c8) at ffs_blkatoff+0x88
ufs_lookup(caee6af0,caee6b2c,c01ebe61,caee6af0,caaca274) at
ufs_lookup+0x31f
ufs_vnoperate(caee6af0) at ufs_vnoperate+0x13
vfs_cache_lookup(caee6b94,caee6bc0,c01efc94,caee6b94,caeb841c) at
vfs_cache_loo9
ufs_vnoperate(caee6b94) at ufs_vnoperate+0x13
lookup(caee6c30,caeb841c,caee6bec,c01ce890,c0364178) at lookup+0x2b2
namei(caee6c30) at namei+0x1df
lstat(caeb841c,caee6d14,2,2,292) at lstat+0x4a
syscall(2f,2f,2f,0,0) at syscall+0x1db
syscall_with_err_pushed() at syscall_with_err_pushed+0x1b
--- syscall (190, FreeBSD ELF, lstat), eip = 0x280b2f33, esp = 0xbfbff1ec, ebp -

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Kernel hacking questions

2002-06-09 Thread Arun Sharma


1. Can I use a SMP kernel and bring it up with just one CPU on a two CPU
   machine ?

2. How do I trace back funcname+offset to a particular line of C code ?
   I tried objdump -d and gcc -S, but it's not easy to read. I thought
   there was a way to get gcc to interleave the C code and the generated
   assembly. 

I have a suspicion that in kern_mutex.c:510, 

if (td1->td_priority < td->td_priority)

there may be circumstances in which td1 could be pointing to memory that
has been freed. I've got a bunch of panics which result in kernel mode
page faults at 0xdead.

Thanks!

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 0xdeadxxxx ?

2002-06-10 Thread Arun Sharma


On Sun, Jun 09, 2002 at 11:40:09PM -0700, Terry Lambert wrote:
> 0xdeadc162 - 0xdeadc0de = 0x0084 = 132 decimal
> 
> Look for a short value that's getting set to 132.

As I said in another email, I think this is td1->td_priority in 
kern_mutex.c:510.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Kernel hacking questions

2002-06-12 Thread Arun Sharma


On Tue, Jun 11, 2002 at 04:36:47AM -0400, John Baldwin wrote:
> > 2. How do I trace back funcname+offset to a particular line of C code ?
> >I tried objdump -d and gcc -S, but it's not easy to read. I thought
> >there was a way to get gcc to interleave the C code and the generated
> >assembly.
> 
> gdb's 'l *foo+0x34' works wonders. :)  If you are stuck with a kernel.debug
> on current that gdb doesn't grok, you can use nm to extract the address of
> the function, add the offset, and use 'addr2line -e kernel.debug 0xc0yy'.

I was looking for

objdump -S foo.o

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

syscall overhead in -current

2002-12-14 Thread Arun Sharma

It seems to me that userret() in 5.0-current is adding quite a bit
of overhead to the syscall latency in FreeBSD. Has anyone done any
measurements of syscall latency for 4.x vs 5.x on identical hardware ?

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: implementing poll() in a device driver (fwd)

1999-07-15 Thread Arun Sharma

Vasudha Ramnath <[EMAIL PROTECTED]> writes:

> 
> I'm running FreeBSD 3.1-RELEASE.
> 
> Could someone explain what the poll() function in a device driver should
> do ?
> 
> Can it return POLLERR or POLLHUP ?
> 
> I have a test driver that returns these values from the poll() function.
> However, the application
> that called the select() is not getting an error. Instead, the select
> is returning that the particular file descriptor is, in this case, 
> 'readable' !

Take a look at "selscan" algorithm in /usr/src/sys/kern/sys_generic.c
if you wish to learn more.

Basically, if your driver doesn't implement the poll() functionality,
it can always return 0. This will ensure that select never wakes up
because of a file descriptor associated with your driver.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Results of investigating optimizing calloc()...

1999-08-04 Thread Arun Sharma


On Wed, Aug 04, 1999 at 01:20:59PM +0200, Dag-Erling Smorgrav wrote:
> "Kelly Yancey" <[EMAIL PROTECTED]> writes:
> > [...]
> 
> Which reminds me - has anyone thought of using DMA for zeroing pages,
> to avoid cache invalidation? The idea is to keep a chunk of zeroes on
> disk and DMA it into memory instead of clearing pages "manually". This
> assumes your disk supports DMA, of course.

On a Pentium III, you can use the new instructions to do page zero'ing
without allocating cache lines.

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Excessive assembly code ?

1999-08-05 Thread Arun Sharma


Taking a quick look at /usr/src/sys/i386:

find . -name *.s | xargs wc -l
  44 ./svr4/svr4_locore.s
 216 ./apm/apm_setup.s
  24 ./linux/linux_locore.s
 461 ./isa/apic_ipl.s
1057 ./isa/apic_vector.s
 168 ./isa/icu_ipl.s
 224 ./isa/icu_vector.s
 387 ./isa/ipl.s
 113 ./isa/vector.s
  59 ./i386/bioscall.s
 340 ./i386/exception.s
 192 ./i386/globals.s
1000 ./i386/locore.s
 319 ./i386/mpboot.s
 555 ./i386/mplock.s
 310 ./i386/simplelock.s
1636 ./i386/support.s
 833 ./i386/swtch.s
 190 ./i386/vm86bios.s
8128 total  

I wonder if so much assembly code is really necessary for FreeBSD. One
argument for minimal usage of assembly code is that it is easier to code
non trivial algorithms in C.

One such example is the scheduler. Since the decision about which process
is going to run next is decided in assembly code, it is restricted to a
relatively dumb algorithm of scanning the runqs and picking one. If the
mechanism (i.e nuts and bolts of the context switch) is coded in assembly
and the policy (which process to pick next) is done in C, the code would
be much more maintainable, IMO.

How do people feel about it here ?

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Usenix 93 paper on hardware profiling of 386BSD

1999-08-06 Thread Arun Sharma


Does anyone have a copy of Andrew McRae's Usenix 93 paper ?

The URL: ftp://ftp.cisco.com/amcrae/hardprof.PS doesn't
seem to be valid any more.

Thanks!

-Arun




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: mmap bug

1999-08-12 Thread Arun Sharma


On Thu, Aug 12, 1999 at 12:02:19PM +0100, Tony Finch wrote:
> Matthew Dillon <[EMAIL PROTECTED]> wrote:
> >
> >One solution would be to map clean R+W pages RO and force a write fault
> >to occur, allowing the system to recognize that there are too many dirty
> >pages in vm_fault before it is too late and flush some of them.  The
> >downside of this is that, of course, we take unnecessary faults.
> 
> Surely they aren't unnecessary faults if they are required for correctness?

They _are_ unnecessary faults, if other correct solutions exist. 
The second alternative - to mark system daemons as special
sounds much more attractive.

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: mmap bug

1999-08-12 Thread Arun Sharma


On Fri, Aug 13, 1999 at 03:04:43PM +0930, Mark Newton wrote:
> Arun Sharma wrote:
> 
>  > The second alternative - to mark system daemons as special
>  > sounds much more attractive.
> 
> Ok, now define the difference between "system daemons" and any other
> daemon (or, for that matter, any other process).

That's easy. 

$ ps aux | head
USER   PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED  TIME COMMAND
root 23924  5.0 30.2 41312 38716  ??  SSat05PM 191:41.92 /usr/X11R6/bin/
root 0  0.0  0.0 00  ??  DLs  31Jul99   0:02.30  (swapper)
root 1  0.0  0.2   504  200  ??  ILs  31Jul99   0:00.05 /sbin/init --
root 2  0.0  0.0 00  ??  DL   31Jul99   0:03.18  (pagedaemon)
root 3  0.0  0.0 00  ??  DL   31Jul99   0:00.00  (vmdaemon)
root 4  0.0  0.0 00  ??  DL   31Jul99   0:03.55  (bufdaemon)
root 5  0.0  0.0 00  ??  DL   31Jul99  12:06.17  (syncer) 

The daemons which are involved in freeing up pages during low memory
conditions qualify as system daemons. Making sure that these daemons
don't block avoids the deadlock.

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: cache-friendly scheduling for SMP

1999-09-16 Thread Arun Sharma

On Thu, Sep 16, 1999 at 12:25:52PM +, greg wrote:

> Can anybody point me to a paper, mailing list discussion, etc. that discusses 
> scheduling processes to not thrash the cpu caches?  Or if there's anything in 
> place, how I can take advantage of it, etc.  I got stumped on the idea
> a while ago, so I'm really curious...

In -current, there is already code to do trivial CPU affinity. Basically,
given multiple processes in the same priority queue to choose from, the
scheduler will pick the one that last ran on the same CPU.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Huge Binaries..

1999-10-01 Thread Arun Sharma


On Thu, Sep 30, 1999 at 10:57:42PM -0700, Julian Elischer wrote:

> I just installed it.
> the binary is 13234176 bytes long!!
> yes folks, that's 13 MB!

That's an improvement from 4.61!

$ ls -l /usr/local/lib/netscape/communicator-4.61.bin
-r-xr-xr-x  1 root  wheel  13271040 Jun 10 10:00 
/usr/local/lib/netscape/communicator-4.61.bin

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

vm_map.h and C++ warnings

1999-10-12 Thread Arun Sharma


The following patch fixes it.

-Arun

# diff -u vm_map.h- vm_map.h
--- vm_map.h-   Tue Oct 12 22:52:10 1999
+++ vm_map.hTue Oct 12 22:54:58 1999
@@ -229,7 +229,7 @@
 #if defined(MAP_LOCK_DIAGNOSTIC)
printf("locking map LK_EXCLUPGRADE: 0x%x\n", map);
 #endif
-   error = lockmgr(&map->lock, LK_EXCLUPGRADE, (void *)0, p);
+   error = lockmgr(&map->lock, LK_EXCLUPGRADE, (struct simplelock *)0, p);
if (error == 0)
map->timestamp++;
return error;



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

kstat - an API for gathering kernel stats

1999-11-03 Thread Arun Sharma


I wrote kstat as a way to improve on the current BSD method of getting
kernel statistics, which involves looking up a particular kernel symbol
name and then getting the value from the symbol offset. This makes any
performance monitoring tool or an application that gets kernel stats
non-portable across different kernel versions if for some reason, the names
of these variables happen to change.

kstat derives some ideas from the Solaris kstat API, but is much simpler.
It adds a new system call to the kernel. Any kernel module that wants to
register a counter calls kstat_register, which makes an entry in the
hash table, that maps the counter name to the address of the counter.

A user program makes a system call with this string "cpu.system" to get
the current value of user/system/nice time etc.

A kernel module and a sample application can be downloaded from:

http://members.home.net/adsharma/kstat.tar.gz

Each system call currently costs a hash table lookup. A tool that may
want to repeatedly get the value of the same counter over and over again
may want to avoid that lookup everytime. I have some ideas on how to make
that happen.

Comments and suggestions are welcome.

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: kstat - an API for gathering kernel stats

1999-01-02 Thread Arun Sharma

On Thu, Nov 04, 1999 at 02:53:51AM -0500, Matthew N. Dodd wrote:
> On Wed, 3 Nov 1999, Arun Sharma wrote:
> > A user program makes a system call with this string "cpu.system" to get
> > the current value of user/system/nice time etc.
> 
> How is this different from doing:
> 
> # sysctl -a | grep load
> vm.loadavg: { 0.15 0.09 0.04 }
> 
> Ideally we could have a syscall that could return the OID for a given name
> to solve the portability and speed issues associated with doing repeated
> lookups.
> 
> Seems like you've reinvented the wheel to me.

I just looked at the sysctl implementation and there are some differences.
Moreover, since it was not being used in tools like vmstat and xosview,
I thought there must be a reason.

sysctl also seems to assume that it doesn't get called frequently. So
mapping the name to the sysctl data is a slightly more heavy duty
operation than a hash table lookup.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: kstat - an API for gathering kernel stats

1999-01-02 Thread Arun Sharma

On Thu, Nov 04, 1999 at 12:52:50PM -0500, Matthew N. Dodd wrote:
> On Thu, 4 Nov 1999, Arun Sharma wrote:
> > I just looked at the sysctl implementation and there are some differences.
> > Moreover, since it was not being used in tools like vmstat and xosview,
> > I thought there must be a reason.
> > 
> > sysctl also seems to assume that it doesn't get called frequently. So
> > mapping the name to the sysctl data is a slightly more heavy duty
> > operation than a hash table lookup.
> 
> Wouldn't hashing the sysctl OIDs be the way to go then?
> 
> Why invent another namespace?
> 

Please see the attached mail. Yes - I didn't look closely at sysctl, before
I started working on kstat. My argument is that we need different interfaces
for kernel tuning (which is what sysctl seems to be good at) and kernel
performance statistics collection.

The former activity is more heavy weight than the latter.

-Arun

On Thu, Nov 04, 1999 at 08:26:14AM -0700, Ronald G. Minnich wrote:
> quick question: is this better than sysctl, and if so why? I worry about
> adding new system calls.

To be honest, I didn't look closely at sysctl before I started writing
kstat, which was primarily meant to be an exercise in learning BSD. 
A few differences:

(a) sysctl seems to be more appropriate for kernel tuning than performance
monitoring. Even if it can do performance monitoring as efficiently
as kstat, I think it makes sense to keep the kernel tuning and performance
monitoring interfaces separate.

(b) kstat uses a hash table lookup to map the counter name to the counter
value. Which should make it a little bit faster.

(c) kstat allows multiple instances of the same counter (cpu.nice for CPU1 
and CPU2 for eg).

-Arun

Re: kstat - an API for gathering kernel stats

1999-01-02 Thread Arun Sharma

On Thu, Nov 04, 1999 at 06:30:01PM -0800, Mike Smith wrote:
> Sysctl is faster than kstat once you have performed the name->oid 
> lookup.  There is basically nothing that kstat can do that sysctl can't 
> do better and faster, apart from lookup-by-name.

Can a loadable module, say a network driver register variables with
sysctl ? Can sysctl itself be made a loadable module ? As for the speed,
I don't think it is an issue - I can add another interface for getting
a kstatid and make it fast. 

I'm not really saying that kstat is better than sysctl. In fact, it
was an oversight on my part not to look closely at sysctl. My goal
was to get some tools - specifically ktop and xosview to work on
FreeBSD. So I don't particularly care how we get there - if it means
adding a few more variables to the sysctl MIB, so be it.

Now, if I make those changes and submit a patch, will it be considered 
for inclusion ?

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: kstat - an API for gathering kernel stats

1999-01-02 Thread Arun Sharma

On Thu, Nov 04, 1999 at 09:31:02PM -0600, Chris Costello wrote:
> On Thu, Nov 04, 1999, Arun Sharma wrote:
> > Can a loadable module, say a network driver register variables with
> > sysctl ? Can sysctl itself be made a loadable module ? As for the speed,
> 
> a.) Yes.

I don't see any examples in sys/modules. The SYSCTL_INT macros eventually
expands to DATA_SET which puts certain data in a different ELF section.

In other words, sysctl seems to be relying on physical adjacency of 
certain structures after linkage is done.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Fwd: Re: kstat - an API for gathering kernel stats

1999-11-28 Thread Arun Sharma

[ For some reason, this post through muc.lists.freebsd.hackers gateway didn't
  show up on the mailing list. Forwarding it to the mailing list.. ]

On Thu, 04 Nov 1999 20:38:50 -0800, Mike Smith <[EMAIL PROTECTED]> wrote:
> > I don't see any examples in sys/modules. The SYSCTL_INT macros eventually
> > expands to DATA_SET which puts certain data in a different ELF section.
> 
> You don't do anything magic at all; it's handled invisibly by the 
> kernel linker.

I was thinking about implementing SMP cpu stats using sysctl today and
I have a question - can I create sysctl nodes dynamically ?

i.e.

for (cpu = 0; cpu < get_num_cpus(); cpu++) {
/* create sysctl node here ? */
}

Also, one simple solution to maintaining per cpu stats is to put the whole
thing in struct globaldata. All existing code remains unchanged and 
automagically updates the per cpu stats. I may need to add some additional
variables, which reflect system wide data. Now, if I put stuff in globaldata
and try to export it using sysctl, things get a little more complex.

One solution to the above problem is to use SMPpt relative addresses in
the sysctl declarations. But given that the number of CPUs is known only
at runtime, we come back to the first question in this mail.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Per CPU timekeeping for SMP

1999-12-05 Thread Arun Sharma


Here's a reimplementation of my earlier per cpu time keeping patch
on SMP.  The attached patch is against a 11/20/99 -current that I
cvsup'ed.

1. On UP, 

sys_time is a global and contains the system wide stats
cpu_time is a global and is essentially the same as sys_time.

2. On SMP

sys_time contains the system wide stats
cpu_time has been changed to a pointer in the per-cpu space.
On BSP, this pointer points to a static array cpu0_cpu_time
On APs, this space is kmem_alloc'ed

Perhaps I should wrap cpu_time in a structure (cpu_info ?), which 
could be the right place to store all per CPU info.

3. I've taken the liberty of changing CP_* to CPU_*. I hope the new names
   better convey the meaning of the variables and are acceptable.

4. I've gotten sysctls working for sys_time -

$ sysctl -A | grep kern.stats
kern.stats.systime.user: 25150
kern.stats.systime.nice: 3878
kern.stats.systime.sys: 14071
kern.stats.systime.intr: 7395
kern.stats.systime.idle: 5326029

I'm working on generating the per cpu sysctls. 

5. The machine specific code for Alpha will need some changes - which I
   can implement, but have no way of compiling or testing.

6. All the existing utilties which depended on peeking at cp_time will
   break (which is a good thing, IMO - so that I can fix them. :-) They 
   will all be converted to use sysctl, as time permits. 

Now, about the release schedule for this work - am I too late for the
12/15 feature freeze ? I'd appreciate some comments on the implementation,
so that if there are any issues, I can fix them before 12/15.

-Arun


Index: i386/i386/genassym.c
===
RCS file: /home/adsharma/cvs_root/freebsd-sys/i386/i386/genassym.c,v
retrieving revision 1.1.1.4
diff -u -r1.1.1.4 genassym.c
--- genassym.c  1999/11/20 23:46:06 1.1.1.4
+++ genassym.c  1999/12/05 19:45:42
@@ -205,6 +205,7 @@
printf("#define\tGD_PRV_PADDR1 %#x\n", OS(globaldata, gd_prv_PADDR1));
printf("#define\tPS_IDLESTACK %#x\n", OS(privatespace, idlestack));
printf("#define\tPS_IDLESTACK_TOP %#x\n", sizeof(struct privatespace));
+   printf("#define\tGD_CPU_TIME %#x\n", OS(globaldata, gd_cpu_time));
 #endif
 
printf("#define\tKCSEL %#x\n", GSEL(GCODE_SEL, SEL_KPL));
Index: i386/i386/globals.s
===
RCS file: /home/adsharma/cvs_root/freebsd-sys/i386/i386/globals.s,v
retrieving revision 1.1.1.2
diff -u -r1.1.1.2 globals.s
--- globals.s   1999/08/31 05:12:09 1.1.1.2
+++ globals.s   1999/12/05 19:46:11
@@ -79,6 +79,7 @@
.setgd_currentldt,globaldata + GD_CURRENTLDT
 #endif
 
+
 #ifndef SMP
.globl  _curproc, _curpcb, _npxproc
.globl  _common_tss, _switchtime, _switchticks
@@ -122,6 +123,9 @@
.setgd_prv_CADDR2,globaldata + GD_PRV_CADDR2
.setgd_prv_CADDR3,globaldata + GD_PRV_CADDR3
.setgd_prv_PADDR1,globaldata + GD_PRV_PADDR1
+
+   .globl  gd_cpu_time
+   .setgd_cpu_time,globaldata + GD_CPU_TIME
 #endif
 
 #if defined(SMP) || defined(APIC_IO)
Index: i386/i386/machdep.c
===
RCS file: /home/adsharma/cvs_root/freebsd-sys/i386/i386/machdep.c,v
retrieving revision 1.1.1.4
diff -u -r1.1.1.4 machdep.c
--- machdep.c   1999/11/20 23:46:07 1.1.1.4
+++ machdep.c   1999/12/05 21:59:13
@@ -114,6 +114,7 @@
 #ifdef SMP
 #include 
 #include 
+#include /* For cpu_time */
 #endif
 #ifdef PERFMON
 #include 
@@ -143,6 +144,10 @@
 
 static MALLOC_DEFINE(M_MBUF, "mbuf", "mbuf");
 
+#ifdef SMP
+static cpu0_cpu_time[NCPUSTATES];
+#endif
+
 int_udatasel, _ucodesel;
 u_int  atdevbase;
 
@@ -1964,6 +1969,11 @@
proc0.p_addr->u_pcb.pcb_mpnest = 1;
 #endif
proc0.p_addr->u_pcb.pcb_ext = 0;
+
+#ifdef SMP
+   /* Setup cpu0's cpu_time */
+   cpu_time = &cpu0_cpu_time;
+#endif
 }
 
 #if defined(I586_CPU) && !defined(NO_F00F_HACK)
Index: i386/i386/mp_machdep.c
===
RCS file: /home/adsharma/cvs_root/freebsd-sys/i386/i386/mp_machdep.c,v
retrieving revision 1.1.1.4
diff -u -r1.1.1.4 mp_machdep.c
--- mp_machdep.c1999/11/20 23:46:07 1.1.1.4
+++ mp_machdep.c1999/12/05 19:48:29
@@ -243,6 +243,11 @@
 /** XXX FIXME: what system files declare these??? */
 extern struct region_descriptor r_gdt, r_idt;
 
+extern long sys_time[NCPUSTATES];
+#ifndef SMP
+extern long cpu_time[NCPUSTATES];
+#endif
+
 intbsp_apic_ready = 0; /* flags useability of BSP apic */
 intmp_ncpus;   /* # of CPUs, including BSP */
 intmp_naps;/* # of Applications processors */
@@ -1798,6 +1803,9 @@
SMPpt[pg + 3] = 0;  /* *prv_CMAP3 */
SMPpt[pg + 4] = 0;  /* *prv_PMAP1 */
 
+   /* space for

Re: Fwd: Re: kstat - an API for gathering kernel stats

1999-12-08 Thread Arun Sharma

On Mon, Nov 29, 1999 at 10:09:35AM +0100, Andrzej Bialecki wrote:
> > I was thinking about implementing SMP cpu stats using sysctl today and
> > I have a question - can I create sysctl nodes dynamically ?
> > 
> > i.e.
> > 
> > for (cpu = 0; cpu < get_num_cpus(); cpu++) {
> > /* create sysctl node here ? */
> > }
> 
> Yes. See for example linux emulator or my SPY module:
> 
>   http://www.freebsd.org/~abial/spy
> 
> You can also create whole new branches, as the second example shows.

Thanks - that was useful. However, I noticed that only the leaves 
(SYSCTL_INT/LONG/STRING) etc can be dynamically created. But nodes
can't be dynamically created. Am I correct ?

I'm interested in doing something like:

kern.stats.cpu0.idle
kern.stats.cpu0.nice
...
kern.stats.cpu1.idle
kern.stats.cpu1.nice
...

and I want the nodes cpu0, cpu1 etc dynamically created. 

But that's no big deal. I'll define 4 cpus for now and zero the values for
non-existent cpus.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Fwd: Re: kstat - an API for gathering kernel stats

1999-12-08 Thread Arun Sharma


On Wed, Dec 08, 1999 at 05:44:31PM +0100, Andrzej Bialecki wrote:
> On Wed, 8 Dec 1999, Arun Sharma wrote:
> Erhm.. No.
> 
> Look closer at the SPY module. I create the whole branch from the root
> level. In the standard system there is no such thing as "kld" node,
> neither there is a "spy" node. I created both of them. Only then I created
> a bunch of leaves (of course, nothing stops you from creating some more
> leaves on each intermediate level, if you need them).

Given a number N, whose value is determined at run time, could you have
created N kld nodes ?

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Per CPU timekeeping for SMP

1999-12-17 Thread Arun Sharma

Arun Sharma wrote:
> 
> 
> Here's a reimplementation of my earlier per cpu time keeping patch
> on SMP.  The attached patch is against a 11/20/99 -current that I
> cvsup'ed.

Did anyone get a chance to review this ? Is everyone busy or sending
patches to -hackers is frowned upon ? Or is this something that 
people aren't so excited about ?

> 4. I've gotten sysctls working for sys_time -
> 
> $ sysctl -A | grep kern.stats
> kern.stats.systime.user: 25150
> kern.stats.systime.nice: 3878
> kern.stats.systime.sys: 14071
> kern.stats.systime.intr: 7395
> kern.stats.systime.idle: 5326029
> 
> I'm working on generating the per cpu sysctls. 

I've completed this work now. Here's the output of the new sysctl on
my dual cpu box:

$ sysctl -A | grep kern.stats
kern.stats.systime.user: 13710
kern.stats.systime.nice: 552
kern.stats.systime.sys: 4296
kern.stats.systime.intr: 2602
kern.stats.systime.idle: 1878764
kern.stats.cpu.user.0: 7082
kern.stats.cpu.user.1: 7169
kern.stats.cpu.nice.0: 9
kern.stats.cpu.nice.1: 2
kern.stats.cpu.sys.0: 2120
kern.stats.cpu.sys.1: 2177
kern.stats.cpu.intr.0: 1309
kern.stats.cpu.intr.1: 1293
kern.stats.cpu.idle.0: 939407
kern.stats.cpu.idle.1: 939358

I have also figured out how to dynamically register sysctl nodes.
The trick is to basically malloc a sysctl_oid and fill in the right
fields and calling sysctl_register_oid. The code is in a kernel
module available from:

http://sharmas.dhs.org/~adsharma/projects/freebsd/sysctl.tar.gz

It really needs to go into the base kernel. Also, I think
sysctl_register_long and its yet to be written friends (register_int)
etc, need to go into kern_sysctl - so that others can reuse the code
to dynamically create sysctl nodes.

I really don't want to spend my time on getting xosview and ktop to
use these patches until I'm convinced that this code is going into
the kernel.

Again, the patches may not be perfect. But if people can review it
I'll fix any issues. Also, Alpha guys need to make minor changes
to keep things working.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Accessing user data from kernel

2000-01-19 Thread Arun Sharma

In muc.lists.freebsd.hackers, you wrote:
> 
> When the kernel wants to access any user data, it either copies them into
> the kernel or maps them into kernel address space.  Can anyone tell me the
> reasons why this is done?  When a process enters the kernel mode, the
> page tables are not changed. 
> 
> I have taken this for granted for a long time without knowing the reasons.

1. The kernel may be entered asynchronously - from interrupts and traps.
   You can't always be sure of which is the currently running user process.

2. For cases where you've entered the kernel synchronously - through syscalls
   for example, you need to check for the validity of data. You could 
   potentially skip the step and validate the data where it is used, rather
   than doing it upfront - but that may mean too many checks. It's just
   cleaner to copyin/copyout once at entry/exit, rather than repeating the
   code all over the place.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Accessing user data from kernel

2000-01-20 Thread Arun Sharma


On Thu, Jan 20, 2000 at 10:04:16AM -0500, Zhihui Zhang wrote:
> Point 2 seems to be saying that we would rather sacrifice some performance
> to gain a cleaner interface (people are talking about eliminating kernel
> copying for a long time). Consider the physical I/O on a raw device, where
> we map the user data again in the KVA without copying the data. Why do we
> do this double mapping, when we can access the user data directly?
> 

Direct I/O to user space should be treated as an optimization. Such I/O
requires wiring down all the user pages before I/O can happen. Hence
it requires special previleges.

Why does it get mapped to KVA ? Because of point 1.

-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Finding percent idle

2000-02-26 Thread Arun Sharma

On Fri, 25 Feb 2000 14:25:46 -0500, James Housley <[EMAIL PROTECTED]> wrote:
> I am trying to find out the current % idle of the machine from within a
> program.  I have looked at the valuse provided by sysctl and found
> loadavg but not system idle.  I have also looked through the source for
> top and haven't been able to figure that out.  All pointers would be
> appreciated.

As another poster pointed out, all of the FreeBSD programs (top, vmstat,
xosview, ktop) get this stuff from kvm - which is a non portable (across
different versions of FreeBSD) interface.  FreeBSD also doesn't keep
these numbers on a per CPU basis on a SMP box.

I wrote a patch for fixing the SMP case and a KLD to get them via
sysctl. With slight modifications to the KLD, you can get those values
exported via sysctl.

The KLD is available at:

http://sharmas.dhs.org/~adsharma/projects/freebsd/sysctl.tar.gz

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 64bit OS?

2000-02-26 Thread Arun Sharma

Arun Sharma wrote:
> Matt Dillon wrote:
> > What I would truely love to do would be to get away with not using a GPT
> > at all and instead doing a vm_map_lookup_entry()/vm_page_lookup()
> > (essentially taking a vm_fault), then optimize the vm_map_entry 
> > structural hierarchy to look more like a GPT rather then the linear 
> > list it currently is.  When coupled with an STLB, especially one that 
> > can be optimized, I think performance would be extremely good.
> 
> For finding the vm_map_entry for a virtual address, a balanced binary tree 
> works better. Linux does well here - it uses AVL trees, which find the
> right vm_map_entry in O(log n) time.

I just did some investigation into seeing if this (balanced binary trees)
is a useful optimization. It doesn't look like one. 

I instrumented the kernel and collected some stats. On booting the kernel
into KDE and running xemacs and netscape, I got:

kern.vm_map_nsteps: 151916
kern.vm_map_nlookups: 65441

i.e. roughly 3 vm_map_entries were walked before getting to the right one.

Then I did a make clean all; in /usr/src/sys and at the end of the compilation,
I got:

kern.vm_map_nsteps: 666258
kern.vm_map_nlookups: 628911

This time the hints seemed to have worked extremely well and there is almost
no overhead involved.

These numbers would be valid even for 64 bit architectures. However, if
the number of apps which use a large number of shared libraries or loadable
modules (Mozilla with XPCOM, KDE with KOM/DCOP) things can be slightly
different.

For now, I think we're just fine with linear linked lists with a hint.

-Arun

*** vm_map.c-   Sat Feb 26 12:01:59 2000
--- vm_map.cSat Feb 26 12:13:46 2000
***
*** 75,80 
--- 75,83 
  #include 
  #include 
  #include 
+ #include 
+ #include 
+ #include 

  #include 
  #include 
***
*** 331,336 
--- 334,349 
  #define   SAVE_HINT(map,value) \
(map)->hint = (value);

+ /* Some counters for tracking the overhead of servicing page faults */
+ static unsigned long nsteps = 0;
+ static unsigned long nlookups = 0;
+ 
+ SYSCTL_LONG(_kern, OID_AUTO, vm_map_nsteps, CTLFLAG_RW, 
+   &nsteps, "");
+ 
+ SYSCTL_LONG(_kern, OID_AUTO, vm_map_nlookups, CTLFLAG_RW, 
+   &nlookups, "");
+ 
  /*
   *vm_map_lookup_entry:[ internal use only ]
   *
***
*** 350,355 
--- 363,370 
vm_map_entry_t cur;
vm_map_entry_t last;

+   nlookups++;
+ 
/*
 * Start looking either from the head of the list, or from the hint.
 */
***
*** 401,406 
--- 416,422 
break;
}
cur = cur->next;
+   nsteps++;
}
*entry = cur->prev;
SAVE_HINT(map, *entry);

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 64bit OS?

2000-02-18 Thread Arun Sharma

[ My apologies if this is a repeat - my earlier mail didn't seem to make it ]

On Fri, 18 Feb 2000 12:03:37 +1100, Patryk Zadarnowski <[EMAIL PROTECTED]> wrote:

> On the other hand, IA-64 is a very exotic architecture from the OS's
> point of view, and anyone planning to port *BSD to it should probably
> start planning ASAP.

I'm a former Intel employee and I have worked on the Linux IA-64 project.
I think there is plenty of planning to do to get an OS running on
IA-64, which is more complex than most other architectures I've known.

First of all - there is plenty of reading to do:

http://developer.intel.com/design/ia-64/manuals/index.htm
http://devresource.hp.com/devresource/Docs/Refs/IA64ISA/index.html

Some of the design decisions to make:

(a) Programming model - LP64 would probably be the most sensible
(b) Page table architecture

IA-64 supports both the long and short format VHPT (virtual hash page
table). Linux chose to use the short format - which really uses no
hashing. 

Linux has the concept of machine independent multi level page tables and
has generic algorithms which manipulate them in machine independent code.
Where possible, it tries to map them to hardware dependent page tables.
On architectures like IA-64 and Power PC, this becomes a little awkward
and Linux essentially treats hardware page tables as TLBs.

The problem with the above approach is duplication of information between
Linux page tables and hardware page tables and inefficient use of memory
for page tables.

I think OSes like FreeBSD which don't have a concept of machine independent
page table are essentially free to do anything in the hat layer and thus 
have more flexibility.

On Linux/IA-64, such duplication is avoided by having a 3 level page
table and overloading the L3 page table with the hardware page table
functionality. In a nutshell, all L3 page tables are mapped in a region
in the *virtual* address space, such that to get the vtop translation
for address P, you can just index into this "linear" virtual page
table. Because the page table is in *virtual* address space, recursive
faults are possible. A significant chunk of the virtual address space
has to be reserved for this sparse, linear page table.

The other alternative is to use a global hash page table and walk the
collision chains in software. The advantage of this scheme is savings
in terms of both physical memory and virtual address space, but a 
heavier dependence on the hardware implemented hashing algorithm's
characteristics.

It isn't really clear which one is superior, but FreeBSD's VM architecture
allows a choice.

(c) Handling the register stack on system call entry and exit

Sparc has shown how frequent register stack flushing can 
offset the good effects of register stacks. IA-64 has some nice
tricks which can be used to avoid the flush.

(d) Restarting of system calls and interactions with the register stack

Linux does this by using a gcc directive which was created for this
purpose. The normal calling conventions allow input registers to be
treated as scratch. But under this directive they will be preserved,
so that system calls can be restarted.

The disadvantage of this approach is compiler specific code (which Linux
has not been averse to in the past) and some register allocation 
inefficiency in the kernel.

The alternate approach is to return ERESTART from the system call,
catch the error in libc and restart the system call with saved args.

Other general notes:

- Writing assembly code is tricky and writing efficient assembly code
  is trickier
- Lots of architectural state to keep track of
- Implementing setjmp and longjmp is tricky, because of the interaction
  with the register stack
- Errata, lack of support can be worked around by looking at Linux sources

I'd love to have technical discussions about the IA-64 architecture
from an OS perspective, if anyone on this list is interested. 

Since last September, I've moved on to a new daytime job, which has
nothing to do with operating systems or kernels. I have a limited amount
of spare time and I'm willing to help out with a IA-64 port, if the 
FreeBSD project decides to pursue it.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 64bit OS?

2000-02-18 Thread Arun Sharma


On Fri, Feb 18, 2000 at 04:06:55PM -0800, Matthew Dillon wrote:
> If I understand the hardware hash table method correctly, then
> I think the absolute best choice for FreeBSD is to use that method
> as it will allow us to get rid of the scaleability problems we have
> with the pv_entry_t scheme we use for IA32.  The number of pv_entry_t's
> in an IA64 architecture wind up being fixed.  How big can we make the 
> hardware-assisted hash table?

Smaller than 2**64. Minimum is 2**15.

> 
> Also, a hash table scheme is a much better fit for a 64 bit address
> space model, especially with sparse mappings.  The MIPS R4K and later
> all use a hash table scheme and it seems to work well for them.
> 

Madhu Talluri's paper on page tables for 64 bit address spaces claims that
having collision chains is expensive - for 8 bytes of mapping information,
the pointer and tag storage overhead is 16 bytes.

Though page table space is important, in the age of big memory computers,
I think performance and manageability are more important.

-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 64bit OS?

2000-02-18 Thread Arun Sharma

On Sat, 19 Feb 2000 12:10:14 +1100, Patryk Zadarnowski
<[EMAIL PROTECTED]> wrote:

> 
> Kevin Elphinstone did a PhD thesis on TLB structures for 64 bit address spaces
> and it turns out that hash tables perform quite poorly. I'd suggest GPTs
> instead, or maybe LPCtrie that Chris Szmajda has been working on here at UNSW.
> Both have the advantage of supporting multiple page sizes that IA64 (and
> Alpha) offer, and hence dramatically increasing the TLB coverage over what
> Linux (or any other commercial OS that took a bite at IA64) can achieve.
> Kevin's paper's at:
> ftp://ftp.cse.unsw.edu.au/pub/users/disy/papers/Elphinstone:phd.ps.gz

Thanks for the great pointer. IA-64 short format = Linear virtual
arrays described in this paper. Long format = conventional hashed page
table.

Page 116 on LVAs in the paper talks about the disadvantages of using
the short format:

(a) Increased TLB misses
(b) Memory overhead similar to multilevel page tables

I don't know if clustered page tables can be implemented with the hardware
support present in IA-64. More investigation is needed.

> Maybe that way we can somehow make use of the Itanium's 4GB page size ;

The best thing is the abilitity to have large pinned TLB entries - they're
called TRs (translation registers) in the manuals. Linux for example
maps all of kernel memory with one huge TR. This also accomplishes the
traditional Linux way of mapping all of physical memory into kernel
virtual.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 64bit OS?

2000-02-19 Thread Arun Sharma


On Sun, Feb 20, 2000 at 12:42:14PM +1100, Patryk Zadarnowski wrote:
> One more thing about GPTs (I thought I'll leave that till last. ;)
> Jochen Liedtke holds a German patent on them, although he will
> probably be fairly easily convinced to give FreeBSD rights to use
> them. I'll be happy to ask (if we're interested.)

It looks like the hardware has to implement GPTs and know how to
walk them. How can FreeBSD use them without hardware support ?

-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 64bit OS?

2000-02-19 Thread Arun Sharma

On Sun, Feb 20, 2000 at 01:48:49PM +1100, Patryk Zadarnowski wrote:
> > It looks like the hardware has to implement GPTs and know how to
> > walk them. How can FreeBSD use them without hardware support ?
> 
> No it doesn't. We've got software GPT implementations for both MIPS64 and
> Alpha, and they're both peform very well in our somewhat hostile SASOS
> conditions.  I'm not sure why you think that a hardware walk is necessary:

For performance reasons and memory efficiency reasons. My understanding of 
your proposal is - use VHPT as a large in memory TLB and use GPT as operating
system's primary page table.

Doesn't that involve duplication of information in memory, especially if
the hash table is big ?

> the only reason why IA-64 walks VPHT in hardware *at all* is to minimize
> the impact on the pipeline and improve ILP:

I think that's an important reason. A software only TLB miss handler
would be inferior to a VHPT based solution on IA-64, IMO.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 64bit OS?

2000-02-20 Thread Arun Sharma


On Sun, Feb 20, 2000 at 04:28:51PM +1100, Patryk Zadarnowski wrote:
> > On Sun, Feb 20, 2000 at 01:48:49PM +1100, Patryk Zadarnowski wrote:
> > > > It looks like the hardware has to implement GPTs and know how to
> > > > walk them. How can FreeBSD use them without hardware support ?
> > > 
> > > No it doesn't. We've got software GPT implementations for both MIPS64 and
> > > Alpha, and they're both peform very well in our somewhat hostile SASOS
> > > conditions.  I'm not sure why you think that a hardware walk is necessary:
> > 
> > For performance reasons and memory efficiency reasons. My understanding of 
> 
> We must be careful here. Although you're getting a samll immediate performance
> gain by not flushing the pipelines, the performance is killed if the working
> set is larger than the TLB (as it usually is on a moderately-loaded system,
> especially in presence of heavy IPC (eg. UNIX pipes)), in which case a smarter
> data structure will usually increase the TLB coverage.

The TLB (VHPT in the case of IA-64) can be made large to reduce the
misses. Also, in the case of a VHPT miss, the software hander need 
not be any more expensive than it would have been in the absence of
the VHPT.

> 
> And don't forget that with VHPT you'll be getting nested TLB faults quite
> frequently in a sparsely-populated page table (think shared libraries).
> 

That's true only for the short format. Not for the long format.

> which has an MMU vaguely resembling that of IA-64.). Besides, doesn't Linux
> duplicate the structure anyway even when it uses a hardware-walked page table?

No. L3 page tables are mapped into the linear page table. So the hardware
walker just walks Linux L3 page tables.

> before.)  Besides, the amount of space saved due to a smarter page table data
> structure more than compensates for the additional memory anyway.

Agree.

> > I think that's an important reason. A software only TLB miss handler
> > would be inferior to a VHPT based solution on IA-64, IMO.
> 
> It's the only justification Rumi Zahir (head of the IA-64 team) gave me when I
> was complaining about it.  (as in: ``why bother? 64 bit page tables are an
> open problem and no other 64 bit platform I know of provides a hardware page
> table walk''. BTW, does anoone know if HP-PA and IBM 64bit PPC implement a
> hardware PT walk?

I can't get the data on IBM's 64 bit Power3. But on 32 bit architectures,
they use a hardware page walker. Researching more, I found someone who
agrees with you about smart software page tables being better than
hardware table walkers.

http://hq.fsmlabs.com/~cort/papers/linuxppc-mm/html/

But I have a hard time beliving that processor architects at major companies
are stupid in wasting transistors on hardware table walkers ;)

-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: 64bit OS?

2000-02-20 Thread Arun Sharma

Matt Dillon wrote:
> 
> Linux also stores persistent information in their machine independant
> page tables.  They aren't throw-away like FreeBSD's are.  This will give
> us a huge advantage when we do the IA64 port.

I forgot to mention that Linux/IA-64 switches the processor to physical mode 
to walk the 3 level page table in the VHPT miss handler. This has additional
overheads associated.

> In general I like the idea of using a VHPT as an STLB (are we having
> fun with terminology yet?).

Yes, Software TLB is a misnomer. Second level TLB is probably better. VHPT
can behave as either STLB or the primary page table of the OS.

> What I would truely love to do would be to get away with not using a GPT
> at all and instead doing a vm_map_lookup_entry()/vm_page_lookup()
> (essentially taking a vm_fault), then optimize the vm_map_entry 
> structural hierarchy to look more like a GPT rather then the linear 
> list it currently is.  When coupled with an STLB, especially one that 
> can be optimized, I think performance would be extremely good.

For finding the vm_map_entry for a virtual address, a balanced binary tree 
works better. Linux does well here - it uses AVL trees, which find the
right vm_map_entry in O(log n) time.

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Getting CPU usage in FreeBSD

2000-03-12 Thread Arun Sharma


> On Linux this is what I do to get this value:  Measure the number of
> scheduled jiffies (hundreths of second), measure elapsed time since last
> measurement, divide.

I ran into the same problem as you - and took the time to implement it.
My patches fix the SMP case as well as getting it via sysctl instead of
kvm_read.

See:

http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=169412+180922+/usr/local/www/db/text/1999/freebsd-hackers/19991212.freebsd-hackers

http://www.freebsd.org/cgi/getmsg.cgi?fetch=293002+0+/usr/local/www/db/text/1999/freebsd-hackers/19991226.freebsd-hackers

-Arun


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

RTLD thread safety

2000-03-25 Thread Arun Sharma


When I try to compile a simple multi threaded program using a wrapper 
around rfork (from linuxthreads port), I get the following core dump:

ld-elf.so.1: assert failed: /usr/src/libexec/rtld-elf/lockdflt.c:54

Investigation into code reveals that lazy resolution of symbols
(using PLTs) was happening in multiple threads in the linker simultaneously.

Also, the code in lockdflt.c is achieving mutual exclusion by blocking
signals. This doesn't work on a SMP machine using kernel threads.

What would be the right solution for this ? A new set of primitives 
registered using dllockinit or making the defaults SMP thread-safe ?

I suppose the linuxthreads port works because it has been tested only
with Linux executables and Linux executables don't use lazy resolution
of symbols ? I'm just speculating here.

-Arun



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: RTLD thread safety

2000-03-26 Thread Arun Sharma

On Sun, Mar 26, 2000 at 11:04:08AM -0600, Richard Seaman, Jr. wrote:
> No.  See the file libc_thread.c in the linuxthreads port.
> 
> Note that if you call rfork (RF_MEM...) without any supporting
> infrastructure (eg. as provided by the linuxthreads port) you
> are in dangerous territory.  You do not get *any* of the
> thread safe behaviour in libc, libgcc, or in ld-ef.so. 

So you went the dllockinit way. Why not put that code in ld-elf.so itself ?
Same goes for other work you've done as a part of the linuxthreads port. If
it is the GPL contamination issue, someone (perhaps me) can rewrite the
relevant parts.

When FreeBSD has it's own native kernel supported pthreads package, all
these things will be very much necessary, irrespective of which threads
model the package uses. So why not do this work now ?

Also, what happened to all the discussion on -arch ? Was there a consensus
reached ?

-Arun

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

1 2 >

1 - 100 of 125 matches

Mail list logo