Re: problems with mmap() and disk caching

2012-04-06 Thread Konstantin Belousov
On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote:
> On 05.04.2012 23:41, Konstantin Belousov wrote:
> >On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote:
> >>On 05.04.2012 19:54, Alan Cox wrote:
> >>>On 04/04/2012 02:17, Konstantin Belousov wrote:
> On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
> >>[snip]
> >This is what I expect. But why this doesn't work without reading file
> >manually?
> Issue seems to be in some change of the behaviour of the reserv or
> phys allocator. I Cc:ed Alan.
> >>>
> >>>I'm pretty sure that the behavior here hasn't significantly changed in
> >>>about twelve years. Otherwise, I agree with your analysis.
> >>>
> >>>On more than one occasion, I've been tempted to change:
> >>>
> >>>pmap_remove_all(mt);
> >>>if (mt->dirty != 0)
> >>>vm_page_deactivate(mt);
> >>>else
> >>>vm_page_cache(mt);
> >>>
> >>>to:
> >>>
> >>>vm_page_dontneed(mt);
> >>>
> >>
> >>Thanks Alan!  Now it works as I expect!
> >>
> >>But I have more questions to you and kib@.  They are in my test below.
> >>
> >>So, prepare file as earlier, and take information about memory usage
> >>from top(1).  After preparation, but before test:
> >>Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free
> >>
> >>First run:
> >>$ ./mmap /mnt/random
> >>mmap:  1 pass took:   7.462865 (none:  0; res: 262144; super:
> >>0; other:  0)
> >>
> >>No super pages after first run, why?..
> >>
> >>Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free
> >>
> >>Now the file is in inactive memory, that's good.
> >>
> >>Second run:
> >>$ ./mmap /mnt/random
> >>mmap:  1 pass took:   0.004191 (none:  0; res: 262144; super:
> >>511; other:  0)
> >>
> >>All super pages are here, nice.
> >>
> >>Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free
> >>
> >>Wow, all inactive pages moved to active and sit there even after process
> >>was terminated, that's not good, what do you think?
> >Why do you think this is 'not good' ? You have plenty of free memory,
> >there is no memory pressure, and all pages were referenced recently.
> >THere is no reason for them to be deactivated.
> >
> 
> I always thought that active memory this is a sum of resident memory of 
> all processes, inactive shows disk cache and wired shows kernel itself.
So you are wrong. Both active and inactive memory can be mapped and
not mapped, both can belong to vnode or to anonymous objects etc.
Active/inactive distinction is only the amount of references that was
noted by pagedaemon, or some other page history like the way it was
unwired.

Wired is not neccessary means kernel-used pages, user processes can
wire their pages as well.
> 
> >>
> >>Read the file:
> >>$ cat /mnt/random>  /dev/null
> >>
> >>Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free
> >>
> >>Now the file is in wired memory.  I do not understand why so.
> >You do use UFS, right ?
> 
> Yes.
> 
> >There is enough buffer headers and buffer KVA
> >to have buffers allocated for the whole file content. Since buffers wire
> >corresponding pages, you get pages migrated to wired.
> >
> >When there appears a buffer pressure (i.e., any other i/o started),
> >the buffers will be repurposed and pages moved to inactive.
> >
> 
> OK, how can I get amount of disk cache?
You cannot. At least I am not aware of any counter that keeps track
of the resident pages belonging to vnode pager.

Buffers should not be thought as disk cache, pages cache disk content.
Instead, VMIO buffers only provide bread()/bwrite() compatible interface
to the page cache (*) for filesystems.
(*) - The cache term is used in generic term, not to confuse with
cached pages counter from top etc.

> 
> >>
> >>Could you please give me explanation about active/inactive/wired memory?
> >>
> >>
> >>>because I suspect that the current code does more harm than good. In
> >>>theory, it saves activations of the page daemon. However, more often
> >>>than not, I suspect that we are spending more on page reactivations than
> >>>we are saving on page daemon activations. The sequential access
> >>>detection heuristic is just too easily triggered. For example, I've seen
> >>>it triggered by demand paging of the gcc text segment. Also, I think
> >>>that pmap_remove_all() and especially vm_page_cache() are too severe for
> >>>a detection heuristic that is so easily triggered.
> >>>
> >>[snip]
> >>
> >>--
> >>Andrey Zonov
> 
> -- 
> Andrey Zonov


pgpb0p2pbb0W7.pgp
Description: PGP signature


Re: problems with mmap() and disk caching

2012-04-06 Thread Konstantin Belousov
On Thu, Apr 05, 2012 at 01:25:49PM -0500, Alan Cox wrote:
> On 04/05/2012 12:31, Konstantin Belousov wrote:
> >On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote:
> >>On 04/04/2012 02:17, Konstantin Belousov wrote:
> >>>On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
> Hi,
> 
> I open the file, then call mmap() on the whole file and get pointer,
> then I work with this pointer.  I expect that page should be only once
> touched to get it into the memory (disk cache?), but this doesn't work!
> 
> I wrote the test (attached) and ran it for the 1G file generated from
> /dev/random, the result is the following:
> 
> Prepare file:
> # swapoff -a
> # newfs /dev/ada0b
> # mount /dev/ada0b /mnt
> # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024
> 
> Purge cache:
> # umount /mnt
> # mount /dev/ada0b /mnt
> 
> Run test:
> $ ./mmap /mnt/random-1024 30
> mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super:
> 0; other:  0)
> mmap:  2 pass took:   7.356670 (none: 261648; res:496; super:
> 0; other:  0)
> mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
> 0; other:  0)
> mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
> 0; other:  0)
> mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
> 0; other:  0)
> mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
> 0; other:  0)
> mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
> 0; other:  0)
> mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
> 0; other:  0)
> mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
> 0; other:  0)
> mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
> 0; other:  0)
> mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
> 0; other:  0)
> mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
> 0; other:  0)
> mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
> 0; other:  0)
> mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
> 0; other:  0)
> mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
> 0; other:  0)
> mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
> 0; other:  0)
> mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
> 0; other:  0)
> mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
> 0; other:  0)
> mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
> 0; other:  0)
> mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
> 0; other:  0)
> mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
> 0; other:  0)
> mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
> 0; other:  0)
> mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
> 0; other:  0)
> mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
> 0; other:  0)
> mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
> 0; other:  0)
> mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
> 0; other:  0)
> mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
> 0; other:  0)
> mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super:
> 0; other:  0)
> mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super:
> 0; other:  0)
> mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super:
> 0; other:  0)
> 
> If I ran this:
> $ cat /mnt/random-1024>   /dev/null
> before test, when result is the following:
> 
> $ ./mmap /mnt/random-1024 5
> mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super:
> 0; other:  0)
> mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super:
> 0; other:  0)
> mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super:
> 0; other:  0)
> mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super:
> 0; other:  0)
> mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super:
> 0; other:  0)
> 
> This is what I expect.  But why this doesn't work without reading file
> manually?
> >>>Issue seems to be in some change of the behaviour of the reserv or
> >>>phys allocator. I Cc:ed Alan.
> >>I'm pretty sure that the behavior here hasn't significantly changed in
> >>about twelve years.  Otherwise, I agree with your analysis.
> >>
> >>On more than one occasion, I've been tempted to change:
> >>
> >> pmap_remove_all(mt);
> >> 

Re: problems with mmap() and disk caching

2012-04-06 Thread Alan Cox

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer.  I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super:
0; other:  0)
mmap:  2 pass took:   7.356670 (none: 261648; res:496; super:
0; other:  0)
mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
0; other:  0)
mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
0; other:  0)
mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
0; other:  0)
mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
0; other:  0)
mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
0; other:  0)
mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
0; other:  0)
mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
0; other:  0)
mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
0; other:  0)
mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
0; other:  0)
mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
0; other:  0)
mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
0; other:  0)
mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
0; other:  0)
mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
0; other:  0)
mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
0; other:  0)
mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
0; other:  0)
mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
0; other:  0)
mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
0; other:  0)
mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
0; other:  0)
mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
0; other:  0)
mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
0; other:  0)
mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
0; other:  0)
mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
0; other:  0)
mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
0; other:  0)
mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
0; other:  0)
mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
0; other:  0)
mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super:
0; other:  0)
mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super:
0; other:  0)
mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super:
0; other:  0)

If I ran this:
$ cat /mnt/random-1024>  /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super:
0; other:  0)
mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super:
0; other:  0)
mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super:
0; other:  0)
mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super:
0; other:  0)
mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super:
0; other:  0)

This is what I expect.  But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

What happen is that fault handler deactivates or caches the pages
previous to the one which would satisfy the fault. See the if()
statement starting at line 463 of vm/vm_fault.c. Since all pages
of the object in your test are clean, the pages are cached.

Next fault would need to allocate some more pages for different index
of the same object. What I see is that vm_reserv_alloc_page() returns a
page that is from the cache for the same object, but different pindex.
As an obvious result, the page is invalidated and repurposed. When next
loop started, the page is not resident anymore, so it has to be re-read
from disk.


I pretty sure that the pages aren't being repurposed this quickly.  
Instead, I believe that the explanation is to be found in mincore().  
mincore() is only reporting pages that are in the object's memq as 
resident.  It is not reporting cache pages as resident.



The behaviour of the allocator is not consistent, so some pages are not
reused, allowing the test to converge and to collect all pages of the
object eventually.

Calling madvise(MADV_RANDOM) fixes th

possible signedness issue in aic7xxx

2012-04-06 Thread Alexander Best
hi there,

i noticed the following worning from clang when building HEAD:

===> sys/modules/aic7xxx/aicasm (obj,build-tools)
/usr/github-freebsd-head/sys/modules/aic7xxx/aicasm/../../../dev/aic7xxx/aicasm/aicasm.c:604:5:
 warning: passing 'int *' to parameter of type 'unsigned int *' converts 
between pointers to integer types with different sign [-Wpointer-sign]
&skip_addr, func_values) == 0) {
^~
/usr/github-freebsd-head/sys/modules/aic7xxx/aicasm/../../../dev/aic7xxx/aicasm/aicasm.c:83:24:
 note: passing argument to parameter 'skip_addr' here
   unsigned int *skip_addr, int *func_vals);
 ^
1 warning generated.

will the attached patch take care of the problem?

cheers.
alex
diff --git a/sys/dev/aic7xxx/aicasm/aicasm.c b/sys/dev/aic7xxx/aicasm/aicasm.c
index 1b88ba0..08a540f 100644
--- a/sys/dev/aic7xxx/aicasm/aicasm.c
+++ b/sys/dev/aic7xxx/aicasm/aicasm.c
@@ -353,7 +353,7 @@ output_code(void)
patch_t *cur_patch;
critical_section_t *cs;
symbol_node_t *cur_node;
-   int instrcount;
+   unsigned int instrcount;
 
instrcount = 0;
fprintf(ofile,
@@ -455,7 +455,7 @@ output_code(void)
 "static const int num_critical_sections = sizeof(critical_sections)\n"
 " / sizeof(*critical_sections);\n");
 
-   fprintf(stderr, "%s: %d instructions used\n", appname, instrcount);
+   fprintf(stderr, "%s: %u instructions used\n", appname, instrcount);
 }
 
 static void
@@ -526,11 +526,11 @@ output_listing(char *ifilename)
patch_t *cur_patch;
symbol_node_t *cur_func;
int *func_values;
-   int instrcount;
+   unsigned int instrcount;
int instrptr;
unsigned int line;
int func_count;
-   int skip_addr;
+   unsigned int skip_addr;
 
instrcount = 0;
instrptr = 0;
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: problems with mmap() and disk caching

2012-04-06 Thread Alan Cox

On 04/06/2012 03:38, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 01:25:49PM -0500, Alan Cox wrote:

On 04/05/2012 12:31, Konstantin Belousov wrote:

On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer.  I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap:  1 pass took:   7.431046 (none: 262112; res: 32; super:
0; other:  0)
mmap:  2 pass took:   7.356670 (none: 261648; res:496; super:
0; other:  0)
mmap:  3 pass took:   7.307094 (none: 260521; res:   1623; super:
0; other:  0)
mmap:  4 pass took:   7.350239 (none: 258904; res:   3240; super:
0; other:  0)
mmap:  5 pass took:   7.392480 (none: 257286; res:   4858; super:
0; other:  0)
mmap:  6 pass took:   7.292069 (none: 255584; res:   6560; super:
0; other:  0)
mmap:  7 pass took:   7.048980 (none: 251142; res:  11002; super:
0; other:  0)
mmap:  8 pass took:   6.899387 (none: 247584; res:  14560; super:
0; other:  0)
mmap:  9 pass took:   7.190579 (none: 242992; res:  19152; super:
0; other:  0)
mmap: 10 pass took:   6.915482 (none: 239308; res:  22836; super:
0; other:  0)
mmap: 11 pass took:   6.565909 (none: 232835; res:  29309; super:
0; other:  0)
mmap: 12 pass took:   6.423945 (none: 226160; res:  35984; super:
0; other:  0)
mmap: 13 pass took:   6.315385 (none: 208555; res:  53589; super:
0; other:  0)
mmap: 14 pass took:   6.760780 (none: 192805; res:  69339; super:
0; other:  0)
mmap: 15 pass took:   5.721513 (none: 174497; res:  87647; super:
0; other:  0)
mmap: 16 pass took:   5.004424 (none: 155938; res: 106206; super:
0; other:  0)
mmap: 17 pass took:   4.224926 (none: 135639; res: 126505; super:
0; other:  0)
mmap: 18 pass took:   3.749608 (none: 117952; res: 144192; super:
0; other:  0)
mmap: 19 pass took:   3.398084 (none:  99066; res: 163078; super:
0; other:  0)
mmap: 20 pass took:   3.029557 (none:  74994; res: 187150; super:
0; other:  0)
mmap: 21 pass took:   2.379430 (none:  55231; res: 206913; super:
0; other:  0)
mmap: 22 pass took:   2.046521 (none:  40786; res: 221358; super:
0; other:  0)
mmap: 23 pass took:   1.152797 (none:  30311; res: 231833; super:
0; other:  0)
mmap: 24 pass took:   0.972617 (none:  16196; res: 245948; super:
0; other:  0)
mmap: 25 pass took:   0.577515 (none:   8286; res: 253858; super:
0; other:  0)
mmap: 26 pass took:   0.380738 (none:   3712; res: 258432; super:
0; other:  0)
mmap: 27 pass took:   0.253583 (none:   1193; res: 260951; super:
0; other:  0)
mmap: 28 pass took:   0.157508 (none:  0; res: 262144; super:
0; other:  0)
mmap: 29 pass took:   0.156169 (none:  0; res: 262144; super:
0; other:  0)
mmap: 30 pass took:   0.156550 (none:  0; res: 262144; super:
0; other:  0)

If I ran this:
$ cat /mnt/random-1024>/dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap:  1 pass took:   0.337657 (none:  0; res: 262144; super:
0; other:  0)
mmap:  2 pass took:   0.186137 (none:  0; res: 262144; super:
0; other:  0)
mmap:  3 pass took:   0.186132 (none:  0; res: 262144; super:
0; other:  0)
mmap:  4 pass took:   0.186535 (none:  0; res: 262144; super:
0; other:  0)
mmap:  5 pass took:   0.190353 (none:  0; res: 262144; super:
0; other:  0)

This is what I expect.  But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years.  Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

 pmap_remove_all(mt);
 if (mt->dirty != 0)
 vm_page_deactivate(mt);
 else
 vm_page_cache(mt);

to:

 vm_page_dontneed(mt);

because I suspect that the current code does more harm than good.  In
theory, it saves activations of the page daemon.  However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations.  The sequential access
detection heuristic is just

Re: [RFT][patch] Scheduling for HTT and not only

2012-04-06 Thread Attilio Rao
Il 05 aprile 2012 19:12, Arnaud Lacombe  ha scritto:
> Hi,
>
> [Sorry for the delay, I got a bit sidetrack'ed...]
>
> 2012/2/17 Alexander Motin :
>> On 17.02.2012 18:53, Arnaud Lacombe wrote:
>>>
>>> On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin  wrote:

 On 02/15/12 21:54, Jeff Roberson wrote:
>
> On Wed, 15 Feb 2012, Alexander Motin wrote:
>>
>> I've decided to stop those cache black magic practices and focus on
>> things that really exist in this world -- SMT and CPU load. I've
>> dropped most of cache related things from the patch and made the rest
>> of things more strict and predictable:
>> http://people.freebsd.org/~mav/sched.htt34.patch
>
>
> This looks great. I think there is value in considering the other
> approach further but I would like to do this part first. It would be
> nice to also add priority as a greater influence in the load balancing
> as well.


 I haven't got good idea yet about balancing priorities, but I've
 rewritten
 balancer itself. As soon as sched_lowest() / sched_highest() are more
 intelligent now, they allowed to remove topology traversing from the
 balancer itself. That should fix double-swapping problem, allow to keep
 some
 affinity while moving threads and make balancing more fair. I did number
 of
 tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and
 16
 threads everything is stationary as it should. With 9 threads I see
 regular
 and random load move between all 8 CPUs. Measurements on 5 minutes run
 show
 deviation of only about 5 seconds. It is the same deviation as I see
 caused
 by only scheduling of 16 threads on 8 cores without any balancing needed
 at
 all. So I believe this code works as it should.

 Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

 I plan this to be a final patch of this series (more to come :)) and if
 there will be no problems or objections, I am going to commit it (except
 some debugging KTRs) in about ten days. So now it's a good time for
 reviews
 and testing. :)

>>> is there a place where all the patches are available ?
>>
>>
>> All my scheduler patches are cumulative, so all you need is only the last
>> mentioned here sched.htt40.patch.
>>
> You may want to have a look to the result I collected in the
> `runs/freebsd-experiments' branch of:
>
> https://github.com/lacombar/hackbench/
>
> and compare them with vanilla FreeBSD 9.0 and -CURRENT results
> available in `runs/freebsd'. On the dual package platform, your patch
> is not a definite win.
>
>> But in some cases, especially for multi-socket systems, to let it show its
>> best, you may want to apply additional patch from avg@ to better detect CPU
>> topology:
>> https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd
>>
> test I conducted specifically for this patch did not showed much 
> improvement...

Can you please clarify on this point?
The test you did included cases where the topology was detected badly
against cases where the topology was detected correctly as a patched
kernel (and you still didn't see a performance improvement), in terms
of cache line sharing?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-06 Thread Alexander Motin

On 04/06/12 17:13, Attilio Rao wrote:

Il 05 aprile 2012 19:12, Arnaud Lacombe  ha scritto:

Hi,

[Sorry for the delay, I got a bit sidetrack'ed...]

2012/2/17 Alexander Motin:

On 17.02.2012 18:53, Arnaud Lacombe wrote:


On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motinwrote:


On 02/15/12 21:54, Jeff Roberson wrote:


On Wed, 15 Feb 2012, Alexander Motin wrote:


I've decided to stop those cache black magic practices and focus on
things that really exist in this world -- SMT and CPU load. I've
dropped most of cache related things from the patch and made the rest
of things more strict and predictable:
http://people.freebsd.org/~mav/sched.htt34.patch



This looks great. I think there is value in considering the other
approach further but I would like to do this part first. It would be
nice to also add priority as a greater influence in the load balancing
as well.



I haven't got good idea yet about balancing priorities, but I've
rewritten
balancer itself. As soon as sched_lowest() / sched_highest() are more
intelligent now, they allowed to remove topology traversing from the
balancer itself. That should fix double-swapping problem, allow to keep
some
affinity while moving threads and make balancing more fair. I did number
of
tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and
16
threads everything is stationary as it should. With 9 threads I see
regular
and random load move between all 8 CPUs. Measurements on 5 minutes run
show
deviation of only about 5 seconds. It is the same deviation as I see
caused
by only scheduling of 16 threads on 8 cores without any balancing needed
at
all. So I believe this code works as it should.

Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

I plan this to be a final patch of this series (more to come :)) and if
there will be no problems or objections, I am going to commit it (except
some debugging KTRs) in about ten days. So now it's a good time for
reviews
and testing. :)


is there a place where all the patches are available ?



All my scheduler patches are cumulative, so all you need is only the last
mentioned here sched.htt40.patch.


You may want to have a look to the result I collected in the
`runs/freebsd-experiments' branch of:

https://github.com/lacombar/hackbench/

and compare them with vanilla FreeBSD 9.0 and -CURRENT results
available in `runs/freebsd'. On the dual package platform, your patch
is not a definite win.


But in some cases, especially for multi-socket systems, to let it show its
best, you may want to apply additional patch from avg@ to better detect CPU
topology:
https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd


test I conducted specifically for this patch did not showed much improvement...


Can you please clarify on this point?
The test you did included cases where the topology was detected badly
against cases where the topology was detected correctly as a patched
kernel (and you still didn't see a performance improvement), in terms
of cache line sharing?


At this moment SCHED_ULE does almost nothing in terms of cache line 
sharing affinity (though it probably worth some further experiments). 
What this patch may improve is opposite case -- reduce cache sharing 
pressure for cache-hungry applications. For example, proper cache 
topology detection (such as lack of global L3 cache, but shared L2 per 
pairs of cores on Core2Quad class CPUs) increases pbzip2 performance 
when number of threads is less then number of CPUs (i.e. when there is 
place for optimization).


--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-06 Thread Attilio Rao
Il 06 aprile 2012 15:27, Alexander Motin  ha scritto:
> On 04/06/12 17:13, Attilio Rao wrote:
>>
>> Il 05 aprile 2012 19:12, Arnaud Lacombe  ha scritto:
>>>
>>> Hi,
>>>
>>> [Sorry for the delay, I got a bit sidetrack'ed...]
>>>
>>> 2012/2/17 Alexander Motin:

 On 17.02.2012 18:53, Arnaud Lacombe wrote:
>
>
> On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin
>  wrote:
>>
>>
>> On 02/15/12 21:54, Jeff Roberson wrote:
>>>
>>>
>>> On Wed, 15 Feb 2012, Alexander Motin wrote:


 I've decided to stop those cache black magic practices and focus on
 things that really exist in this world -- SMT and CPU load. I've
 dropped most of cache related things from the patch and made the
 rest
 of things more strict and predictable:
 http://people.freebsd.org/~mav/sched.htt34.patch
>>>
>>>
>>>
>>> This looks great. I think there is value in considering the other
>>> approach further but I would like to do this part first. It would be
>>> nice to also add priority as a greater influence in the load
>>> balancing
>>> as well.
>>
>>
>>
>> I haven't got good idea yet about balancing priorities, but I've
>> rewritten
>> balancer itself. As soon as sched_lowest() / sched_highest() are more
>> intelligent now, they allowed to remove topology traversing from the
>> balancer itself. That should fix double-swapping problem, allow to
>> keep
>> some
>> affinity while moving threads and make balancing more fair. I did
>> number
>> of
>> tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8
>> and
>> 16
>> threads everything is stationary as it should. With 9 threads I see
>> regular
>> and random load move between all 8 CPUs. Measurements on 5 minutes run
>> show
>> deviation of only about 5 seconds. It is the same deviation as I see
>> caused
>> by only scheduling of 16 threads on 8 cores without any balancing
>> needed
>> at
>> all. So I believe this code works as it should.
>>
>> Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch
>>
>> I plan this to be a final patch of this series (more to come :)) and
>> if
>> there will be no problems or objections, I am going to commit it
>> (except
>> some debugging KTRs) in about ten days. So now it's a good time for
>> reviews
>> and testing. :)
>>
> is there a place where all the patches are available ?



 All my scheduler patches are cumulative, so all you need is only the
 last
 mentioned here sched.htt40.patch.

>>> You may want to have a look to the result I collected in the
>>> `runs/freebsd-experiments' branch of:
>>>
>>> https://github.com/lacombar/hackbench/
>>>
>>> and compare them with vanilla FreeBSD 9.0 and -CURRENT results
>>> available in `runs/freebsd'. On the dual package platform, your patch
>>> is not a definite win.
>>>
 But in some cases, especially for multi-socket systems, to let it show
 its
 best, you may want to apply additional patch from avg@ to better detect
 CPU
 topology:

 https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd

>>> test I conducted specifically for this patch did not showed much
>>> improvement...
>>
>>
>> Can you please clarify on this point?
>> The test you did included cases where the topology was detected badly
>> against cases where the topology was detected correctly as a patched
>> kernel (and you still didn't see a performance improvement), in terms
>> of cache line sharing?
>
>
> At this moment SCHED_ULE does almost nothing in terms of cache line sharing
> affinity (though it probably worth some further experiments). What this
> patch may improve is opposite case -- reduce cache sharing pressure for
> cache-hungry applications. For example, proper cache topology detection
> (such as lack of global L3 cache, but shared L2 per pairs of cores on
> Core2Quad class CPUs) increases pbzip2 performance when number of threads is
> less then number of CPUs (i.e. when there is place for optimization).

My asking is not referred to your patch really.
I just wanted to know if he correctly benchmark a case where the
topology was screwed up and then correctly recognized by avg's patch
in terms of cache level aggregation (it wasn't referred to your patch
btw).

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-06 Thread Alexander Motin

On 04/06/12 17:30, Attilio Rao wrote:

Il 06 aprile 2012 15:27, Alexander Motin  ha scritto:

On 04/06/12 17:13, Attilio Rao wrote:


Il 05 aprile 2012 19:12, Arnaud Lacombeha scritto:


Hi,

[Sorry for the delay, I got a bit sidetrack'ed...]

2012/2/17 Alexander Motin:


On 17.02.2012 18:53, Arnaud Lacombe wrote:



On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin
  wrote:



On 02/15/12 21:54, Jeff Roberson wrote:



On Wed, 15 Feb 2012, Alexander Motin wrote:



I've decided to stop those cache black magic practices and focus on
things that really exist in this world -- SMT and CPU load. I've
dropped most of cache related things from the patch and made the
rest
of things more strict and predictable:
http://people.freebsd.org/~mav/sched.htt34.patch




This looks great. I think there is value in considering the other
approach further but I would like to do this part first. It would be
nice to also add priority as a greater influence in the load
balancing
as well.




I haven't got good idea yet about balancing priorities, but I've
rewritten
balancer itself. As soon as sched_lowest() / sched_highest() are more
intelligent now, they allowed to remove topology traversing from the
balancer itself. That should fix double-swapping problem, allow to
keep
some
affinity while moving threads and make balancing more fair. I did
number
of
tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8
and
16
threads everything is stationary as it should. With 9 threads I see
regular
and random load move between all 8 CPUs. Measurements on 5 minutes run
show
deviation of only about 5 seconds. It is the same deviation as I see
caused
by only scheduling of 16 threads on 8 cores without any balancing
needed
at
all. So I believe this code works as it should.

Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

I plan this to be a final patch of this series (more to come :)) and
if
there will be no problems or objections, I am going to commit it
(except
some debugging KTRs) in about ten days. So now it's a good time for
reviews
and testing. :)


is there a place where all the patches are available ?




All my scheduler patches are cumulative, so all you need is only the
last
mentioned here sched.htt40.patch.


You may want to have a look to the result I collected in the
`runs/freebsd-experiments' branch of:

https://github.com/lacombar/hackbench/

and compare them with vanilla FreeBSD 9.0 and -CURRENT results
available in `runs/freebsd'. On the dual package platform, your patch
is not a definite win.


But in some cases, especially for multi-socket systems, to let it show
its
best, you may want to apply additional patch from avg@ to better detect
CPU
topology:

https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd


test I conducted specifically for this patch did not showed much
improvement...



Can you please clarify on this point?
The test you did included cases where the topology was detected badly
against cases where the topology was detected correctly as a patched
kernel (and you still didn't see a performance improvement), in terms
of cache line sharing?



At this moment SCHED_ULE does almost nothing in terms of cache line sharing
affinity (though it probably worth some further experiments). What this
patch may improve is opposite case -- reduce cache sharing pressure for
cache-hungry applications. For example, proper cache topology detection
(such as lack of global L3 cache, but shared L2 per pairs of cores on
Core2Quad class CPUs) increases pbzip2 performance when number of threads is
less then number of CPUs (i.e. when there is place for optimization).


My asking is not referred to your patch really.
I just wanted to know if he correctly benchmark a case where the
topology was screwed up and then correctly recognized by avg's patch
in terms of cache level aggregation (it wasn't referred to your patch
btw).


I understand. I've just described test case when properly detected 
topology could give benefit. What the test really does is indeed a good 
question.


--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Did something change with ioctl CAMIOCOMMAND from 8.0 to 9.0 ?

2012-04-06 Thread Thomas Schmitt
Hi,

googling brought me to this forum post
  http://forums.freebsd.org/showthread.php?p=172885
which reports that xfburn fails to recognize optical drives on FreeBSD 9.0.

There are error messages about a ioctl which might be emitted by libburn
for getting a list of drives:

  xfburn: error sending CAMIOCOMMAND ioctl: Inappropriate ioctl for device

On my FreeBSD 8.0 test system, everything seems ok with libburn.
xorriso lists both drives and is willing to blank and burn a CD.

Could somebody with a 9.0 system and a CD/DVD/BD drive please get xorriso
(e.g. from ports) and try whether it shows all drives ?
This command:

  xorriso -devices

should report something like

  0  -dev '/dev/cd0' rwrwr- :  'TSSTcorp' 'CDDVDW SH-S223B' 
  1  -dev '/dev/cd1' rwrwr- :  'TSSTcorp' 'DVD-ROM SH-D162C' 

One needs rw-permissions for the involved devices in order to get them
listed. Up to now, this were: acd* cd* pass* xpt*


If the CAMIOCOMMAND of libburn/sg-freebsd.c is wrong for 9.0, then i could
need instructions how to perform drive listing and how to recognize 9.0
resp. the need for the new code at compile time.

The code can be inspected online at
  http://libburnia-project.org/browser/libburn/trunk/libburn/sg-freebsd.c

The (union ccb) idx->ccb for this ioctl at line 231

  if (ioctl(idx->fd, CAMIOCOMMAND, &(idx->ccb)) == -1) {

is set up in this function beginning at line 160

  static int sg_init_enumerator(burn_drive_enumerator_t *idx_)


Have a nice day :)

Thomas

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"