Re: problems with mmap() and disk caching
On Thu, Apr 05, 2012 at 11:54:53PM +0400, Andrey Zonov wrote: > On 05.04.2012 23:41, Konstantin Belousov wrote: > >On Thu, Apr 05, 2012 at 11:33:46PM +0400, Andrey Zonov wrote: > >>On 05.04.2012 19:54, Alan Cox wrote: > >>>On 04/04/2012 02:17, Konstantin Belousov wrote: > On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: > >>[snip] > >This is what I expect. But why this doesn't work without reading file > >manually? > Issue seems to be in some change of the behaviour of the reserv or > phys allocator. I Cc:ed Alan. > >>> > >>>I'm pretty sure that the behavior here hasn't significantly changed in > >>>about twelve years. Otherwise, I agree with your analysis. > >>> > >>>On more than one occasion, I've been tempted to change: > >>> > >>>pmap_remove_all(mt); > >>>if (mt->dirty != 0) > >>>vm_page_deactivate(mt); > >>>else > >>>vm_page_cache(mt); > >>> > >>>to: > >>> > >>>vm_page_dontneed(mt); > >>> > >> > >>Thanks Alan! Now it works as I expect! > >> > >>But I have more questions to you and kib@. They are in my test below. > >> > >>So, prepare file as earlier, and take information about memory usage > >>from top(1). After preparation, but before test: > >>Mem: 80M Active, 55M Inact, 721M Wired, 215M Buf, 46G Free > >> > >>First run: > >>$ ./mmap /mnt/random > >>mmap: 1 pass took: 7.462865 (none: 0; res: 262144; super: > >>0; other: 0) > >> > >>No super pages after first run, why?.. > >> > >>Mem: 79M Active, 1079M Inact, 722M Wired, 216M Buf, 45G Free > >> > >>Now the file is in inactive memory, that's good. > >> > >>Second run: > >>$ ./mmap /mnt/random > >>mmap: 1 pass took: 0.004191 (none: 0; res: 262144; super: > >>511; other: 0) > >> > >>All super pages are here, nice. > >> > >>Mem: 1103M Active, 55M Inact, 722M Wired, 216M Buf, 45G Free > >> > >>Wow, all inactive pages moved to active and sit there even after process > >>was terminated, that's not good, what do you think? > >Why do you think this is 'not good' ? You have plenty of free memory, > >there is no memory pressure, and all pages were referenced recently. > >THere is no reason for them to be deactivated. > > > > I always thought that active memory this is a sum of resident memory of > all processes, inactive shows disk cache and wired shows kernel itself. So you are wrong. Both active and inactive memory can be mapped and not mapped, both can belong to vnode or to anonymous objects etc. Active/inactive distinction is only the amount of references that was noted by pagedaemon, or some other page history like the way it was unwired. Wired is not neccessary means kernel-used pages, user processes can wire their pages as well. > > >> > >>Read the file: > >>$ cat /mnt/random> /dev/null > >> > >>Mem: 79M Active, 55M Inact, 1746M Wired, 1240M Buf, 45G Free > >> > >>Now the file is in wired memory. I do not understand why so. > >You do use UFS, right ? > > Yes. > > >There is enough buffer headers and buffer KVA > >to have buffers allocated for the whole file content. Since buffers wire > >corresponding pages, you get pages migrated to wired. > > > >When there appears a buffer pressure (i.e., any other i/o started), > >the buffers will be repurposed and pages moved to inactive. > > > > OK, how can I get amount of disk cache? You cannot. At least I am not aware of any counter that keeps track of the resident pages belonging to vnode pager. Buffers should not be thought as disk cache, pages cache disk content. Instead, VMIO buffers only provide bread()/bwrite() compatible interface to the page cache (*) for filesystems. (*) - The cache term is used in generic term, not to confuse with cached pages counter from top etc. > > >> > >>Could you please give me explanation about active/inactive/wired memory? > >> > >> > >>>because I suspect that the current code does more harm than good. In > >>>theory, it saves activations of the page daemon. However, more often > >>>than not, I suspect that we are spending more on page reactivations than > >>>we are saving on page daemon activations. The sequential access > >>>detection heuristic is just too easily triggered. For example, I've seen > >>>it triggered by demand paging of the gcc text segment. Also, I think > >>>that pmap_remove_all() and especially vm_page_cache() are too severe for > >>>a detection heuristic that is so easily triggered. > >>> > >>[snip] > >> > >>-- > >>Andrey Zonov > > -- > Andrey Zonov pgpb0p2pbb0W7.pgp Description: PGP signature
Re: problems with mmap() and disk caching
On Thu, Apr 05, 2012 at 01:25:49PM -0500, Alan Cox wrote: > On 04/05/2012 12:31, Konstantin Belousov wrote: > >On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote: > >>On 04/04/2012 02:17, Konstantin Belousov wrote: > >>>On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: > Hi, > > I open the file, then call mmap() on the whole file and get pointer, > then I work with this pointer. I expect that page should be only once > touched to get it into the memory (disk cache?), but this doesn't work! > > I wrote the test (attached) and ran it for the 1G file generated from > /dev/random, the result is the following: > > Prepare file: > # swapoff -a > # newfs /dev/ada0b > # mount /dev/ada0b /mnt > # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 > > Purge cache: > # umount /mnt > # mount /dev/ada0b /mnt > > Run test: > $ ./mmap /mnt/random-1024 30 > mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: > 0; other: 0) > mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: > 0; other: 0) > mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: > 0; other: 0) > mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: > 0; other: 0) > mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: > 0; other: 0) > mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: > 0; other: 0) > mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: > 0; other: 0) > mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: > 0; other: 0) > mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: > 0; other: 0) > mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: > 0; other: 0) > mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: > 0; other: 0) > mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: > 0; other: 0) > mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: > 0; other: 0) > mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: > 0; other: 0) > mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: > 0; other: 0) > mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: > 0; other: 0) > mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: > 0; other: 0) > mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: > 0; other: 0) > mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: > 0; other: 0) > mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: > 0; other: 0) > mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: > 0; other: 0) > mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: > 0; other: 0) > mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: > 0; other: 0) > mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: > 0; other: 0) > mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: > 0; other: 0) > mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: > 0; other: 0) > mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: > 0; other: 0) > mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: > 0; other: 0) > mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: > 0; other: 0) > mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: > 0; other: 0) > > If I ran this: > $ cat /mnt/random-1024> /dev/null > before test, when result is the following: > > $ ./mmap /mnt/random-1024 5 > mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: > 0; other: 0) > mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: > 0; other: 0) > mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: > 0; other: 0) > mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: > 0; other: 0) > mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: > 0; other: 0) > > This is what I expect. But why this doesn't work without reading file > manually? > >>>Issue seems to be in some change of the behaviour of the reserv or > >>>phys allocator. I Cc:ed Alan. > >>I'm pretty sure that the behavior here hasn't significantly changed in > >>about twelve years. Otherwise, I agree with your analysis. > >> > >>On more than one occasion, I've been tempted to change: > >> > >> pmap_remove_all(mt); > >>
Re: problems with mmap() and disk caching
On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024> /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. What happen is that fault handler deactivates or caches the pages previous to the one which would satisfy the fault. See the if() statement starting at line 463 of vm/vm_fault.c. Since all pages of the object in your test are clean, the pages are cached. Next fault would need to allocate some more pages for different index of the same object. What I see is that vm_reserv_alloc_page() returns a page that is from the cache for the same object, but different pindex. As an obvious result, the page is invalidated and repurposed. When next loop started, the page is not resident anymore, so it has to be re-read from disk. I pretty sure that the pages aren't being repurposed this quickly. Instead, I believe that the explanation is to be found in mincore(). mincore() is only reporting pages that are in the object's memq as resident. It is not reporting cache pages as resident. The behaviour of the allocator is not consistent, so some pages are not reused, allowing the test to converge and to collect all pages of the object eventually. Calling madvise(MADV_RANDOM) fixes th
possible signedness issue in aic7xxx
hi there, i noticed the following worning from clang when building HEAD: ===> sys/modules/aic7xxx/aicasm (obj,build-tools) /usr/github-freebsd-head/sys/modules/aic7xxx/aicasm/../../../dev/aic7xxx/aicasm/aicasm.c:604:5: warning: passing 'int *' to parameter of type 'unsigned int *' converts between pointers to integer types with different sign [-Wpointer-sign] &skip_addr, func_values) == 0) { ^~ /usr/github-freebsd-head/sys/modules/aic7xxx/aicasm/../../../dev/aic7xxx/aicasm/aicasm.c:83:24: note: passing argument to parameter 'skip_addr' here unsigned int *skip_addr, int *func_vals); ^ 1 warning generated. will the attached patch take care of the problem? cheers. alex diff --git a/sys/dev/aic7xxx/aicasm/aicasm.c b/sys/dev/aic7xxx/aicasm/aicasm.c index 1b88ba0..08a540f 100644 --- a/sys/dev/aic7xxx/aicasm/aicasm.c +++ b/sys/dev/aic7xxx/aicasm/aicasm.c @@ -353,7 +353,7 @@ output_code(void) patch_t *cur_patch; critical_section_t *cs; symbol_node_t *cur_node; - int instrcount; + unsigned int instrcount; instrcount = 0; fprintf(ofile, @@ -455,7 +455,7 @@ output_code(void) "static const int num_critical_sections = sizeof(critical_sections)\n" " / sizeof(*critical_sections);\n"); - fprintf(stderr, "%s: %d instructions used\n", appname, instrcount); + fprintf(stderr, "%s: %u instructions used\n", appname, instrcount); } static void @@ -526,11 +526,11 @@ output_listing(char *ifilename) patch_t *cur_patch; symbol_node_t *cur_func; int *func_values; - int instrcount; + unsigned int instrcount; int instrptr; unsigned int line; int func_count; - int skip_addr; + unsigned int skip_addr; instrcount = 0; instrptr = 0; ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: problems with mmap() and disk caching
On 04/06/2012 03:38, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 01:25:49PM -0500, Alan Cox wrote: On 04/05/2012 12:31, Konstantin Belousov wrote: On Thu, Apr 05, 2012 at 10:54:31AM -0500, Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res:496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024>/dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt->dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just
Re: [RFT][patch] Scheduling for HTT and not only
Il 05 aprile 2012 19:12, Arnaud Lacombe ha scritto: > Hi, > > [Sorry for the delay, I got a bit sidetrack'ed...] > > 2012/2/17 Alexander Motin : >> On 17.02.2012 18:53, Arnaud Lacombe wrote: >>> >>> On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin wrote: On 02/15/12 21:54, Jeff Roberson wrote: > > On Wed, 15 Feb 2012, Alexander Motin wrote: >> >> I've decided to stop those cache black magic practices and focus on >> things that really exist in this world -- SMT and CPU load. I've >> dropped most of cache related things from the patch and made the rest >> of things more strict and predictable: >> http://people.freebsd.org/~mav/sched.htt34.patch > > > This looks great. I think there is value in considering the other > approach further but I would like to do this part first. It would be > nice to also add priority as a greater influence in the load balancing > as well. I haven't got good idea yet about balancing priorities, but I've rewritten balancer itself. As soon as sched_lowest() / sched_highest() are more intelligent now, they allowed to remove topology traversing from the balancer itself. That should fix double-swapping problem, allow to keep some affinity while moving threads and make balancing more fair. I did number of tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and 16 threads everything is stationary as it should. With 9 threads I see regular and random load move between all 8 CPUs. Measurements on 5 minutes run show deviation of only about 5 seconds. It is the same deviation as I see caused by only scheduling of 16 threads on 8 cores without any balancing needed at all. So I believe this code works as it should. Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch I plan this to be a final patch of this series (more to come :)) and if there will be no problems or objections, I am going to commit it (except some debugging KTRs) in about ten days. So now it's a good time for reviews and testing. :) >>> is there a place where all the patches are available ? >> >> >> All my scheduler patches are cumulative, so all you need is only the last >> mentioned here sched.htt40.patch. >> > You may want to have a look to the result I collected in the > `runs/freebsd-experiments' branch of: > > https://github.com/lacombar/hackbench/ > > and compare them with vanilla FreeBSD 9.0 and -CURRENT results > available in `runs/freebsd'. On the dual package platform, your patch > is not a definite win. > >> But in some cases, especially for multi-socket systems, to let it show its >> best, you may want to apply additional patch from avg@ to better detect CPU >> topology: >> https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd >> > test I conducted specifically for this patch did not showed much > improvement... Can you please clarify on this point? The test you did included cases where the topology was detected badly against cases where the topology was detected correctly as a patched kernel (and you still didn't see a performance improvement), in terms of cache line sharing? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFT][patch] Scheduling for HTT and not only
On 04/06/12 17:13, Attilio Rao wrote: Il 05 aprile 2012 19:12, Arnaud Lacombe ha scritto: Hi, [Sorry for the delay, I got a bit sidetrack'ed...] 2012/2/17 Alexander Motin: On 17.02.2012 18:53, Arnaud Lacombe wrote: On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motinwrote: On 02/15/12 21:54, Jeff Roberson wrote: On Wed, 15 Feb 2012, Alexander Motin wrote: I've decided to stop those cache black magic practices and focus on things that really exist in this world -- SMT and CPU load. I've dropped most of cache related things from the patch and made the rest of things more strict and predictable: http://people.freebsd.org/~mav/sched.htt34.patch This looks great. I think there is value in considering the other approach further but I would like to do this part first. It would be nice to also add priority as a greater influence in the load balancing as well. I haven't got good idea yet about balancing priorities, but I've rewritten balancer itself. As soon as sched_lowest() / sched_highest() are more intelligent now, they allowed to remove topology traversing from the balancer itself. That should fix double-swapping problem, allow to keep some affinity while moving threads and make balancing more fair. I did number of tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and 16 threads everything is stationary as it should. With 9 threads I see regular and random load move between all 8 CPUs. Measurements on 5 minutes run show deviation of only about 5 seconds. It is the same deviation as I see caused by only scheduling of 16 threads on 8 cores without any balancing needed at all. So I believe this code works as it should. Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch I plan this to be a final patch of this series (more to come :)) and if there will be no problems or objections, I am going to commit it (except some debugging KTRs) in about ten days. So now it's a good time for reviews and testing. :) is there a place where all the patches are available ? All my scheduler patches are cumulative, so all you need is only the last mentioned here sched.htt40.patch. You may want to have a look to the result I collected in the `runs/freebsd-experiments' branch of: https://github.com/lacombar/hackbench/ and compare them with vanilla FreeBSD 9.0 and -CURRENT results available in `runs/freebsd'. On the dual package platform, your patch is not a definite win. But in some cases, especially for multi-socket systems, to let it show its best, you may want to apply additional patch from avg@ to better detect CPU topology: https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd test I conducted specifically for this patch did not showed much improvement... Can you please clarify on this point? The test you did included cases where the topology was detected badly against cases where the topology was detected correctly as a patched kernel (and you still didn't see a performance improvement), in terms of cache line sharing? At this moment SCHED_ULE does almost nothing in terms of cache line sharing affinity (though it probably worth some further experiments). What this patch may improve is opposite case -- reduce cache sharing pressure for cache-hungry applications. For example, proper cache topology detection (such as lack of global L3 cache, but shared L2 per pairs of cores on Core2Quad class CPUs) increases pbzip2 performance when number of threads is less then number of CPUs (i.e. when there is place for optimization). -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFT][patch] Scheduling for HTT and not only
Il 06 aprile 2012 15:27, Alexander Motin ha scritto: > On 04/06/12 17:13, Attilio Rao wrote: >> >> Il 05 aprile 2012 19:12, Arnaud Lacombe ha scritto: >>> >>> Hi, >>> >>> [Sorry for the delay, I got a bit sidetrack'ed...] >>> >>> 2012/2/17 Alexander Motin: On 17.02.2012 18:53, Arnaud Lacombe wrote: > > > On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin > wrote: >> >> >> On 02/15/12 21:54, Jeff Roberson wrote: >>> >>> >>> On Wed, 15 Feb 2012, Alexander Motin wrote: I've decided to stop those cache black magic practices and focus on things that really exist in this world -- SMT and CPU load. I've dropped most of cache related things from the patch and made the rest of things more strict and predictable: http://people.freebsd.org/~mav/sched.htt34.patch >>> >>> >>> >>> This looks great. I think there is value in considering the other >>> approach further but I would like to do this part first. It would be >>> nice to also add priority as a greater influence in the load >>> balancing >>> as well. >> >> >> >> I haven't got good idea yet about balancing priorities, but I've >> rewritten >> balancer itself. As soon as sched_lowest() / sched_highest() are more >> intelligent now, they allowed to remove topology traversing from the >> balancer itself. That should fix double-swapping problem, allow to >> keep >> some >> affinity while moving threads and make balancing more fair. I did >> number >> of >> tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 >> and >> 16 >> threads everything is stationary as it should. With 9 threads I see >> regular >> and random load move between all 8 CPUs. Measurements on 5 minutes run >> show >> deviation of only about 5 seconds. It is the same deviation as I see >> caused >> by only scheduling of 16 threads on 8 cores without any balancing >> needed >> at >> all. So I believe this code works as it should. >> >> Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch >> >> I plan this to be a final patch of this series (more to come :)) and >> if >> there will be no problems or objections, I am going to commit it >> (except >> some debugging KTRs) in about ten days. So now it's a good time for >> reviews >> and testing. :) >> > is there a place where all the patches are available ? All my scheduler patches are cumulative, so all you need is only the last mentioned here sched.htt40.patch. >>> You may want to have a look to the result I collected in the >>> `runs/freebsd-experiments' branch of: >>> >>> https://github.com/lacombar/hackbench/ >>> >>> and compare them with vanilla FreeBSD 9.0 and -CURRENT results >>> available in `runs/freebsd'. On the dual package platform, your patch >>> is not a definite win. >>> But in some cases, especially for multi-socket systems, to let it show its best, you may want to apply additional patch from avg@ to better detect CPU topology: https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd >>> test I conducted specifically for this patch did not showed much >>> improvement... >> >> >> Can you please clarify on this point? >> The test you did included cases where the topology was detected badly >> against cases where the topology was detected correctly as a patched >> kernel (and you still didn't see a performance improvement), in terms >> of cache line sharing? > > > At this moment SCHED_ULE does almost nothing in terms of cache line sharing > affinity (though it probably worth some further experiments). What this > patch may improve is opposite case -- reduce cache sharing pressure for > cache-hungry applications. For example, proper cache topology detection > (such as lack of global L3 cache, but shared L2 per pairs of cores on > Core2Quad class CPUs) increases pbzip2 performance when number of threads is > less then number of CPUs (i.e. when there is place for optimization). My asking is not referred to your patch really. I just wanted to know if he correctly benchmark a case where the topology was screwed up and then correctly recognized by avg's patch in terms of cache level aggregation (it wasn't referred to your patch btw). Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFT][patch] Scheduling for HTT and not only
On 04/06/12 17:30, Attilio Rao wrote: Il 06 aprile 2012 15:27, Alexander Motin ha scritto: On 04/06/12 17:13, Attilio Rao wrote: Il 05 aprile 2012 19:12, Arnaud Lacombeha scritto: Hi, [Sorry for the delay, I got a bit sidetrack'ed...] 2012/2/17 Alexander Motin: On 17.02.2012 18:53, Arnaud Lacombe wrote: On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin wrote: On 02/15/12 21:54, Jeff Roberson wrote: On Wed, 15 Feb 2012, Alexander Motin wrote: I've decided to stop those cache black magic practices and focus on things that really exist in this world -- SMT and CPU load. I've dropped most of cache related things from the patch and made the rest of things more strict and predictable: http://people.freebsd.org/~mav/sched.htt34.patch This looks great. I think there is value in considering the other approach further but I would like to do this part first. It would be nice to also add priority as a greater influence in the load balancing as well. I haven't got good idea yet about balancing priorities, but I've rewritten balancer itself. As soon as sched_lowest() / sched_highest() are more intelligent now, they allowed to remove topology traversing from the balancer itself. That should fix double-swapping problem, allow to keep some affinity while moving threads and make balancing more fair. I did number of tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and 16 threads everything is stationary as it should. With 9 threads I see regular and random load move between all 8 CPUs. Measurements on 5 minutes run show deviation of only about 5 seconds. It is the same deviation as I see caused by only scheduling of 16 threads on 8 cores without any balancing needed at all. So I believe this code works as it should. Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch I plan this to be a final patch of this series (more to come :)) and if there will be no problems or objections, I am going to commit it (except some debugging KTRs) in about ten days. So now it's a good time for reviews and testing. :) is there a place where all the patches are available ? All my scheduler patches are cumulative, so all you need is only the last mentioned here sched.htt40.patch. You may want to have a look to the result I collected in the `runs/freebsd-experiments' branch of: https://github.com/lacombar/hackbench/ and compare them with vanilla FreeBSD 9.0 and -CURRENT results available in `runs/freebsd'. On the dual package platform, your patch is not a definite win. But in some cases, especially for multi-socket systems, to let it show its best, you may want to apply additional patch from avg@ to better detect CPU topology: https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd test I conducted specifically for this patch did not showed much improvement... Can you please clarify on this point? The test you did included cases where the topology was detected badly against cases where the topology was detected correctly as a patched kernel (and you still didn't see a performance improvement), in terms of cache line sharing? At this moment SCHED_ULE does almost nothing in terms of cache line sharing affinity (though it probably worth some further experiments). What this patch may improve is opposite case -- reduce cache sharing pressure for cache-hungry applications. For example, proper cache topology detection (such as lack of global L3 cache, but shared L2 per pairs of cores on Core2Quad class CPUs) increases pbzip2 performance when number of threads is less then number of CPUs (i.e. when there is place for optimization). My asking is not referred to your patch really. I just wanted to know if he correctly benchmark a case where the topology was screwed up and then correctly recognized by avg's patch in terms of cache level aggregation (it wasn't referred to your patch btw). I understand. I've just described test case when properly detected topology could give benefit. What the test really does is indeed a good question. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Did something change with ioctl CAMIOCOMMAND from 8.0 to 9.0 ?
Hi, googling brought me to this forum post http://forums.freebsd.org/showthread.php?p=172885 which reports that xfburn fails to recognize optical drives on FreeBSD 9.0. There are error messages about a ioctl which might be emitted by libburn for getting a list of drives: xfburn: error sending CAMIOCOMMAND ioctl: Inappropriate ioctl for device On my FreeBSD 8.0 test system, everything seems ok with libburn. xorriso lists both drives and is willing to blank and burn a CD. Could somebody with a 9.0 system and a CD/DVD/BD drive please get xorriso (e.g. from ports) and try whether it shows all drives ? This command: xorriso -devices should report something like 0 -dev '/dev/cd0' rwrwr- : 'TSSTcorp' 'CDDVDW SH-S223B' 1 -dev '/dev/cd1' rwrwr- : 'TSSTcorp' 'DVD-ROM SH-D162C' One needs rw-permissions for the involved devices in order to get them listed. Up to now, this were: acd* cd* pass* xpt* If the CAMIOCOMMAND of libburn/sg-freebsd.c is wrong for 9.0, then i could need instructions how to perform drive listing and how to recognize 9.0 resp. the need for the new code at compile time. The code can be inspected online at http://libburnia-project.org/browser/libburn/trunk/libburn/sg-freebsd.c The (union ccb) idx->ccb for this ioctl at line 231 if (ioctl(idx->fd, CAMIOCOMMAND, &(idx->ccb)) == -1) { is set up in this function beginning at line 160 static int sg_init_enumerator(burn_drive_enumerator_t *idx_) Have a nice day :) Thomas ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"