Re: No bus_space_read_8 on x86 ?
On Oct 5, 2012, at 10:08 AM, John Baldwin wrote: > On Thursday, October 04, 2012 1:20:52 pm Carl Delsey wrote: >> I noticed that the bus_space_*_8 functions are unimplemented for x86. >> Looking at the code, it seems this is intentional. >> >> Is this done because on 32-bit systems we don't know, in the general >> case, whether to read the upper or lower 32-bits first? >> >> If that's the reason, I was thinking we could provide two >> implementations for i386: bus_space_read_8_upper_first and >> bus_space_read_8_lower_first. For amd64 we would just have bus_space_read_8 >> >> Anybody who wants to use bus_space_read_8 in their file would do >> something like: >> #define BUS_SPACE_8_BYTES LOWER_FIRST >> or >> #define BUS_SPACE_8_BYTES UPPER_FIRST >> whichever is appropriate for their hardware. >> >> This would go in their source file before including bus.h and we would >> take care of mapping to the correct implementation. >> >> With the prevalence of 64-bit registers these days, if we don't provide >> an implementation, I expect many drivers will end up rolling their own. >> >> If this seems like a good idea, I'll happily whip up a patch and submit it. > > I think cxgb* already have an implementation. For amd64 we should certainly > have bus_space_*_8(), at least for SYS_RES_MEMORY. I think they should fail > for SYS_RES_IOPORT. I don't think we can force a compile-time error though, > would just have to return -1 on reads or some such? I believe it was because bus reads weren't guaranteed to be atomic on i386. don't know if that's still the case or a concern, but it was an intentional omission. Warner ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Kernel memory usage
I was trying to co-relate the o/p from "top" to that I get from vmstat -z. I don't have any user programs that wires memory. Given that, I'm assuming the wired memory count shown by "top" is memory used by kernel. Now I would like find out how the kernel is using this "wired" memory. So, I look at dynamic memory allocated by kernel using "vmstat -z". I think memory allocated via malloc() is serviced by zones if the allocation size is <4k. So, I'm not sure how useful "vmstat -m" is. I also add up memory used by buffer cache. Is there any other significant chunk I'm missing ? Does vmstat -m show memory that is not accounted for in vmstat -z. Thanks, Sushanth ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: SMP Version of tar
Not necessarily. If I understand correctly what Tim means, he's talking about an in-memory compression of several blocks by several separate threads, and then - after all the threads have compressed their respective blocks - writing out the result to the output file in order. Of course, this would incur a small penalty in that the dictionary would not be reused between blocks, but it might still be worth it. all fine. i just wanted to point out that ungzipping normal standard gzip file cannot be multithreaded, and multithreaded-compressed gzip output would be different. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: problem cross-compiling 9.1
[snip] > any fix? > > You have found the fix. Remove the WITHOUT_ options from the build > > that keep it from completing. You'll be able to add them at installworld > > time w/o a hassle. nanobsd uses this to keep things down, while still > > being able to build the system. > > Warner > where can I find the with/without list? btw, I did look at nanobsd in the past and have borrowed some ideas :-) thanks, danny ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: problem cross-compiling 9.1
On Oct 9, 2012, at 3:46 AM, Daniel Braniss wrote: > [snip] >> any fix? >>> You have found the fix. Remove the WITHOUT_ options from the build >>> that keep it from completing. You'll be able to add them at installworld >>> time w/o a hassle. nanobsd uses this to keep things down, while still >>> being able to build the system. >>> Warner > where can I find the with/without list? > btw, I did look at nanobsd in the past and have borrowed some ideas :-) man make.conf and man src.conf, then read through bsd.own.mk if interested in knowing what exactly can be used. HTH! -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
On Oct 4, 2012, at 12:36 AM, Rick Macklem wrote: > Garrett Wollman wrote: >> <> said: >> Simple: just use a sepatate mutex for each list that a cache entry is on, rather than a global lock for everything. This would reduce the mutex contention, but I'm not sure how significantly since I don't have the means to measure it yet. >>> Well, since the cache trimming is removing entries from the lists, I >>> don't >>> see how that can be done with a global lock for list updates? >> >> Well, the global lock is what we have now, but the cache trimming >> process only looks at one list at a time, so not locking the list that >> isn't being iterated over probably wouldn't hurt, unless there's some >> mechanism (that I didn't see) for entries to move from one list to >> another. Note that I'm considering each hash bucket a separate >> "list". (One issue to worry about in that case would be cache-line >> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE >> ought to be increased to reduce that.) >> > Yea, a separate mutex for each hash list might help. There is also the > LRU list that all entries end up on, that gets used by the trimming code. > (I think? I wrote this stuff about 8 years ago, so I haven't looked at > it in a while.) > > Also, increasing the hash table size is probably a good idea, especially > if you reduce how aggressively the cache is trimmed. > >>> Only doing it once/sec would result in a very large cache when >>> bursts of >>> traffic arrives. >> >> My servers have 96 GB of memory so that's not a big deal for me. >> > This code was originally "production tested" on a server with 1Gbyte, > so times have changed a bit;-) > >>> I'm not sure I see why doing it as a separate thread will improve >>> things. >>> There are N nfsd threads already (N can be bumped up to 256 if you >>> wish) >>> and having a bunch more "cache trimming threads" would just increase >>> contention, wouldn't it? >> >> Only one cache-trimming thread. The cache trim holds the (global) >> mutex for much longer than any individual nfsd service thread has any >> need to, and having N threads doing that in parallel is why it's so >> heavily contended. If there's only one thread doing the trim, then >> the nfsd service threads aren't spending time either contending on the >> mutex (it will be held less frequently and for shorter periods). >> > I think the little drc2.patch which will keep the nfsd threads from > acquiring the mutex and doing the trimming most of the time, might be > sufficient. I still don't see why a separate trimming thread will be > an advantage. I'd also be worried that the one cache trimming thread > won't get the job done soon enough. > > When I did production testing on a 1Gbyte server that saw a peak > load of about 100RPCs/sec, it was necessary to trim aggressively. > (Although I'd be tempted to say that a server with 1Gbyte is no > longer relevant, I recently recall someone trying to run FreeBSD > on a i486, although I doubt they wanted to run the nfsd on it.) > >>> The only negative effect I can think of w.r.t. having the nfsd >>> threads doing it would be a (I believe negligible) increase in RPC >>> response times (the time the nfsd thread spends trimming the cache). >>> As noted, I think this time would be negligible compared to disk I/O >>> and network transit times in the total RPC response time? >> >> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G >> network connectivity, spinning on a contended mutex takes a >> significant amount of CPU time. (For the current design of the NFS >> server, it may actually be a win to turn off adaptive mutexes -- I >> should give that a try once I'm able to do more testing.) >> > Have fun with it. Let me know when you have what you think is a good patch. > > rick > >> -GAWollman >> ___ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to >> "freebsd-hackers-unsubscr...@freebsd.org" > ___ > freebsd...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscr...@freebsd.org" My quest for IOPS over NFS continues :) So far I'm not able to achieve more than about 3000 8K read requests over NFS, while the server locally gives much more. And this is all from a file that is completely in ARC cache, no disk IO involved. I've snatched some sample DTrace script from the net : [ http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes ] And modified it for our new NFS server : #!/usr/sbin/dtrace -qs fbt:kernel:nfsrvd_*:entry { self->ts = timestamp; @counts[probefunc] = count(); } fbt:kernel:nfsrvd_*:return / self->ts > 0 / { this->delta = (timestamp-self->ts)/100; } fbt:kernel:nfsrvd_*:return / self
Re: NFS server bottlenecks
On Oct 9, 2012, at 5:12 PM, Nikolay Denev wrote: > > On Oct 4, 2012, at 12:36 AM, Rick Macklem wrote: > >> Garrett Wollman wrote: >>> <>> said: >>> > Simple: just use a sepatate mutex for each list that a cache entry > is on, rather than a global lock for everything. This would reduce > the mutex contention, but I'm not sure how significantly since I > don't have the means to measure it yet. > Well, since the cache trimming is removing entries from the lists, I don't see how that can be done with a global lock for list updates? >>> >>> Well, the global lock is what we have now, but the cache trimming >>> process only looks at one list at a time, so not locking the list that >>> isn't being iterated over probably wouldn't hurt, unless there's some >>> mechanism (that I didn't see) for entries to move from one list to >>> another. Note that I'm considering each hash bucket a separate >>> "list". (One issue to worry about in that case would be cache-line >>> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE >>> ought to be increased to reduce that.) >>> >> Yea, a separate mutex for each hash list might help. There is also the >> LRU list that all entries end up on, that gets used by the trimming code. >> (I think? I wrote this stuff about 8 years ago, so I haven't looked at >> it in a while.) >> >> Also, increasing the hash table size is probably a good idea, especially >> if you reduce how aggressively the cache is trimmed. >> Only doing it once/sec would result in a very large cache when bursts of traffic arrives. >>> >>> My servers have 96 GB of memory so that's not a big deal for me. >>> >> This code was originally "production tested" on a server with 1Gbyte, >> so times have changed a bit;-) >> I'm not sure I see why doing it as a separate thread will improve things. There are N nfsd threads already (N can be bumped up to 256 if you wish) and having a bunch more "cache trimming threads" would just increase contention, wouldn't it? >>> >>> Only one cache-trimming thread. The cache trim holds the (global) >>> mutex for much longer than any individual nfsd service thread has any >>> need to, and having N threads doing that in parallel is why it's so >>> heavily contended. If there's only one thread doing the trim, then >>> the nfsd service threads aren't spending time either contending on the >>> mutex (it will be held less frequently and for shorter periods). >>> >> I think the little drc2.patch which will keep the nfsd threads from >> acquiring the mutex and doing the trimming most of the time, might be >> sufficient. I still don't see why a separate trimming thread will be >> an advantage. I'd also be worried that the one cache trimming thread >> won't get the job done soon enough. >> >> When I did production testing on a 1Gbyte server that saw a peak >> load of about 100RPCs/sec, it was necessary to trim aggressively. >> (Although I'd be tempted to say that a server with 1Gbyte is no >> longer relevant, I recently recall someone trying to run FreeBSD >> on a i486, although I doubt they wanted to run the nfsd on it.) >> The only negative effect I can think of w.r.t. having the nfsd threads doing it would be a (I believe negligible) increase in RPC response times (the time the nfsd thread spends trimming the cache). As noted, I think this time would be negligible compared to disk I/O and network transit times in the total RPC response time? >>> >>> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G >>> network connectivity, spinning on a contended mutex takes a >>> significant amount of CPU time. (For the current design of the NFS >>> server, it may actually be a win to turn off adaptive mutexes -- I >>> should give that a try once I'm able to do more testing.) >>> >> Have fun with it. Let me know when you have what you think is a good patch. >> >> rick >> >>> -GAWollman >>> ___ >>> freebsd-hackers@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>> To unsubscribe, send any mail to >>> "freebsd-hackers-unsubscr...@freebsd.org" >> ___ >> freebsd...@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscr...@freebsd.org" > > My quest for IOPS over NFS continues :) > So far I'm not able to achieve more than about 3000 8K read requests over NFS, > while the server locally gives much more. > And this is all from a file that is completely in ARC cache, no disk IO > involved. > > I've snatched some sample DTrace script from the net : [ > http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes ] > > And modified it for our new NFS server : > > #!/usr/sbin/dtrace -qs > > fbt:kernel:nfsrvd_*:entry > { > self->ts = timesta
time_t when used as timedelta
Hi list, I'm looking at this possible divide-by zero in dhclient: http://scan.freebsd.your.org/freebsd-head/WORLD/2012-10-07-amd64/report-nBhqE2.html.gz#EndPath In this specific case, it's obvious from the intention of the code that ip->client->interval is always >0, but it's not obvious to me in the code. I could add an assert before the possible divide-by-zero: assert(ip->client->interval > 0); But looking at the code, I'm not sure it's very elegant. ip->client->interval is defined as time_t (see src/sbin/dhclient/dhcpd.h), which is a signed integer type, if I'm correct. However, some time_t members of struct client_state and struct client_config (see said header file) are assumed in the code to be positive and possibly non-null. Instead of plastering the code with asserts, is there something like an utime_t type? Or are there better ways to enforce the invariant? Thanks, Erik ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: time_t when used as timedelta
On Tue, 2012-10-09 at 17:35 +0200, Erik Cederstrand wrote: > Hi list, > > I'm looking at this possible divide-by zero in dhclient: > http://scan.freebsd.your.org/freebsd-head/WORLD/2012-10-07-amd64/report-nBhqE2.html.gz#EndPath > > In this specific case, it's obvious from the intention of the code that > ip->client->interval is always >0, but it's not obvious to me in the code. I > could add an assert before the possible divide-by-zero: > > assert(ip->client->interval > 0); > > But looking at the code, I'm not sure it's very elegant. ip->client->interval > is defined as time_t (see src/sbin/dhclient/dhcpd.h), which is a signed > integer type, if I'm correct. However, some time_t members of struct > client_state and struct client_config (see said header file) are assumed in > the code to be positive and possibly non-null. Instead of plastering the code > with asserts, is there something like an utime_t type? Or are there better > ways to enforce the invariant? > It looks to me like the place where enforcement is really needed is in parse_lease_time() which should ensure at the very least that negative values never get through, and in some cases that zeroes don't sneak in from config files. If it were ensured that ip->client->config->backoff_cutoff could never be less than 1 (and it appears any value less than 1 would be insane), then the division by zero case could never happen. However, at least one of the config statements handled by parse_lease_time() allows a value of zero. Since nothing seems to ensure that backoff_cutoff is non-zero, it seems like a potential source of div-by-zero errors too, in that same function. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: problem cross-compiling 9.1
On Oct 9, 2012, at 4:46 AM, Daniel Braniss wrote: > [snip] >> any fix? >>> You have found the fix. Remove the WITHOUT_ options from the build >>> that keep it from completing. You'll be able to add them at installworld >>> time w/o a hassle. nanobsd uses this to keep things down, while still >>> being able to build the system. >>> Warner >> > where can I find the with/without list? > btw, I did look at nanobsd in the past and have borrowed some ideas :-) bsd.own.mk Warner ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: No bus_space_read_8 on x86 ?
On Monday, October 08, 2012 4:59:24 pm Warner Losh wrote: > > On Oct 5, 2012, at 10:08 AM, John Baldwin wrote: > > > On Thursday, October 04, 2012 1:20:52 pm Carl Delsey wrote: > >> I noticed that the bus_space_*_8 functions are unimplemented for x86. > >> Looking at the code, it seems this is intentional. > >> > >> Is this done because on 32-bit systems we don't know, in the general > >> case, whether to read the upper or lower 32-bits first? > >> > >> If that's the reason, I was thinking we could provide two > >> implementations for i386: bus_space_read_8_upper_first and > >> bus_space_read_8_lower_first. For amd64 we would just have bus_space_read_8 > >> > >> Anybody who wants to use bus_space_read_8 in their file would do > >> something like: > >> #define BUS_SPACE_8_BYTES LOWER_FIRST > >> or > >> #define BUS_SPACE_8_BYTES UPPER_FIRST > >> whichever is appropriate for their hardware. > >> > >> This would go in their source file before including bus.h and we would > >> take care of mapping to the correct implementation. > >> > >> With the prevalence of 64-bit registers these days, if we don't provide > >> an implementation, I expect many drivers will end up rolling their own. > >> > >> If this seems like a good idea, I'll happily whip up a patch and submit it. > > > > I think cxgb* already have an implementation. For amd64 we should > > certainly > > have bus_space_*_8(), at least for SYS_RES_MEMORY. I think they should > > fail > > for SYS_RES_IOPORT. I don't think we can force a compile-time error > > though, > > would just have to return -1 on reads or some such? > > I believe it was because bus reads weren't guaranteed to be atomic on i386. > don't know if that's still the case or a concern, but it was an intentional > omission. True. If you are on a 32-bit system you can read the two 4 byte values and then build a 64-bit value. For 64-bit platforms we should offer bus_read_8() however. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS server bottlenecks
Nikolay Denev wrote: > On Oct 4, 2012, at 12:36 AM, Rick Macklem > wrote: > > > Garrett Wollman wrote: > >> < >> said: > >> > Simple: just use a sepatate mutex for each list that a cache > entry > is on, rather than a global lock for everything. This would > reduce > the mutex contention, but I'm not sure how significantly since I > don't have the means to measure it yet. > > >>> Well, since the cache trimming is removing entries from the lists, > >>> I > >>> don't > >>> see how that can be done with a global lock for list updates? > >> > >> Well, the global lock is what we have now, but the cache trimming > >> process only looks at one list at a time, so not locking the list > >> that > >> isn't being iterated over probably wouldn't hurt, unless there's > >> some > >> mechanism (that I didn't see) for entries to move from one list to > >> another. Note that I'm considering each hash bucket a separate > >> "list". (One issue to worry about in that case would be cache-line > >> contention in the array of hash buckets; perhaps > >> NFSRVCACHE_HASHSIZE > >> ought to be increased to reduce that.) > >> > > Yea, a separate mutex for each hash list might help. There is also > > the > > LRU list that all entries end up on, that gets used by the trimming > > code. > > (I think? I wrote this stuff about 8 years ago, so I haven't looked > > at > > it in a while.) > > > > Also, increasing the hash table size is probably a good idea, > > especially > > if you reduce how aggressively the cache is trimmed. > > > >>> Only doing it once/sec would result in a very large cache when > >>> bursts of > >>> traffic arrives. > >> > >> My servers have 96 GB of memory so that's not a big deal for me. > >> > > This code was originally "production tested" on a server with > > 1Gbyte, > > so times have changed a bit;-) > > > >>> I'm not sure I see why doing it as a separate thread will improve > >>> things. > >>> There are N nfsd threads already (N can be bumped up to 256 if you > >>> wish) > >>> and having a bunch more "cache trimming threads" would just > >>> increase > >>> contention, wouldn't it? > >> > >> Only one cache-trimming thread. The cache trim holds the (global) > >> mutex for much longer than any individual nfsd service thread has > >> any > >> need to, and having N threads doing that in parallel is why it's so > >> heavily contended. If there's only one thread doing the trim, then > >> the nfsd service threads aren't spending time either contending on > >> the > >> mutex (it will be held less frequently and for shorter periods). > >> > > I think the little drc2.patch which will keep the nfsd threads from > > acquiring the mutex and doing the trimming most of the time, might > > be > > sufficient. I still don't see why a separate trimming thread will be > > an advantage. I'd also be worried that the one cache trimming thread > > won't get the job done soon enough. > > > > When I did production testing on a 1Gbyte server that saw a peak > > load of about 100RPCs/sec, it was necessary to trim aggressively. > > (Although I'd be tempted to say that a server with 1Gbyte is no > > longer relevant, I recently recall someone trying to run FreeBSD > > on a i486, although I doubt they wanted to run the nfsd on it.) > > > >>> The only negative effect I can think of w.r.t. having the nfsd > >>> threads doing it would be a (I believe negligible) increase in RPC > >>> response times (the time the nfsd thread spends trimming the > >>> cache). > >>> As noted, I think this time would be negligible compared to disk > >>> I/O > >>> and network transit times in the total RPC response time? > >> > >> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G > >> network connectivity, spinning on a contended mutex takes a > >> significant amount of CPU time. (For the current design of the NFS > >> server, it may actually be a win to turn off adaptive mutexes -- I > >> should give that a try once I'm able to do more testing.) > >> > > Have fun with it. Let me know when you have what you think is a good > > patch. > > > > rick > > > >> -GAWollman > >> ___ > >> freebsd-hackers@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > >> To unsubscribe, send any mail to > >> "freebsd-hackers-unsubscr...@freebsd.org" > > ___ > > freebsd...@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > > "freebsd-fs-unsubscr...@freebsd.org" > > My quest for IOPS over NFS continues :) > So far I'm not able to achieve more than about 3000 8K read requests > over NFS, > while the server locally gives much more. > And this is all from a file that is completely in ARC cache, no disk > IO involved. > Just out of curiousity, why do you use 8K reads instead of 64K reads. Since the RPC overhead (including the DRC functions) is per RPC
Re: SMP Version of tar
On Oct 8, 2012, at 3:21 AM, Wojciech Puchar wrote: >> Not necessarily. If I understand correctly what Tim means, he's talking >> about an in-memory compression of several blocks by several separate >> threads, and then - after all the threads have compressed their > > but gzip format is single stream. dictionary IMHO is not reset every X > kilobytes. > > parallel gzip is possible but not with same data format. Yes, it is. The following creates a compressed file that is completely compatible with the standard gzip/gunzip tools: * Break file into blocks * Compress each block into a gzip file (with gzip header and trailer information) * Concatenate the result. This can be correctly decoded by gunzip. In theory, you get slightly worse compression. In practice, if your blocks are reasonably large (a megabyte or so each), the difference is negligible. Tim ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"