are double cells safe to use?
I consider using double cells for efficient implementation of binary trees. Is this safe? (i.e. no pre-assumptions made somewhere what these are used for) For prototyping I intended to add double-cons set-cbr! set-ccr! cbr ccr to the scheme level. Is this safe? They will only contain or refer to SCM data types. Of course I could add an own datatype, but as double cells contains exactly what I need and they are used internally I consider them being the obvious choice. I wrote to both user and devel as I consider this may be more of a devel question.
Re: are double cells safe to use?
Sorry, double cells are even encouraged to use according the manual under "Allocating cells" (I had missed that) and no specific cares as long as they contain only SCM. On Tue, Oct 29, 2013 at 11:15 AM, Roland Orre wrote: > I consider using double cells for efficient implementation of binary trees. > > Is this safe? (i.e. no pre-assumptions made somewhere what these are used for) > > For prototyping I intended to add > double-cons > set-cbr! > set-ccr! > cbr > ccr > > to the scheme level. Is this safe? > They will only contain or refer to SCM data types. > > Of course I could add an own datatype, but as double cells contains > exactly what I need and they are used internally I consider them being > the obvious choice. > > I wrote to both user and devel as I consider this may be more of a > devel question.
Re: Guile 2.2 TODO
Andy Wingo skribis: > On Sun 01 Sep 2013 13:03, Andy Wingo writes: [...] >> I think that's pretty much it. Obviously it would be nice to get a lot >> of other things into 2.2 but this is the necessary bit. I think we >> should shoot for a 2.1.0 release around 1 December; dunno. > > I think we'll be ready on that date, code-wise, but documentation makes > a 2.1.0 release much less likely -- documenting things takes time. That’s because the doc of the VM in 2.0 put the bar very high. :-) FWIW, I think it’s fine for an odd release series to have incomplete documentation (especially about the internals) if that can be fixed before the stable series comes out. Thank you for the impressive work! Ludo’.
Re: vhash speed thread safeness
Hi, Stefan, Stefan Israelsson Tampe skribis: > I did some tests witha C-based vhash implementation, it's possible to > increse the speed by 30x compared to current vlist imlpementation in > guile. It would be possible to get this speed today if one implemented > vhash-assoc as a VM op. Anyway using the source as a standard lib will > get you about 10x speed increase. As we discussed, I don’t really like the idea of implementing that much in C. Also, I’d really like the optimization effort to be driven by actual profiling data, such as C-level profiles of repeated calls to the current ‘vhash-assoc’. Have you tried that? > Another pressing need is that vhashes are not thread safe, Section 2.8 of Bagwell’s paper proposes a simple solution. All that is missing AFAICS is an atomic test-and-set VM op to implement it (which may also be useful in other places.) What do you think of this approach? Thanks, Ludo’.
Re: [PATCH] Add procedures to convert alists into hash tables
Hi! David Thompson skribis: > When looking through the different hash table implementations > available (Guile, SRFI-69, and RNRS) I found a useful SRFI-69 > procedure that had no equivalent in Guile's native hash table API: > alist->hash-table. I think it would make sense to implement them in Scheme, say in ice-9/hash-table.scm, which could be either a separate module or a file included from boot-9.scm. WDYT? Ludo’.
Re: vhash speed thread safeness
On Tue, Oct 29, 2013 at 1:34 PM, Ludovic Courtès wrote: > Hi, Stefan, > > Stefan Israelsson Tampe skribis: > > > I did some tests witha C-based vhash implementation, it's possible to > > increse the speed by 30x compared to current vlist imlpementation in > > guile. It would be possible to get this speed today if one implemented > > vhash-assoc as a VM op. Anyway using the source as a standard lib will > > get you about 10x speed increase. > > As we discussed, I don’t really like the idea of implementing that much > in C. > My neither, but it is good to see that in cache friendly cases we can improve the situation 30x. Gives us a goal to strive for. Also the main intention is to use vashses for guile-log. For this we can note. 1. With this I managed to make an indexer that can find any ordered set of matches that matches a 1 consed pattern (x,y) in about 0.5 mu s. e.g. matches for form (x y) (_ y) (x _) (_ _). the fun part is that the indexer indexes and y list of list of list including having some elements beeing variable. This is cool because if you want to solve first order logic problems you will have sometimes many hundreds of matching patterns that represents the compiled rules and hence get a speedup for searching for rule matches. 2.A thread safe version of vhashes could be used for kanren or guile-log. To improve scalability. In order to use this sanely we need to be able to truncate the vhash else the lookup complexity will not be that great for many applications. A truncation means that one need to check if the state has been stored or not and in case it has been stored do a no-op. Also, I’d really like the optimization effort to be driven by actual > profiling data, such as C-level profiles of repeated calls to the > current ‘vhash-assoc’. Have you tried that? No. But a lot of things can be gain by unpacking the vectors and use pure C vector lookup, I think that we will get that in the second or third version of the native compiler at least its to much to ask for to get it for the first one no? Then it's the overhead of VM jumps as usual which probably is the main contributor. > > Another pressing need is that vhashes are not thread safe, > > Section 2.8 of Bagwell’s paper proposes a simple solution. All that is > missing AFAICS is an atomic test-and-set VM op to implement it (which > may also be useful in other places.) > > What do you think of this approach? For vlists it's probably a good idea, I don't know if it's enough for vhashes though. Maybe you need a mutex. But lock overhead will be significant and I suspect that it is faster many times to start afresh in all threads. But untill we get a well optimized native compiler, lock overhead is not that pressing. BTW I have a good enough solution implemented now that I will use for parallellizing logic programs, which is what I will concentrate on getting the iso-prolog into shape. > > Thanks, > Ludo’. > > > /Stefan
Re: are double cells safe to use?
Roland Orre writes: > I consider using double cells for efficient implementation of binary trees. > > Is this safe? (i.e. no pre-assumptions made somewhere what these are used for) > > For prototyping I intended to add > double-cons > set-cbr! > set-ccr! > cbr > ccr > > to the scheme level. Is this safe? No, this won't work. First of all, double cells cannot be used to store four SCM values, because the first word of a double cell has to contain the type tag. Pairs avoid the type tag by the clever hack: the low bit of every SCM value is 0, so if the first word of a heap object has 0 in the low bit, that heap object is assumed to be a pair. But this means that every other heap object must start with a word whose low bit is 1. So the best you could do with a double cell would be to store three SCM objects, which is no better space efficiency than you already get with a vector of size 3 (in the master branch, which will become Guile 2.2). Another problem with creating yet another new fundamental data type is that in order to use it efficiently, we'd need to create more VM instructions to access it. That means more opcodes from our limited 8-bit opcode space, and more code in the VM, which is bad because ideally a VM should be compact for good cache behavior. I think you should just use records, or maybe vectors, for this. Mark
Re: vhash speed thread safeness
Stefan Israelsson Tampe skribis: > On Tue, Oct 29, 2013 at 1:34 PM, Ludovic Courtès wrote: > >> Hi, Stefan, >> >> Stefan Israelsson Tampe skribis: >> >> > I did some tests witha C-based vhash implementation, it's possible to >> > increse the speed by 30x compared to current vlist imlpementation in >> > guile. It would be possible to get this speed today if one implemented >> > vhash-assoc as a VM op. Anyway using the source as a standard lib will >> > get you about 10x speed increase. >> >> As we discussed, I don’t really like the idea of implementing that much >> in C. >> > My neither, but it is good to see that in cache friendly cases we can > improve the situation 30x. Gives us a goal to strive for. Also the main > intention is to use vashses for guile-log. For this we can note. OK. [...] >> > Another pressing need is that vhashes are not thread safe, >> >> Section 2.8 of Bagwell’s paper proposes a simple solution. All that is >> missing AFAICS is an atomic test-and-set VM op to implement it (which >> may also be useful in other places.) >> >> What do you think of this approach? > > > For vlists it's probably a good idea, I don't know if it's enough for > vhashes though. Oooh, right, sorry for overlooking that. > Maybe you need a mutex. But lock overhead will be significant Surely, especially if it’s a fat mutex. Hmm hmm. Of course that could be an argument for doing some C (primitives, not VM ops), but looking at ‘%vhash-assoc’, that would clearly mean reimplementing pretty much all of the vlist + vhash in C, which sucks. I wonder if there’s some other data structure with similar properties that doesn’t have the thread-safety issue. Maybe Ian has an idea? The weight-balanced trees in MIT/GNU Scheme look interesting: http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Weight_002dBalanced-Trees.html Thoughts? Thanks, Ludo’.
Re: are double cells safe to use?
Thanks, > So the best you could do with a double cell would be to store three SCM > objects, which is no better space efficiency than you already get with a > vector of size 3 (in the master branch, which will become Guile 2.2). but with the vector I also have an extra cell (single?) pointing to the vector. For this kind of things I've earlier always used records (with new smobs) or vectors, but now I wanted to make it as space efficient as possible. It'll be a memory based database which will be "static" (re-indexed daily, Day-Stout-Warren trees) and contain (in the long run..., billions of records) I guess the best alternative then is to use smob with e.g. a 4-word (SCM) vector. which implies 6 scm_t_bits, but if a 4 word vector is equally space efficient I may better go for that. On Tue, Oct 29, 2013 at 6:17 PM, Mark H Weaver wrote: > Roland Orre writes: > >> I consider using double cells for efficient implementation of binary trees. >> >> Is this safe? (i.e. no pre-assumptions made somewhere what these are used >> for) >> >> For prototyping I intended to add >> double-cons >> set-cbr! >> set-ccr! >> cbr >> ccr >> >> to the scheme level. Is this safe? > > No, this won't work. First of all, double cells cannot be used to store > four SCM values, because the first word of a double cell has to contain > the type tag. Pairs avoid the type tag by the clever hack: the low bit > of every SCM value is 0, so if the first word of a heap object has 0 in > the low bit, that heap object is assumed to be a pair. But this means > that every other heap object must start with a word whose low bit is 1. > > So the best you could do with a double cell would be to store three SCM > objects, which is no better space efficiency than you already get with a > vector of size 3 (in the master branch, which will become Guile 2.2). > > Another problem with creating yet another new fundamental data type is > that in order to use it efficiently, we'd need to create more VM > instructions to access it. That means more opcodes from our limited > 8-bit opcode space, and more code in the VM, which is bad because > ideally a VM should be compact for good cache behavior. > > I think you should just use records, or maybe vectors, for this. > > Mark
Re: vhash speed thread safeness
l...@gnu.org (Ludovic Court$(D+2(Bs) writes: > I wonder if there$B!G(Bs some other data structure with similar properties > that doesn$B!G(Bt have the thread-safety issue. Maybe Ian has an idea? HAMTs (another Bagwell creation) might be a reasonable option, but I don't have a complete implementation. I ran into some issues with it a while back, but I have completely forgotten what it was. I do want them in pfds, so I guess I could take this as the kick-up-the-arse to finish it. > The weight-balanced trees in MIT/GNU Scheme look interesting: > > > http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Weight_002dBalanced-Trees.html I have those in pfds under the name bbtrees (for balanced binary trees). They are pretty flexible, and you get reasonable times for a lot of operations, but like a lot of tree structures not particularly cache friendly. -- Ian Price -- shift-reset.com "Programming is like pinball. The reward for doing it well is the opportunity to do it again" - from "The Wizardy Compiled"