are double cells safe to use?

2013-10-29 Thread Roland Orre
I consider using double cells for efficient implementation of binary trees.

Is this safe? (i.e. no pre-assumptions made somewhere what these are used for)

For prototyping I intended to add
double-cons
set-cbr!
set-ccr!
cbr
ccr

to the scheme level. Is this safe?
They will only contain or refer to SCM data types.

Of course I could add an own datatype, but as double cells contains
exactly what I need and they are used internally I consider them being
the obvious choice.

I wrote to both user and devel as I consider this may be more of a
devel question.



Re: are double cells safe to use?

2013-10-29 Thread Roland Orre
Sorry, double cells are even encouraged to use according the manual under
"Allocating cells"  (I had missed that)
and no specific cares as long as they contain only SCM.

On Tue, Oct 29, 2013 at 11:15 AM, Roland Orre  wrote:
> I consider using double cells for efficient implementation of binary trees.
>
> Is this safe? (i.e. no pre-assumptions made somewhere what these are used for)
>
> For prototyping I intended to add
> double-cons
> set-cbr!
> set-ccr!
> cbr
> ccr
>
> to the scheme level. Is this safe?
> They will only contain or refer to SCM data types.
>
> Of course I could add an own datatype, but as double cells contains
> exactly what I need and they are used internally I consider them being
> the obvious choice.
>
> I wrote to both user and devel as I consider this may be more of a
> devel question.



Re: Guile 2.2 TODO

2013-10-29 Thread Ludovic Courtès
Andy Wingo  skribis:

> On Sun 01 Sep 2013 13:03, Andy Wingo  writes:

[...]

>> I think that's pretty much it.  Obviously it would be nice to get a lot
>> of other things into 2.2 but this is the necessary bit.  I think we
>> should shoot for a 2.1.0 release around 1 December; dunno.
>
> I think we'll be ready on that date, code-wise, but documentation makes
> a 2.1.0 release much less likely -- documenting things takes time.

That’s because the doc of the VM in 2.0 put the bar very high.  :-)

FWIW, I think it’s fine for an odd release series to have incomplete
documentation (especially about the internals) if that can be fixed
before the stable series comes out.

Thank you for the impressive work!

Ludo’.




Re: vhash speed thread safeness

2013-10-29 Thread Ludovic Courtès
Hi, Stefan,

Stefan Israelsson Tampe  skribis:

> I did some tests witha C-based vhash implementation, it's possible to 
> increse the speed by 30x compared to current vlist imlpementation in
> guile. It would be possible to get this speed today if one implemented
> vhash-assoc as a VM op. Anyway using the source as a standard lib will
> get you about 10x speed increase.

As we discussed, I don’t really like the idea of implementing that much
in C.

Also, I’d really like the optimization effort to be driven by actual
profiling data, such as C-level profiles of repeated calls to the
current ‘vhash-assoc’.  Have you tried that?

> Another pressing need is that vhashes are not thread safe,

Section 2.8 of Bagwell’s paper proposes a simple solution.  All that is
missing AFAICS is an atomic test-and-set VM op to implement it (which
may also be useful in other places.)

What do you think of this approach?

Thanks,
Ludo’.




Re: [PATCH] Add procedures to convert alists into hash tables

2013-10-29 Thread Ludovic Courtès
Hi!

David Thompson  skribis:

> When looking through the different hash table implementations
> available (Guile, SRFI-69, and RNRS) I found a useful SRFI-69
> procedure that had no equivalent in Guile's native hash table API:
> alist->hash-table.

I think it would make sense to implement them in Scheme, say in
ice-9/hash-table.scm, which could be either a separate module or a file
included from boot-9.scm.

WDYT?

Ludo’.




Re: vhash speed thread safeness

2013-10-29 Thread Stefan Israelsson Tampe
On Tue, Oct 29, 2013 at 1:34 PM, Ludovic Courtès  wrote:

> Hi, Stefan,
>
> Stefan Israelsson Tampe  skribis:
>
> > I did some tests witha C-based vhash implementation, it's possible to
> > increse the speed by 30x compared to current vlist imlpementation in
> > guile. It would be possible to get this speed today if one implemented
> > vhash-assoc as a VM op. Anyway using the source as a standard lib will
> > get you about 10x speed increase.
>
> As we discussed, I don’t really like the idea of implementing that much
> in C.
>
My neither, but it is good to see that in cache friendly cases we can
improve the situation 30x. Gives us a goal to strive for. Also the main
intention is to use vashses for guile-log. For this we can note.

1. With this I managed to make an indexer that can find any ordered set of
matches that matches a 1 consed pattern (x,y) in about 0.5 mu s.
e.g. matches for form (x y) (_ y) (x _) (_ _). the fun part is that the
indexer indexes and y list of list of list including having some elements
beeing variable. This is cool because if you want to solve first order
logic problems you will have sometimes many hundreds of matching patterns
that represents the compiled rules and hence get a speedup for searching
for rule matches.

2.A thread safe version of vhashes could be used for kanren or guile-log.
To improve scalability. In order to use this sanely we need to be able to
truncate the vhash else the lookup complexity will not be that great for
many applications. A truncation means that one need to check if the state
has been stored or not and in case it has been stored do a no-op.

Also, I’d really like the optimization effort to be driven by actual
> profiling data, such as C-level profiles of repeated calls to the
> current ‘vhash-assoc’.  Have you tried that?


No. But a lot of things can be gain by unpacking the vectors and use pure C
vector lookup, I think that we will get that in the second or third version
of the native compiler at least its to much to ask for to get it for the
first one no? Then it's the overhead of VM jumps as usual which probably is
the main contributor.




> > Another pressing need is that vhashes are not thread safe,
>
> Section 2.8 of Bagwell’s paper proposes a simple solution.  All that is
> missing AFAICS is an atomic test-and-set VM op to implement it (which
> may also be useful in other places.)
>
> What do you think of this approach?


For vlists it's probably a good idea, I don't know if it's enough for
vhashes though. Maybe you need a mutex. But lock overhead will be
significant and I suspect that it is faster many times to start afresh in
all threads. But untill we get a well optimized native compiler, lock
overhead is not that pressing.

BTW I have a good enough solution implemented now that I will use for
parallellizing logic programs, which is what I will concentrate on getting
the iso-prolog into shape.


>


>
Thanks,
> Ludo’.
>
>
> /Stefan


Re: are double cells safe to use?

2013-10-29 Thread Mark H Weaver
Roland Orre  writes:

> I consider using double cells for efficient implementation of binary trees.
>
> Is this safe? (i.e. no pre-assumptions made somewhere what these are used for)
>
> For prototyping I intended to add
> double-cons
> set-cbr!
> set-ccr!
> cbr
> ccr
>
> to the scheme level. Is this safe?

No, this won't work.  First of all, double cells cannot be used to store
four SCM values, because the first word of a double cell has to contain
the type tag.  Pairs avoid the type tag by the clever hack: the low bit
of every SCM value is 0, so if the first word of a heap object has 0 in
the low bit, that heap object is assumed to be a pair.  But this means
that every other heap object must start with a word whose low bit is 1.

So the best you could do with a double cell would be to store three SCM
objects, which is no better space efficiency than you already get with a
vector of size 3 (in the master branch, which will become Guile 2.2).

Another problem with creating yet another new fundamental data type is
that in order to use it efficiently, we'd need to create more VM
instructions to access it.  That means more opcodes from our limited
8-bit opcode space, and more code in the VM, which is bad because
ideally a VM should be compact for good cache behavior.

I think you should just use records, or maybe vectors, for this.

 Mark



Re: vhash speed thread safeness

2013-10-29 Thread Ludovic Courtès
Stefan Israelsson Tampe  skribis:

> On Tue, Oct 29, 2013 at 1:34 PM, Ludovic Courtès  wrote:
>
>> Hi, Stefan,
>>
>> Stefan Israelsson Tampe  skribis:
>>
>> > I did some tests witha C-based vhash implementation, it's possible to
>> > increse the speed by 30x compared to current vlist imlpementation in
>> > guile. It would be possible to get this speed today if one implemented
>> > vhash-assoc as a VM op. Anyway using the source as a standard lib will
>> > get you about 10x speed increase.
>>
>> As we discussed, I don’t really like the idea of implementing that much
>> in C.
>>
> My neither, but it is good to see that in cache friendly cases we can
> improve the situation 30x. Gives us a goal to strive for. Also the main
> intention is to use vashses for guile-log. For this we can note.

OK.

[...]

>> > Another pressing need is that vhashes are not thread safe,
>>
>> Section 2.8 of Bagwell’s paper proposes a simple solution.  All that is
>> missing AFAICS is an atomic test-and-set VM op to implement it (which
>> may also be useful in other places.)
>>
>> What do you think of this approach?
>
>
> For vlists it's probably a good idea, I don't know if it's enough for
> vhashes though.

Oooh, right, sorry for overlooking that.

> Maybe you need a mutex. But lock overhead will be significant

Surely, especially if it’s a fat mutex.

Hmm hmm.  Of course that could be an argument for doing some C
(primitives, not VM ops), but looking at ‘%vhash-assoc’, that would
clearly mean reimplementing pretty much all of the vlist + vhash in C,
which sucks.

I wonder if there’s some other data structure with similar properties
that doesn’t have the thread-safety issue.  Maybe Ian has an idea?

The weight-balanced trees in MIT/GNU Scheme look interesting:

  
http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Weight_002dBalanced-Trees.html

Thoughts?

Thanks,
Ludo’.



Re: are double cells safe to use?

2013-10-29 Thread Roland Orre
Thanks,

> So the best you could do with a double cell would be to store three SCM
> objects, which is no better space efficiency than you already get with a
> vector of size 3 (in the master branch, which will become Guile 2.2).

but with the vector I also have an extra cell (single?) pointing to the vector.
For this kind of things I've earlier always used records (with new smobs)
or vectors, but now I wanted to make it as space efficient as possible.
It'll be a memory based database which will be "static" (re-indexed daily,
Day-Stout-Warren trees) and contain  (in the long run..., billions of records)

I guess the best alternative then is to use smob with e.g. a 4-word
(SCM) vector.
which implies 6 scm_t_bits, but if a 4 word vector is equally space
efficient I may
better go for that.


On Tue, Oct 29, 2013 at 6:17 PM, Mark H Weaver  wrote:
> Roland Orre  writes:
>
>> I consider using double cells for efficient implementation of binary trees.
>>
>> Is this safe? (i.e. no pre-assumptions made somewhere what these are used 
>> for)
>>
>> For prototyping I intended to add
>> double-cons
>> set-cbr!
>> set-ccr!
>> cbr
>> ccr
>>
>> to the scheme level. Is this safe?
>
> No, this won't work.  First of all, double cells cannot be used to store
> four SCM values, because the first word of a double cell has to contain
> the type tag.  Pairs avoid the type tag by the clever hack: the low bit
> of every SCM value is 0, so if the first word of a heap object has 0 in
> the low bit, that heap object is assumed to be a pair.  But this means
> that every other heap object must start with a word whose low bit is 1.
>
> So the best you could do with a double cell would be to store three SCM
> objects, which is no better space efficiency than you already get with a
> vector of size 3 (in the master branch, which will become Guile 2.2).
>
> Another problem with creating yet another new fundamental data type is
> that in order to use it efficiently, we'd need to create more VM
> instructions to access it.  That means more opcodes from our limited
> 8-bit opcode space, and more code in the VM, which is bad because
> ideally a VM should be compact for good cache behavior.
>
> I think you should just use records, or maybe vectors, for this.
>
>  Mark



Re: vhash speed thread safeness

2013-10-29 Thread Ian Price
l...@gnu.org (Ludovic Court$(D+2(Bs) writes:

> I wonder if there$B!G(Bs some other data structure with similar properties
> that doesn$B!G(Bt have the thread-safety issue.  Maybe Ian has an idea?

HAMTs (another Bagwell creation) might be a reasonable option, but I
don't have a complete implementation. I ran into some issues with it a
while back, but I have completely forgotten what it was. I do want them
in pfds, so I guess I could take this as the kick-up-the-arse to finish
it.

> The weight-balanced trees in MIT/GNU Scheme look interesting:
>
>   
> http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Weight_002dBalanced-Trees.html

I have those in pfds under the name bbtrees (for balanced binary
trees). They are pretty flexible, and you get reasonable times for a lot
of operations, but like a lot of tree structures not particularly cache
friendly.

-- 
Ian Price -- shift-reset.com

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"