Re: vectors are something else

Mark H Weaver Mon, 15 Apr 2013 23:20:04 -0700

Daniel Llorens <daniel.llor...@bluewin.ch> writes:

> On Apr 16, 2013, at 04:00, Mark H Weaver wrote:
>
>>> [1] have vector- ops only accept vector- types. This removes container
>>> type dispatch from the vector- ops but not from the array- ops. The
>>> last patch set I've sent to the list goes in this direction. It
>>> matches the behavior of other vector-like types such as bytevectors
>>> and strings.
>> 
>> I very much favor this approach, but unfortunately I don't think we can
>> do it in 2.0.  It'll have to wait until 2.2.
>
> What should we do then about the consistency bugs? like vector? being
> #f on something that is accepted by vector-ref and so on.


I don't know.

> Did you have a look at the table I posted?

Yes, I did, and generally I strongly prefer column [1], but we have to
be very careful what changes we make in a stable release series that
might break compatibility with existing code.  I find this frustrating
as well, which is one reason why I agree with Andy that shifting focus
to 2.2 soon would be a good idea.

>> I consider this option completely unacceptable.  Arrays are much more
>> complex than vectors, and will inevitably be *much* slower in the common
>> case where the compiler doesn't know much.
>
> ...
>
>> Most egregiously,
>> 'make-shared-array' requires that array references apply an arbitrary
>> affine transformation that's unknown at compile time, which means
>> multiplications and additions for each index.
>
> make-shared-array itself is a very slow function and I'd like to have
> some alternatives, but it's only called when setting up the shared
> array. In any case, it's not slow because of any multiplications, but
> because of how much it conses.

I think you misunderstood me.  I'm not talking about the cost of the
'make-shared-array' call itself.  I'm talking about the fact that in
order to support arrays that might have been created using
'make-shared-array', things like 'array-ref' must apply an arbitrary
affine transformation to the indices in order to compute the offset into
the linear storage.

> You overestimate (by a lot) the cost of the index computation. One
> works on arrays by looping over them. In these loops the strides are
> always constant. No matter the rank of the array, in the inner loop
> only one index moves. The cost of this stepping is negligible on any
> scenario that isn't hugely contrived.

Here you're assuming either a single core procedure that loops over the
entire array (e.g. something like 'array-map!'), or a Scheme loop that
the compiler is able to analyze in detail.  Of course, when we have a
fancy native compiler, and if you write your Scheme carefully, you could
certainly arrange to make this optimization possible.

However, there's a lot of Scheme code that uses vectors and is not coded
with such concerns in mind.  In such code, you might have a top-level
procedure that receives an argument and applies 'vector-ref' to it.  In
such cases, if all vectors are actually arrays, the compiler cannot
assume anything about what kind of array is passed in.  Such an array
might have been created by 'make-shared-array', and thus might use an
arbitrary affine transformation.  If 'vector-ref' can be used on such an
array, then this procedure would have to fetch the stride and offset and
do the multiplication each time.  If there's a loop that calls this
top-level procedure, then the compiler can't do anything with this.

In the context of Scheme, with no static type system, with most
procedures accessed via mutable top-level variables, and where programs
are often written in terms of higher-order procedures, analyzing
programs can be very difficult or impossible.  Therefore, it's a bad
idea to have complicated primitives that are slow in the general case,
in the hope that a fancy compiler will be able to analyze entire loops
and make them fast through non-trivial analyses.

>> Don't get me wrong: I'm all in favor of providing flexible, generic,
>> extensible procedures.  They have their place.  I think that's why
>> there's still a justification for keeping 'generalized-vectors' around.
>
> They didn't do anything that the array functions didn't do just as
> well.

Well, generalized vector accessors don't have to cope with multiple
dimensions, nor do they require performing multiplications for most of
the supported vector types.

> Plus they were buggy —they are buggy in stable-2.0.

Bugs are not a good reason to remove an API.

> And the names took half the screen.

This is a weak argument.  You can always make shorter names for them.

Having said all this, I haven't yet thought much about this issue, and
don't (yet) have a strong opinion on whether the generic vector
procedures should be kept.

>> However, efficiency demands that we also have simple, non-generic,
>> non-extensible procedures as well.  IMO, we should keep the vector,
>> bitvector, and bytevector procedures as simple as possible.
>
> Other than the bizarre idea that arrays have to be slow, I think this
> is a sensible position. After all, arrays have both an array
> descriptor and a reference to linear storage, so there must be a type
> for this linear storage. (E.g. in stable-2.0, vector- operations take
> either arrays, in a buggy way, or the underlying storage type, which
> is called 'simple vector' and isn't exposed to Scheme.)
>
> What's your take on Daniel Hartwig's position? For example a and b on
> the table I posted —b is an array, but it prints as #(2 4). How do you
> justify forbidding vector-ref for something that prints #(2 4) ?

I don't think that arrays should ever be printed as #(2 4).

Thanks for the discussion.

     Regards,
       Mark

Re: vectors are something else

Reply via email to