Daniel Llorens <daniel.llor...@bluewin.ch> writes: > On Apr 16, 2013, at 04:00, Mark H Weaver wrote: > >>> [1] have vector- ops only accept vector- types. This removes container >>> type dispatch from the vector- ops but not from the array- ops. The >>> last patch set I've sent to the list goes in this direction. It >>> matches the behavior of other vector-like types such as bytevectors >>> and strings. >> >> I very much favor this approach, but unfortunately I don't think we can >> do it in 2.0. It'll have to wait until 2.2. > > What should we do then about the consistency bugs? like vector? being > #f on something that is accepted by vector-ref and so on.
I don't know. > Did you have a look at the table I posted? Yes, I did, and generally I strongly prefer column [1], but we have to be very careful what changes we make in a stable release series that might break compatibility with existing code. I find this frustrating as well, which is one reason why I agree with Andy that shifting focus to 2.2 soon would be a good idea. >> I consider this option completely unacceptable. Arrays are much more >> complex than vectors, and will inevitably be *much* slower in the common >> case where the compiler doesn't know much. > > ... > >> Most egregiously, >> 'make-shared-array' requires that array references apply an arbitrary >> affine transformation that's unknown at compile time, which means >> multiplications and additions for each index. > > make-shared-array itself is a very slow function and I'd like to have > some alternatives, but it's only called when setting up the shared > array. In any case, it's not slow because of any multiplications, but > because of how much it conses. I think you misunderstood me. I'm not talking about the cost of the 'make-shared-array' call itself. I'm talking about the fact that in order to support arrays that might have been created using 'make-shared-array', things like 'array-ref' must apply an arbitrary affine transformation to the indices in order to compute the offset into the linear storage. > You overestimate (by a lot) the cost of the index computation. One > works on arrays by looping over them. In these loops the strides are > always constant. No matter the rank of the array, in the inner loop > only one index moves. The cost of this stepping is negligible on any > scenario that isn't hugely contrived. Here you're assuming either a single core procedure that loops over the entire array (e.g. something like 'array-map!'), or a Scheme loop that the compiler is able to analyze in detail. Of course, when we have a fancy native compiler, and if you write your Scheme carefully, you could certainly arrange to make this optimization possible. However, there's a lot of Scheme code that uses vectors and is not coded with such concerns in mind. In such code, you might have a top-level procedure that receives an argument and applies 'vector-ref' to it. In such cases, if all vectors are actually arrays, the compiler cannot assume anything about what kind of array is passed in. Such an array might have been created by 'make-shared-array', and thus might use an arbitrary affine transformation. If 'vector-ref' can be used on such an array, then this procedure would have to fetch the stride and offset and do the multiplication each time. If there's a loop that calls this top-level procedure, then the compiler can't do anything with this. In the context of Scheme, with no static type system, with most procedures accessed via mutable top-level variables, and where programs are often written in terms of higher-order procedures, analyzing programs can be very difficult or impossible. Therefore, it's a bad idea to have complicated primitives that are slow in the general case, in the hope that a fancy compiler will be able to analyze entire loops and make them fast through non-trivial analyses. >> Don't get me wrong: I'm all in favor of providing flexible, generic, >> extensible procedures. They have their place. I think that's why >> there's still a justification for keeping 'generalized-vectors' around. > > They didn't do anything that the array functions didn't do just as > well. Well, generalized vector accessors don't have to cope with multiple dimensions, nor do they require performing multiplications for most of the supported vector types. > Plus they were buggy —they are buggy in stable-2.0. Bugs are not a good reason to remove an API. > And the names took half the screen. This is a weak argument. You can always make shorter names for them. Having said all this, I haven't yet thought much about this issue, and don't (yet) have a strong opinion on whether the generic vector procedures should be kept. >> However, efficiency demands that we also have simple, non-generic, >> non-extensible procedures as well. IMO, we should keep the vector, >> bitvector, and bytevector procedures as simple as possible. > > Other than the bizarre idea that arrays have to be slow, I think this > is a sensible position. After all, arrays have both an array > descriptor and a reference to linear storage, so there must be a type > for this linear storage. (E.g. in stable-2.0, vector- operations take > either arrays, in a buggy way, or the underlying storage type, which > is called 'simple vector' and isn't exposed to Scheme.) > > What's your take on Daniel Hartwig's position? For example a and b on > the table I posted —b is an array, but it prints as #(2 4). How do you > justify forbidding vector-ref for something that prints #(2 4) ? I don't think that arrays should ever be printed as #(2 4). Thanks for the discussion. Regards, Mark