Implementing RFC 272

Buddha Buck Sat, 23 Sep 2000 22:00:20 -0700
This may be verging outside of the range of "language-data" and veering 
towards "internals", but I'd sort of like to hash out an idea to make 
sure that I'm thinking straight.

When I heard about transpose() (as well as reshape(), etc), I was 
concerned about the time it would take to execute these complex 
operations.  To wit, naively, I believed that a lot of data shuffling 
would be necessary.

Specifically, given the 2x3 array [[a,b,c],[d,e,f]], I had assumed that 
it would be stored in a structure roughly analogous to:

%a = { shape => [2, 3],
       data  => [a, b, c, d, e, f]
}

with position [i,j] being accessd as $a{data}[3*i+j].

(Note:  I wouldn't implement this at the Perl level, but at a lower 
level. I presume that in C the relevant structure would look something 
like:

struct array { int dims; int *shape; int *data };

but even this is a rough prototype.  I'm just using Perl for clarity).

More generally, an AxBxCxDxE array would have position [a,b,c,d,e] 
accessed with $a{data}[a*B*C*D*E + b*C*D*E + c*D*E + d*E + e]

Transposing that 2x3 array into the 3x2 array [[a,d],[b,e],[c,f]] would 
require building:

%a = { shape => [3,2],
       data  => [a,d,b,e,c,f]
}

which would have [i,j] being accessed as $a{data}[2*i+j];

Obviously, this is slow, which is why the suggestion "do it lazily" 
comes up.

However, am I wrong in thinking that this representation is faster:

%a = { shape    => [2,3],
       accessor => [3,1],
       offset   => 0,
       data     => [a,b,c,d,e,f]
}

The idea is that element [i,j] would be accessed as 

  $a{data}[dotproduct([i,j],$a{accessor}) + $a{offset}]

In this case, this would translate to $a{data}[i*3+j+0], which is what 
we had before.

But transpose then wouldn't heve to shuffle data, just shuffle the 
shape and accessor vectors:

%a = { shape    => [3,2],
       accessor => [1,3],
       offset   => 0,
       data     => [a,b,c,d,e,f]
}

In the AxBxCxDxE array, it would (pre-transpose) look something like:

$data = [ ... ];
{ shape    => [A,       B,     C,   D, E],
  accessor => [B*C*D*E, C*D*E, D*E, E, 1],
  offset   => 0,
  data     => $data
}

After transposing the 2nd (B) and 3rd (C) dimensions, we'd get:

{ shape    => [A,       C,   B,     D, E],
  accessor => [B*C*D*E, D*E, C*D*E, E, 1],
  offset   => 0,
  data     => $data # unchanged from above
}

I can see other benefits to this idea as well.  The main theoretical 
benefit I can see is that it eliminates that need to worry about 
"row-major" v. "column major", etc.  All dimensions are treated equally.

It also means that if we wanted to take the 4 dimensional sub-array you 
get from the AxBxCxDxE array above when C is fixed at 4, it can be done 
semi-lazily to get:

{ shape    => [A,B,D,E],
  accessor => [B*C*D*E, C*D*E, E, 1],
  offset   => 4*D*E,
  data     => $data # again, unchanged from above
}

(This is where the "offset", which was always always 0 above, comes 
into play.)

Fixing the E dimension to 3 would get you:

{ shape    => [A,B,D]
  accessor => [B*C*D*E, C*D*E, E],
  offset   => 4*D*E + 3*1,
  data     => $data # no change yet
}

I'm expecting one of four basic reactions to this idea:

1)  That's neat and novel,
2)  That's what was already planned, it's the "obvious" way to do it,
3)  We tried that, here's why it doesn't work,
4)  It's stupid, and here's why,

I honestly don't know which is the "proper" response...

-- 
     Buddha Buck                             [EMAIL PROTECTED]
"Just as the strength of the Internet is chaos, so the strength of our
liberty depends upon the chaos and cacophony of the unfettered speech
the First Amendment protects."  -- A.L.A. v. U.S. Dept. of Justice
Implementing RFC 272

Reply via email to