[Numpy-discussion] RFC: comments to BLAS committee from numpy/scipy devs

2018-01-09 Thread Nathaniel Smith
Hi all,

As mentioned earlier [1][2], there's work underway to revise and
update the BLAS standard -- e.g. we might get support for strided
arrays and lose xerbla! There's a draft at [3]. They're interested in
feedback from users, so I've written up a first draft of comments
about what we would like as NumPy/SciPy developers. This is very much
a first attempt -- I know we have lots of people who are more expert
on BLAS than me on these lists :-). Please let me know what you think.

-n

[1] https://mail.python.org/pipermail/numpy-discussion/2017-November/077420.html
[2] https://mail.python.org/pipermail/scipy-dev/2017-November/022267.html
[3] 
https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit

-

# Comments from NumPy / SciPy developers on "A Proposal for a
Next-Generation BLAS"

These are comments on [A Proposal for a Next-Generation
BLAS](https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit#)
(version as of 2017-12-13), from the perspective of the developers of
the NumPy and SciPy libraries. We hope this feedback is useful, and
welcome further discussion.

## Who are we?

NumPy and SciPy are the two foundational libraries of the Python
numerical ecosystem, and one of their duties is to wrap BLAS and
expose it for the use of other Python libraries. (NumPy primarily
provides a GEMM wrapper, while SciPy exposes more specialized
operations.) It's unclear how many users we have exactly, but we
certainly ship multiple million copies of BLAS every month, and
provide one of the most popular numerical toolkits for both novice and
expert users.

Looking at the original BLAS and LAPACK interfaces, it often seems
that their imagined user is something like a classic supercomputer
consumer, who writes code directly in Fortran or C against the BLAS
API, and where the person writing the code and running the code are
the same. NumPy/SciPy are coming from a very different perspective:
our users generally know nothing about the details of the underlying
BLAS; they just want to describe their problem in some high-level way,
and the library is responsible for making it happen as efficiently as
possible, and is often integrated into some larger system (e.g. a
real-time analytics platform embedded in a web server).

When it comes to our BLAS usage, we mostly use only a small subset of
the routines. However, as "consumer software" used by a wide variety
of users with differing degress of technical expertise, we're expected
to Just Work on a wide variety of systems, and with as many different
vendor BLAS libraries as possible. On the other hand, the fact that
we're working with Python means we don't tend to worry about small
inefficiencies that will be lost in the noise in any case, and are
willing to sacrifice some performance to get more reliable operation
across our diverse userbase.

## Comments on specific aspects of the proposal

### Data Layout

We are **strongly in favor** of the proposal to support arbitrary
strided data layouts. Ideally, this would support strides *specified
in bytes* (allowing for unaligned data layouts), and allow for truly
arbitrary strides, including *zero or negative* values. However, we
think it's fine if some of the weirder cases suffer a performance
penalty.

Rationale: NumPy – and thus, most of the scientific Python ecosystem –
only has one way of representing an array: the `numpy.ndarray` type,
which is an arbitrary dimensional tensor with arbitrary strides. It is
common to encounter matrices with non-trivial strides. For example::

# Make a 3-dimensional tensor, 10 x 9 x 8
t = np.zeros((10, 9, 8))
# Considering this as a stack of eight 10x9 matrices, extract the first:
mat = t[:, :, 0]

Now `mat` has non-trivial strides on both axes. (If running this in a
Python interpreter, you can see this by looking at the value of
`mat.strides`.) Another case where interesting strides arise is when
performing 
["broadcasting"](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html),
which is the name for NumPy's rules for stretching arrays to make
their shapes match. For example, in an expression like::

np.array([1, 2, 3]) + 1

the scalar `1` is "broadcast" to create a vector `[1, 1, 1]`. This is
accomplished without allocating memory, by creating a vector with
settings length = 3, strides = 0 – so all the elements share a single
location in memory. Similarly, by using negative strides we can
reverse an array without allocating memory::

a = np.array([1, 2, 3])
a_flipped = a[::-1]

Now `a_flipped` has the value `[3, 2, 1]`, while sharing storage with
the array `a = [1, 2, 3]`. Misaligned data is also possible (e.g. an
array of 8-byte doubles with a 9-byte stride), though it arises more
rarely. (An example of when it might occurs is in an on-disk data
format that alternates between storing a double value and then a
single byte value, which is then memory-mapped.)

While this array representati

Re: [Numpy-discussion] [SciPy-Dev] RFC: comments to BLAS committee from numpy/scipy devs

2018-01-09 Thread Ilhan Polat
I couldn't find an item to place this but I think ilaenv and also calling
the function twice (one with lwork=-1 and reading the optimal block size
and the call the function again properly with lwork=) in LAPACK
needs to be gotten rid of.

That's a major annoyance during the wrapping of LAPACK routines for SciPy.

I don't know if this is realistic but the values ilaenv needed can be
computed once (or again if hardware is changed) at the install and can be
read off by the routines.



On Jan 9, 2018 09:25, "Nathaniel Smith"  wrote:

> Hi all,
>
> As mentioned earlier [1][2], there's work underway to revise and
> update the BLAS standard -- e.g. we might get support for strided
> arrays and lose xerbla! There's a draft at [3]. They're interested in
> feedback from users, so I've written up a first draft of comments
> about what we would like as NumPy/SciPy developers. This is very much
> a first attempt -- I know we have lots of people who are more expert
> on BLAS than me on these lists :-). Please let me know what you think.
>
> -n
>
> [1] https://mail.python.org/pipermail/numpy-discussion/
> 2017-November/077420.html
> [2] https://mail.python.org/pipermail/scipy-dev/2017-November/022267.html
> [3] https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdB
> DvtD5I14QHp9OE/edit
>
> -
>
> # Comments from NumPy / SciPy developers on "A Proposal for a
> Next-Generation BLAS"
>
> These are comments on [A Proposal for a Next-Generation
> BLAS](https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdB
> DvtD5I14QHp9OE/edit#)
> (version as of 2017-12-13), from the perspective of the developers of
> the NumPy and SciPy libraries. We hope this feedback is useful, and
> welcome further discussion.
>
> ## Who are we?
>
> NumPy and SciPy are the two foundational libraries of the Python
> numerical ecosystem, and one of their duties is to wrap BLAS and
> expose it for the use of other Python libraries. (NumPy primarily
> provides a GEMM wrapper, while SciPy exposes more specialized
> operations.) It's unclear how many users we have exactly, but we
> certainly ship multiple million copies of BLAS every month, and
> provide one of the most popular numerical toolkits for both novice and
> expert users.
>
> Looking at the original BLAS and LAPACK interfaces, it often seems
> that their imagined user is something like a classic supercomputer
> consumer, who writes code directly in Fortran or C against the BLAS
> API, and where the person writing the code and running the code are
> the same. NumPy/SciPy are coming from a very different perspective:
> our users generally know nothing about the details of the underlying
> BLAS; they just want to describe their problem in some high-level way,
> and the library is responsible for making it happen as efficiently as
> possible, and is often integrated into some larger system (e.g. a
> real-time analytics platform embedded in a web server).
>
> When it comes to our BLAS usage, we mostly use only a small subset of
> the routines. However, as "consumer software" used by a wide variety
> of users with differing degress of technical expertise, we're expected
> to Just Work on a wide variety of systems, and with as many different
> vendor BLAS libraries as possible. On the other hand, the fact that
> we're working with Python means we don't tend to worry about small
> inefficiencies that will be lost in the noise in any case, and are
> willing to sacrifice some performance to get more reliable operation
> across our diverse userbase.
>
> ## Comments on specific aspects of the proposal
>
> ### Data Layout
>
> We are **strongly in favor** of the proposal to support arbitrary
> strided data layouts. Ideally, this would support strides *specified
> in bytes* (allowing for unaligned data layouts), and allow for truly
> arbitrary strides, including *zero or negative* values. However, we
> think it's fine if some of the weirder cases suffer a performance
> penalty.
>
> Rationale: NumPy – and thus, most of the scientific Python ecosystem –
> only has one way of representing an array: the `numpy.ndarray` type,
> which is an arbitrary dimensional tensor with arbitrary strides. It is
> common to encounter matrices with non-trivial strides. For example::
>
> # Make a 3-dimensional tensor, 10 x 9 x 8
> t = np.zeros((10, 9, 8))
> # Considering this as a stack of eight 10x9 matrices, extract the
> first:
> mat = t[:, :, 0]
>
> Now `mat` has non-trivial strides on both axes. (If running this in a
> Python interpreter, you can see this by looking at the value of
> `mat.strides`.) Another case where interesting strides arise is when
> performing ["broadcasting"](https://docs.scipy.org/doc/numpy-1.13.0/
> user/basics.broadcasting.html),
> which is the name for NumPy's rules for stretching arrays to make
> their shapes match. For example, in an expression like::
>
> np.array([1, 2, 3]) + 1
>
> the scalar `1` is "broadcast" to create a vector `[1, 1, 1]`. This is
> acco

Re: [Numpy-discussion] array - dimension size of 1-D and 2-D examples

2018-01-09 Thread Martin.Gfeller
Hi Derek

I have a related question:

Given:

a = numpy.array([[0,1,2],[3,4]])
assert a.ndim == 1
b = numpy.array([[0,1,2],[3,4,5]])
assert b.ndim == 2

Is there an elegant  way to force b to remain a 1-dim object array? 

I have a use case where normally the sublists are of different lengths, but I 
get a completely different structure when they are (coincidentally in my case) 
of the same length.

Thanks and best regards, Martin


Martin Gfeller, Swisscom / Enterprise / Banking / Products / Quantax

Message: 1
Date: Sun, 31 Dec 2017 00:11:48 +0100
From: Derek Homeier 
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] array - dimension size of 1-D and 2-D
examples
Message-ID:

Content-Type: text/plain; charset=utf-8

On 30 Dec 2017, at 5:38 pm, Vinodhini Balusamy  wrote:
> 
> Just one more question from the details you have provided which from 
> my understanding strongly seems to be Design [DEREK] You cannot create 
> a regular 2-dimensional integer array from one row of length 3
>> and a second one of length 0. Thus np.array chooses the next most 
>> basic type of array it can fit your input data in
> 
Indeed, the general philosophy is to preserve the structure and type of your 
input data as far as possible, i.e. a list is turned into a 1d-array, a list of 
lists (or tuples etc?) into a 2d-array,_ if_ the sequences are of equal length 
(even if length 1).
As long as there is an unambiguous way to convert the data into an array (see 
below).

>Which is the case,  only if an second one of length 0 is given.
>What about the case 1 :
> >>> x12 = np.array([[1,2,3]])
> >>> x12
> array([[1, 2, 3]])
> >>> print(x12)
> [[1 2 3]]
> >>> x12.ndim
> 2
> >>>
> >>>
> This seems to take 2 dimension.

Yes, structurally this is equivalent to your second example

> also,
>>> x12 = np.array([[1,2,3],[0,0,0]])
>>> print(x12)
[[1 2 3]
 [0 0 0]]
>>> x12.ndim
2

> I presumed the above case and the case where length 0 is provided to be 
> treated same(I mean same behaviour).
> Correct me if I am wrong.
> 
In this case there is no unambiguous way to construct the array - you would 
need a shape (2, 3) array to store the two lists with 3 elements in the first 
list. Obviously x12[0] would be np.array([1,2,3]), but what should be the value 
of x12[1], if the second list is empty - it could be zeros, or repeating 
x12[0], or simply undefined. np.array([1, 2, 3], [4]]) would be even less 
clearly defined.
These cases where there is no obvious ?right? way to create the array have 
usually been discussed at some length, but I don?t know if this is fully 
documented in some place. For the essentials, see

https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html

note also the upcasting rules if you have e.g. a mix of integers and reals or 
complex numbers, and also how to control shape or data type explicitly with the 
respective keywords.

Derek


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array - dimension size of 1-D and 2-D examples

2018-01-09 Thread Sebastian Berg
On Tue, 2018-01-09 at 12:27 +, martin.gfel...@swisscom.com wrote:
> Hi Derek
> 
> I have a related question:
> 
> Given:
> 
>   a = numpy.array([[0,1,2],[3,4]])
>   assert a.ndim == 1
>   b = numpy.array([[0,1,2],[3,4,5]])
>   assert b.ndim == 2
> 
> Is there an elegant  way to force b to remain a 1-dim object array? 
> 

You will have to create an empty object array and assign the lists to
it.

```
b = np.empty(len(l), dtype=object)
b[...] = l
```

> I have a use case where normally the sublists are of different
> lengths, but I get a completely different structure when they are
> (coincidentally in my case) of the same length.
> 
> Thanks and best regards, Martin
> 
> 
> Martin Gfeller, Swisscom / Enterprise / Banking / Products / Quantax
> 
> Message: 1
> Date: Sun, 31 Dec 2017 00:11:48 +0100
> From: Derek Homeier 
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] array - dimension size of 1-D and 2-D
>   examples
> Message-ID:
>en.de>
> Content-Type: text/plain; charset=utf-8
> 
> On 30 Dec 2017, at 5:38 pm, Vinodhini Balusamy 
> wrote:
> > 
> > Just one more question from the details you have provided which
> > from 
> > my understanding strongly seems to be Design [DEREK] You cannot
> > create 
> > a regular 2-dimensional integer array from one row of length 3
> > > and a second one of length 0. Thus np.array chooses the next
> > > most 
> > > basic type of array it can fit your input data in
> 
> Indeed, the general philosophy is to preserve the structure and type
> of your input data as far as possible, i.e. a list is turned into a
> 1d-array, a list of lists (or tuples etc?) into a 2d-array,_ if_ the
> sequences are of equal length (even if length 1).
> As long as there is an unambiguous way to convert the data into an
> array (see below).
> 
> >Which is the case,  only if an second one of length 0 is given.
> >What about the case 1 :
> > > > > x12 = np.array([[1,2,3]])
> > > > > x12
> > 
> > array([[1, 2, 3]])
> > > > > print(x12)
> > 
> > [[1 2 3]]
> > > > > x12.ndim
> > 
> > 2
> > > > > 
> > > > > 
> > 
> > This seems to take 2 dimension.
> 
> Yes, structurally this is equivalent to your second example
> 
> > also,
> > > > x12 = np.array([[1,2,3],[0,0,0]])
> > > > print(x12)
> 
> [[1 2 3]
>  [0 0 0]]
> > > > x12.ndim
> 
> 2
> 
> > I presumed the above case and the case where length 0 is provided
> > to be treated same(I mean same behaviour).
> > Correct me if I am wrong.
> > 
> 
> In this case there is no unambiguous way to construct the array - you
> would need a shape (2, 3) array to store the two lists with 3
> elements in the first list. Obviously x12[0] would be
> np.array([1,2,3]), but what should be the value of x12[1], if the
> second list is empty - it could be zeros, or repeating x12[0], or
> simply undefined. np.array([1, 2, 3], [4]]) would be even less
> clearly defined.
> These cases where there is no obvious ?right? way to create the array
> have usually been discussed at some length, but I don?t know if this
> is fully documented in some place. For the essentials, see
> 
> https://docs.scipy.org/doc/numpy/reference/routines.array-creation.ht
> ml
> 
> note also the upcasting rules if you have e.g. a mix of integers and
> reals or complex numbers, and also how to control shape or data type
> explicitly with the respective keywords.
> 
>   Derek
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] RFC: comments to BLAS committee from numpy/scipy devs

2018-01-09 Thread Nathaniel Smith
On Tue, Jan 9, 2018 at 3:40 AM, Ilhan Polat  wrote:
> I couldn't find an item to place this but I think ilaenv and also calling
> the function twice (one with lwork=-1 and reading the optimal block size and
> the call the function again properly with lwork=) in LAPACK needs to
> be gotten rid of.
>
> That's a major annoyance during the wrapping of LAPACK routines for SciPy.
>
> I don't know if this is realistic but the values ilaenv needed can be
> computed once (or again if hardware is changed) at the install and can be
> read off by the routines.

Unfortunately I think this effort is just to revise BLAS, not LAPACK.
Maybe you should try starting a conversation with the LAPACK
developers though – I don't know much about how they work but maybe
they'd be interested in feedback.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] RFC: comments to BLAS committee from numpy/scipy devs

2018-01-09 Thread Nathaniel Smith
On Tue, Jan 9, 2018 at 12:53 PM, Tyler Reddy  wrote:
> One common issue in computational geometry is the need to operate rapidly on
> arrays with "heterogeneous shapes."
>
> So, an array that has rows with different numbers of columns -- shape (1,3)
> for the first polygon and shape (1, 12) for the second polygon and so on.
>
> This seems like a particularly nasty scenario when the loss of "homogeneity"
> in shape precludes traditional vectorization -- I think numpy effectively
> converts these to dtype=object, etc. I don't
> think is necessarily a BLAS issue since wrapping comp. geo. libraries does
> happen in a subset of cases to handle this, but if there's overlap in
> utility you could pass it along I suppose.

You might be interested in this discussion of "Batch BLAS":
https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit#heading=h.pvsif1mxvaqq

I didn't get into it in the draft response, because it didn't seem
like something where NumPy/SciPy have any useful experience to offer,
but it sounds like there are people worrying about this case.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion