Re: [Numpy-discussion] improving arange()? introducing fma()?

2018-02-10 Thread Matthias Geier
I just want to add a few links related to the topic:

https://mail.python.org/pipermail/numpy-discussion/2007-September/029129.html
https://quantumwise.com/forum/index.php?topic=110.0#.VIVgyIctjRZ
https://mail.python.org/pipermail/numpy-discussion/2012-February/060238.html
http://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/misc/arange.ipynb

cheers,
Matthias

On Fri, Feb 9, 2018 at 11:17 PM, Benjamin Root  wrote:

> Interesting...
>
> ```
> static void
> @NAME@_fill(@type@ *buffer, npy_intp length, void *NPY_UNUSED(ignored))
> {
> npy_intp i;
> @type@ start = buffer[0];
> @type@ delta = buffer[1];
> delta -= start;
> for (i = 2; i < length; ++i) {
> buffer[i] = start + i*delta;
> }
> }
> ```
>
> So, the second value is computed using the delta arange was given, but
> then tries to get the delta back, which incurs errors:
> ```
> >>> a = np.float32(-115)
> >>> delta = np.float32(0.01)
> >>> b = a + delta
> >>> new_delta = b - a
> >>> "%.16f" % delta
> '0.009997764826'
> >>> "%.16f" % new_delta
> '0.0100021362304688'
> ```
>
> Also, right there is a good example of where the use of fma() could be of
> value.
>
> Cheers!
> Ben Root
>
>
> On Fri, Feb 9, 2018 at 4:56 PM, Eric Wieser 
> wrote:
>
>> Can’t arange and linspace operations with floats be done internally
>>
>> Yes, and they probably should be - they’re done this way as a hack
>> because the api exposed for custom dtypes is here
>> ,
>> (example implementation here
>> )
>> - essentially, you give it the first two elements of the array, and ask it
>> to fill in the rest.
>> ​
>>
>> On Fri, 9 Feb 2018 at 13:17 Matthew Harrigan 
>> wrote:
>>
>>> I apologize if I'm missing something basic, but why are floats being
>>> accumulated in the first place?  Can't arange and linspace operations with
>>> floats be done internally similar to `start + np.arange(num_steps) *
>>> step_size`?  I.e. always accumulate (really increment) integers to limit
>>> errors.
>>>
>>> On Fri, Feb 9, 2018 at 3:43 PM, Benjamin Root 
>>> wrote:
>>>


 On Fri, Feb 9, 2018 at 12:19 PM, Chris Barker 
 wrote:

> On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers 
> wrote:
>>
>>  It is partly a plea for some development of numerically accurate
>>> functions for computing lat/lon grids from a combination of inputs: 
>>> bounds,
>>> counts, and resolutions.
>>>
>>
> Can you be more specific about what problems you've run into -- I work
> with lat-lon grids all the time, and have never had a problem.
>
> float32 degrees gives you about 1 meter accuracy or better, so I can
> see how losing a few digits might be an issue, though I would argue that
> you maybe shouldn't use float32 if you are worried about anything close to
> 1m accuracy... -- or shift to a relative coordinate system of some sort.
>

 The issue isn't so much the accuracy of the coordinates themselves. I
 am only worried about 1km resolution (which is approximately 0.01 degrees
 at mid-latitudes). My concern is with consistent *construction* of a
 coordinate grid with even spacing. As it stands right now. If I provide a
 corner coordinate, a resolution, and the number of pixels, the result is
 not terrible (indeed, this is the approach used by gdal/rasterio). If I
 have start/end coordinates and the number of pixels, the result is not bad,
 either (use linspace). But, if I have start/end coordinates and a
 resolution, then determining the number of pixels from that is actually
 tricky to get right in the general case, especially with float32 and large
 grids, and especially if the bounding box specified isn't exactly divisible
 by the resolution.


>
> I have been playing around with the decimal package a bit lately,
>>>
>>
> sigh. decimal is so often looked at a solution to a problem it isn't
> designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype
> :-)
>
> what you get from decimal is variable precision -- maybe a binary
> variable precision lib is a better answer -- that would be a good thing to
> have easy access to in numpy, but in this case, if you want better 
> accuracy
> in a computation that will end up in float32, just use float64.
>

 I am not concerned about computing distances or anything like that, I
 am trying to properly construct my grid. I need consistent results
 regardless of which way the grid is specified (start/end/count,
 start/res/count, start/end/res). I have found that loading up the grid
 specs (using in a config file or command-line) using the Decimal class
 allows me to exactly and 

Re: [Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

2018-12-29 Thread Matthias Geier
Hi Sebastian.

I don't have an opinion (yet) about this matter, but I have a question:

On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:

[...]

> new_arr = arr.reshape(new_shape)
> assert np.may_share_memory(arr, new_arr)
>
> # Which is sometimes -- but should not be -- written as:
> arr.shape = new_shape  # unnecessary container modification

[...]

Why is this discouraged?

Why do you call this "unnecessary container modification"?

I've used this idiom in the past for exactly those cases where I
wanted to make sure no copy is made.

And if we are not supposed to assign to arr.shape, why is it allowed
in the first place?

cheers,
Matthias
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

2018-12-30 Thread Matthias Geier
On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:
>
> On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
> > Hi Sebastian.
> >
> > I don't have an opinion (yet) about this matter, but I have a
> > question:
> >
> > On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
> >
> > [...]
> >
> > > new_arr = arr.reshape(new_shape)
> > > assert np.may_share_memory(arr, new_arr)
> > >
> > > # Which is sometimes -- but should not be -- written as:
> > > arr.shape = new_shape  # unnecessary container modification
> >
> > [...]
> >
> > Why is this discouraged?
> >
> > Why do you call this "unnecessary container modification"?
> >
> > I've used this idiom in the past for exactly those cases where I
> > wanted to make sure no copy is made.
> >
> > And if we are not supposed to assign to arr.shape, why is it allowed
> > in the first place?
>
> Well, this may be a matter of taste, but say you have an object that
> stores an array:
>
> class MyObject:
> def __init__(self):
> self.myarr = some_array
>
>
> Now, lets say I do:
>
> def some_func(arr):
> # Do something with the array:
> arr.shape = -1
>
> myobject = MyObject()
> some_func(myobject)
>
> then myobject will suddenly have the wrong shape stored. In most cases
> this is harmless, but I truly believe this is exactly why we have views
> and why they are so awesome.
> The content of arrays is mutable, but the array object itself should
> not be muted normally.

Thanks for the example! I don't understand its point, though.
Also, it's not working since MyObject doesn't have a .shape attribute.

> There may be some corner cases, but a lot of the
> "than why is it allowed" questions are answered with: for history
> reasons.

OK, that's a good point.

> By the way, on error the `arr.shape = ...` code currently creates the
> copy temporarily.

That's interesting and it should probably be fixed.

But it is not reason enough for me not to use it.
I find it important that is doesn't make a copy in the success case, I
don't care very much for the error case.

Would you mind elaborating on the real reasons why I shouldn't use it?

cheers,
Matthias

>
> - Sebastian
>
>
> >
> > cheers,
> > Matthias
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

2019-01-02 Thread Matthias Geier
Hi Sebastian.

Thanks for the clarification.

On Sun, Dec 30, 2018 at 5:25 PM Sebastian Berg wrote:
> On Sun, 2018-12-30 at 16:03 +0100, Matthias Geier wrote:
> > On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:
> > > On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
> > > > Hi Sebastian.
> > > >
> > > > I don't have an opinion (yet) about this matter, but I have a
> > > > question:
> > > >
> > > > On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
> > > >
> > > > [...]
> > > >
> > > > > new_arr = arr.reshape(new_shape)
> > > > > assert np.may_share_memory(arr, new_arr)
> > > > >
> > > > > # Which is sometimes -- but should not be -- written as:
> > > > > arr.shape = new_shape  # unnecessary container modification
> > > >
> > > > [...]
> > > >
> > > > Why is this discouraged?
> > > >
> > > > Why do you call this "unnecessary container modification"?
> > > >
> > > > I've used this idiom in the past for exactly those cases where I
> > > > wanted to make sure no copy is made.
> > > >
> > > > And if we are not supposed to assign to arr.shape, why is it
> > > > allowed
> > > > in the first place?
> > >
> > > Well, this may be a matter of taste, but say you have an object
> > > that
> > > stores an array:
> > >
> > > class MyObject:
> > > def __init__(self):
> > > self.myarr = some_array
> > >
> > >
> > > Now, lets say I do:
> > >
> > > def some_func(arr):
> > > # Do something with the array:
> > > arr.shape = -1
> > >
> > > myobject = MyObject()
> > > some_func(myobject)
> > >
> > > then myobject will suddenly have the wrong shape stored. In most
> > > cases
> > > this is harmless, but I truly believe this is exactly why we have
> > > views
> > > and why they are so awesome.
> > > The content of arrays is mutable, but the array object itself
> > > should
> > > not be muted normally.
> >
> > Thanks for the example! I don't understand its point, though.
> > Also, it's not working since MyObject doesn't have a .shape
> > attribute.
> >
>
> The example should have called `some_func(myobject.arr)`. The thing is
> that if you have more references to the same array around, you change
> all their shapes. And if those other references are there for a reason,
> that is not what you want.
>
> That does not matter much in most cases, but it could change the shape
> of an array in a completely different place then intended. Creating a
> new view is cheap, so I think such things should be avoided.
>
> I admit, most code will effectively do:
> arr = input_arr[...]  # create a new view
> arr.shape = ...
>
> so that there is no danger. But conceptually, I do not think there
> should be a danger of magically changing the shape of a stored array in
> a different part of the code.
>
> Does that make some sense? Maybe shorter example:
>
> arr = np.arange(10)
> arr2 = arr
> arr2.shape = (5, 2)
>
> print(arr.shape)  # also (5, 2)
>
> so the arr container (shape, dtype) is changed/muted. I think we expect
> that for content here, but not for the shape.

Thanks for the clarification, I think I now understand your example.

However, the behavior you are describing is just like the normal
reference semantics of Python itself.

If you have multiple identifiers bound to the same (mutable) object,
you'll always have this "problem".

I think every Python user should be aware of this behavior, but I
don't think it is reason to discourage assigning to arr.shape.

Coming back to the original suggestion of this thread:
Since assigning to arr.shape makes sure no copy of the array data is
made, I don't think it's necessary to add a new no-copy argument to
reshape().

But the bug you mentioned ("on error the `arr.shape = ...` code
currently creates the copy temporarily") should probably be fixed at
some point ...

cheers,
Matthias

>
> - Sebastian
>
>
> > > There may be some corner cases, but a lot of the
> > > "than why is it allowed" questions are answered with: for history
> > > reasons.
> >
> > OK, that's a good point.
> >
> > > By the way, on error the `arr.shape = ...` code currently creates
> > > the
> > > copy temporarily.
> >

Re: [Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

2019-01-07 Thread Matthias Geier
On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
>
> On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > Hi Sebastian.
> >
> > Thanks for the clarification.
> >
> 
> > > print(arr.shape)  # also (5, 2)
> > >
> > > so the arr container (shape, dtype) is changed/muted. I think we
> > > expect
> > > that for content here, but not for the shape.
> >
> > Thanks for the clarification, I think I now understand your example.
> >
> > However, the behavior you are describing is just like the normal
> > reference semantics of Python itself.
> >
> > If you have multiple identifiers bound to the same (mutable) object,
> > you'll always have this "problem".
> >
> > I think every Python user should be aware of this behavior, but I
> > don't think it is reason to discourage assigning to arr.shape.
>
> Well, I doubt I will convince you.

I think we actually have quite little disagreement.

I agree with you on what should be done *most of the time*, but I
wouldn't totally discourage mutating NumPy array shapes, because I
think in the right circumstances it can be very useful.

> But want to point out that a numpy
> array is:
>
>   * underlying data
>   * shape/strides (pointing to the exact data)
>   * data type (interpret the data)
>
> Arrays are mutable, but this is only half true from my perspective.
> Everyone using numpy should be aware of "views", i.e. that the content
> of the underlying data can change.

I agree, everyone should be aware of that.

> However, if I have a read-only array, and pass it around, I would not
> expect it to change. That is because while the underlying data is
> muted, how this data is accessed and interpreted is not.
>
> In other words, I see array objects as having two sides to them [0]:
>
>   * Underlying data   -> normally mutable and often muted
>   * container:-> not muted by almost all code
>   * shape/strides
>   * data type

Exactly: "almost all code".

Most of the time I would not assign to arr.shape, but in some rare
occasions I find it very useful.

And one of those rare occasions is when you want guaranteed no-copy behavior.

There are also some (most likely significantly rarer) cases where I
would modify arr.strides.

> I realize that in some cases muting the container metadata happens. But
> I do believe it should be as minimal as possible. And frankly, probably
> one could do away with it completely.

I guess that's the only point where we disagree.

I wouldn't completely discourage it and I would definitely not remove
the functionality.

> Another example for where it is bad would be a threaded environment. If
> a python function temporarily changes the shape of an array to read
> from it without creating a view first, this will break multi-threaded
> access to that array.

Sure, let's not use it while multi-threading then.

I still think that's not at all a reason to remove the feature.

There are some things that are problematic when multi-threading, but
that's typically not reason enough to completely disallow them.

cheers,
Matthias

>
> - Sebastian
>
>
> [0] I tried to find other examples for such a split. Maybe a
> categorical/state object which is allowed change value/state. But the
> list of possible states cannot change.
>
>
> > Coming back to the original suggestion of this thread:
> > Since assigning to arr.shape makes sure no copy of the array data is
> > made, I don't think it's necessary to add a new no-copy argument to
> > reshape().
> >
> > But the bug you mentioned ("on error the `arr.shape = ...` code
> > currently creates the copy temporarily") should probably be fixed at
> > some point ...
> >
> > cheers,
> > Matthias
> >
> > > - Sebastian
> > >
> > >
> > > > > There may be some corner cases, but a lot of the
> > > > > "than why is it allowed" questions are answered with: for
> > > > > history
> > > > > reasons.
> > > >
> > > > OK, that's a good point.
> > > >
> > > > > By the way, on error the `arr.shape = ...` code currently
> > > > > creates
> > > > > the
> > > > > copy temporarily.
> > > >
> > > > That's interesting and it should probably be fixed.
> > > >
> > > > But it is not reason enough for me not to use it.
> > > > I find it important that is doesn't make a copy in the success
> > > > case,
> > > > I
> > > > don't care very much for the error case.
> > > >
> > > > Would you mind elaborating on the real reasons why I shouldn't
> > > > use
> > > > it?
> > > >
> > > > cheers,
> > > > Matthias
> > > >
> > > > > - Sebastian
> > > > >
> > > > >
> > > > > > cheers,
> > > > > > Matthias
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion