[python-uk] Job post

2017-06-13 Thread Safe Hammad
Hi All,

In case you might be interested, my company Arctic Shores is looking for a
Senior Python Developer to join me and my team in sunny Manchester. Full
details here:

http://pythonjobs.github.io/jobs/arctic-shores-senior-python-developer.html

If you've ever yearned to work in the Beautiful North and you're interested
in finding out more, please get in touch with me directly.

Best,

Safe

--
Safe Hammad
s...@arcticshores.com
___
python-uk mailing list
python-uk@python.org
https://mail.python.org/mailman/listinfo/python-uk


Re: [python-uk] A stack with better performance than using a list

2017-06-13 Thread Jonathan Hartley

You are right, when popping an empty stack I should probably raise.



On 2017-06-08 13:06, Samuel F wrote:

It may have failed for a different reason, (hard to say without the
original question and answer).

In the case where the stack is empty, you are returning None, was that
the requirement? (Likely to have been -1)

Sam

On Thu, 8 Jun 2017 at 17:27, Jonathan Hartley 
wrote:


Yep, that's a great elimination of the suspicious small overheads.

line_profiler is beautiful, I'll definitely be adding it to my
toolbox, thanks for that!

I tried a variant of accumulating the output and printing it all as
a single string, but of course this didn't help, printing is already
buffered.

Jonathan

On 6/8/2017 03:54, Stestagg wrote:

I honestly can't see a way to improve this in python.  My best
solution is:

def main(lines):
stack = []
sa = stack.append
sp = stack.pop
si = stack.__getitem__
for line in lines:
meth = line[:3]
if meth == b'pus':
sa(int(line[5:]))
elif meth == b'pop':
sp()
else:
parts = line[15:].split()
end = len(stack)-1
amount = int(parts[1])
for x in range(int(parts[0])):
index = end - x
stack[index] += amount
print(stack[-1] if stack else None)

which comes out about 25% faster than your solution.

One tool that's interesting to use here is: line_profiler:
https://github.com/rkern/line_profiler

putting a @profile decorator on the above main() call, and running
with kernprof produces the following output:

Line #  Hits Time  Per Hit   % Time  Line Contents

==

12   @profile

13   def main(lines):

14 14  4.0  0.0  stack = []

15   201   949599  0.5 11.5  for line in
lines:

16   200  1126944  0.6 13.7  meth =
line[:3]

17   200   974635  0.5 11.8  if meth ==
b'pus':

18   100  1002733  1.0 12.2
stack.append(int(line[5:]))

19   100   478756  0.5  5.8  elif meth
== b'pop':

2099   597114  0.6  7.2
stack.pop()

21   else:

22 16  6.0  0.0  parts =
line[15:].split()

23 12  2.0  0.0  end =
len(stack)-1

24 11  1.0  0.0  amount
= int(parts[1])

2551   241227  0.5  2.9  for x
in range(int(parts[0])):

2650   273477  0.5  3.3
index = end - x

2750   309033  0.6  3.7
stack[index] += amount

28   200  2295803  1.1 27.8
print(stack[-1])

which shows that there's no obvious bottleneck (line by line) here
(for my sample data).

Note the print() overhead dominates the runtime, and that's with me
piping the output to /dev/null directly.

I had a go at using arrays, deques, and numpy arrays in various ways
without luck, but we're getting fairly close to the native python
statement execution overhead here (hence folding it all into one
function).

My only thoughts would be to see if there were some magic that could
be done by offloading the work onto a non-python library somehow.

Another thing that might help some situations (hence my previous
questions) would be to implement the add_to_first_n as a lazy
operator (i.e. have a stack of the add_to_first_n values and
dynamically add to the results of pop() but that would proabably be
much slow in the average case.

Steve

On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley
 wrote:

Hey.

Thanks for engaging, but I can't help with the most important of
those questions - the large data sets on which my solution failed
due to timeout are hidden from candidates. Not unreasonable to
assume that they do exercise deep stacks, and large args to
add_to_first_n, etc.

Yes, the input looks exactly like your example. All args are
integers. The question asked for output corresponding to the top of
the stack after every operation. I omitted this print from inside
the 'for' loop in 'main', thinking it irrelevant.

I converted to integers inside 'dispatch'. 'args' must have actually
been created with:

args = [int(i) for i in tokens[1:]]

Where len(tokens) is never going to be bigger than 3.

Return values (from 'pop') were unused.

On 6/7/2017 13:25, Stestagg wrote:

Do you have any more context?
For example, is the add_to_first_n likely to be called with very
large numbers, or very often? Does the stack get very deep, or stay
shallow?

I'm assuming that lines look like this:

push 1
push 2
add_to_first_n 2 10
pop
pop

with all arguments as integers, and the final value being returned
from main()?
How did you convert from string inputs to numeric values?
How did you manage return values?

:D

On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley
 wrote:

I recently submitted a solution to a coding challenge, in an
employment c

Re: [python-uk] A stack with better performance than using a list

2017-06-13 Thread Jonathan Hartley


Very interesting! Thanks for digging deeper and sharing.

I was thinking about horrible complicated structures like storing the 
'add_to_first_n' params in parallel to the stack, to apply them at 'pop' 
time, which doesn't work at all. As is so often the case with these 
things, your solution of pushing those markers onto the stack makes me 
feel silly for not realising sooner.


Thanks to everyone for the interesting discussion.

Jonathan



On 2017-06-08 13:17, Stestagg wrote:

I tracked down the challenge on the site, and have a working solution
(I won't share for obvious reasons). Basically the timeouts were being
caused by 'add_to_first_n' being called in horrible ways in the test
cases.

Because add_to_first_n alters the bottom of the stack, you can just
push a marker onto the stack rather than iterating and mutating each
entry, doing this made those test cases pass

Personally, I think it's not a well-described problem, because it's
expecting you to tune the algo to specific shapes of data without
allowing any visibility on the data, or a description of what to code
for.  An algo junkie may jump straight to the optimized version, but a
pragmatic developer would, in my opinion, hesitate to do that without
any actual evidence that the problem required it.

Steve

On Thu, Jun 8, 2017 at 5:27 PM Jonathan Hartley 
wrote:


Yep, that's a great elimination of the suspicious small overheads.

line_profiler is beautiful, I'll definitely be adding it to my
toolbox, thanks for that!

I tried a variant of accumulating the output and printing it all as
a single string, but of course this didn't help, printing is already
buffered.

Jonathan

On 6/8/2017 03:54, Stestagg wrote:

I honestly can't see a way to improve this in python.  My best
solution is:

def main(lines):
stack = []
sa = stack.append
sp = stack.pop
si = stack.__getitem__
for line in lines:
meth = line[:3]
if meth == b'pus':
sa(int(line[5:]))
elif meth == b'pop':
sp()
else:
parts = line[15:].split()
end = len(stack)-1
amount = int(parts[1])
for x in range(int(parts[0])):
index = end - x
stack[index] += amount
print(stack[-1] if stack else None)

which comes out about 25% faster than your solution.

One tool that's interesting to use here is: line_profiler:
https://github.com/rkern/line_profiler

putting a @profile decorator on the above main() call, and running
with kernprof produces the following output:

Line #  Hits Time  Per Hit   % Time  Line Contents

==

12   @profile

13   def main(lines):

14 14  4.0  0.0  stack = []

15   201   949599  0.5 11.5  for line in
lines:

16   200  1126944  0.6 13.7  meth =
line[:3]

17   200   974635  0.5 11.8  if meth ==
b'pus':

18   100  1002733  1.0 12.2
stack.append(int(line[5:]))

19   100   478756  0.5  5.8  elif meth
== b'pop':

2099   597114  0.6  7.2
stack.pop()

21   else:

22 16  6.0  0.0  parts =
line[15:].split()

23 12  2.0  0.0  end =
len(stack)-1

24 11  1.0  0.0  amount
= int(parts[1])

2551   241227  0.5  2.9  for x
in range(int(parts[0])):

2650   273477  0.5  3.3
index = end - x

2750   309033  0.6  3.7
stack[index] += amount

28   200  2295803  1.1 27.8
print(stack[-1])

which shows that there's no obvious bottleneck (line by line) here
(for my sample data).

Note the print() overhead dominates the runtime, and that's with me
piping the output to /dev/null directly.

I had a go at using arrays, deques, and numpy arrays in various ways
without luck, but we're getting fairly close to the native python
statement execution overhead here (hence folding it all into one
function).

My only thoughts would be to see if there were some magic that could
be done by offloading the work onto a non-python library somehow.

Another thing that might help some situations (hence my previous
questions) would be to implement the add_to_first_n as a lazy
operator (i.e. have a stack of the add_to_first_n values and
dynamically add to the results of pop() but that would proabably be
much slow in the average case.

Steve

On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley
 wrote:

Hey.

Thanks for engaging, but I can't help with the most important of
those questions - the large data sets on which my solution failed
due to timeout are hidden from candidates. Not unreasonable to
assume that they do exercise deep stacks, and large args to
add_to_first_n, etc.

Yes, the input looks exactly like your example. All args are
integers. The question asked for output

Re: [python-uk] A stack with better performance than using a list

2017-06-13 Thread Mark Lawrence via python-uk

On 07/06/2017 18:50, Jonathan Hartley wrote:
I recently submitted a solution to a coding challenge, in an employment 
context. One of the questions was to model a simple stack. I wrote a 
solution which appended and popped from the end of a list. This worked, 
but failed with timeouts on their last few automated tests with large 
(hidden) data sets.


 From memory, I think I had something pretty standard:

class Stack:

def __init__(self):
 self.storage = []

def push(arg):
 self.storage.append(arg)

def pop():
return self.storage.pop() if self.storage else None

def add_to_first_n(n, amount):
for n in range(n):
 self.storage[n] += amount

def dispatch(self, line)
 tokens = line.split()
 method = getattr(self, tokens[0])
 args = tokens[1:]
 method(*args)

def main(lines):
 stack = Stack()
for line in lines:
 stack.dispatch(line)


(will that formatting survive? Apologies if not)

Subsequent experiments have confirmed that appending to and popping from 
the end of lists are O(1), amortized.


So why is my solution too slow?

This question was against the clock, 4th question of 4 in an hour. So I 
wasn't expecting to produce Cython or C optimised code in that timeframe 
(Besides, my submitted .py file runs on their servers, so the 
environment is limited.)


So what am I missing, from a performance perspective? Are there other 
data structures in stdlib which are also O(1) but with a better constant?


Ah. In writing this out, I have begun to suspect that my slicing of 
'tokens' to produce 'args' in the dispatch is needlessly wasting time. 
Not much, but some.


Thoughts welcome,

 Jonathan

--
Jonathan hartleytart...@tartley.com http://tartley.com
Made out of meat.   +1 507-513-1101twitter/skype: tartley



Any objections to me putting this thread up on the main Python mailing 
list and reddit as it seems rather interesting?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
python-uk mailing list
python-uk@python.org
https://mail.python.org/mailman/listinfo/python-uk


Re: [python-uk] A stack with better performance than using a list

2017-06-13 Thread Jonathan Hartley


On 06/13/2017 09:04 AM, Mark Lawrence via python-uk wrote:

On 07/06/2017 18:50, Jonathan Hartley wrote:
I recently submitted a solution to a coding challenge, in an 
employment context. One of the questions was to model a simple stack. 
I wrote a solution which appended and popped from the end of a list. 
This worked, but failed with timeouts on their last few automated 
tests with large (hidden) data sets.


 From memory, I think I had something pretty standard:

class Stack:

def __init__(self):
 self.storage = []

def push(arg):
 self.storage.append(arg)

def pop():
return self.storage.pop() if self.storage else None

def add_to_first_n(n, amount):
for n in range(n):
 self.storage[n] += amount

def dispatch(self, line)
 tokens = line.split()
 method = getattr(self, tokens[0])
 args = tokens[1:]
 method(*args)

def main(lines):
 stack = Stack()
for line in lines:
 stack.dispatch(line)


(will that formatting survive? Apologies if not)

Subsequent experiments have confirmed that appending to and popping 
from the end of lists are O(1), amortized.


So why is my solution too slow?

This question was against the clock, 4th question of 4 in an hour. So 
I wasn't expecting to produce Cython or C optimised code in that 
timeframe (Besides, my submitted .py file runs on their servers, so 
the environment is limited.)


So what am I missing, from a performance perspective? Are there other 
data structures in stdlib which are also O(1) but with a better 
constant?


Ah. In writing this out, I have begun to suspect that my slicing of 
'tokens' to produce 'args' in the dispatch is needlessly wasting 
time. Not much, but some.


Thoughts welcome,

 Jonathan

--
Jonathan hartleytart...@tartley.com http://tartley.com
Made out of meat.   +1 507-513-1101twitter/skype: tartley



Any objections to me putting this thread up on the main Python mailing 
list and reddit as it seems rather interesting?




I'd rather not if I get any say in that, because I agreed in the T&C of 
the coding challenge that I wouldn't discuss the problem or solutions 
with others, and I don't want to annoy them right now. How about in a 
month? :-)



--
Jonathan Hartleytart...@tartley.comhttp://tartley.com
Made out of meat.   +1 507-513-1101twitter/skype: tartley

___
python-uk mailing list
python-uk@python.org
https://mail.python.org/mailman/listinfo/python-uk


Re: [python-uk] A stack with better performance than using a list

2017-06-13 Thread Mark Lawrence via python-uk

On 13/06/2017 16:29, Jonathan Hartley wrote:


On 06/13/2017 09:04 AM, Mark Lawrence via python-uk wrote:

On 07/06/2017 18:50, Jonathan Hartley wrote:
I recently submitted a solution to a coding challenge, in an 
employment context. One of the questions was to model a simple stack. 
I wrote a solution which appended and popped from the end of a list. 
This worked, but failed with timeouts on their last few automated 
tests with large (hidden) data sets.


 From memory, I think I had something pretty standard:

class Stack:

def __init__(self):
 self.storage = []

def push(arg):
 self.storage.append(arg)

def pop():
return self.storage.pop() if self.storage else None

def add_to_first_n(n, amount):
for n in range(n):
 self.storage[n] += amount

def dispatch(self, line)
 tokens = line.split()
 method = getattr(self, tokens[0])
 args = tokens[1:]
 method(*args)

def main(lines):
 stack = Stack()
for line in lines:
 stack.dispatch(line)


(will that formatting survive? Apologies if not)

Subsequent experiments have confirmed that appending to and popping 
from the end of lists are O(1), amortized.


So why is my solution too slow?

This question was against the clock, 4th question of 4 in an hour. So 
I wasn't expecting to produce Cython or C optimised code in that 
timeframe (Besides, my submitted .py file runs on their servers, so 
the environment is limited.)


So what am I missing, from a performance perspective? Are there other 
data structures in stdlib which are also O(1) but with a better 
constant?


Ah. In writing this out, I have begun to suspect that my slicing of 
'tokens' to produce 'args' in the dispatch is needlessly wasting 
time. Not much, but some.


Thoughts welcome,

 Jonathan

--
Jonathan hartleytart...@tartley.com http://tartley.com
Made out of meat.   +1 507-513-1101twitter/skype: tartley



Any objections to me putting this thread up on the main Python mailing 
list and reddit as it seems rather interesting?




I'd rather not if I get any say in that, because I agreed in the T&C of 
the coding challenge that I wouldn't discuss the problem or solutions 
with others, and I don't want to annoy them right now. How about in a 
month? :-)



Fine by me, on my calendar for 13th July :-)

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
python-uk mailing list
python-uk@python.org
https://mail.python.org/mailman/listinfo/python-uk