On Sat, Apr 2, 2016 at 6:32 AM Peter Otten <__pete...@web.de> wrote: > Vito De Tullio wrote: > > > Michael Selik wrote: > > > >>> > I need to scan a list of strings. If one of the elements matches the > >>> > beginning of a search keyword, that element needs to snap to the > front > >>> > of the list. > >>> > >>> I know this post regards the function passing, but, on you specific > >>> problem, > >>> can't you just ... sort the list with a custom key? > >> > >> If the number of matches is small relative to the size of the list, I'd > >> expect the sort would be slower than most of the other suggestions. > > > > umm... I take the liberty to set up a little test > > > > $ cat test_sort_prefix.py > > #!/usr/bin/env python3 > > > > from sys import argv > > from timeit import timeit > > > > > > def sort_in_place(l): > > def key(e): > > return not e.startswith('yes') > > l.sort(key=key) > > > > > > def original_solution(mylist): > > for i in range(len(mylist)): > > if(mylist[i].startswith('yes')): > > mylist[:] = [mylist[i]] + mylist[:i] + mylist[i+1:] > > original_solution() reverses the order of the matches while sort_in_place() > doesn't. If the order is not relevant I'd try something like > > def exchange(items): > pos = 0 > for index, item in enumerate(items): > if item.startswith("yes"): > items[pos], items[index] = item, items[pos] > pos += 1 > > > def main(): > > if argv[1] == 'sort_in_place': > > f = sort_in_place > > elif argv[1] == 'original_solution': > > f = original_solution > > else: > > raise Exception() > > > > nomatches = int(argv[2]) > > matches = int(argv[3]) > > > > l = ["no_" for _ in range(nomatches)] + ["yes_" for _ in > > range(matches)] > > Python's timsort knows how to deal with (partially) sorted lists. I suggest > that you add > > random.seed(42) # same list for all script invocations > random.shuffle(l) > > here to spread the matches somewhat over the list. > > > print(timeit("f(l)", number=100, globals={"f": f, "l": l})) > > Remember that f(l) modifies l, so all but the first invocation will see a > list where all matches are at the beginning. In other words: in 99 out of > 100 runs the list is already sorted. > > While adding the overhead of copying the list I would expect the results of > > timeit("f(l[:])", ...) > > to be more realistic. > > > if __name__ == '__main__': > > main() > > $ ./test_sort_prefix.py sort_in_place 1000000 3 > > 33.260575089996564 > > $ ./test_sort_prefix.py original_solution 1000000 3 > > 35.93009542999789 > > $ > > > > > > in my tests there is no sensible difference between the two solutions... > > and the number of matches is risible :) > > Indeed, as the number of matches rises your sort() approach will likely > fare > better. > > Your conclusions may be correct, but the benchmark to support them is > flawed. > > Option 1: scan and match, pop, insert Scan and match is O(n) pop is O(n) ... gotta shift everything over one after deleting insert is O(n) on a list, better use deque.appendleft for O(1)
Option 2: map keyfunc, sort Mapping the keyfunction is O(n) (tim)Sorting is O(n log n) worst case, O(n) best case. Option 1 using a deque will be O(n). Option 2 will be O(n log n) worst case and O(n) best case. On small datasets the constant factors matter, but then it's small, so who cares about speed? (partially joking) -- https://mail.python.org/mailman/listinfo/python-list