Re: sorting 1172026 entries

2012-05-07 Thread Ian Kelly
On Mon, May 7, 2012 at 3:52 PM, Cameron Simpson wrote: > | (or add 50% or something) each > | time, meaning that as n increases, the frequency of reallocations > | decreases - hence the O(1) amortized time. > > Hmm, yes. But it is only O(1) for doubling. If one went with a smaller > increment (to

Re: sorting 1172026 entries

2012-05-07 Thread Cameron Simpson
On 07May2012 11:02, Chris Angelico wrote: | On Mon, May 7, 2012 at 10:31 AM, Cameron Simpson wrote: | > I didn't mean per .append() call (which I'd expect to be O(n) for large | > n), I meant overall for the completed list. | > | > Don't the realloc()s make it O(n^2) overall for large n? The list

Re: sorting 1172026 entries

2012-05-06 Thread Chris Angelico
On Mon, May 7, 2012 at 10:31 AM, Cameron Simpson wrote: > I didn't mean per .append() call (which I'd expect to be O(n) for large > n), I meant overall for the completed list. > > Don't the realloc()s make it O(n^2) overall for large n? The list > must get copied when the underlying space fills. I

Re: sorting 1172026 entries

2012-05-06 Thread Cameron Simpson
On 06May2012 17:10, Chris Rebert wrote: | On Sun, May 6, 2012 at 4:54 PM, Cameron Simpson wrote: | > On 06May2012 18:36, J. Mwebaze wrote: | > | > for filename in txtfiles: | > | >    temp=[] | > | >    f=open(filename) | > | >    for line in f.readlines(): | > | >      line = line.strip() | > |

Re: sorting 1172026 entries

2012-05-06 Thread Chris Rebert
On Sun, May 6, 2012 at 4:54 PM, Cameron Simpson wrote: > On 06May2012 18:36, J. Mwebaze wrote: > | > for filename in txtfiles: > | >    temp=[] > | >    f=open(filename) > | >    for line in f.readlines(): > | >      line = line.strip() > | >      line=line.split() > | >      temp.append((parser.

Re: sorting 1172026 entries

2012-05-06 Thread Cameron Simpson
On 06May2012 18:36, J. Mwebaze wrote: | > for filename in txtfiles: | >temp=[] | >f=open(filename) | >for line in f.readlines(): | > line = line.strip() | > line=line.split() | > temp.append((parser.parse(line[0]), float(line[1]))) Have you timed the different parts of

Re: sorting 1172026 entries

2012-05-06 Thread Dan Stromberg
How much physical RAM (not the virtual memory, but the physical memory) does your machine have available? We know the number of elements in your dataset, but how big are the individual elements? If a sort is never completing, you're probably swapping. list.sort() is preferrable to sorted(list),

Re: sorting 1172026 entries

2012-05-06 Thread Mark Lawrence
On 06/05/2012 20:11, Alec Taylor wrote: Also, is there a reason you are sorting the data-set after insert rather than using a self-sorting data-structure? A well chosen self-sorting data-structure is always more efficient when full data flow is controlled. I.e.: first insert can be modified to

Re: sorting 1172026 entries

2012-05-06 Thread Alec Taylor
Also, is there a reason you are sorting the data-set after insert rather than using a self-sorting data-structure? A well chosen self-sorting data-structure is always more efficient when full data flow is controlled. I.e.: first insert can be modified to use the self-sorting data-structure I can

Re: sorting 1172026 entries

2012-05-06 Thread Stefan Behnel
J. Mwebaze, 06.05.2012 18:29: > sorry see, corrected code > > for filename in txtfiles: >temp=[] >f=open(filename) >for line in f.readlines(): > line = line.strip() > line=line.split() > temp.append((parser.parse(line[0]), float(line[1]))) >temp=sorted(temp) >wit

Re: sorting 1172026 entries

2012-05-06 Thread Chris Rebert
On Sun, May 6, 2012 at 9:29 AM, J. Mwebaze wrote: > sorry see, corrected code > > > for filename in txtfiles: >    temp=[] >    f=open(filename) Why not use `with` here too? >    for line in f.readlines(): readlines() reads *the entire file contents* into memory all at once! Use `for line in f:

Re: sorting 1172026 entries

2012-05-06 Thread xDog Walker
On Sunday 2012 May 06 09:29, J. Mwebaze wrote: >  temp=sorted(temp) Change to: temp.sort() RTFM on sorted() and .sort(). -- Yonder nor sorghum stenches shut ladle gulls stopper torque wet strainers. -- http://mail.python.org/mailman/listinfo/python-list

Re: sorting 1172026 entries

2012-05-06 Thread Gary Herron
On 05/06/2012 09:29 AM, J. Mwebaze wrote: sorry see, corrected code for filename in txtfiles: temp=[] f=open(filename) for line in f.readlines(): line = line.strip() line=line.split() temp.append((parser.parse(line[0]), float(line[1]))) temp=sorted(temp) with open(

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
I noticed the error in code please ignore this post.. On Sun, May 6, 2012 at 6:29 PM, J. Mwebaze wrote: > sorry see, corrected code > > > for filename in txtfiles: >temp=[] >f=open(filename) >for line in f.readlines(): > line = line.strip() > line=line.split() > temp.a

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
sorry see, corrected code for filename in txtfiles: temp=[] f=open(filename) for line in f.readlines(): line = line.strip() line=line.split() temp.append((parser.parse(line[0]), float(line[1]))) temp=sorted(temp) with open(filename.strip('.txt')+ '.sorted', 'wb') as

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
I have attached one of the files, try to sort and let me know the results. Kindly sort by date. ooops - am told the file exceed 25M. below is the code import glob txtfiles =glob.glob('*.txt') import dateutil.parser as parser for filename in txtfiles: temp=[] f=open(filename) for line

Re: sorting 1172026 entries

2012-05-06 Thread Devin Jeanpierre
On Sun, May 6, 2012 at 12:11 PM, J. Mwebaze wrote: > [ (datatime, int) ] * 1172026 I can't duplicate slowness. It finishes fairly quickly here. Maybe you could try posting specific code? It might be something else that is making your program take forever. >>> x = [(datetime.datetime.now() + date

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
On Sun, May 6, 2012 at 6:09 PM, Devin Jeanpierre wrote: > On Sun, May 6, 2012 at 11:57 AM, J. Mwebaze wrote: > > I have several lists with approx 1172026 entries. I have been trying to > sort > > the records, but have failed.. I tried lists.sort() i also trired sorted > > python's inbuilt method.

Re: sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
On Sun, May 6, 2012 at 6:07 PM, Benjamin Schollnick wrote: > > On May 6, 2012, at 11:57 AM, J. Mwebaze wrote: > > I have several lists with approx 1172026 entries. I have been trying to > sort the records, but have failed.. I tried lists.sort() i also trired > sorted python's inbuilt method. This

Re: sorting 1172026 entries

2012-05-06 Thread Devin Jeanpierre
On Sun, May 6, 2012 at 11:57 AM, J. Mwebaze wrote: > I have several lists with approx 1172026 entries. I have been trying to sort > the records, but have failed.. I tried lists.sort() i also trired sorted > python's inbuilt method. This has been running for weeks. Sorting 1172026 random floats ta

Re: sorting 1172026 entries

2012-05-06 Thread Benjamin Schollnick
On May 6, 2012, at 11:57 AM, J. Mwebaze wrote: > I have several lists with approx 1172026 entries. I have been trying to sort > the records, but have failed.. I tried lists.sort() i also trired sorted > python's inbuilt method. This has been running for weeks. > > Any one knows of method that

sorting 1172026 entries

2012-05-06 Thread J. Mwebaze
I have several lists with approx 1172026 entries. I have been trying to sort the records, but have failed.. I tried lists.sort() i also trired sorted python's inbuilt method. This has been running for weeks. Any one knows of method that can handle such lists. cheers -- *Mob UG: +256 (0) 70 17