Re: Implementing file reading in C/Python

2009-01-23 Thread mk
John Machin wrote: The factor of 30 indeed does not seem right -- I have done somewhat similar stuff (calculating Levenshtein distance [edit distance] on words read from very large files), coded the same algorithm in pure Python and C++ (using linked lists in C++) and Python version was 2.5 times

Re: Implementing file reading in C/Python

2009-01-14 Thread David Bolen
Johannes Bauer writes: > Yup, I changed the Python code to behave the same way the C code did - > however overall it's not much of an improvement: Takes about 15 minutes > to execute (still factor 23). Not sure this is completely fair if you're only looking for a pure Python solution, but to be

Re: Implementing file reading in C/Python

2009-01-13 Thread Marc 'BlackJack' Rintsch
On Mon, 12 Jan 2009 21:26:27 -0500, Steve Holden wrote: > The very idea of mapping part of a process's virtual address space onto > an area in which "low-level system code resides, so writing to this > region may corrupt the system, with potentially catastrophic > consequences" seems to be asking

Re: Implementing file reading in C/Python

2009-01-12 Thread Grant Edwards
On 2009-01-13, Steve Holden wrote: > sturlamolden wrote: >> On Jan 12, 1:52 pm, Sion Arrowsmith >> wrote: >> >>> And today's moral is: try it before posting. Yeah, I can map a 2GB >>> file no problem, complete with associated 2GB+ allocated VM. The >>> addressing is clearly not working how I was

Re: Implementing file reading in C/Python

2009-01-12 Thread Steve Holden
sturlamolden wrote: > On Jan 12, 1:52 pm, Sion Arrowsmith > wrote: > >> And today's moral is: try it before posting. Yeah, I can map a 2GB >> file no problem, complete with associated 2GB+ allocated VM. The >> addressing is clearly not working how I was expecting it too. > > The virtual memory s

Re: Implementing file reading in C/Python

2009-01-12 Thread Steve Holden
sturlamolden wrote: > On Jan 12, 1:52 pm, Sion Arrowsmith > wrote: > >> And today's moral is: try it before posting. Yeah, I can map a 2GB >> file no problem, complete with associated 2GB+ allocated VM. The >> addressing is clearly not working how I was expecting it too. > > The virtual memory s

Re: Implementing file reading in C/Python

2009-01-12 Thread Grant Edwards
On 2009-01-12, Sion Arrowsmith wrote: > In case the cancel didn't get through: > > Sion Arrowsmith wrote: >>Grant Edwards wrote: >>>2GB should easily fit within the process's virtual memory >>>space. >>Assuming you're in a 64bit world. Me, I've only got 2GB of address >>space available to play

Re: Implementing file reading in C/Python

2009-01-12 Thread Grant Edwards
On 2009-01-12, Sion Arrowsmith wrote: > Grant Edwards wrote: >>On 2009-01-09, Sion Arrowsmith wrote: >>> Grant Edwards wrote: If I were you, I'd try mmap()ing the file instead of reading it into string objects one chunk at a time. >>> You've snipped the bit further on in that sentence

Re: Implementing file reading in C/Python

2009-01-12 Thread Hrvoje Niksic
sturlamolden writes: > On Jan 9, 6:41 pm, Sion Arrowsmith > wrote: > >> You've snipped the bit further on in that sentence where the OP >> says that the file of interest is 2GB. Do you still want to try >> mmap'ing it? > > Python's mmap object does not take an offset parameter. If it did, one >

Re: Implementing file reading in C/Python

2009-01-12 Thread sturlamolden
On Jan 12, 1:52 pm, Sion Arrowsmith wrote: > And today's moral is: try it before posting. Yeah, I can map a 2GB > file no problem, complete with associated 2GB+ allocated VM. The > addressing is clearly not working how I was expecting it too. The virtual memory space of a 32 bit process is 4 GB.

Re: Implementing file reading in C/Python

2009-01-12 Thread Sion Arrowsmith
In case the cancel didn't get through: Sion Arrowsmith wrote: >Grant Edwards wrote: >>2GB should easily fit within the process's virtual memory >>space. >Assuming you're in a 64bit world. Me, I've only got 2GB of address >space available to play in -- mmap'ing all of it out of the question. A

Re: Implementing file reading in C/Python

2009-01-12 Thread sturlamolden
On Jan 9, 6:41 pm, Sion Arrowsmith wrote: > You've snipped the bit further on in that sentence where the OP > says that the file of interest is 2GB. Do you still want to try > mmap'ing it? Python's mmap object does not take an offset parameter. If it did, one could mmap smaller portions of the f

Re: Implementing file reading in C/Python

2009-01-12 Thread Sion Arrowsmith
Grant Edwards wrote: >On 2009-01-09, Sion Arrowsmith wrote: >> Grant Edwards wrote: >>>If I were you, I'd try mmap()ing the file instead of reading it >>>into string objects one chunk at a time. >> You've snipped the bit further on in that sentence where the >> OP says that the file of interes

Re: Implementing file reading in C/Python

2009-01-10 Thread Francesco Bochicchio
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote: > Marc 'BlackJack' Rintsch wrote: >> On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: >> >>> As this was horribly slow (20 Minutes for a 2GB file) I coded the whole >>> thing in C also: >> >> Yours took ~37 minutes for 2 GiB here. This "j

Re: Implementing file reading in C/Python

2009-01-09 Thread Rhamphoryncus
On Jan 9, 2:14 pm, Marc 'BlackJack' Rintsch wrote: > On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote: > > Marc 'BlackJack' Rintsch wrote: > > >> def iter_max_values(blocks, block_count): > >>     for i, block in enumerate(blocks): > >>         histogram = defaultdict(int) > >>         for byte in b

Re: Implementing file reading in C/Python

2009-01-09 Thread John Machin
On Jan 9, 9:56 pm, mk wrote: > The factor of 30 indeed does not seem right -- I have done somewhat > similar stuff (calculating Levenshtein distance [edit distance] on words > read from very large files), coded the same algorithm in pure Python and > C++ (using linked lists in C++) and Python ver

Re: Implementing file reading in C/Python

2009-01-09 Thread Grant Edwards
On 2009-01-09, Marc 'BlackJack' Rintsch wrote: > On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote: > >> Marc 'BlackJack' Rintsch wrote: >> >>> def iter_max_values(blocks, block_count): >>> for i, block in enumerate(blocks): >>> histogram = defaultdict(int) >>> for byte in block:

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote: > Marc 'BlackJack' Rintsch wrote: > >> def iter_max_values(blocks, block_count): >> for i, block in enumerate(blocks): >> histogram = defaultdict(int) >> for byte in block: >> histogram[byte] += 1 >> >>

Re: Implementing file reading in C/Python

2009-01-09 Thread Grant Edwards
On 2009-01-09, Sion Arrowsmith wrote: > Grant Edwards wrote: >>On 2009-01-09, Johannes Bauer wrote: >>> I've come from C/C++ and am now trying to code some Python because I >>> absolutely love the language. However I still have trouble getting >>> Python code to run efficiently. Right now I hav

Re: Implementing file reading in C/Python

2009-01-09 Thread Sion Arrowsmith
Grant Edwards wrote: >On 2009-01-09, Johannes Bauer wrote: >> I've come from C/C++ and am now trying to code some Python because I >> absolutely love the language. However I still have trouble getting >> Python code to run efficiently. Right now I have a easy task: Get a >> file, >If I were you,

Re: Implementing file reading in C/Python

2009-01-09 Thread bearophileHUGS
Johannes Bauer, I was about to start writing a faster version. I think with some care and Psyco you can go about as 5 times slower than C or something like that. To do that you need to use almost the same code for the C version, with a list of 256 ints for the frequencies, not using max() but a ma

Re: Implementing file reading in C/Python

2009-01-09 Thread Grant Edwards
On 2009-01-09, Johannes Bauer wrote: > I've come from C/C++ and am now trying to code some Python because I > absolutely love the language. However I still have trouble getting > Python code to run efficiently. Right now I have a easy task: Get a > file, If I were you, I'd try mmap()ing the file

Re: Implementing file reading in C/Python

2009-01-09 Thread rurpy
On Jan 9, 6:48 am, Johannes Bauer wrote: > mk schrieb: > > The factor of 30 indeed does not seem right -- I have done somewhat > > similar stuff (calculating Levenshtein distance [edit distance] on words > > read from very large files), coded the same algorithm in pure Python and > > C++ (using li

Re: Implementing file reading in C/Python

2009-01-09 Thread MRAB
Marc 'BlackJack' Rintsch wrote: On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: As this was horribly slow (20 Minutes for a 2GB file) I coded the whole thing in C also: Yours took ~37 minutes for 2 GiB here. This "just" ~15 minutes: #!/usr/bin/env python from __future__ import div

Re: Implementing file reading in C/Python

2009-01-09 Thread pruebauno
On Jan 9, 8:48 am, Johannes Bauer wrote: > No - and I've not known there was a profiler yet have found anything > meaningful (there seems to be an profiling C interface, but that won't > get me anywhere). Is that a seperate tool or something? Could you > provide a link? > Thanks, > Kind regards, >

Re: Implementing file reading in C/Python

2009-01-09 Thread Johannes Bauer
mk schrieb: > Johannes Bauer wrote: > >> Which takes about 40 seconds. I want the niceness of Python but a little >> more speed than I'm getting (I'd settle for factor 2 or 3 slower, but >> factor 30 is just too much). > > This probably doesn't contribute much, but have you tried using Python > p

Re: Implementing file reading in C/Python

2009-01-09 Thread Johannes Bauer
Marc 'BlackJack' Rintsch schrieb: > On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: > >> As this was horribly slow (20 Minutes for a 2GB file) I coded the whole >> thing in C also: > > Yours took ~37 minutes for 2 GiB here. This "just" ~15 minutes: Ah, ok... when implementing your sug

Re: Implementing file reading in C/Python

2009-01-09 Thread Johannes Bauer
James Mills schrieb: > What does this little tool do anyway ? > It's very interesting the images it creates > out of files. What is this called ? It has no particular name. I was toying around with the Princeton Cold Boot Attack (http://citp.princeton.edu/memory/). In particular I was interested

Re: Implementing file reading in C/Python

2009-01-09 Thread Johannes Bauer
Marc 'BlackJack' Rintsch schrieb: >> f = open(sys.argv[1], "r") > > Mode should be 'rb'. Check. >> filesize = os.stat(sys.argv[1])[6] > > `os.path.getsize()` is a little bit more readable. Check. >> print("Filesize : %d" % (filesize)) print("Image size : %dx%d" >> % (width, height)

Re: Implementing file reading in C/Python

2009-01-09 Thread mk
Johannes Bauer wrote: Which takes about 40 seconds. I want the niceness of Python but a little more speed than I'm getting (I'd settle for factor 2 or 3 slower, but factor 30 is just too much). This probably doesn't contribute much, but have you tried using Python profiler? You might have *so

Re: Implementing file reading in C/Python

2009-01-09 Thread Steve Holden
Steven D'Aprano wrote: > On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote: > >> On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch >> wrote: print("Filesize : %d" % (filesize)) print("Image size : %dx%d" % (width, height)) print("Bytes per Pixel: %d" % (blocksize))

Re: Implementing file reading in C/Python

2009-01-09 Thread Steve Holden
Marc 'BlackJack' Rintsch wrote: > On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: [...] >> print("Filesize : %d" % (filesize)) print("Image size : %dx%d" >> % (width, height)) print("Bytes per Pixel: %d" % (blocksize)) > > Why parentheses around ``print``\s "argument"? In Pyth

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: > As this was horribly slow (20 Minutes for a 2GB file) I coded the whole > thing in C also: Yours took ~37 minutes for 2 GiB here. This "just" ~15 minutes: #!/usr/bin/env python from __future__ import division, with_statement import os

Re: Implementing file reading in C/Python

2009-01-09 Thread Steven D'Aprano
On Fri, 09 Jan 2009 09:15:20 +, Marc 'BlackJack' Rintsch wrote: >> picture = { } >> havepixels = 0 >> while True: >> data = f.read(blocksize) >> if len(data) <= 0: break > > if data: > break > > is enough. You've reversed the sense of the test. The OP exits the loop w

Re: Implementing file reading in C/Python

2009-01-09 Thread Steven D'Aprano
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote: > On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch > wrote: >>> print("Filesize : %d" % (filesize)) print("Image size : >>> %dx%d" % (width, height)) print("Bytes per Pixel: %d" % (blocksize)) >> >> Why parentheses around ``

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: > datamap = { } > for i in range(len(data)): > datamap[ord(data[i])] = datamap.get(data[i], 0) + 1 Here is an error by the way: You call `ord()` just on the left side of the ``=``, so all keys in the dictionary

Re: Implementing file reading in C/Python

2009-01-09 Thread James Mills
On Fri, Jan 9, 2009 at 7:41 PM, Marc 'BlackJack' Rintsch wrote: > Please read again what I wrote. Lol I thought "<3" was a smiley! :) Sorry! cheers James -- http://mail.python.org/mailman/listinfo/python-list

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote: > On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch > wrote: >> Why parentheses around ``print``\s "argument"? In Python <3 ``print`` >> is a statement and not a function. > > Not true as of 2.6+ and 3.0+ > > print is now a functio

Re: Implementing file reading in C/Python

2009-01-09 Thread James Mills
On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch wrote: >> print("Filesize : %d" % (filesize)) print("Image size : %dx%d" >> % (width, height)) print("Bytes per Pixel: %d" % (blocksize)) > > Why parentheses around ``print``\s "argument"? In Python <3 ``print`` is > a statement a

Re: Implementing file reading in C/Python

2009-01-09 Thread Marc 'BlackJack' Rintsch
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote: > I've first tried Python. Please don't beat me, it's slow as hell and > probably a horrible solution: > > #!/usr/bin/python > import sys > import os > > f = open(sys.argv[1], "r") Mode should be 'rb'. > filesize = os.stat(sys.argv[1])[

Re: Implementing file reading in C/Python

2009-01-08 Thread Steve Holden
MRAB wrote: > Johannes Bauer wrote: >> Hello group, [and about 200 other lines there was no need to quote] [...] > Have a look at psyco: http://psyco.sourceforge.net/ Have a little consideration for others when making a short reply to a long post, please. Trim what isn't necessary. Thanks. regard

Re: Implementing file reading in C/Python

2009-01-08 Thread James Mills
On Fri, Jan 9, 2009 at 2:29 PM, James Mills wrote: > I shall attempt to optimize this :) > I have a funny feeling you might be caught up with > some features of Python - one notable one being that > some things in Python are immutable. > > psyco might help here though ... What does this little t

Re: Implementing file reading in C/Python

2009-01-08 Thread James Mills
On Fri, Jan 9, 2009 at 3:13 PM, Johannes Bauer wrote: > Uhh, yes, you're right there... I must admit that I was too lazy to > include all the stat headers and to a proper st_size check in the C > version (just a quick hack), so it's practically hardcoded. > > With files of exactly 2GB in size the

Re: Implementing file reading in C/Python

2009-01-08 Thread Johannes Bauer
James Mills schrieb: > I have tested this against a randomly generated > file from /dev/urandom (10M). Yes the Python > one is much slower, but I believe it's bebcause > the Python implementation is _correct_ where > teh C one is _wrong_ :) > > The resulting test.bin.pgm from python is exactly >

Re: Implementing file reading in C/Python

2009-01-08 Thread James Mills
On Fri, Jan 9, 2009 at 1:04 PM, Johannes Bauer wrote: > Hello group, Hello. (...) > Which takes about 40 seconds. I want the niceness of Python but a little > more speed than I'm getting (I'd settle for factor 2 or 3 slower, but > factor 30 is just too much). > > Can anyone point out how to sol

Re: Implementing file reading in C/Python

2009-01-08 Thread MRAB
Johannes Bauer wrote: Hello group, I've come from C/C++ and am now trying to code some Python because I absolutely love the language. However I still have trouble getting Python code to run efficiently. Right now I have a easy task: Get a file, split it up into a million chunks, count the most p

Implementing file reading in C/Python

2009-01-08 Thread Johannes Bauer
Hello group, I've come from C/C++ and am now trying to code some Python because I absolutely love the language. However I still have trouble getting Python code to run efficiently. Right now I have a easy task: Get a file, split it up into a million chunks, count the most prominent character in ea