John Machin wrote:
The factor of 30 indeed does not seem right -- I have done somewhat
similar stuff (calculating Levenshtein distance [edit distance] on words
read from very large files), coded the same algorithm in pure Python and
C++ (using linked lists in C++) and Python version was 2.5 times
Johannes Bauer writes:
> Yup, I changed the Python code to behave the same way the C code did -
> however overall it's not much of an improvement: Takes about 15 minutes
> to execute (still factor 23).
Not sure this is completely fair if you're only looking for a pure
Python solution, but to be
On Mon, 12 Jan 2009 21:26:27 -0500, Steve Holden wrote:
> The very idea of mapping part of a process's virtual address space onto
> an area in which "low-level system code resides, so writing to this
> region may corrupt the system, with potentially catastrophic
> consequences" seems to be asking
On 2009-01-13, Steve Holden wrote:
> sturlamolden wrote:
>> On Jan 12, 1:52 pm, Sion Arrowsmith
>> wrote:
>>
>>> And today's moral is: try it before posting. Yeah, I can map a 2GB
>>> file no problem, complete with associated 2GB+ allocated VM. The
>>> addressing is clearly not working how I was
sturlamolden wrote:
> On Jan 12, 1:52 pm, Sion Arrowsmith
> wrote:
>
>> And today's moral is: try it before posting. Yeah, I can map a 2GB
>> file no problem, complete with associated 2GB+ allocated VM. The
>> addressing is clearly not working how I was expecting it too.
>
> The virtual memory s
sturlamolden wrote:
> On Jan 12, 1:52 pm, Sion Arrowsmith
> wrote:
>
>> And today's moral is: try it before posting. Yeah, I can map a 2GB
>> file no problem, complete with associated 2GB+ allocated VM. The
>> addressing is clearly not working how I was expecting it too.
>
> The virtual memory s
On 2009-01-12, Sion Arrowsmith wrote:
> In case the cancel didn't get through:
>
> Sion Arrowsmith wrote:
>>Grant Edwards wrote:
>>>2GB should easily fit within the process's virtual memory
>>>space.
>>Assuming you're in a 64bit world. Me, I've only got 2GB of address
>>space available to play
On 2009-01-12, Sion Arrowsmith wrote:
> Grant Edwards wrote:
>>On 2009-01-09, Sion Arrowsmith wrote:
>>> Grant Edwards wrote:
If I were you, I'd try mmap()ing the file instead of reading it
into string objects one chunk at a time.
>>> You've snipped the bit further on in that sentence
sturlamolden writes:
> On Jan 9, 6:41 pm, Sion Arrowsmith
> wrote:
>
>> You've snipped the bit further on in that sentence where the OP
>> says that the file of interest is 2GB. Do you still want to try
>> mmap'ing it?
>
> Python's mmap object does not take an offset parameter. If it did, one
>
On Jan 12, 1:52 pm, Sion Arrowsmith
wrote:
> And today's moral is: try it before posting. Yeah, I can map a 2GB
> file no problem, complete with associated 2GB+ allocated VM. The
> addressing is clearly not working how I was expecting it too.
The virtual memory space of a 32 bit process is 4 GB.
In case the cancel didn't get through:
Sion Arrowsmith wrote:
>Grant Edwards wrote:
>>2GB should easily fit within the process's virtual memory
>>space.
>Assuming you're in a 64bit world. Me, I've only got 2GB of address
>space available to play in -- mmap'ing all of it out of the question.
A
On Jan 9, 6:41 pm, Sion Arrowsmith
wrote:
> You've snipped the bit further on in that sentence where the OP
> says that the file of interest is 2GB. Do you still want to try
> mmap'ing it?
Python's mmap object does not take an offset parameter. If it did, one
could mmap smaller portions of the f
Grant Edwards wrote:
>On 2009-01-09, Sion Arrowsmith wrote:
>> Grant Edwards wrote:
>>>If I were you, I'd try mmap()ing the file instead of reading it
>>>into string objects one chunk at a time.
>> You've snipped the bit further on in that sentence where the
>> OP says that the file of interes
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
> Marc 'BlackJack' Rintsch wrote:
>> On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
>>
>>> As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
>>> thing in C also:
>>
>> Yours took ~37 minutes for 2 GiB here. This "j
On Jan 9, 2:14 pm, Marc 'BlackJack' Rintsch wrote:
> On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
> > Marc 'BlackJack' Rintsch wrote:
>
> >> def iter_max_values(blocks, block_count):
> >> for i, block in enumerate(blocks):
> >> histogram = defaultdict(int)
> >> for byte in b
On Jan 9, 9:56 pm, mk wrote:
> The factor of 30 indeed does not seem right -- I have done somewhat
> similar stuff (calculating Levenshtein distance [edit distance] on words
> read from very large files), coded the same algorithm in pure Python and
> C++ (using linked lists in C++) and Python ver
On 2009-01-09, Marc 'BlackJack' Rintsch wrote:
> On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
>
>> Marc 'BlackJack' Rintsch wrote:
>>
>>> def iter_max_values(blocks, block_count):
>>> for i, block in enumerate(blocks):
>>> histogram = defaultdict(int)
>>> for byte in block:
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
> Marc 'BlackJack' Rintsch wrote:
>
>> def iter_max_values(blocks, block_count):
>> for i, block in enumerate(blocks):
>> histogram = defaultdict(int)
>> for byte in block:
>> histogram[byte] += 1
>>
>>
On 2009-01-09, Sion Arrowsmith wrote:
> Grant Edwards wrote:
>>On 2009-01-09, Johannes Bauer wrote:
>>> I've come from C/C++ and am now trying to code some Python because I
>>> absolutely love the language. However I still have trouble getting
>>> Python code to run efficiently. Right now I hav
Grant Edwards wrote:
>On 2009-01-09, Johannes Bauer wrote:
>> I've come from C/C++ and am now trying to code some Python because I
>> absolutely love the language. However I still have trouble getting
>> Python code to run efficiently. Right now I have a easy task: Get a
>> file,
>If I were you,
Johannes Bauer, I was about to start writing a faster version. I think
with some care and Psyco you can go about as 5 times slower than C or
something like that.
To do that you need to use almost the same code for the C version,
with a list of 256 ints for the frequencies, not using max() but a
ma
On 2009-01-09, Johannes Bauer wrote:
> I've come from C/C++ and am now trying to code some Python because I
> absolutely love the language. However I still have trouble getting
> Python code to run efficiently. Right now I have a easy task: Get a
> file,
If I were you, I'd try mmap()ing the file
On Jan 9, 6:48 am, Johannes Bauer wrote:
> mk schrieb:
> > The factor of 30 indeed does not seem right -- I have done somewhat
> > similar stuff (calculating Levenshtein distance [edit distance] on words
> > read from very large files), coded the same algorithm in pure Python and
> > C++ (using li
Marc 'BlackJack' Rintsch wrote:
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
thing in C also:
Yours took ~37 minutes for 2 GiB here. This "just" ~15 minutes:
#!/usr/bin/env python
from __future__ import div
On Jan 9, 8:48 am, Johannes Bauer wrote:
> No - and I've not known there was a profiler yet have found anything
> meaningful (there seems to be an profiling C interface, but that won't
> get me anywhere). Is that a seperate tool or something? Could you
> provide a link?
> Thanks,
> Kind regards,
>
mk schrieb:
> Johannes Bauer wrote:
>
>> Which takes about 40 seconds. I want the niceness of Python but a little
>> more speed than I'm getting (I'd settle for factor 2 or 3 slower, but
>> factor 30 is just too much).
>
> This probably doesn't contribute much, but have you tried using Python
> p
Marc 'BlackJack' Rintsch schrieb:
> On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
>
>> As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
>> thing in C also:
>
> Yours took ~37 minutes for 2 GiB here. This "just" ~15 minutes:
Ah, ok... when implementing your sug
James Mills schrieb:
> What does this little tool do anyway ?
> It's very interesting the images it creates
> out of files. What is this called ?
It has no particular name. I was toying around with the Princeton Cold
Boot Attack (http://citp.princeton.edu/memory/). In particular I was
interested
Marc 'BlackJack' Rintsch schrieb:
>> f = open(sys.argv[1], "r")
>
> Mode should be 'rb'.
Check.
>> filesize = os.stat(sys.argv[1])[6]
>
> `os.path.getsize()` is a little bit more readable.
Check.
>> print("Filesize : %d" % (filesize)) print("Image size : %dx%d"
>> % (width, height)
Johannes Bauer wrote:
Which takes about 40 seconds. I want the niceness of Python but a little
more speed than I'm getting (I'd settle for factor 2 or 3 slower, but
factor 30 is just too much).
This probably doesn't contribute much, but have you tried using Python
profiler? You might have *so
Steven D'Aprano wrote:
> On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote:
>
>> On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch
>> wrote:
print("Filesize : %d" % (filesize)) print("Image size :
%dx%d" % (width, height)) print("Bytes per Pixel: %d" % (blocksize))
Marc 'BlackJack' Rintsch wrote:
> On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
[...]
>> print("Filesize : %d" % (filesize)) print("Image size : %dx%d"
>> % (width, height)) print("Bytes per Pixel: %d" % (blocksize))
>
> Why parentheses around ``print``\s "argument"? In Pyth
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
> As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
> thing in C also:
Yours took ~37 minutes for 2 GiB here. This "just" ~15 minutes:
#!/usr/bin/env python
from __future__ import division, with_statement
import os
On Fri, 09 Jan 2009 09:15:20 +, Marc 'BlackJack' Rintsch wrote:
>> picture = { }
>> havepixels = 0
>> while True:
>> data = f.read(blocksize)
>> if len(data) <= 0: break
>
> if data:
> break
>
> is enough.
You've reversed the sense of the test. The OP exits the loop w
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote:
> On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch
> wrote:
>>> print("Filesize : %d" % (filesize)) print("Image size :
>>> %dx%d" % (width, height)) print("Bytes per Pixel: %d" % (blocksize))
>>
>> Why parentheses around ``
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
> datamap = { }
> for i in range(len(data)):
> datamap[ord(data[i])] = datamap.get(data[i], 0) + 1
Here is an error by the way: You call `ord()` just on the left side of
the ``=``, so all keys in the dictionary
On Fri, Jan 9, 2009 at 7:41 PM, Marc 'BlackJack' Rintsch wrote:
> Please read again what I wrote.
Lol I thought "<3" was a smiley! :)
Sorry!
cheers
James
--
http://mail.python.org/mailman/listinfo/python-list
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote:
> On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch
> wrote:
>> Why parentheses around ``print``\s "argument"? In Python <3 ``print``
>> is a statement and not a function.
>
> Not true as of 2.6+ and 3.0+
>
> print is now a functio
On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch wrote:
>> print("Filesize : %d" % (filesize)) print("Image size : %dx%d"
>> % (width, height)) print("Bytes per Pixel: %d" % (blocksize))
>
> Why parentheses around ``print``\s "argument"? In Python <3 ``print`` is
> a statement a
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
> I've first tried Python. Please don't beat me, it's slow as hell and
> probably a horrible solution:
>
> #!/usr/bin/python
> import sys
> import os
>
> f = open(sys.argv[1], "r")
Mode should be 'rb'.
> filesize = os.stat(sys.argv[1])[
MRAB wrote:
> Johannes Bauer wrote:
>> Hello group,
[and about 200 other lines there was no need to quote]
[...]
> Have a look at psyco: http://psyco.sourceforge.net/
Have a little consideration for others when making a short reply to a
long post, please. Trim what isn't necessary. Thanks.
regard
On Fri, Jan 9, 2009 at 2:29 PM, James Mills
wrote:
> I shall attempt to optimize this :)
> I have a funny feeling you might be caught up with
> some features of Python - one notable one being that
> some things in Python are immutable.
>
> psyco might help here though ...
What does this little t
On Fri, Jan 9, 2009 at 3:13 PM, Johannes Bauer wrote:
> Uhh, yes, you're right there... I must admit that I was too lazy to
> include all the stat headers and to a proper st_size check in the C
> version (just a quick hack), so it's practically hardcoded.
>
> With files of exactly 2GB in size the
James Mills schrieb:
> I have tested this against a randomly generated
> file from /dev/urandom (10M). Yes the Python
> one is much slower, but I believe it's bebcause
> the Python implementation is _correct_ where
> teh C one is _wrong_ :)
>
> The resulting test.bin.pgm from python is exactly
>
On Fri, Jan 9, 2009 at 1:04 PM, Johannes Bauer wrote:
> Hello group,
Hello.
(...)
> Which takes about 40 seconds. I want the niceness of Python but a little
> more speed than I'm getting (I'd settle for factor 2 or 3 slower, but
> factor 30 is just too much).
>
> Can anyone point out how to sol
Johannes Bauer wrote:
Hello group,
I've come from C/C++ and am now trying to code some Python because I
absolutely love the language. However I still have trouble getting
Python code to run efficiently. Right now I have a easy task: Get a
file, split it up into a million chunks, count the most p
Hello group,
I've come from C/C++ and am now trying to code some Python because I
absolutely love the language. However I still have trouble getting
Python code to run efficiently. Right now I have a easy task: Get a
file, split it up into a million chunks, count the most prominent
character in ea
47 matches
Mail list logo