Peter Otten wrote:
Robin Becker wrote:
#sscan1.py thanks to Skip
import sys, time, mmap, os, re
fn = sys.argv[1]
fh=os.open(fn,os.O_BINARY|os.O_RDONLY)
s=mmap.mmap(fh,0,access=mmap.ACCESS_READ)
l=n=0
t0 = time.time()
for mat in re.split("X", s):
re.split() returns a list, not a generator, and
Peter Otten wrote:
Robin Becker wrote:
#sscan1.py thanks to Skip
import sys, time, mmap, os, re
fn = sys.argv[1]
fh=os.open(fn,os.O_BINARY|os.O_RDONLY)
s=mmap.mmap(fh,0,access=mmap.ACCESS_READ)
l=n=0
t0 = time.time()
for mat in re.split("X", s):
re.split() returns a list, not a generator, and
Robin Becker wrote:
> #sscan1.py thanks to Skip
> import sys, time, mmap, os, re
> fn = sys.argv[1]
> fh=os.open(fn,os.O_BINARY|os.O_RDONLY)
> s=mmap.mmap(fh,0,access=mmap.ACCESS_READ)
> l=n=0
> t0 = time.time()
> for mat in re.split("X", s):
re.split() returns a list, not a generator, and th
On Thu, 28 Apr 2005 20:35:43 +, Robin Becker <[EMAIL PROTECTED]> wrote:
>Jeremy Bowers wrote:
>
> >
> > As you try to understand mmap, make sure your mental model can take into
> > account the fact that it is easy and quite common to mmap a file several
> > times larger than your physical
Skip Montanaro wrote:
.
Let me return to your original problem though, doing regex operations on
files. I modified your two scripts slightly:
.
Skip
I'm sure my results are dependent on something other than the coding style
I suspect file/disk cache and paging operates here. Note that we
Jeremy Bowers wrote:
.
As you try to understand mmap, make sure your mental model can take into
account the fact that it is easy and quite common to mmap a file several
times larger than your physical memory, and it does not even *try* to read
the whole thing in at any given time. You may benef
Robin Becker wrote:
Skip Montanaro wrote:
..
I'm not sure why the mmap() solution is so much slower for you.
Perhaps on
some systems files opened for reading are mmap'd under the covers.
I'm sure
it's highly platform-dependent. (My results on MacOSX - see below - are
somewhat better.)
...
Jeremy Bowers wrote:
>
> As you try to understand mmap, make sure your mental model can take into
> account the fact that it is easy and quite common to mmap a file several
> times larger than your physical memory, and it does not even *try* to read
> the whole thing in at any given time. You
Skip Montanaro wrote:
...
I'm not sure why the mmap() solution is so much slower for you. Perhaps on
some systems files opened for reading are mmap'd under the covers. I'm sure
it's highly platform-dependent. (My results on MacOSX - see below - are
somewhat better.)
Let me return to your origina
Bengt> To be fairer, I think you'd want to hoist the re compilation out
Bengt> of the loop.
The re module compiles and caches regular expressions, so I doubt it would
affect the runtime of either version.
Bengt> But also to be fairer, maybe include the overhead of splitting
Bengt
Skip Montanaro wrote:
..
I'm not sure why the mmap() solution is so much slower for you. Perhaps on
some systems files opened for reading are mmap'd under the covers. I'm sure
it's highly platform-dependent. (My results on MacOSX - see below - are
somewhat better.)
I'll have a go at doing th
On Wed, 27 Apr 2005 21:39:45 -0500, Skip Montanaro <[EMAIL PROTECTED]> wrote:
>
>Robin> I implemented a simple scanning algorithm in two ways. First
> buffered scan
>Robin> tscan0.py; second mmapped scan tscan1.py.
>
>...
>
>Robin> C:\code\reportlab\demos\gadflypaper>\tmp\tscan0.
Robin> I implemented a simple scanning algorithm in two ways. First
buffered scan
Robin> tscan0.py; second mmapped scan tscan1.py.
...
Robin> C:\code\reportlab\demos\gadflypaper>\tmp\tscan0.py dingo.dat
Robin> len=139583265 w=103 time=110.91
Robin> C:\code\reportlab\de
Jeremy Bowers wrote:
On Tue, 26 Apr 2005 20:54:53 +, Robin Becker wrote:
Skip Montanaro wrote:
...
If I mmap() a file, it's not slurped into main memory immediately, though as
you pointed out, it's charged to my process's virtual memory. As I access
bits of the file's contents, it will page i
On Mon, 25 Apr 2005 16:01:45 +0100, Robin Becker <[EMAIL PROTECTED]> wrote:
>Is there any way to get regexes to work on non-string/unicode objects. I would
>like to split large files by regex and it seems relatively hard to do so
>without
>having the whole file in memory. Even with buffers it s
On Tue, 26 Apr 2005 20:54:53 +, Robin Becker wrote:
> Skip Montanaro wrote:
> ...
>> If I mmap() a file, it's not slurped into main memory immediately, though as
>> you pointed out, it's charged to my process's virtual memory. As I access
>> bits of the file's contents, it will page in only w
Skip Montanaro wrote:
...
If I mmap() a file, it's not slurped into main memory immediately, though as
you pointed out, it's charged to my process's virtual memory. As I access
bits of the file's contents, it will page in only what's necessary. If I
mmap() a huge file, then print out a few bytes
>> It's hard to imagine how sliding a small window onto a file within Python
>> would be more efficient than the operating system's paging system. ;-)
Robin> well it might be if I only want to scan forward through the file
Robin> (think lexical analysis). Most lexical analyzers use
On Tue, 26 Apr 2005 19:32:29 +0100, Robin Becker wrote:
> Skip Montanaro wrote:
>> Robin> So we avoid dirty page writes etc etc. However, I still think I
>> Robin> could get away with a small window into the file which would be
>> Robin> more efficient.
>>
>> It's hard to imagine how
Skip Montanaro wrote:
Robin> So we avoid dirty page writes etc etc. However, I still think I
Robin> could get away with a small window into the file which would be
Robin> more efficient.
It's hard to imagine how sliding a small window onto a file within Python
would be more efficient th
Robin> So we avoid dirty page writes etc etc. However, I still think I
Robin> could get away with a small window into the file which would be
Robin> more efficient.
It's hard to imagine how sliding a small window onto a file within Python
would be more efficient than the operating sys
Steve Holden wrote:
.
thanks I'll give it a whirl
Whoops, I don't think it's a regex search :-(
You should be able to adapt the logic fairly easily, I hope.
The buffering logic is half the problem; doing it quickly is the other half.
The third half of the problem is getting re to co-o
Robin Becker wrote:
Steve Holden wrote:
..
I seem to remember that the Medusa code contains a fairly good
overlapped search for a terminator string, if you want to chunk the file.
Take a look at the handle_read() method of class async_chat in the
standard library's asynchat.py.
.
thanks
Steve Holden wrote:
..
I seem to remember that the Medusa code contains a fairly good
overlapped search for a terminator string, if you want to chunk the file.
Take a look at the handle_read() method of class async_chat in the
standard library's asynchat.py.
.
thanks I'll give it a whirl
Robin Becker wrote:
Richard Brodie wrote:
"Robin Becker" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Gerald Klix wrote:
Map the file into RAM by using the mmap module.
The file's contents than is availabel as a seachable string.
that's a good idea, but I wonder if it actually saves
Richard Brodie wrote:
"Robin Becker" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Gerald Klix wrote:
Map the file into RAM by using the mmap module.
The file's contents than is availabel as a seachable string.
that's a good idea, but I wonder if it actually saves on memory? I just
t
"Robin Becker" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Gerald Klix wrote:
> > Map the file into RAM by using the mmap module.
> > The file's contents than is availabel as a seachable string.
> >
>
> that's a good idea, but I wonder if it actually saves on memory? I just tried
Gerald Klix wrote:
Map the file into RAM by using the mmap module.
The file's contents than is availabel as a seachable string.
that's a good idea, but I wonder if it actually saves on memory? I just tried
regexing through a 25Mb file and end up with 40Mb as working set (it rose
linearly as the l
Map the file into RAM by using the mmap module.
The file's contents than is availabel as a seachable string.
HTH,
Gerald
Robin Becker schrieb:
Is there any way to get regexes to work on non-string/unicode objects. I
would like to split large files by regex and it seems relatively hard to
do so wi
29 matches
Mail list logo