Peter Otten wrote:
Robin Becker wrote:
#sscan1.py thanks to Skip
import sys, time, mmap, os, re
fn = sys.argv[1]
fh=os.open(fn,os.O_BINARY|os.O_RDONLY)
s=mmap.mmap(fh,0,access=mmap.ACCESS_READ)
l=n=0
t0 = time.time()
for mat in re.split("X", s):
re.split() returns a list, not a generator, and
Peter Otten wrote:
Robin Becker wrote:
#sscan1.py thanks to Skip
import sys, time, mmap, os, re
fn = sys.argv[1]
fh=os.open(fn,os.O_BINARY|os.O_RDONLY)
s=mmap.mmap(fh,0,access=mmap.ACCESS_READ)
l=n=0
t0 = time.time()
for mat in re.split("X", s):
re.split() returns a list, not a generator, and
Robin Becker wrote:
> #sscan1.py thanks to Skip
> import sys, time, mmap, os, re
> fn = sys.argv[1]
> fh=os.open(fn,os.O_BINARY|os.O_RDONLY)
> s=mmap.mmap(fh,0,access=mmap.ACCESS_READ)
> l=n=0
> t0 = time.time()
> for mat in re.split("X", s):
re.split() returns a list, not a generator, and th
On Thu, 28 Apr 2005 20:35:43 +, Robin Becker <[EMAIL PROTECTED]> wrote:
>Jeremy Bowers wrote:
>
> >
> > As you try to understand mmap, make sure your mental model can take into
> > account the fact that it is easy and quite common to mmap a file several
> > times larger than your physical
Skip Montanaro wrote:
.
Let me return to your original problem though, doing regex operations on
files. I modified your two scripts slightly:
.
Skip
I'm sure my results are dependent on something other than the coding style
I suspect file/disk cache and paging operates here. Note that we
Jeremy Bowers wrote:
.
As you try to understand mmap, make sure your mental model can take into
account the fact that it is easy and quite common to mmap a file several
times larger than your physical memory, and it does not even *try* to read
the whole thing in at any given time. You may benef
Robin Becker wrote:
Skip Montanaro wrote:
..
I'm not sure why the mmap() solution is so much slower for you.
Perhaps on
some systems files opened for reading are mmap'd under the covers.
I'm sure
it's highly platform-dependent. (My results on MacOSX - see below - are
somewhat better.)
...
Jeremy Bowers wrote:
>
> As you try to understand mmap, make sure your mental model can take into
> account the fact that it is easy and quite common to mmap a file several
> times larger than your physical memory, and it does not even *try* to read
> the whole thing in at any given time. You
Skip Montanaro wrote:
...
I'm not sure why the mmap() solution is so much slower for you. Perhaps on
some systems files opened for reading are mmap'd under the covers. I'm sure
it's highly platform-dependent. (My results on MacOSX - see below - are
somewhat better.)
Let me return to your origina
Bengt> To be fairer, I think you'd want to hoist the re compilation out
Bengt> of the loop.
The re module compiles and caches regular expressions, so I doubt it would
affect the runtime of either version.
Bengt> But also to be fairer, maybe include the overhead of splitting
Bengt
Skip Montanaro wrote:
..
I'm not sure why the mmap() solution is so much slower for you. Perhaps on
some systems files opened for reading are mmap'd under the covers. I'm sure
it's highly platform-dependent. (My results on MacOSX - see below - are
somewhat better.)
I'll have a go at doing th
On Wed, 27 Apr 2005 21:39:45 -0500, Skip Montanaro <[EMAIL PROTECTED]> wrote:
>
>Robin> I implemented a simple scanning algorithm in two ways. First
> buffered scan
>Robin> tscan0.py; second mmapped scan tscan1.py.
>
>...
>
>Robin> C:\code\reportlab\demos\gadflypaper>\tmp\tscan0.
Robin> I implemented a simple scanning algorithm in two ways. First
buffered scan
Robin> tscan0.py; second mmapped scan tscan1.py.
...
Robin> C:\code\reportlab\demos\gadflypaper>\tmp\tscan0.py dingo.dat
Robin> len=139583265 w=103 time=110.91
Robin> C:\code\reportlab\de
Jeremy Bowers wrote:
On Tue, 26 Apr 2005 20:54:53 +, Robin Becker wrote:
Skip Montanaro wrote:
...
If I mmap() a file, it's not slurped into main memory immediately, though as
you pointed out, it's charged to my process's virtual memory. As I access
bits of the file's contents, it will page i
On Mon, 25 Apr 2005 16:01:45 +0100, Robin Becker <[EMAIL PROTECTED]> wrote:
>Is there any way to get regexes to work on non-string/unicode objects. I would
>like to split large files by regex and it seems relatively hard to do so
>without
>having the whole file in memory. Even with buffers it s
On Tue, 26 Apr 2005 20:54:53 +, Robin Becker wrote:
> Skip Montanaro wrote:
> ...
>> If I mmap() a file, it's not slurped into main memory immediately, though as
>> you pointed out, it's charged to my process's virtual memory. As I access
>> bits of the file's contents, it will page in only w
Skip Montanaro wrote:
...
If I mmap() a file, it's not slurped into main memory immediately, though as
you pointed out, it's charged to my process's virtual memory. As I access
bits of the file's contents, it will page in only what's necessary. If I
mmap() a huge file, then print out a few bytes
>> It's hard to imagine how sliding a small window onto a file within Python
>> would be more efficient than the operating system's paging system. ;-)
Robin> well it might be if I only want to scan forward through the file
Robin> (think lexical analysis). Most lexical analyzers use
On Tue, 26 Apr 2005 19:32:29 +0100, Robin Becker wrote:
> Skip Montanaro wrote:
>> Robin> So we avoid dirty page writes etc etc. However, I still think I
>> Robin> could get away with a small window into the file which would be
>> Robin> more efficient.
>>
>> It's hard to imagine how
Skip Montanaro wrote:
Robin> So we avoid dirty page writes etc etc. However, I still think I
Robin> could get away with a small window into the file which would be
Robin> more efficient.
It's hard to imagine how sliding a small window onto a file within Python
would be more efficient th
Robin> So we avoid dirty page writes etc etc. However, I still think I
Robin> could get away with a small window into the file which would be
Robin> more efficient.
It's hard to imagine how sliding a small window onto a file within Python
would be more efficient than the operating sys
Steve Holden wrote:
.
thanks I'll give it a whirl
Whoops, I don't think it's a regex search :-(
You should be able to adapt the logic fairly easily, I hope.
The buffering logic is half the problem; doing it quickly is the other half.
The third half of the problem is getting re to co-o
Robin Becker wrote:
Steve Holden wrote:
..
I seem to remember that the Medusa code contains a fairly good
overlapped search for a terminator string, if you want to chunk the file.
Take a look at the handle_read() method of class async_chat in the
standard library's asynchat.py.
.
thanks
Steve Holden wrote:
..
I seem to remember that the Medusa code contains a fairly good
overlapped search for a terminator string, if you want to chunk the file.
Take a look at the handle_read() method of class async_chat in the
standard library's asynchat.py.
.
thanks I'll give it a whirl
Robin Becker wrote:
Richard Brodie wrote:
"Robin Becker" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Gerald Klix wrote:
Map the file into RAM by using the mmap module.
The file's contents than is availabel as a seachable string.
that's a good idea, but I wonder if it actually saves
Richard Brodie wrote:
"Robin Becker" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Gerald Klix wrote:
Map the file into RAM by using the mmap module.
The file's contents than is availabel as a seachable string.
that's a good idea, but I wonder if it actually saves on memory? I just
t
"Robin Becker" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Gerald Klix wrote:
> > Map the file into RAM by using the mmap module.
> > The file's contents than is availabel as a seachable string.
> >
>
> that's a good idea, but I wonder if it actually saves on memory? I just tried
Gerald Klix wrote:
Map the file into RAM by using the mmap module.
The file's contents than is availabel as a seachable string.
that's a good idea, but I wonder if it actually saves on memory? I just tried
regexing through a 25Mb file and end up with 40Mb as working set (it rose
linearly as the l
Map the file into RAM by using the mmap module.
The file's contents than is availabel as a seachable string.
HTH,
Gerald
Robin Becker schrieb:
Is there any way to get regexes to work on non-string/unicode objects. I
would like to split large files by regex and it seems relatively hard to
do so wi
Is there any way to get regexes to work on non-string/unicode objects. I would
like to split large files by regex and it seems relatively hard to do so without
having the whole file in memory. Even with buffers it seems hard to get regexes
to indicate that they failed because of buffer terminati
30 matches
Mail list logo