Re: Scanning a file character by character

2009-02-22 Thread rzed
Spacebar265 wrote in news:c86cd530-cee5-4de6-8e19-304c664c9...@c12g2000yqj.googlegroups.c om: > On Feb 11, 1:06 am, Duncan Booth > wrote: [...] >> >>> re.split("(\w+)", "The quick brown fox jumps, and falls >> >>> over.")[1::2] >> >> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', >

Re: Scanning a file character by character

2009-02-17 Thread Tim Chase
Josh Dukes wrote: In [401]: import shlex In [402]: shlex.split("""Joe went to 'the store' where he bought a "box of chocolates" and stuff.""") how's that work for ya? It works great if that's the desired behavior. However, the OP wrote about splitting the lines into separate words, not "

Re: Scanning a file character by character

2009-02-17 Thread Josh Dukes
In [401]: import shlex In [402]: shlex.split("""Joe went to 'the store' where he bought a "box of chocolates" and stuff.""") Out[402]: ['Joe', 'went', 'to', 'the store', 'where', 'he', 'bought', 'a', 'box of chocolates', 'and', 'stuff.'] how's that work for ya? http://docs.python.or

Re: Scanning a file character by character

2009-02-12 Thread Rhodri James
On Fri, 13 Feb 2009 03:24:21 -, Spacebar265 wrote: On Feb 11, 1:06 am, Duncan Booth wrote: Steven D'Aprano wrote: > On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote: >> How would I do separate lines into words without scanning one character >> at a time? > Scan a line at a ti

Re: Scanning a file character by character

2009-02-12 Thread Spacebar265
On Feb 11, 1:06 am, Duncan Booth wrote: > Steven D'Aprano wrote: > > On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote: > > >> How would I do separate lines into words without scanning one character > >> at a time? > > > Scan a line at a time, then split each line into words. > > > for line i

Re: Scanning a file character by character

2009-02-10 Thread MRAB
Steven D'Aprano wrote: On Tue, 10 Feb 2009 16:46:30 -0600, Tim Chase wrote: Or for a slightly less simple minded splitting you could try re.split: re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2] ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over'] Perhaps

Re: Scanning a file character by character

2009-02-10 Thread Steven D'Aprano
On Tue, 10 Feb 2009 16:46:30 -0600, Tim Chase wrote: >>> Or for a slightly less simple minded splitting you could try re.split: >>> >> re.split("(\w+)", "The quick brown fox jumps, and falls >> over.")[1::2] >>> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over'] >> >> >> P

Re: Scanning a file character by character

2009-02-10 Thread Rhodri James
On Tue, 10 Feb 2009 22:02:57 -, Steven D'Aprano wrote: On Tue, 10 Feb 2009 12:06:06 +, Duncan Booth wrote: Steven D'Aprano wrote: On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote: How would I do separate lines into words without scanning one character at a time? Scan a l

Re: Scanning a file character by character

2009-02-10 Thread Tim Chase
Or for a slightly less simple minded splitting you could try re.split: re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2] ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over'] Perhaps I'm missing something, but the above regex does the exact same thing as line

Re: Scanning a file character by character

2009-02-10 Thread Steven D'Aprano
On Tue, 10 Feb 2009 12:06:06 +, Duncan Booth wrote: > Steven D'Aprano wrote: > >> On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote: >> >>> How would I do separate lines into words without scanning one >>> character at a time? >> >> Scan a line at a time, then split each line into word

Re: Scanning a file character by character

2009-02-10 Thread Duncan Booth
Steven D'Aprano wrote: > On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote: > >> How would I do separate lines into words without scanning one character >> at a time? > > Scan a line at a time, then split each line into words. > > > for line in open('myfile.txt'): > words = line.split

Re: Scanning a file character by character

2009-02-10 Thread Hendrik van Rooyen
"Spacebar265" wrote: >Thanks. How would I do separate lines into words without scanning one >character at a time? Type the following at the interactive prompt and see what happens: s = "This is a string composed of a few words and a newline\n" help(s.split) help(s.rstrip) help(s.strip) dir(s)

Re: Scanning a file character by character

2009-02-09 Thread Steven D'Aprano
On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote: > How would I do separate lines into words without scanning one character > at a time? Scan a line at a time, then split each line into words. for line in open('myfile.txt'): words = line.split() should work for a particularly simple-

Re: Scanning a file character by character

2009-02-09 Thread Spacebar265
On Feb 9, 5:13 pm, Steve Holden wrote: > Spacebar265 wrote: > > On Feb 7, 2:17 am, Jorgen Grahn wrote: > >> On Wed, 4 Feb 2009 22:48:13 -0800 (PST), Spacebar265 > >> wrote: > >>> Hi. Does anyone know how to scan a filecharacterbycharacterand > >>> have eachcharacterso I can put it into a variab

Re: Scanning a file character by character

2009-02-08 Thread Steve Holden
Spacebar265 wrote: > On Feb 7, 2:17 am, Jorgen Grahn wrote: >> On Wed, 4 Feb 2009 22:48:13 -0800 (PST), Spacebar265 >> wrote: >>> Hi. Does anyone know how to scan a filecharacterbycharacterand >>> have eachcharacterso I can put it into a variable. I am attempting >>> to make a chatbot and need t

Re: Scanning a file character by character

2009-02-08 Thread Spacebar265
On Feb 7, 2:17 am, Jorgen Grahn wrote: > On Wed, 4 Feb 2009 22:48:13 -0800 (PST), Spacebar265 > wrote: > > Hi. Does anyone know how to scan a filecharacterbycharacterand > > have eachcharacterso I can put it into a variable. I am attempting > > to make a chatbot and need this to read the saved i

Re: Scanning a file character by character

2009-02-06 Thread Jorgen Grahn
On Wed, 4 Feb 2009 22:48:13 -0800 (PST), Spacebar265 wrote: > Hi. Does anyone know how to scan a file character by character and > have each character so I can put it into a variable. I am attempting > to make a chatbot and need this to read the saved input to look for > spelling mistakes and fur

Re: Scanning a file character by character

2009-02-05 Thread Gabriel Genellina
En Thu, 05 Feb 2009 04:48:13 -0200, Spacebar265 escribió: Hi. Does anyone know how to scan a file character by character and have each character so I can put it into a variable. I am attempting to make a chatbot and need this to read the saved input to look for spelling mistakes and further ana

Re: Scanning a file character by character

2009-02-04 Thread Bard Aase
On 5 Feb, 07:48, Spacebar265 wrote: > Hi. Does anyone know how to scan a file character by character and > have each character so I can put it into a variable. I am attempting > to make a chatbot and need this to read the saved input to look for > spelling mistakes and further analysis of user inp

Scanning a file character by character

2009-02-04 Thread Spacebar265
Hi. Does anyone know how to scan a file character by character and have each character so I can put it into a variable. I am attempting to make a chatbot and need this to read the saved input to look for spelling mistakes and further analysis of user input. Thanks Spacebar265 -- http://mail.python.

Re: Scanning a file

2005-11-02 Thread David Rasmussen
Steven D'Aprano wrote: > > 0x0100 is one of a number of unique start codes in the MPEG2 > standard. It is guaranteed to be unique in the video stream, however > when searching for codes within the video stream, make sure you're in > the video stream! > I know I am in the cases I am intere

With & marcos via import hooking? (Was Re: Scanning a file)

2005-11-02 Thread Bengt Richter
On Tue, 01 Nov 2005 07:14:57 -0600, Paul Watson <[EMAIL PROTECTED]> wrote: >Paul Rubin wrote: >> [EMAIL PROTECTED] (John J. Lee) writes: >> >>>Closing off this particular one would make it harder to get benefit of >>>non-C implementations of Python, so it has been judged "not worth it". >>>I thin

Re: Scanning a file

2005-11-02 Thread Steven D'Aprano
David Rasmussen wrote: > Lasse Vågsæther Karlsen wrote: > >> David Rasmussen wrote: >> >> >>> If you must know, the above one-liner actually counts the number of >>> frames in an MPEG2 file. I want to know this number for a number of >>> files for various reasons. I don't want it to take forev

Re: Scanning a file

2005-11-01 Thread David Rasmussen
Steven D'Aprano wrote: > > However, there may be a simpler solution *fingers crossed* -- you are > searching for a sub-string "\x00\x00\x01\x00", which is hex 0x100. > Surely you don't want any old substring of "\x00\x00\x01\x00", but only > the ones which align on word boundaries? > Nope, so

Re: Scanning a file

2005-11-01 Thread David Rasmussen
Bengt Richter wrote: > > Good point, but perhaps the bit pattern the OP is looking for is guaranteed > (e.g. by some kind of HDLC-like bit or byte stuffing or escaping) not to occur > except as frame marker (which might make sense re the problem of re-synching > to frames in a glitched video strea

Re: Scanning a file

2005-11-01 Thread David Rasmussen
Lasse Vågsæther Karlsen wrote: > David Rasmussen wrote: > > >> If you must know, the above one-liner actually counts the number of >> frames in an MPEG2 file. I want to know this number for a number of >> files for various reasons. I don't want it to take forever. > > Don't you risk getting mo

Re: Scanning a file

2005-11-01 Thread Fredrik Lundh
Alex Martelli wrote: >> As far as I know, Python simply relies on the opreating system to close >> files left open at the end of the program. > > Nope, see > leobject.c?rev=2.164.2.3&view=markup> that's slightly misleading

Re: Scanning a file

2005-11-01 Thread Paul Watson
Alex Martelli wrote: > Steve Holden <[EMAIL PROTECTED]> wrote: >... > >>>The runtime knows it is doing it. Please allow the runtime to tell me >>>what it knows it is doing. Thanks. >> >>In point oif fact I don't believe the runtime does any such thing >>(though I must admit I haven't checke

Re: Scanning a file

2005-11-01 Thread Paul Watson
Paul Rubin wrote: > [EMAIL PROTECTED] (John J. Lee) writes: > >>Closing off this particular one would make it harder to get benefit of >>non-C implementations of Python, so it has been judged "not worth it". >>I think I agree with that judgement. > > > The right fix is PEP 343. I am sure you ar

Re: Scanning a file

2005-10-31 Thread Alex Martelli
Steve Holden <[EMAIL PROTECTED]> wrote: ... > > The runtime knows it is doing it. Please allow the runtime to tell me > > what it knows it is doing. Thanks. > > In point oif fact I don't believe the runtime does any such thing > (though I must admit I haven't checked the source, so you may p

Re: Scanning a file

2005-10-31 Thread Paul Rubin
[EMAIL PROTECTED] (John J. Lee) writes: > Closing off this particular one would make it harder to get benefit of > non-C implementations of Python, so it has been judged "not worth it". > I think I agree with that judgement. The right fix is PEP 343. -- http://mail.python.org/mailman/listinfo/pyt

Re: Scanning a file

2005-10-31 Thread John J. Lee
Paul Watson <[EMAIL PROTECTED]> writes: [...] > How "ill" will things be when large bodies of code cannot run > successfully on a future version of Python or a non-CPython > implementation which does not close files. Might as well put file > closing on exit into the specification. [...] There

Re: Scanning a file

2005-10-31 Thread Steve Holden
Paul Watson wrote: > Steve Holden wrote: > >>>Since everyone needs this, how about building it in such that files >>>which are closed by the runtime, and not user code, are reported or >>>queryable? Perhaps a command line switch to either invoke or suppress >>>reporting them on exit. >>> >> >>

Re: Scanning a file

2005-10-31 Thread Paul Watson
Steve Holden wrote: >> Since everyone needs this, how about building it in such that files >> which are closed by the runtime, and not user code, are reported or >> queryable? Perhaps a command line switch to either invoke or suppress >> reporting them on exit. >> > This is a rather poor substi

Re: Scanning a file

2005-10-31 Thread Bengt Richter
On Mon, 31 Oct 2005 09:19:10 +0100, Peter Otten <[EMAIL PROTECTED]> wrote: >Bengt Richter wrote: > >> I still smelled a bug in the counting of substring in the overlap region, >> and you motivated me to find it (obvious in hindsight, but aren't most ;-) >> >> A substring can get over-counted if t

Re: Scanning a file

2005-10-31 Thread Bengt Richter
On Mon, 31 Oct 2005 09:41:02 +0100, =?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?= <[EMAIL PROTECTED]> wrote: >David Rasmussen wrote: > >> If you must know, the above one-liner actually counts the number of >> frames in an MPEG2 file. I want to know this number for a number of >> files for variou

Re: Scanning a file

2005-10-31 Thread Steve Holden
Paul Watson wrote: > Alex Martelli wrote: > ... > >gc.garbage >> >>[<__main__.a object at 0x64cf0>, <__main__.b object at 0x58510>] >> >>So, no big deal -- run a gc.collect() and parse through gc.garbage for >>any instances of your "wrapper of file" class, and you'll find ones that >>were forg

Re: Scanning a file

2005-10-31 Thread Paul Watson
Alex Martelli wrote: ... gc.garbage > > [<__main__.a object at 0x64cf0>, <__main__.b object at 0x58510>] > > So, no big deal -- run a gc.collect() and parse through gc.garbage for > any instances of your "wrapper of file" class, and you'll find ones that > were forgotten as part of a cyclic g

Re: Scanning a file

2005-10-31 Thread Steven D'Aprano
David Rasmussen wrote: > Steven D'Aprano wrote: > >> On Fri, 28 Oct 2005 06:22:11 -0700, [EMAIL PROTECTED] wrote: >> >>> Which is quite fast. The only problems is that the file might be huge. >> >> >> What *you* call huge and what *Python* calls huge may be very different >> indeed. What are you

Re: Scanning a file

2005-10-31 Thread Lasse Vågsæther Karlsen
David Rasmussen wrote: > If you must know, the above one-liner actually counts the number of > frames in an MPEG2 file. I want to know this number for a number of > files for various reasons. I don't want it to take forever. Don't you risk getting more "frames" than the file actually have? Wha

Re: Scanning a file

2005-10-30 Thread David Rasmussen
No comments to this post? /David -- http://mail.python.org/mailman/listinfo/python-list

Re: Scanning a file

2005-10-30 Thread Peter Otten
Bengt Richter wrote: > I still smelled a bug in the counting of substring in the overlap region, > and you motivated me to find it (obvious in hindsight, but aren't most ;-) > > A substring can get over-counted if the "overlap" region joins > infelicitously with the next input. E.g., try counting

Re: Scanning a file

2005-10-30 Thread Steven D'Aprano
Alex Martelli wrote: > Steven D'Aprano <[EMAIL PROTECTED]> wrote: >... > >>>No. But if you get a totally unexpected exception, >> >>I'm more concerned about getting an expected exception -- or more >>accurately, *missing* an expected exception. Matching on Exception is too >>high. EOFError

Re: Scanning a file

2005-10-30 Thread Paul Watson
Fredrik Lundh wrote: > Paul Watson wrote: > >>This is Cyngwin on Windows XP. > > using cygwin to analyze performance characteristics of portable API:s > is a really lousy idea. Ok. So, I agree. That is just what I had at hand. Here are some other numbers to which due diligence has also not b

Re: Scanning a file

2005-10-30 Thread Tony Nelson
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > Steve Holden wrote: > > Indeed, but reading one byte at a time is about the slowest way to > > process a file, in Python or any other language, because it fails to > > amortize the overhead cost of function calls over many characters. > >

Re: Scanning a file

2005-10-30 Thread Bengt Richter
On Sat, 29 Oct 2005 21:10:11 +0100, Steve Holden <[EMAIL PROTECTED]> wrote: >Peter Otten wrote: >> Bengt Richter wrote: >> >> >>>What struck me was >>> >>> >> gen = byblocks(StringIO.StringIO('no'),1024,len('end?')-1) >> [gen.next() for i in xrange(10)] >>> >>>['no', 'no', 'no', 'no', 'n

Re: Scanning a file

2005-10-30 Thread Alex Martelli
Steven D'Aprano <[EMAIL PROTECTED]> wrote: ... > > No. But if you get a totally unexpected exception, > > I'm more concerned about getting an expected exception -- or more > accurately, *missing* an expected exception. Matching on Exception is too > high. EOFError will probably need to be han

Re: Scanning a file

2005-10-30 Thread Alex Martelli
John J. Lee <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] (Alex Martelli) writes: > [...] > > If you're trying to test your code to ensure it explicitly closes all > > files, you could (from within your tests) rebind built-ins 'file' and > > 'open' to be a class wrapping the real thing, and addin

Re: Scanning a file

2005-10-30 Thread Steven D'Aprano
On Sun, 30 Oct 2005 08:35:12 -0700, Alex Martelli wrote: > Steven D'Aprano <[EMAIL PROTECTED]> wrote: >... >> > Don't ever catch and ``handle'' exceptions in such ways. In particular, >> > each time you're thinking of writing a bare 'except:' clause, think >> > again, and you'll most likely f

Re: Scanning a file

2005-10-30 Thread John J. Lee
[EMAIL PROTECTED] (Alex Martelli) writes: [...] > If you're trying to test your code to ensure it explicitly closes all > files, you could (from within your tests) rebind built-ins 'file' and > 'open' to be a class wrapping the real thing, and adding a flag to > remember if the file is open; at __d

Re: Scanning a file

2005-10-30 Thread Alex Martelli
Steven D'Aprano <[EMAIL PROTECTED]> wrote: ... > > Don't ever catch and ``handle'' exceptions in such ways. In particular, > > each time you're thinking of writing a bare 'except:' clause, think > > again, and you'll most likely find a much better approach. > > What would you -- or anyone else

Re: Scanning a file

2005-10-30 Thread Steven D'Aprano
On Sat, 29 Oct 2005 16:41:42 -0700, Alex Martelli wrote: > Steven D'Aprano <[EMAIL PROTECTED]> wrote: >... >> I should also point out that for really serious work, the idiom: >> >> f = file("parrot") >> handle(f) >> f.close() >> >> is insufficiently robust for production level code. That was

Re: Scanning a file

2005-10-30 Thread Peter Otten
David Rasmussen wrote: > None of the solutions presented in this thread is nearly as fast as the > > print file("filename", "rb").read().count("\x00\x00\x01\x00") Have you already timed Scott David Daniels' approach with a /large/ blocksize? It looks promising. Peter -- http://mail.python.org

Re: Scanning a file

2005-10-30 Thread Fredrik Lundh
Paul Watson wrote: > This is Cyngwin on Windows XP. using cygwin to analyze performance characteristics of portable API:s is a really lousy idea. here are corresponding figures from a real operating system: using a 16 MB file: $ time python2.4 scanmap.py real0m0.080s user0m

Re: Scanning a file

2005-10-30 Thread Peter Otten
David Rasmussen wrote: > None of the solutions presented in this thread is nearly as fast as the > > print file("filename", "rb").read().count("\x00\x00\x01\x00") Have you already timed Scott David Daniel's approach with a /large/ blocksize? It looks promising. Peter -- http://mail.python.org

Re: Scanning a file

2005-10-29 Thread netvaibhav
Steve Holden wrote: > Indeed, but reading one byte at a time is about the slowest way to > process a file, in Python or any other language, because it fails to > amortize the overhead cost of function calls over many characters. > > Buffering wasn't invented because early programmers had nothing b

Re: Scanning a file

2005-10-29 Thread Alex Martelli
Steven D'Aprano <[EMAIL PROTECTED]> wrote: ... > I should also point out that for really serious work, the idiom: > > f = file("parrot") > handle(f) > f.close() > > is insufficiently robust for production level code. That was a detail I > didn't think I needed to drop on the original newbie po

Re: Scanning a file

2005-10-29 Thread Alex Martelli
Paul Watson <[EMAIL PROTECTED]> wrote: > "Alex Martelli" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > > > In today's implementations of Classic Python, yes. In other equally > > valid implementations of the language, such as Jython, IronPython, or, > > for all we know, some f

Re: Scanning a file

2005-10-29 Thread David Rasmussen
Steven D'Aprano wrote: > On Fri, 28 Oct 2005 06:22:11 -0700, [EMAIL PROTECTED] wrote: > >>Which is quite fast. The only problems is that the file might be huge. > > What *you* call huge and what *Python* calls huge may be very different > indeed. What are you calling huge? > I'm not saying that

Re: Scanning a file

2005-10-29 Thread Mike Meyer
"Paul Watson" <[EMAIL PROTECTED]> writes: > "Mike Meyer" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >> "Paul Watson" <[EMAIL PROTECTED]> writes: > ... >> Did you do timings on it vs. mmap? Having to copy the data multiple >> times to deal with the overlap - thanks to strings be

Re: Scanning a file

2005-10-29 Thread Steven D'Aprano
On Sat, 29 Oct 2005 21:08:09 +, Tim Roberts wrote: >>In any case, you are assuming that Python will automagically close the >>file when you are done. > > Nonsense. This behavior is deterministic. At the end of that line, the > anonymous file object out of scope, the object is deleted, and t

Re: Scanning a file

2005-10-29 Thread David Rasmussen
[EMAIL PROTECTED] wrote: > I think implementing a finite state automaton would be a good (best?) > solution. I have drawn a FSM for you (try viewing the following in > fixed width font). Just increment the count when you reach state 5. > > <---| >

Re: Scanning a file

2005-10-29 Thread Paul Rubin
"Paul Watson" <[EMAIL PROTECTED]> writes: > How could I identify when Python code does not close files and depends on > the runtime to take care of this? I want to know that the code will work > well under other Python implementations and future implementations which may > not have this provide

Re: Scanning a file

2005-10-29 Thread Paul Watson
"Alex Martelli" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > In today's implementations of Classic Python, yes. In other equally > valid implementations of the language, such as Jython, IronPython, or, > for all we know, some future implementation of Classic, that may well > not

Re: Scanning a file

2005-10-29 Thread Steve Holden
[EMAIL PROTECTED] wrote: > I think implementing a finite state automaton would be a good (best?) > solution. I have drawn a FSM for you (try viewing the following in > fixed width font). Just increment the count when you reach state 5. > > <---| >

Re: Scanning a file

2005-10-29 Thread Alex Martelli
Tim Roberts <[EMAIL PROTECTED]> wrote: ... > >> print file("filename", "rb").read().count("\x00\x00\x01\x00") > > > >Funny you should say that, because I can't stand unnecessary one-liners. > > > >In any case, you are assuming that Python will automagically close the > >file when you are done. >

Re: Scanning a file

2005-10-29 Thread Scott David Daniels
Paul Watson wrote: > Here is a better one that counts, and not just detects, the substring. This > is -much- faster than using mmap; especially for a large file that may cause > paging to start. Using mmap can be -very- slow. > > > ... > b = fp.read(blocksize) > count = 0 > while len(b) > b

Re: Scanning a file

2005-10-29 Thread Tim Roberts
Steven D'Aprano <[EMAIL PROTECTED]> wrote: > >On Fri, 28 Oct 2005 15:29:46 +0200, Björn Lindström wrote: > >> "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: >> >>> f = open("filename", "rb") >>> s = f.read() >>> sub = "\x00\x00\x01\x00" >>> count = s.count(sub) >>> print count >> >> That's a lot

Re: Scanning a file

2005-10-29 Thread Steve Holden
Peter Otten wrote: > Bengt Richter wrote: > > >>What struck me was >> >> > gen = byblocks(StringIO.StringIO('no'),1024,len('end?')-1) > [gen.next() for i in xrange(10)] >> >>['no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no'] > > > Ouch. Seems like I spotted the subtle cornerca

Re: Scanning a file

2005-10-29 Thread netvaibhav
I think implementing a finite state automaton would be a good (best?) solution. I have drawn a FSM for you (try viewing the following in fixed width font). Just increment the count when you reach state 5. <---| || 0 0

Re: Scanning a file

2005-10-29 Thread Paul Watson
"Mike Meyer" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > "Paul Watson" <[EMAIL PROTECTED]> writes: ... > Did you do timings on it vs. mmap? Having to copy the data multiple > times to deal with the overlap - thanks to strings being immutable - > would seem to be a lose, and makes

Re: Scanning a file

2005-10-29 Thread Peter Otten
Bengt Richter wrote: > What struck me was > > >>> gen = byblocks(StringIO.StringIO('no'),1024,len('end?')-1) > >>> [gen.next() for i in xrange(10)] > ['no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no', 'no'] Ouch. Seems like I spotted the subtle cornercase error and missed the big one. Peter

Re: Scanning a file

2005-10-29 Thread Alex Martelli
Bengt Richter <[EMAIL PROTECTED]> wrote: ... > >>>while block: > >>>block = block[-overlap:] + f.read(blocksize-overlap) > >>>if block: yield block ... > I was thinking this was an example a la Alex's previous discussion > of interviewee code challenges ;-) > > What struc

Re: Scanning a file

2005-10-29 Thread Bengt Richter
On Sat, 29 Oct 2005 10:34:24 +0200, Peter Otten <[EMAIL PROTECTED]> wrote: >Bengt Richter wrote: > >> On Fri, 28 Oct 2005 20:03:17 -0700, [EMAIL PROTECTED] (Alex Martelli) >> wrote: >> >>>Mike Meyer <[EMAIL PROTECTED]> wrote: >>> ... Except if you can't read the file into memory because it

Re: Scanning a file

2005-10-29 Thread Peter Otten
Bengt Richter wrote: > On Fri, 28 Oct 2005 20:03:17 -0700, [EMAIL PROTECTED] (Alex Martelli) > wrote: > >>Mike Meyer <[EMAIL PROTECTED]> wrote: >> ... >>> Except if you can't read the file into memory because it's to large, >>> there's a pretty good chance you won't be able to mmap it either.

Re: Scanning a file

2005-10-29 Thread Bengt Richter
On Fri, 28 Oct 2005 20:03:17 -0700, [EMAIL PROTECTED] (Alex Martelli) wrote: >Mike Meyer <[EMAIL PROTECTED]> wrote: > ... >> Except if you can't read the file into memory because it's to large, >> there's a pretty good chance you won't be able to mmap it either. To >> deal with huge files, the

Re: Scanning a file

2005-10-29 Thread Bengt Richter
On 28 Oct 2005 06:51:36 -0700, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: >First of all, this isn't a text file, it is a binary file. Secondly, >substrings can overlap. In the sequence 0010010 the substring 0010 >occurs twice. > ISTM you better let others know exactly what you mean by this, be

Re: Scanning a file

2005-10-28 Thread Fredrik Lundh
Mike Meyer wrote: > Did you do timings on it vs. mmap? Having to copy the data multiple > times to deal with the overlap - thanks to strings being immutable - > would seem to be a lose, and makes me wonder how it could be faster > than mmap in general. if you use "mmap" to read large files sequen

Re: Scanning a file

2005-10-28 Thread Mike Meyer
"Paul Watson" <[EMAIL PROTECTED]> writes: > Here is a better one that counts, and not just detects, the substring. This > is -much- faster than using mmap; especially for a large file that may cause > paging to start. Using mmap can be -very- slow. > > #!/usr/bin/env python > import sys > > fn

Re: Scanning a file

2005-10-28 Thread Paul Watson
"Paul Watson" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >>I want to scan a file byte for byte for occurences of the the four byte >> pattern 0x0100. I've tried with this: >> >> # start >> import sys >> >> numCha

Re: Scanning a file

2005-10-28 Thread Alex Martelli
Mike Meyer <[EMAIL PROTECTED]> wrote: ... > Except if you can't read the file into memory because it's to large, > there's a pretty good chance you won't be able to mmap it either. To > deal with huge files, the only option is to read the file in in > chunks, count the occurences in each chunk,

Re: Scanning a file

2005-10-28 Thread Steven D'Aprano
On Fri, 28 Oct 2005 06:22:11 -0700, [EMAIL PROTECTED] wrote: > Which is quite fast. The only problems is that the file might be huge. What *you* call huge and what *Python* calls huge may be very different indeed. What are you calling huge? > I really have no need for reading the entire file int

Re: Scanning a file

2005-10-28 Thread Steven D'Aprano
On Fri, 28 Oct 2005 15:29:46 +0200, Björn Lindström wrote: > "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > >> f = open("filename", "rb") >> s = f.read() >> sub = "\x00\x00\x01\x00" >> count = s.count(sub) >> print count > > That's a lot of lines. This is a bit off topic, but I just can't sta

Re: Scanning a file

2005-10-28 Thread Paul Watson
<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I want to scan a file byte for byte for occurences of the the four byte > pattern 0x0100. I've tried with this: > > # start > import sys > > numChars = 0 > startCode = 0 > count = 0 > > inputFile = sys.stdin > > while True: >ch =

Re: Scanning a file

2005-10-28 Thread Kent Johnson
[EMAIL PROTECTED] wrote: > I want to scan a file byte for byte for occurences of the the four byte > pattern 0x0100. data = sys.stdin.read() print data.count('\x00\x00\x01\x00') Kent -- http://mail.python.org/mailman/listinfo/python-list

Re: Scanning a file

2005-10-28 Thread Mike Meyer
Andrew McCarthy <[EMAIL PROTECTED]> writes: > On 2005-10-28, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> I'm now down to: >> >> f = open("filename", "rb") >> s = f.read() >> sub = "\x00\x00\x01\x00" >> count = s.count(sub) >> print count >> >> Which is quite fast. The only problems is that the

Re: Scanning a file

2005-10-28 Thread James Stroud
On Friday 28 October 2005 06:29, Björn Lindström wrote: > "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > > f = open("filename", "rb") > > s = f.read() > > sub = "\x00\x00\x01\x00" > > count = s.count(sub) > > print count > > That's a lot of lines. This is a bit off topic, but I just can't stand

Re: Scanning a file

2005-10-28 Thread Paul Watson
<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I want to scan a file byte for byte for occurences of the the four byte > pattern 0x0100. I've tried with this: > > # start > import sys > > numChars = 0 > startCode = 0 > count = 0 > > inputFile = sys.stdin > > while True: >ch =

Re: Scanning a file

2005-10-28 Thread Jeremy Sanders
Gerhard Häring wrote: > [EMAIL PROTECTED] wrote: >> I want to scan a file byte for byte [...] >> while True: >> ch = inputFile.read(1) >> [...] But it is very slow. What is the fastest way to do this? Using some >> native call? Using a buffer? Using whatever? > > Read in blocks, not byte for

Re: Scanning a file

2005-10-28 Thread Andrew McCarthy
On 2005-10-28, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I'm now down to: > > f = open("filename", "rb") > s = f.read() > sub = "\x00\x00\x01\x00" > count = s.count(sub) > print count > > Which is quite fast. The only problems is that the file might be huge. > I really have no need for reading

Re: Scanning a file

2005-10-28 Thread [EMAIL PROTECTED]
First of all, this isn't a text file, it is a binary file. Secondly, substrings can overlap. In the sequence 0010010 the substring 0010 occurs twice. /David -- http://mail.python.org/mailman/listinfo/python-list

Re: Scanning a file

2005-10-28 Thread Bernhard Herzog
Jorge Godoy <[EMAIL PROTECTED]> writes: > How about iterating through the file? You can read it line by line, two lines > at a time. Pseudocode follows: > > line1 = read_line > while line2 = read_line: > line_to_check = ''.join([line1, line2]) > check_for_desired_string > line1

Re: Scanning a file

2005-10-28 Thread Jorge Godoy
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > Which is quite fast. The only problems is that the file might be huge. > I really have no need for reading the entire file into a string as I am > doing here. All I want is to count occurences this substring. Can I > somehow count occurences in a f

Re: Scanning a file

2005-10-28 Thread Björn Lindström
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > f = open("filename", "rb") > s = f.read() > sub = "\x00\x00\x01\x00" > count = s.count(sub) > print count That's a lot of lines. This is a bit off topic, but I just can't stand unnecessary local variables. print file("filename", "rb").read().coun

Re: Scanning a file

2005-10-28 Thread [EMAIL PROTECTED]
I'm now down to: f = open("filename", "rb") s = f.read() sub = "\x00\x00\x01\x00" count = s.count(sub) print count Which is quite fast. The only problems is that the file might be huge. I really have no need for reading the entire file into a string as I am doing here. All I want is to count occu

Re: Scanning a file

2005-10-28 Thread Paul Rubin
[EMAIL PROTECTED] writes: > I want to scan a file byte for byte for occurences of the the four byte > pattern 0x0100. I've tried with this: use re.search or string.find. The simplest way is just read the whole file into memory first. If the file is too big, you have to read it in chunks and

Re: Scanning a file

2005-10-28 Thread pinkfloydhomer
Okay, how do I do this? Also, if you look at the code, I build a 32-bit unsigned integer from the bytes I read. And the 32-bit pattern I am looking for can start on _any_ byte boundary in the file. It would be nice if I could somehow just scan for that pattern explicitly, without having to build a

Re: Scanning a file

2005-10-28 Thread Gerhard Häring
[EMAIL PROTECTED] wrote: > I want to scan a file byte for byte [...] > while True: > ch = inputFile.read(1) > [...] But it is very slow. What is the fastest way to do this? Using some > native call? Using a buffer? Using whatever? Read in blocks, not byte for byte. I had good experiences with

Scanning a file

2005-10-28 Thread pinkfloydhomer
I want to scan a file byte for byte for occurences of the the four byte pattern 0x0100. I've tried with this: # start import sys numChars = 0 startCode = 0 count = 0 inputFile = sys.stdin while True: ch = inputFile.read(1) numChars += 1 if len(ch) < 1: break startCode = ((