Re: Text Processing

2011-12-22 Thread Yigit Turgut
On Dec 21, 2:01 am, Alexander Kapps wrote: > On 20.12.2011 22:04, Nick Dokos wrote: > > > > > > > > > > >>> I have a text file containing such data ; > > >>>          A                B                C > >>> --- > >>> -2.0100e-01    8.000e-02  

Re: Text Processing

2011-12-20 Thread Alexander Kapps
On 20.12.2011 22:04, Nick Dokos wrote: I have a text file containing such data ; ABC --- -2.0100e-018.000e-028.000e-05 -2.e-010.000e+00 4.800e-04 -1.9900e-014.000e-021.600e-04

Re: Text Processing

2011-12-20 Thread Nick Dokos
Jérôme wrote: > Tue, 20 Dec 2011 11:17:15 -0800 (PST) > Yigit Turgut a écrit: > > > Hi all, > > > > I have a text file containing such data ; > > > > ABC > > --- > > -2.0100e-018.000e-028.000e-0

Re: Text Processing

2011-12-20 Thread Jérôme
Tue, 20 Dec 2011 11:17:15 -0800 (PST) Yigit Turgut a écrit: > Hi all, > > I have a text file containing such data ; > > ABC > --- > -2.0100e-018.000e-028.000e-05 > -2.e-010.000e+00 4.800

Re: Text Processing

2011-12-20 Thread Dave Angel
On 12/20/2011 02:17 PM, Yigit Turgut wrote: Hi all, I have a text file containing such data ; ABC --- -2.0100e-018.000e-028.000e-05 -2.e-010.000e+00 4.800e-04 -1.9900e-014.000e-02

Text Processing

2011-12-20 Thread Yigit Turgut
Hi all, I have a text file containing such data ; ABC --- -2.0100e-018.000e-028.000e-05 -2.e-010.000e+00 4.800e-04 -1.9900e-014.000e-021.600e-04 But I only need Section B, and I

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread S.Mandl
> haven't used XSLT, and don't know if there's one in emacs... > > it'd be nice if someone actually give a example... > Hi Xah, actually I have to correct myself. HTML is not XML. If it were, you could use a stylesheet like this: http://www.w3.org/1999/XSL/Transform";>

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Ian Kelly
On Tue, Jul 5, 2011 at 2:37 PM, Xah Lee wrote: > but in anycase, i can't see how this part would work > ((?:[^<]|<(?!/p>))+) It's not that different from the pattern 「alt="[^"]+"」 earlier in the regex. The capture group accepts one or more characters that either aren't '<', or that are '<' but a

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 5, 12:17 pm, Ian Kelly wrote: > On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee wrote: > > So, a solution by regex is out. > > Actually, none of the complications you listed appear to exclude > regexes.  Here's a possible (untested) solution: > > > ((?:\s* height="[0-9]+">)+) > \s*((?:[^<]|<(?!/

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 5, 12:17 pm, Ian Kelly wrote: > On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee wrote: > > So, a solution by regex is out. > > Actually, none of the complications you listed appear to exclude > regexes.  Here's a possible (untested) solution: > > > ((?:\s* height="[0-9]+">)+) > \s*((?:[^<]|<(?!/

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 4, 12:13 pm, "S.Mandl" wrote: > Nice. I guess that XSLT would be another (the official) approach for > such a task. > Is there an XSLT-engine for Emacs? > > -- Stefan haven't used XSLT, and don't know if there's one in emacs... it'd be nice if someone actually give a example... Xah --

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Ian Kelly
On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee wrote: > So, a solution by regex is out. Actually, none of the complications you listed appear to exclude regexes. Here's a possible (untested) solution: ((?:\s*)+) \s*((?:[^<]|<(?!/p>))+) \s* and corresponding replacement string: \1 \2 I don't kno

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-04 Thread S.Mandl
Nice. I guess that XSLT would be another (the official) approach for such a task. Is there an XSLT-engine for Emacs? -- Stefan -- http://mail.python.org/mailman/listinfo/python-list

emacs lisp text processing example (html5 figure/figcaption)

2011-07-03 Thread Xah Lee
llows. -- Emacs Lisp: Processing HTML: Transform Tags to HTML5 “figure” and “figcaption” Tags Xah Lee, 2011-07-03 Another triumph of using elisp for text processing over perl/python. The Problem -- Summary I want batch tran

Re: Is text processing with dicts a good use case for Python cross-compilers like Cython/Pyrex or ShedSkin?

2010-12-16 Thread Stefan Behnel
pyt...@bdurham.com, 16.12.2010 21:03: Is text processing with dicts a good use case for Python cross-compilers like Cython/Pyrex or ShedSkin? (I've read the cross compiler claims about massive increases in pure numeric performance). Cython is generally a good choice for string proce

Is text processing with dicts a good use case for Python cross-compilers like Cython/Pyrex or ShedSkin?

2010-12-16 Thread python
Is text processing with dicts a good use case for Python cross-compilers like Cython/Pyrex or ShedSkin? (I've read the cross compiler claims about massive increases in pure numeric performance). I have 3 use cases I'm considering for Python-to-C++ cross-compilers for generating 32-

Re: Simple Text Processing

2009-09-13 Thread Tim Roberts
Steven D'Aprano wrote: >On Fri, 11 Sep 2009 21:52:36 -0700, Tim Roberts wrote: > >> Basically, when you're good with Perl, you start to think of every task >> in terms of regular expression matches. When you're good with Python, >> you start to think of every task in terms of lists and tuples. >

Re: Simple Text Processing

2009-09-11 Thread Steven D'Aprano
On Fri, 11 Sep 2009 21:52:36 -0700, Tim Roberts wrote: > Basically, when you're good with Perl, you start to think of every task > in terms of regular expression matches. When you're good with Python, > you start to think of every task in terms of lists and tuples. Not me -- I think of most such

Re: Simple Text Processing

2009-09-11 Thread Tim Roberts
AJAskey wrote: > >Never mind. I guess I had been trying to make it more difficult than >it is. As a note, I can work on something for 10 hours and not figure >it out. But the second I post to a group, then I immediately figure >it out myself. Strange snake this Python... Come sit on the couch

Re: Simple Text Processing

2009-09-10 Thread AJAskey
Never mind. I guess I had been trying to make it more difficult than it is. As a note, I can work on something for 10 hours and not figure it out. But the second I post to a group, then I immediately figure it out myself. Strange snake this Python... Example for anyone else interested: line =

Re: Simple Text Processing

2009-09-10 Thread Benjamin Kaplan
On Thu, Sep 10, 2009 at 11:36 AM, AJAskey wrote: > New to Python. I can solve the problem in perl by using "split()" to > an array. Can't figure it out in Python. > > I'm reading variable lines of text. I want to use the first number I > find. The problem is the lines are variable. > > Input

Simple Text Processing

2009-09-10 Thread AJAskey
New to Python. I can solve the problem in perl by using "split()" to an array. Can't figure it out in Python. I'm reading variable lines of text. I want to use the first number I find. The problem is the lines are variable. Input example: this is a number: 1 here are some numbers 1 2 3 4

Re: text processing SOLVED

2008-09-27 Thread [EMAIL PROTECTED]
Thanks Black Jack Working -- http://mail.python.org/mailman/listinfo/python-list

Re: text processing

2008-09-25 Thread Paul McGuire
On Sep 25, 9:51 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > I have string like follow > 12560/ABC,12567/BC,123,567,890/JK > > I want above string to group like as follow > (12560,ABC) > (12567,BC) > (123,567,890,JK) > > i try regular expression i am able to get first two not the third one.

Re: text processing

2008-09-25 Thread MRAB
On Sep 25, 6:34 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > On Thu, 25 Sep 2008 15:51:28 +0100, [EMAIL PROTECTED] wrote: > > I have string like follow > > 12560/ABC,12567/BC,123,567,890/JK > > > I want above string to group like as follow (12560,ABC) > > (12567,BC) > > (123,567,890,JK

Re: text processing

2008-09-25 Thread kib2
You can do it with regexps too : >-- import re to_watch = re.compile(r"(?P\d+)[/](?P[A-Z]+)") final_list = to_watch.findall("12560/ABC,12567/BC,123,567,890/JK") for number,word in final_list : print "number:%s -- word: %s"%(num

Re: text processing

2008-09-25 Thread Marc 'BlackJack' Rintsch
On Thu, 25 Sep 2008 15:51:28 +0100, [EMAIL PROTECTED] wrote: > I have string like follow > 12560/ABC,12567/BC,123,567,890/JK > > I want above string to group like as follow (12560,ABC) > (12567,BC) > (123,567,890,JK) > > i try regular expression i am able to get first two not the third one. > ca

text processing

2008-09-25 Thread [EMAIL PROTECTED]
I have string like follow 12560/ABC,12567/BC,123,567,890/JK I want above string to group like as follow (12560,ABC) (12567,BC) (123,567,890,JK) i try regular expression i am able to get first two not the third one. can regular expression given data in different groups -- http://mail.python.org

Re: emacs lisp as text processing language...

2007-10-29 Thread Xah Lee
... continued from previous post. PS I'm cross-posting this post to perl and python groups because i find that it being a little know fact that emacs lisp's power in the area of text processing, are far beyond Perl (or Python). ... i worked as a professional perl programer since 1998.

emacs lisp as text processing language...

2007-10-29 Thread Xah Lee
Text Processing with Emacs Lisp Xah Lee, 2007-10-29 This page gives a outline of how to use emacs lisp to do text processing, using a specific real-world problem as example. If you don't know elisp, first take a gander at Emacs Lisp Basics. HTML version with links and colors is at:

Re: Simple Text Processing Help

2007-10-17 Thread Tim Roberts
[EMAIL PROTECTED] wrote: > >And now for something completely different... > >I've been reading up a bit about Python and Excel and I quickly told >the program to output to Excel quite easily. However, what if the >input file were a Word document? I can't seem to find much >information about parsi

Re: Simple Text Processing Help

2007-10-16 Thread patrick . waldo
And now for something completely different... I've been reading up a bit about Python and Excel and I quickly told the program to output to Excel quite easily. However, what if the input file were a Word document? I can't seem to find much information about parsing Word files. What could I add

Re: Simple Text Processing Help

2007-10-16 Thread patrick . waldo
And now for something completely different... I see a lot of COM stuff with Python for excel...and I quickly made the same program output to excel. What if the input file were a Word document? Where is there information about manipulating word documents, or what could I add to make the same prog

Re: Simple Text Processing Help

2007-10-16 Thread Peter Otten
patrick.waldo wrote: > manipulation? Also, I conceptually get it, but would you mind walking > me through >> for key, group in groupby(instream, unicode.isspace): >> if not key: >> yield "".join(group) itertools.groupby() splits a sequence into groups with the same key; e. g

Re: Simple Text Processing Help

2007-10-15 Thread Paul McGuire
On Oct 14, 8:48 am, [EMAIL PROTECTED] wrote: > Hi all, > > I started Python just a little while ago and I am stuck on something > that is really simple, but I just can't figure out. > > Essentially I need to take a text document with some chemical > information in Czech and organize it into another

Re: Simple Text Processing Help

2007-10-15 Thread Paul Hankin
On Oct 15, 10:08 pm, [EMAIL PROTECTED] wrote: > Because of my limited Python knowledge, I will need to try to figure > out exactly how they work for future text manipulation and for my own > knowledge. Could you recommend some resources for this kind of text > manipulation? Also, I conceptually g

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
Wow, thank you all. All three work. To output correctly I needed to add: output.write("\r\n") This is really a great help!! Because of my limited Python knowledge, I will need to try to figure out exactly how they work for future text manipulation and for my own knowledge. Could you recommend

Re: Simple Text Processing Help

2007-10-15 Thread Peter Otten
patrick.waldo wrote: > my sample input file looks like this( not organized,as you see it): > 200-720-769-93-2 > kyselina mocová C5H4N4O3 > > 200-001-8 50-00-0 > formaldehyd CH2O > > 200-002-3 > 50-01-1 > guanidínium-chlorid CH5N3.ClH Assuming that the records are al

Re: Simple Text Processing Help

2007-10-15 Thread Paul Hankin
On Oct 15, 12:20 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > On Mon, 15 Oct 2007 10:47:16 +, patrick.waldo wrote: > > my sample input file looks like this( not organized,as you see it): > > 200-720-769-93-2 > > kyselina mocová C5H4N4O3 > > > 200-001-8 50-00-0 >

Re: Simple Text Processing Help

2007-10-15 Thread Marc 'BlackJack' Rintsch
On Mon, 15 Oct 2007 10:47:16 +, patrick.waldo wrote: > my sample input file looks like this( not organized,as you see it): > 200-720-769-93-2 > kyselina mocová C5H4N4O3 > > 200-001-8 50-00-0 > formaldehyd CH2O > > 200-002-3 > 50-01-1 > guanidínium-chlorid CH5N3.C

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
> lines = open('your_file.txt').readlines()[:4] > print lines > print map(len, lines) gave me: ['\xef\xbb\xbf200-720-769-93-2\n', 'kyselina mo\xc4\x8dov \xc3\xa1 C5H4N4O3\n', '\n', '200-001-8\t50-00-0\n'] [28, 32, 1, 18] I think it means that I'm still at option 3. I got

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
> lines = open('your_file.txt').readlines()[:4] > print lines > print map(len, lines) gave me: ['\xef\xbb\xbf200-720-769-93-2\n', 'kyselina mo\xc4\x8dov \xc3\xa1 C5H4N4O3\n', '\n', '200-001-8\t50-00-0\n'] [28, 32, 1, 18] I think it means that I'm still at option 3. I got

Re: Simple Text Processing Help

2007-10-14 Thread John Machin
On Oct 14, 11:48 pm, [EMAIL PROTECTED] wrote: > Hi all, > > I started Python just a little while ago and I am stuck on something > that is really simple, but I just can't figure out. > > Essentially I need to take a text document with some chemical > information in Czech and organize it into anothe

Re: Simple Text Processing Help

2007-10-14 Thread Marc 'BlackJack' Rintsch
On Sun, 14 Oct 2007 16:57:06 +, patrick.waldo wrote: > Thank you both for helping me out. I am still rather new to Python > and so I'm probably trying to reinvent the wheel here. > > When I try to do Paul's response, I get tokens = line.strip().split() > [] What is in `line`? Paul wrot

Re: Simple Text Processing Help

2007-10-14 Thread patrick . waldo
Thank you both for helping me out. I am still rather new to Python and so I'm probably trying to reinvent the wheel here. When I try to do Paul's response, I get >>>tokens = line.strip().split() [] So I am not quite sure how to read line by line. tokens = input.read().split() gets me all the in

Re: Simple Text Processing Help

2007-10-14 Thread Paul Hankin
On Oct 14, 2:48 pm, [EMAIL PROTECTED] wrote: > Hi all, > > I started Python just a little while ago and I am stuck on something > that is really simple, but I just can't figure out. > > Essentially I need to take a text document with some chemical > information in Czech and organize it into another

Re: Simple Text Processing Help

2007-10-14 Thread Marc 'BlackJack' Rintsch
On Sun, 14 Oct 2007 13:48:51 +, patrick.waldo wrote: > Essentially I need to take a text document with some chemical > information in Czech and organize it into another text file. The > information is always EINECS number, CAS, chemical name, and formula > in tables. I need to organize them

Simple Text Processing Help

2007-10-14 Thread patrick . waldo
Hi all, I started Python just a little while ago and I am stuck on something that is really simple, but I just can't figure out. Essentially I need to take a text document with some chemical information in Czech and organize it into another text file. The information is always EINECS number, CAS

Re: Text processing and file creation

2007-09-07 Thread Paddy
On Sep 7, 3:50 am, George Sakkis <[EMAIL PROTECTED]> wrote: > On Sep 5, 5:17 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > wrote: > If this was a code golf challenge, I'd choose the Unix split solution and be both maintainable as well as concise :-) - Paddy. -- http://mail.python.org/mailman/li

Re: Text processing and file creation

2007-09-06 Thread George Sakkis
On Sep 5, 5:17 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > On Sep 5, 1:28 pm, Paddy <[EMAIL PROTECTED]> wrote: > > > On Sep 5, 5:13 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > wrote: > > > > I have a text source file of about 20.000 lines.>From this file, I like > > > to write the fir

Re: Text processing and file creation

2007-09-06 Thread Ricardo Aráoz
Shawn Milochik wrote: > On 9/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> I have a text source file of about 20.000 lines. >> >From this file, I like to write the first 5 lines to a new file. Close >> that file, grab the next 5 lines write these to a new file... grabbing >> 5 lines and cre

Re: Text processing and file creation

2007-09-06 Thread Shawn Milochik
Here's my solution, for what it's worth: #!/usr/bin/env python import os input = open("test.txt", "r") counter = 0 fileNum = 0 fileName = "" def newFileName(): global fileNum, fileName while os.path.exists(fileName) or fileName == "": fileNum += 1 x = "%0.5d" % fileN

Re: Text processing and file creation

2007-09-06 Thread Arnau Sanchez
[EMAIL PROTECTED] escribió: > I am still wondering how to do this efficiently in Python (being kind > of new to it... and it's not for homework). You should post some code anyway, it would be easier to give useful advice (it would also demonstrate that you put some effort on it). Anyway, here i

Re: Text processing and file creation

2007-09-06 Thread Alberto Griggio
> Thanks for making me aware of the (UNIX) split command (split -l 5 > inFile.txt), it's short, it's fast, it's beautiful. > > I am still wondering how to do this efficiently in Python (being kind > of new to it... and it's not for homework). Something like this should do the job: def nlines(num

Re: Text processing and file creation

2007-09-05 Thread Arnaud Delobelle
On Sep 6, 12:46 am, Steve Holden <[EMAIL PROTECTED]> wrote: > Arnaud Delobelle wrote: [...] > > print "all done!" # All done > > print "Now there are 4000 files in this directory..." > > > Python 3.0 - ready (I've used open() instead of file()) > > bzzt! > > Python 3.0a1 (py3k:57844, Aug 31

Re: Text processing and file creation

2007-09-05 Thread Ginger
and u can parse lines from read buffer freely. have fun! - Original Message - From: "Shawn Milochik" <[EMAIL PROTECTED]> To: Sent: Thursday, September 06, 2007 1:03 AM Subject: Re: Text processing and file creation > On 9/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTE

Re: Text processing and file creation

2007-09-05 Thread Steve Holden
Arnaud Delobelle wrote: [...] > from my_useful_functions import new_file, write_first_5_lines, > done_processing_file, grab_next_5_lines, another_new_file, write_these > > in_f = open('myfile') > out_f = new_file() > write_first_5_lines(in_f, out_f) # write first 5 lines > close(out_f) > while not

Re: Text processing and file creation

2007-09-05 Thread Arnaud Delobelle
On Sep 5, 5:13 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > I have a text source file of about 20.000 lines.>From this file, I like to > write the first 5 lines to a new file. Close > > that file, grab the next 5 lines write these to a new file... grabbing > 5 lines and creating new files

Re: Text processing and file creation

2007-09-05 Thread [EMAIL PROTECTED]
On Sep 5, 1:28 pm, Paddy <[EMAIL PROTECTED]> wrote: > On Sep 5, 5:13 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > wrote: > > > I have a text source file of about 20.000 lines.>From this file, I like to > > write the first 5 lines to a new file. Close > > > that file, grab the next 5 lines write t

Re: Text processing and file creation

2007-09-05 Thread Paddy
On Sep 5, 5:13 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > I have a text source file of about 20.000 lines.>From this file, I like to > write the first 5 lines to a new file. Close > > that file, grab the next 5 lines write these to a new file... grabbing > 5 lines and creating new files

Re: Text processing and file creation

2007-09-05 Thread James Stroud
[EMAIL PROTECTED] wrote: > I have a text source file of about 20.000 lines. >>From this file, I like to write the first 5 lines to a new file. Close > that file, grab the next 5 lines write these to a new file... grabbing > 5 lines and creating new files until processing of all 20.000 lines is > do

Re: Text processing and file creation

2007-09-05 Thread kyosohma
On Sep 5, 11:57 am, Bjoern Schliessmann wrote: > [EMAIL PROTECTED] wrote: > > I would use a counter in a for loop using the readline method to > > iterate over the 20,000 line file. > > file objects are iterables themselves, so there's no need to do that > by using a method. Very true! Darn it!

Re: Text processing and file creation

2007-09-05 Thread Shawn Milochik
On 9/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I have a text source file of about 20.000 lines. > >From this file, I like to write the first 5 lines to a new file. Close > that file, grab the next 5 lines write these to a new file... grabbing > 5 lines and creating new files until proces

Re: Text processing and file creation

2007-09-05 Thread Bjoern Schliessmann
[EMAIL PROTECTED] wrote: > I would use a counter in a for loop using the readline method to > iterate over the 20,000 line file. file objects are iterables themselves, so there's no need to do that by using a method. > Reset the counter every 5 lines/ iterations and close the file. I'd use a

Re: Text processing and file creation

2007-09-05 Thread Arnau Sanchez
[EMAIL PROTECTED] escribió: > I have a text source file of about 20.000 lines. >>From this file, I like to write the first 5 lines to a new file. Close > that file, grab the next 5 lines write these to a new file... grabbing > 5 lines and creating new files until processing of all 20.000 lines is

Re: Text processing and file creation

2007-09-05 Thread kyosohma
On Sep 5, 11:13 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > I have a text source file of about 20.000 lines.>From this file, I like to > write the first 5 lines to a new file. Close > > that file, grab the next 5 lines write these to a new file... grabbing > 5 lines and creating new files

Text processing and file creation

2007-09-05 Thread [EMAIL PROTECTED]
I have a text source file of about 20.000 lines. >From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new files until processing of all 20.000 lines is done. Is there an efficient way to d

Re: On text processing

2007-03-23 Thread Daniel Nogradi
> > I'm in a process of rewriting a bash/awk/sed script -- that grew to > > big -- in python. I can rewrite it in a simple line-by-line way but > > that results in ugly python code and I'm sure there is a simple > > pythonic way. > > > > The bash script processed text files of the form: > > > > ###

Re: On text processing

2007-03-23 Thread Paul McGuire
On Mar 23, 5:30 pm, "Daniel Nogradi" <[EMAIL PROTECTED]> wrote: > Hi list, > > I'm in a process of rewriting a bash/awk/sed script -- that grew to > big -- in python. I can rewrite it in a simple line-by-line way but > that results in ugly python code and I'm sure there is a simple > pythonic way.

Re: On text processing

2007-03-23 Thread Paddy
On Mar 23, 10:30 pm, "Daniel Nogradi" <[EMAIL PROTECTED]> wrote: > Hi list, > > I'm in a process of rewriting a bash/awk/sed script -- that grew to > big -- in python. I can rewrite it in a simple line-by-line way but > that results in ugly python code and I'm sure there is a simple > pythonic way.

Re: On text processing

2007-03-23 Thread Daniel Nogradi
> This is my first try: > > ddata = {} > > inside_matrix = False > for row in file("data.txt"): > if row.strip(): > fields = row.split() > if len(fields) == 2: > inside_matrix = False > ddata[fields[0]] = [fields[1]] > lastkey = fields[0] >

Re: On text processing

2007-03-23 Thread bearophileHUGS
Daniel Nogradi: > Any elegant solution for this? This is my first try: ddata = {} inside_matrix = False for row in file("data.txt"): if row.strip(): fields = row.split() if len(fields) == 2: inside_matrix = False ddata[fields[0]] = [fields[1]]

On text processing

2007-03-23 Thread Daniel Nogradi
Hi list, I'm in a process of rewriting a bash/awk/sed script -- that grew to big -- in python. I can rewrite it in a simple line-by-line way but that results in ugly python code and I'm sure there is a simple pythonic way. The bash script processed text files of the form: ###

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
I remember something about it coming up in some of the discussions of free lists and better behavior in this regard in 2.5, but I don't remember the details. Under Python 2.5, my original code posting no longer exhibits the bug - upon calling del(a), python's size shrinks back to ~4 MB, which i

Re: Suitability for long-running text processing?

2007-01-08 Thread Chris Mellon
On 1/8/07, tsuraan <[EMAIL PROTECTED]> wrote: > > > > My first thought was that interned strings were causing the growth, > > but that doesn't seem to be the case. > > Interned strings, as of 2.3, are no longer immortal, right? The intern doc > says you have to keep a reference around to the strin

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
My first thought was that interned strings were causing the growth, but that doesn't seem to be the case. Interned strings, as of 2.3, are no longer immortal, right? The intern doc says you have to keep a reference around to the string now, anyhow. I really wish I could find that thing I read

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
$ python Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> # Python is using 2.7 MiB ... a = ['1234' for i in xrange(10 << 20)] >>> # Python is using 42.9 MiB ..

Re: Suitability for long-running text processing?

2007-01-08 Thread Chris Mellon
On 1/8/07, Felipe Almeida Lessa <[EMAIL PROTECTED]> wrote: > On 1/8/07, tsuraan <[EMAIL PROTECTED]> wrote: > > > > > > > I just tried on my system > > > > > > (Python is using 2.9 MiB) > > > >>> a = ['a' * (1 << 20) for i in xrange(300)] > > > (Python is using 304.1 MiB) > > > >>> del a > > > (Pyth

Re: Suitability for long-running text processing?

2007-01-08 Thread Felipe Almeida Lessa
On 1/8/07, tsuraan <[EMAIL PROTECTED]> wrote: > > > > I just tried on my system > > > > (Python is using 2.9 MiB) > > >>> a = ['a' * (1 << 20) for i in xrange(300)] > > (Python is using 304.1 MiB) > > >>> del a > > (Python is using 2.9 MiB -- as before) > > > > And I didn't even need to tell the ga

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
I just tried on my system (Python is using 2.9 MiB) >>> a = ['a' * (1 << 20) for i in xrange(300)] (Python is using 304.1 MiB) >>> del a (Python is using 2.9 MiB -- as before) And I didn't even need to tell the garbage collector to do its job. Some info: It looks like the big difference betwe

Re: Suitability for long-running text processing?

2007-01-08 Thread Felipe Almeida Lessa
On 1/8/07, tsuraan <[EMAIL PROTECTED]> wrote: [snip] > The loop is deep enough that I always interrupt it once python's size is > around 250 MB. Once the gc.collect() call is finished, python's size has > not changed a bit. [snip] > This has been tried under python 2.4.3 in gentoo linux and python

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
After reading http://www.python.org/doc/faq/general/#how-does-python-manage-memory, I tried modifying this program as below: a=[] for i in xrange(33,127): for j in xrange(33,127): for k in xrange(33,127): for l in xrange(33, 127): a.append(chr(i)+chr(j)+chr(k)+chr(l)) import sys sys

Suitability for long-running text processing?

2007-01-08 Thread tsuraan
I have a pair of python programs that parse and index files on my computer to make them searchable. The problem that I have is that they continually grow until my system is out of memory, and then things get ugly. I remember, when I was first learning python, reading that the python interpreter

Re: Beginner question on text processing

2006-12-29 Thread skip
Harold> To illustrate, assume I have a text file, call it test.txt, with Harold> the following information: Harold> X11 .32 Harold> X22 .45 Harold> My goal in the python program is to manipulate this file such Harold> that a new file would be created that looks like:

Beginner question on text processing

2006-12-29 Thread Doran, Harold
I am beginning to use python primarily to organize data into formats needed for input into some statistical packages. I do not have much programming experience outside of LaTeX and R, so some of this is a bit new. I am attempting to write a program that reads in a text file that contains some value

Re: fast text processing

2006-02-21 Thread Larry Bates
Alexis Gallagher wrote: > Steve, > > First, many thanks! > > Steve Holden wrote: >> Alexis Gallagher wrote: >>> >>> filehandle = open("data",'r',buffering=1000) >> >> This buffer size seems, shall we say, unadventurous? It's likely to >> slow things down considerably, since the filesystem is pr

Re: fast text processing

2006-02-21 Thread Alexis Gallagher
Steve, First, many thanks! Steve Holden wrote: > Alexis Gallagher wrote: >> >> filehandle = open("data",'r',buffering=1000) > > This buffer size seems, shall we say, unadventurous? It's likely to slow > things down considerably, since the filesystem is probably going to > naturally wnt to use

Re: fast text processing

2006-02-21 Thread Ben Sizer
Maybe this code will be faster? (If it even does the same thing: largely untested) filehandle = open("data",'r',buffering=1000) fileIter = iter(filehandle) lastLine = fileIter.next() lastTokens = lastLine.strip().split(delimiter) lastGeno = extract(lastTokens[0]) for currentLine in fileIter:

Re: fast text processing

2006-02-21 Thread Steve Holden
Alexis Gallagher wrote: > (I tried to post this yesterday but I think my ISP ate it. Apologies if > this is a double-post.) > > Is it possible to do very fast string processing in python? My > bioinformatics application needs to scan very large ASCII files (80GB+), > compare adjacent lines, and

fast text processing

2006-02-21 Thread Alexis Gallagher
(I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to scan very large ASCII files (80GB+), compare adjacent lines, and conditionally do some further proce

Re: Newbie Text Processing Question

2005-10-04 Thread Fredrik Lundh
Gregory Piñero wrote: >That's how Python works. You read in the whole file, edit it, and write it > back out. that's how file systems work. if file systems generally supported insert operations, Python would of course support that feature. -- http://mail.python.org/mailman/listinfo/python-

Re: Newbie Text Processing Question

2005-10-04 Thread Mike Meyer
[EMAIL PROTECTED] writes: > I'm a total newbie to Python so any and all advice is greatly > appreciated. Well, I've got some for you. > I'm trying to use regular expressions to process text in an SGML file > but only in one section. This is generally a bad idea. SGML family languages aren't easy

Re: Newbie Text Processing Question

2005-10-04 Thread James Stroud
You can edit a file in place, but it is not applicable to what you are doing. As soon as you insert the first "", you've shifted everything downstream by those 8 bytes. Since they map to a physically located blocks on a physical drive, you will have to rewrite those blocks. If it is a big file

Re: Newbie Text Processing Question

2005-10-04 Thread Gregory Piñero
That's how Python works.  You read in the whole file, edit it, and write it back out.  As far as I know there's no way to edit a file "in place" which I'm assuming is what you're asking? And now, cue the responses telling you to use a fancy parser (XML?) for your project ;-) -Greg On 4 Oct 2005 2

Newbie Text Processing Question

2005-10-04 Thread gshepherd281
Hi, I'm a total newbie to Python so any and all advice is greatly appreciated. I'm trying to use regular expressions to process text in an SGML file but only in one section. So the input would look like this: RESEARCH GUIDE content content content content FORMS content content content cont

Re: Improving my text processing script

2005-09-01 Thread Paul McGuire
Yes indeed, the real data often has surprising differences from the simulations! :) It turns out that pyparsing LineStart()'s are pretty fussy. Usually, pyparsing is very forgiving about whitespace between expressions, but it turns out that LineStart *must* be followed by the next expression, wit

Re: Improving my text processing script

2005-09-01 Thread pruebauno
[EMAIL PROTECTED] wrote: > Paul McGuire wrote: > > match...), this program has quite a few holes. > tried run it though and it is not working for me. The following code > runs but prints nothing at all: > > import pyparsing as prs > And this is the point where I have to post the real stuff because

Re: Improving my text processing script

2005-09-01 Thread pruebauno
Paul McGuire wrote: > match...), this program has quite a few holes. > > What if the word "Identifier" is inside one of the quoted strings? > What if the actual value is "tablename10"? This will match your > "tablename1" string search, but it is certainly not what you want. > Did you know there ar

Re: Improving my text processing script

2005-09-01 Thread pruebauno
Miki Tebeka wrote: > Look at re.findall, I think it'll be easier. Minor changes aside the interesting thing, as you pointed out, would be using re.findall. I could not figure out how to. -- http://mail.python.org/mailman/listinfo/python-list

Re: Improving my text processing script

2005-09-01 Thread Miki Tebeka
Hello pruebauno, > import re > f=file('tlst') > tlst=f.read().split('\n') > f.close() tlst = open("tlst").readlines() > f=file('plst') > sep=re.compile('Identifier "(.*?)"') > plst=[] > for elem in f.read().split('Identifier'): > content='Identifier'+elem > match=sep.search(content) >

  1   2   >