On Wed, 21 Jul 2010 10:06:14 -0500, Brandon Harris wrote:
> what do you mean by slurp the entire file? I'm trying to use regular
> expressions because line by line parsing will be too slow. And example
> file would have somewhere in the realm of 6 million lines of code.
And you think trying to r
Brandon Harris wrote:
> I'm trying to read in and parse an ascii type file that contains
> information that can span several lines.
> Example:
What about something like this (you need re.MULTILINE):
In [16]: re.findall('^([^ ].*\n([ ].*\n)+)', a, re.MULTILINE)
Out[16]:
[('createNode animCurveTU
>>> I could make it that simple, but that is also incredibly slow and on
>>> a file with several million lines, it takes somewhere in the league of
>>> half an hour to grab all the data. I need this to grab data from
>>> many many file and return the data quickly.
>>>
>>> Brandon L. Harris
>>>
>> T
Could it be that there isn't just that type of data in the file? there
are many different types, that is just one that I'm trying to grab.
Brandon L. Harris
Andreas Tawn wrote:
I could make it that simple, but that is also incredibly slow and on a
file with several million lines, it takes som
> I could make it that simple, but that is also incredibly slow and on a
> file with several million lines, it takes somewhere in the league of
> half an hour to grab all the data. I need this to grab data from many
> many file and return the data quickly.
>
> Brandon L. Harris
That's surprising.
I could make it that simple, but that is also incredibly slow and on a
file with several million lines, it takes somewhere in the league of
half an hour to grab all the data. I need this to grab data from many
many file and return the data quickly.
Brandon L. Harris
Andreas Tawn wrote:
I'm
Brandon Harris wrote:
> I'm trying to read in and parse an ascii type file that contains
> information that can span several lines.
> Example:
>
> createNode animCurveTU -n "test:master_globalSmooth";
> setAttr ".tan" 9;
> setAttr -s 4 ".ktv[0:3]" 101 0 163 0 169 0 201 0;
> setAttr -
> I'm trying to read in and parse an ascii type file that contains
> information that can span several lines.
> Example:
>
> createNode animCurveTU -n "test:master_globalSmooth";
> setAttr ".tan" 9;
> setAttr -s 4 ".ktv[0:3]" 101 0 163 0 169 0 201 0;
> setAttr -s 4 ".kit[3]" 10;
>
At the moment I'm trying to stick with built in python modules to create
tools for a much larger pipeline on multiple OSes.
Brandon L. Harris
Eknath Venkataramani wrote:
On Wed, Jul 21, 2010 at 8:12 PM, Brandon Harris
mailto:brandon.har...@reelfx.com>> wrote:
I'm trying to read in an
On Wed, Jul 21, 2010 at 8:12 PM, Brandon Harris
wrote:
> I'm trying to read in and parse an ascii type file that contains
> information that can span several lines.
>
Do you have to use only regex? If not, I'd certainly suggest 'pyparsing'.
It's a pleasure to use and very easy on the eye too, if
what do you mean by slurp the entire file?
I'm trying to use regular expressions because line by line parsing will
be too slow. And example file would have somewhere in the realm of 6
million lines of code.
Brandon L. Harris
Rodrick Brown wrote:
Slurp the entire file into a string and pick o
Slurp the entire file into a string and pick out the fields you need.
Sent from my iPhone 4.
On Jul 21, 2010, at 10:42 AM, Brandon Harris wrote:
> I'm trying to read in and parse an ascii type file that contains information
> that can span several lines.
> Example:
>
> createNode animCurveTU
I'm trying to read in and parse an ascii type file that contains
information that can span several lines.
Example:
createNode animCurveTU -n "test:master_globalSmooth";
setAttr ".tan" 9;
setAttr -s 4 ".ktv[0:3]" 101 0 163 0 169 0 201 0;
setAttr -s 4 ".kit[3]" 10;
setAttr -s 4 ".kot
Steven Bethard wrote:
Kent Johnson wrote:
for line in raw_data:
if line.startswith('RelevantInfo1'):
info1 = raw_data.next().strip()
elif line.startswith('RelevantInfo2'):
info2 = raw_data.next().strip()
elif line.startswith('RelevantInfo3'):
info3 = raw_data.nex
Kent Johnson wrote:
for line in raw_data:
if line.startswith('RelevantInfo1'):
info1 = raw_data.next().strip()
elif line.startswith('RelevantInfo2'):
info2 = raw_data.next().strip()
elif line.startswith('RelevantInfo3'):
info3 = raw_data.next().strip()
sc
On Thu, 3 Mar 2005 12:26:37 -0800, James Stroud <[EMAIL PROTECTED]> wrote:
> Have a look at "martel", part of biopython. The world of bioinformatics is
> filled with files with structure like this.
>
> http://www.biopython.org/docs/api/public/Martel-module.html
>
> James
Thanks for the link. Stev
On Thu, 03 Mar 2005 16:25:39 -0500, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Here is another attempt. I'm still not sure I understand what form you want
> the data in. I made a
> dict -> dict -> list structure so if you lookup e.g. scores['10/11/04']['60']
> you get a list of all
> the Relevan
On Thu, 03 Mar 2005 13:45:31 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote:
>
> I think if you use the non-greedy .*? instead of the greedy .*, you'll
> get this behavior. For example:
>
> py> s = """\
> ... Gibberish
> ... 53
> ... MoreGarbage
> [snip a whole bunch of stuff]
> ... RelevantInfo
Here is another attempt. I'm still not sure I understand what form you want the data in. I made a
dict -> dict -> list structure so if you lookup e.g. scores['10/11/04']['60'] you get a list of all
the RelevantInfo2 values for Relevant1='10/11/04' and Relevant2='60'.
The parser is a simple-minde
Yatima wrote:
On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote:
A possible solution, using the re module:
py> s = """\
... Gibberish
... 53
... MoreGarbage
... 12
... RelevantInfo1
... 10/10/04
... NothingImportant
... ThisDoesNotMatter
... 44
... RelevantInfo2
... 22
..
I found the original paper for Martel:
http://www.dalkescientific.com/Martel/ipc9/
On Thursday 03 March 2005 12:26 pm, James Stroud wrote:
> Have a look at "martel", part of biopython. The world of bioinformatics is
> filled with files with structure like this.
>
> http://www.biopython.org/docs/a
On Thu, 03 Mar 2005 07:14:50 -0500, Kent Johnson <[EMAIL PROTECTED]> wrote:
>
> Here is a way to create a list of [RelevantInfo, value] pairs:
> import cStringIO
>
> raw_data = '''Gibberish
> 53
> MoreGarbage
> 12
> RelevantInfo1
> 10/10/04
> NothingImportant
> ThisDoesNotMatter
> 44
> RelevantInfo
Have a look at "martel", part of biopython. The world of bioinformatics is
filled with files with structure like this.
http://www.biopython.org/docs/api/public/Martel-module.html
James
On Thursday 03 March 2005 12:03 pm, Yatima wrote:
> On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard
<[EMAI
On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote:
>
> A possible solution, using the re module:
>
> py> s = """\
> ... Gibberish
> ... 53
> ... MoreGarbage
> ... 12
> ... RelevantInfo1
> ... 10/10/04
> ... NothingImportant
> ... ThisDoesNotMatter
> ... 44
> ... RelevantI
Yatima wrote:
Hey Folks,
I've got some info in a bunch of files that kind of looks like so:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
and so on...
Anyhow, these "fields" repeat several times
Yatima wrote:
Hey Folks,
I've got some info in a bunch of files that kind of looks like so:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
and so on...
Anyhow, these "fields" repeat several times
Hey Folks,
I've got some info in a bunch of files that kind of looks like so:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
and so on...
Anyhow, these "fields" repeat several times in a give
27 matches
Mail list logo