Re: Grammars and biological data formats

2014-08-16 Thread Fields, Christopher J
Yes, that looks like an even better option. I see that this is implemented in p5 as File::Map, which is a nice portable option. Chris > On Aug 16, 2014, at 7:51 AM, "Martin D Kealey" > wrote: > > > Hmmm, what about just implementing mmap-as-string? > > Then, assuming the parsing process is

Re: Grammars and biological data formats

2014-08-16 Thread Martin D Kealey
Hmmm, what about just implementing mmap-as-string? Then, assuming the parsing process is somewhat stream-like, the OS will take care of swapping in chunks as you need them. You don't even need anything special to support backtracking -- it's just a memory address, after all. -Martin On Thu, 14

Re: Grammars and biological data formats

2014-08-14 Thread Fields, Christopher J
Yeah, I'm thinking of a Cat-like class that would chunkify the data and check for matches. The main reason I would like to stick with a consistent grammar-based approach is I have seen many instances in BioPerl where a parser is essentially rewritten based on its purpose (full parsing, lazy par

Re: Grammars and biological data formats

2014-08-14 Thread Carl Mäsak
I was going to pipe in and say that I wouldn't wait around for Cat, I'd write something that reads chunks and then parses that. It'll be a bit more code, but it'll work today. But I see you reached that conclusion already. :) Lately I've found myself writing more and more grammars that parse just

Re: Grammars and biological data formats

2014-08-13 Thread Fields, Christopher J
On Aug 13, 2014, at 8:11 AM, Christopher Fields wrote: > On Aug 13, 2014, at 4:50 AM, Solomon Foster wrote: > >> On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J >> wrote: >>> I have a fairly simple question regarding the feasibility of using grammars >>> with commonly used biological da

Re: Grammars and biological data formats

2014-08-13 Thread Fields, Christopher J
On Aug 13, 2014, at 4:50 AM, Solomon Foster wrote: > On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J > wrote: >> I have a fairly simple question regarding the feasibility of using grammars >> with commonly used biological data formats. >> >> My main question: if I wanted to parse() or su

Re: Grammars and biological data formats

2014-08-13 Thread Solomon Foster
On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J wrote: > I have a fairly simple question regarding the feasibility of using grammars > with commonly used biological data formats. > > My main question: if I wanted to parse() or subparse() vary large files (not > unheard of to have FASTA/FAS

Grammars and biological data formats

2014-08-13 Thread Fields, Christopher J
I have a fairly simple question regarding the feasibility of using grammars with commonly used biological data formats. My main question: if I wanted to parse() or subparse() vary large files (not unheard of to have FASTA/FASTQ or other similar data files exceed 100’s of GB) would a grammar b

Re: Grammars and biological data formats

2014-08-09 Thread Fields, Christopher J
On Aug 9, 2014, at 8:51 PM, "Fields, Christopher J" wrote: > > >> On Aug 9, 2014, at 5:25 PM, "t...@wakelift.de" wrote: >> >> >>> On 08/10/2014 12:21 AM, t...@wakelift.de wrote: >>> Something that does surprise me is that your tests seem to imply that :p >>> for subparse doesn't work. I'll l

Re: Grammars and biological data formats

2014-08-09 Thread Fields, Christopher J
> On Aug 9, 2014, at 5:25 PM, "t...@wakelift.de" wrote: > > >> On 08/10/2014 12:21 AM, t...@wakelift.de wrote: >> Something that does surprise me is that your tests seem to imply that :p >> for subparse doesn't work. I'll look into that, because I believe it >> ought to be implemented already.

Re: Grammars and biological data formats

2014-08-09 Thread Darren Duncan
I've already been thinking for awhile now that parsers need to be able to operate in a streaming fashion (when the grammars lend themselves to it, by not needing to lookahead, much if at all, to understand what they've already seen) so that strings that don't fit in memory all at once can be par

Re: Grammars and biological data formats

2014-08-09 Thread timo
On 08/10/2014 12:21 AM, t...@wakelift.de wrote: > Something that does surprise me is that your tests seem to imply that :p > for subparse doesn't work. I'll look into that, because I believe it > ought to be implemented already. Perhaps not properly hooked up, though. On #perl6 I got corrected qu

Re: Grammars and biological data formats

2014-08-09 Thread timo
(accidentally sent this privately only, now re-sending to the list) Hello Christopher, In the Perl 6 specification, there are plans for lazy and memory-releasing ways to parse strings that are either too large to fit into memory at once or that are generated lazily (like being streamed in through

Grammars and biological data formats

2014-08-09 Thread Fields, Christopher J
(accidentally sent to perl6-lang, apologies for cross-posting but this seems more appropriate) I have a fairly simple question regarding the feasibility of using grammars with commonly used biological data formats. My main question: if I wanted to parse() or subparse() vary large files (not