Re: Grammars and biological data formats

Fields, Christopher J Wed, 13 Aug 2014 11:02:28 -0700

On Aug 13, 2014, at 8:11 AM, Christopher Fields <[email protected]> wrote:


> On Aug 13, 2014, at 4:50 AM, Solomon Foster <[email protected]> wrote:
> 
>> On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J
>> <[email protected]> wrote:
>>> I have a fairly simple question regarding the feasibility of using grammars 
>>> with commonly used biological data formats.
>>> 
>>> My main question: if I wanted to parse() or subparse() vary large files 
>>> (not unheard of to have FASTA/FASTQ or other similar data files exceed 
>>> 100’s of GB) would a grammar be the best solution?  For instance, based on 
>>> what I am reading the semantics appear to be greedy; for instance:
>>> 
>>>   Grammar.parsefile($file)
>>> 
>>> appears to be a convenient shorthand for:
>>> 
>>>   Grammar.parse($file.slurp)
>>> 
>>> since Grammar.parse() works on a Str, not a IO::Handle or Buf.  Or am I 
>>> misunderstanding how this could be accomplished?
>> 
>> My understanding is it is intended that parsing can work on Cats
>> (hypothetical lazy strings) but this hasn't been implemented yet
>> anywhere.
>> 
>> -- 
>> Solomon Foster: [email protected]
>> HarmonyWare, Inc: http://www.harmonyware.com
> 
> Yeah, that’s what I recall as well.  I see very little in the specs re: Cat 
> unfortunately.
> 
> chris

Ah, nevermind.  I did a search of the IRC channel and found it’s considered to 
be a ‘6.1’ feature:

    http://irclog.perlgeek.de/perl6/2014-07-06#i_8978974

It is mentioned a few times in the specs, I’m guessing based on where it’s 
thought to fit in best.  For the moment the proposal is to run grammar parsing 
on sized chunks of the input data, which might be how Cat would be implemented 
anyway.

chris

Re: Grammars and biological data formats

Reply via email to