Re: Grammars and biological data formats

Carl Mäsak Thu, 14 Aug 2014 04:41:17 -0700

I was going to pipe in and say that I wouldn't wait around for Cat,
I'd write something that reads chunks and then parses that. It'll be a
bit more code, but it'll work today. But I see you reached that
conclusion already. :)


Lately I've found myself writing more and more grammars that parse
just one line of some input. Provided that the same action object gets
attached to the parse each time, that's an excellent place to store
information that you want to persist between lines. Actually, action
objects started to make a whole lot more sense to me after I found
that use case, because it takes on the role of a session/lifetime
object for the parse process itself.

// Carl

On Wed, Aug 13, 2014 at 3:19 PM, Fields, Christopher J
<cjfie...@illinois.edu> wrote:
> On Aug 13, 2014, at 8:11 AM, Christopher Fields <cjfie...@illinois.edu> wrote:
>
>> On Aug 13, 2014, at 4:50 AM, Solomon Foster <colo...@gmail.com> wrote:
>>
>>> On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J
>>> <cjfie...@illinois.edu> wrote:
>>>> I have a fairly simple question regarding the feasibility of using 
>>>> grammars with commonly used biological data formats.
>>>>
>>>> My main question: if I wanted to parse() or subparse() vary large files 
>>>> (not unheard of to have FASTA/FASTQ or other similar data files exceed 
>>>> 100’s of GB) would a grammar be the best solution?  For instance, based on 
>>>> what I am reading the semantics appear to be greedy; for instance:
>>>>
>>>>   Grammar.parsefile($file)
>>>>
>>>> appears to be a convenient shorthand for:
>>>>
>>>>   Grammar.parse($file.slurp)
>>>>
>>>> since Grammar.parse() works on a Str, not a IO::Handle or Buf.  Or am I 
>>>> misunderstanding how this could be accomplished?
>>>
>>> My understanding is it is intended that parsing can work on Cats
>>> (hypothetical lazy strings) but this hasn't been implemented yet
>>> anywhere.
>>>
>>> --
>>> Solomon Foster: colo...@gmail.com
>>> HarmonyWare, Inc: http://www.harmonyware.com
>>
>> Yeah, that’s what I recall as well.  I see very little in the specs re: Cat 
>> unfortunately.
>>
>> chris
>
> Ah, nevermind.  I did a search of the IRC channel and found it’s considered 
> to be a ‘6.1’ feature:
>
>     http://irclog.perlgeek.de/perl6/2014-07-06#i_8978974
>
> It is mentioned a few times in the specs, I’m guessing based on where it’s 
> thought to fit in best.  For the moment the proposal is to run grammar 
> parsing on sized chunks of the input data, which might be how Cat would be 
> implemented anyway.
>
> chris
>

Re: Grammars and biological data formats

Reply via email to