The following module was proposed for inclusion in the Module List:

  modid:       Lingua::EN::Tokenizer::Offsets
  DSLIP:       bdpfp
  description: Finds word (token) boundaries, and returns t
  userid:      ANDREFS (André Fernandes dos Santos)
  chapterid:   11 (String_Lang_Text_Proc)
  communities:
    http://github.com/andrefs/Lingua-EN-Sentence-Offsets/issues

  similar:
    Lingua::FreeLing3::Tokenizer

  rationale:

    Tokenizer (word splitter) for English with a twist (does for tokens
    what Lingua::EN::Sentence::Offsets does for sentences).

    Most tokenizers return either: - the original text with forced
    spacing between tokens - some kind of array with the tokens

    This module was primarily developed to, instead, return a list of
    pairs of start-end offsets for each token. This allows to know where
    each token starts and ends without the need of actually splitting
    the text.

  enteredby:   ANDREFS (André Fernandes dos Santos)
  enteredon:   Sun Jun  3 00:51:05 2012 GMT

The resulting entry would be:

Lingua::EN::Tokenizer::
::Offsets         bdpfp Finds word (token) boundaries, and returns t ANDREFS


Thanks for registering,
-- 
The PAUSE

PS: The following links are only valid for module list maintainers:

Registration form with editing capabilities:
  
https://pause.perl.org/pause/authenquery?ACTION=add_mod&USERID=d0b00000_09d9564b03957820&SUBMIT_pause99_add_mod_preview=1
Immediate (one click) registration:
  
https://pause.perl.org/pause/authenquery?ACTION=add_mod&USERID=d0b00000_09d9564b03957820&SUBMIT_pause99_add_mod_insertit=1
Peek at the current permissions:
  
https://pause.perl.org/pause/authenquery?pause99_peek_perms_by=me&pause99_peek_perms_query=Lingua%3A%3AEN%3A%3ATokenizer%3A%3AOffsets

Reply via email to