Re: New chunks

Paul Dupuis Wed, 12 Mar 2014 18:33:07 -0700

Actually, no it is not based on simple period delimiters. It is using a
code library that has built in rules to understand sentence structure in
most languages and actually can recognize real end-of-sentences. It
deals with numbers, abbreviations, etc. correctly.


Can someone probably construct some sequence of characters that could be
called a sentence that might get mis-parsed? Possibly - I am familiar
with the library RunRev is using only by reputation, so I can't say for
sure. However for most text you will work with where you want to return
"sentence 2 of paragraph 5 of fld X" you will get exactly what you expect.

On 3/12/2014 7:58 PM, Bob Sneidar wrote:
> Pretty sure Livecode is going to do a simple delimiter on period. You would 
> have to prep the data first by replacing periods in any word that is a number 
> with a placeholder, processing your sentences, then restoring the 
> placeholders (if you need to). 
>
> You could get fancy by setting the lineDelimiter to space, then finding every 
> line that ends in a period and processing everything in-between. It’s 
> doubtful a number would end in a period without it being the end of a 
> sentence. 
>
> Bob
>
>
> On Mar 11, 2014, at 15:34 , Jim Hurley <jhurley0...@sbcglobal.net> wrote:
>
>> Can someone explain how the “sentence" chunk would work?
>> How are decimal points, and points in an abbreviation distinguished from the 
>> “period” that deliniates the end of a “sentence?”
>> Does it presume that the exitsing text has special embedded “periods?”
>>
>> I’ve written my own, but it is very cumbersome and not flawless. I use it to 
>> do manuscript analysis.
>> Like: Find all sentences in which “time” and “party” occur anywhere in the 
>> same sentence.
>>
>> My ignorance on unicode is profound.
>> Jim
>>
>> C
>>> Message: 15
>>> Date: Tue, 11 Mar 2014 18:15:18 +0000
>>> From: Benjamin Beaumont <b...@runrev.com>
>>> To: LiveCode Developer List <livecode-...@lists.runrev.com>,        How to
>>>     use LiveCode <use-livecode@lists.runrev.com>
>>> Subject: New chunks
>>> Message-ID:
>>>     <CADd0_Txbhdem4PbKXifXUsujqPLs9HROME6vKhF=sk1znp2...@mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> Hi All,
>>>
>>> We're in the process of adding some new chunk types in LiveCode 7 and we
>>> would appreciate suggestions for a particular chunk name.
>>>
>>> The new chunk types are:
>>>
>>> naturalword (breaks on unicode word boundaries)
>>> sentence (breaks on unicode sentence boundaries)
>>> paragraph (Same behaviour as current 'line' chunk)
>>>
>>> The first chunk is called 'naturalword' because 'word' is already in use.
>>> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
>>> option for backward compatibility. We are also limited by the current
>>> parser which doesn't allow us to use the form:
>>>
>>> put natural word 1 of "this is a string of words"
>>>
>>> 'naturalword' is the clearest internal suggestion at the moment and we'd
>>> love to get the input from community members if there is an even clearer
>>> option.
>>>
>>> Warm regards and thank you for your input.
>>>
>>> Ben
>>>
>>> _____
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: New chunks

Reply via email to