Actually, no it is not based on simple period delimiters. It is using a code library that has built in rules to understand sentence structure in most languages and actually can recognize real end-of-sentences. It deals with numbers, abbreviations, etc. correctly.
Can someone probably construct some sequence of characters that could be called a sentence that might get mis-parsed? Possibly - I am familiar with the library RunRev is using only by reputation, so I can't say for sure. However for most text you will work with where you want to return "sentence 2 of paragraph 5 of fld X" you will get exactly what you expect. On 3/12/2014 7:58 PM, Bob Sneidar wrote: > Pretty sure Livecode is going to do a simple delimiter on period. You would > have to prep the data first by replacing periods in any word that is a number > with a placeholder, processing your sentences, then restoring the > placeholders (if you need to). > > You could get fancy by setting the lineDelimiter to space, then finding every > line that ends in a period and processing everything in-between. It’s > doubtful a number would end in a period without it being the end of a > sentence. > > Bob > > > On Mar 11, 2014, at 15:34 , Jim Hurley <jhurley0...@sbcglobal.net> wrote: > >> Can someone explain how the “sentence" chunk would work? >> How are decimal points, and points in an abbreviation distinguished from the >> “period” that deliniates the end of a “sentence?” >> Does it presume that the exitsing text has special embedded “periods?” >> >> I’ve written my own, but it is very cumbersome and not flawless. I use it to >> do manuscript analysis. >> Like: Find all sentences in which “time” and “party” occur anywhere in the >> same sentence. >> >> My ignorance on unicode is profound. >> Jim >> >> C >>> Message: 15 >>> Date: Tue, 11 Mar 2014 18:15:18 +0000 >>> From: Benjamin Beaumont <b...@runrev.com> >>> To: LiveCode Developer List <livecode-...@lists.runrev.com>, How to >>> use LiveCode <use-livecode@lists.runrev.com> >>> Subject: New chunks >>> Message-ID: >>> <CADd0_Txbhdem4PbKXifXUsujqPLs9HROME6vKhF=sk1znp2...@mail.gmail.com> >>> Content-Type: text/plain; charset=ISO-8859-1 >>> >>> Hi All, >>> >>> We're in the process of adding some new chunk types in LiveCode 7 and we >>> would appreciate suggestions for a particular chunk name. >>> >>> The new chunk types are: >>> >>> naturalword (breaks on unicode word boundaries) >>> sentence (breaks on unicode sentence boundaries) >>> paragraph (Same behaviour as current 'line' chunk) >>> >>> The first chunk is called 'naturalword' because 'word' is already in use. >>> Renaming the current 'word' chunk to 'token' to free up 'word' is not an >>> option for backward compatibility. We are also limited by the current >>> parser which doesn't allow us to use the form: >>> >>> put natural word 1 of "this is a string of words" >>> >>> 'naturalword' is the clearest internal suggestion at the moment and we'd >>> love to get the input from community members if there is an even clearer >>> option. >>> >>> Warm regards and thank you for your input. >>> >>> Ben >>> >>> _____ >> _______________________________________________ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode