Re: Jane Austen's peculiarity

Peter M. Brigham Sat, 08 Aug 2015 19:34:26 -0700

On Aug 8, 2015, at 6:41 PM, Richard Gaskin wrote:

> Richmond wrote:
> 
>> function findWere pText
>>   -- returns a comma-delim list of all the line offsets matching "were *ed"
>>   --    or "were" && <a word in your preterite list>.
>>   put fld "WERBS" into pretList
>>   put wordOffsets("were", pText, true) into offList
> 
> Unless the build you're using a custom build, wouldn't that be "wordOffset" 
> (singular)?


I included the utility functions wordOffsets() and offsets() in one of my 
previous posts. I probably should have repeated them. I use them a lot -- there 
are many contexts in which they are useful.

function wordOffsets str, pContainer, matchWhole
   -- returns a comma-delimited list of all the wordOffsets of str in pContainer
   -- if matchWhole = true then only whole words are located
   --    else will find word matches everywhere str is part of a word in 
pContainer
   --    note that in LC words will include adjacent puncutation,
   --       so using matchWhole = true may exclude too many "words"
   -- duplicates are stripped out
   --    eg wordOffsets("co","the common coconut") = 2,3   not   2,3,3
   -- note: to get the last wordOffset of a string in a container (often useful)
   --    use "item -1 of wordOffsets(...)"
   -- by Peter M. Brigham, [email protected] — freeware
   -- requires offsets()
   
   if matchWhole = empty then put false into matchWhole
   put offsets(str,pContainer) into offList
   if offList = 0 then return 0
   repeat for each item i in offList
      put the number of words of (char 1 to i of pContainer) into wdNbr
      if matchWhole then
         if word wdNbr of pContainer <> str then next repeat
      end if
      put 1 into A[wdNbr]
      -- using an array avoids duplicates
   end repeat
   put the keys of A into wordList
   sort lines of wordList ascending numeric
   replace cr with comma in wordList
   return wordList
end wordOffsets

function offsets str, pContainer
   -- returns a comma-delimited list of all the offsets of str in pContainer
   -- returns 0 if not found
   -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
   --     ie, overlapping offsets are not counted
   -- note: to get the last occurrence of a string in a container (often useful)
   --     use "item -1 of offsets(...)"
   -- by Peter M. Brigham, [email protected] — freeware
   
   if str is not in pContainer then return 0
   put 0 into startPoint
   repeat
      put offset(str,pContainer,startPoint) into thisOffset
      if thisOffset = 0 then exit repeat
      add thisOffset to startPoint
      put startPoint & comma after offsetList
      add length(str)-1 to startPoint
   end repeat
   return item 1 to -1 of offsetList -- delete trailing comma
end offsets

> Also, if you're using v7 you might consider "trueWordOffset", which accounts 
> for quote characters and omits punctuation that characterize the historic 
> definition of "word" in xTalks.
> 
> The Unicode libraries in v7 make many natural-language parsing tasks much 
> simpler - there's even a new "sentence" chunk type.

Yes, with newer versions the engine now does stuff that required scripted 
functions in earlier LC versions. I'm still not using later versions because my 
work stacks don't run in them properly, so I have all these utility functions 
in my library.

-- Peter

Peter M. Brigham
[email protected]
http://home.comcast.net/~pmbrig



_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Jane Austen's peculiarity

Reply via email to