On Aug 8, 2015, at 6:41 PM, Richard Gaskin wrote:
> Richmond wrote:
>
>> function findWere pText
>> -- returns a comma-delim list of all the line offsets matching "were *ed"
>> -- or "were" && <a word in your preterite list>.
>> put fld "WERBS" into pretList
>> put wordOffsets("were", pText, true) into offList
>
> Unless the build you're using a custom build, wouldn't that be "wordOffset"
> (singular)?
I included the utility functions wordOffsets() and offsets() in one of my
previous posts. I probably should have repeated them. I use them a lot -- there
are many contexts in which they are useful.
function wordOffsets str, pContainer, matchWhole
-- returns a comma-delimited list of all the wordOffsets of str in pContainer
-- if matchWhole = true then only whole words are located
-- else will find word matches everywhere str is part of a word in
pContainer
-- note that in LC words will include adjacent puncutation,
-- so using matchWhole = true may exclude too many "words"
-- duplicates are stripped out
-- eg wordOffsets("co","the common coconut") = 2,3 not 2,3,3
-- note: to get the last wordOffset of a string in a container (often useful)
-- use "item -1 of wordOffsets(...)"
-- by Peter M. Brigham, [email protected] — freeware
-- requires offsets()
if matchWhole = empty then put false into matchWhole
put offsets(str,pContainer) into offList
if offList = 0 then return 0
repeat for each item i in offList
put the number of words of (char 1 to i of pContainer) into wdNbr
if matchWhole then
if word wdNbr of pContainer <> str then next repeat
end if
put 1 into A[wdNbr]
-- using an array avoids duplicates
end repeat
put the keys of A into wordList
sort lines of wordList ascending numeric
replace cr with comma in wordList
return wordList
end wordOffsets
function offsets str, pContainer
-- returns a comma-delimited list of all the offsets of str in pContainer
-- returns 0 if not found
-- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
-- ie, overlapping offsets are not counted
-- note: to get the last occurrence of a string in a container (often useful)
-- use "item -1 of offsets(...)"
-- by Peter M. Brigham, [email protected] — freeware
if str is not in pContainer then return 0
put 0 into startPoint
repeat
put offset(str,pContainer,startPoint) into thisOffset
if thisOffset = 0 then exit repeat
add thisOffset to startPoint
put startPoint & comma after offsetList
add length(str)-1 to startPoint
end repeat
return item 1 to -1 of offsetList -- delete trailing comma
end offsets
> Also, if you're using v7 you might consider "trueWordOffset", which accounts
> for quote characters and omits punctuation that characterize the historic
> definition of "word" in xTalks.
>
> The Unicode libraries in v7 make many natural-language parsing tasks much
> simpler - there's even a new "sentence" chunk type.
Yes, with newer versions the engine now does stuff that required scripted
functions in earlier LC versions. I'm still not using later versions because my
work stacks don't run in them properly, so I have all these utility functions
in my library.
-- Peter
Peter M. Brigham
[email protected]
http://home.comcast.net/~pmbrig
_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode