On Aug 8, 2015, at 1:56 PM, Richmond wrote: > On 08/08/15 20:48, Peter M. Brigham wrote: >> On Aug 8, 2015, at 12:42 PM, Richmond wrote: >> >>> Jane Austen [amongst others] uses an interesting type of grammatical >>> construction of this sort: >>> >>> After breakfast, the girls walked to Meryton to inquire if Mr. Wickham >>> _were returned_, and to lament over his absence from the Netherfield ball. >>> >>> Pride and Prejudice. >>> >>> I would like to analyse a million word corpus that I have been granted >>> access to for this type of construction. >>> >>> However, I don't want to find examples of only 'were returned', but all >>> examples of >>> >>> were + infinitive / preterite / past participle >>> >>> and, presumably for that I shall have to use wildcards . . . >>> >>> OR ??? >> I'll leave it to those who speak Regex to suggest a wildcard solution. >> Here's another one (not tested) that will catch past participles ending in >> "ed". > > Looks good; however, I am really looking for ALL preterites; such as > 'become', so your 'ed' trap won't catch that. > > I am wondering about using a listField of all the preterites that I am > looking for.
if you do that then just make the repeat loop as follows: repeat for each item w in offList put word w+1 of pText into testWord if testWord ends with "ed" then put w & comma after outList else if testWord is among the words of fld "preteritesList" then put w & comma after outList end repeat This will be faster if you put the preteritesList field into a variable before the repeat loop, since it's significantly faster for the engine to access the contents of a variable compared with the contents of a field. -- Peter Peter M. Brigham pmb...@gmail.com http://home.comcast.net/~pmbrig >> Not sure how this will scale with large texts: >> >> function findWere pText >> -- returns a comma-delim list of all the word offsets matching "were *ed" >> put wordOffsets("were", pText, true) into offList >> repeat for each item w in offList >> put word w+1 of pText into testWord >> if testWord ends with "ed" then put w & comma after outList >> end repeat >> return item 1 to -1 of outList >> end if >> >> function wordOffsets str, pContainer, matchWhole >> -- returns a comma-delimited list of all the wordOffsets of str in >> pContainer >> -- if matchWhole = true then only whole words are located >> -- else will find word matches everywhere str is part of a word in >> pContainer >> -- note that in LC words will include adjacent puncutation, >> -- so using matchWhole = true may exclude too many "words" >> -- duplicates are stripped out >> -- eg wordOffsets("co","the common coconut") = 2,3 not 2,3,3 >> -- note: to get the last wordOffset of a string in a container (often >> useful) >> -- use "item -1 of wordOffsets(...)" >> -- by Peter M. Brigham, pmb...@gmail.com — freeware >> -- requires offsets() >> >> if matchWhole = empty then put false into matchWhole >> put offsets(str,pContainer) into offList >> if offList = 0 then return 0 >> repeat for each item i in offList >> put the number of words of (char 1 to i of pContainer) into wdNbr >> if matchWhole then >> if word wdNbr of pContainer <> str then next repeat >> end if >> put 1 into A[wdNbr] >> -- using an array avoids duplicates >> end repeat >> put the keys of A into wordList >> sort lines of wordList ascending numeric >> replace cr with comma in wordList >> return wordList >> end wordOffsets >> >> function offsets str, pContainer >> -- returns a comma-delimited list of all the offsets of str in pContainer >> -- returns 0 if not found >> -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5" >> -- ie, overlapping offsets are not counted >> -- note: to get the last occurrence of a string in a container (often >> useful) >> -- use "item -1 of offsets(...)" >> -- by Peter M. Brigham, pmb...@gmail.com — freeware >> >> if str is not in pContainer then return 0 >> put 0 into startPoint >> repeat >> put offset(str,pContainer,startPoint) into thisOffset >> if thisOffset = 0 then exit repeat >> add thisOffset to startPoint >> put startPoint & comma after offsetList >> add length(str)-1 to startPoint >> end repeat >> return item 1 to -1 of offsetList -- delete trailing comma >> end offsets >> >> P.S. I love Jane Austen. One of my favorite books of all time is "Pride and >> Prejudice." It's so beautifully constructed. > > > Glad to hear that another programmer doesn't spend all their time in front of > a computer screen! _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode