On 08/08/15 20:48, Peter M. Brigham wrote:
On Aug 8, 2015, at 12:42 PM, Richmond wrote:

Jane Austen [amongst others] uses an interesting type of grammatical 
construction of this sort:

After breakfast, the girls walked to Meryton to inquire if Mr. Wickham
_were returned_, and to lament over his absence from the Netherfield ball.

Pride and Prejudice.

I would like to analyse a million word corpus that I have been granted access 
to for this type of construction.

However, I don't want to find examples of only 'were returned', but all 
examples of

were + infinitive / preterite / past participle

and, presumably for that I shall have to use wildcards . . .

OR ???
I'll leave it to those who speak Regex to suggest a wildcard solution. Here's another one 
(not tested) that will catch past participles ending in "ed".

Looks good; however, I am really looking for ALL preterites; such as 'become', so your 'ed' trap won't catch that.

I am wondering about using a listField of all the preterites that I am looking for.

Not sure how this will scale with large texts:

function findWere pText
    -- returns a comma-delim list of all the word offsets matching "were *ed"
    put wordOffsets("were", pText, true) into offList
    repeat for each item w in offList
       put word w+1 of pText into testWord
       if testWord ends with "ed" then put w & comma after outList
    end repeat
    return item 1 to -1 of outList
end if

function wordOffsets str, pContainer, matchWhole
    -- returns a comma-delimited list of all the wordOffsets of str in 
pContainer
    -- if matchWhole = true then only whole words are located
    --    else will find word matches everywhere str is part of a word in 
pContainer
    --    note that in LC words will include adjacent puncutation,
    --       so using matchWhole = true may exclude too many "words"
    -- duplicates are stripped out
    --    eg wordOffsets("co","the common coconut") = 2,3   not   2,3,3
    -- note: to get the last wordOffset of a string in a container (often 
useful)
    --    use "item -1 of wordOffsets(...)"
    -- by Peter M. Brigham, pmb...@gmail.com — freeware
    -- requires offsets()
if matchWhole = empty then put false into matchWhole
    put offsets(str,pContainer) into offList
    if offList = 0 then return 0
    repeat for each item i in offList
       put the number of words of (char 1 to i of pContainer) into wdNbr
       if matchWhole then
          if word wdNbr of pContainer <> str then next repeat
       end if
       put 1 into A[wdNbr]
       -- using an array avoids duplicates
    end repeat
    put the keys of A into wordList
    sort lines of wordList ascending numeric
    replace cr with comma in wordList
    return wordList
end wordOffsets

function offsets str, pContainer
    -- returns a comma-delimited list of all the offsets of str in pContainer
    -- returns 0 if not found
    -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
    --     ie, overlapping offsets are not counted
    -- note: to get the last occurrence of a string in a container (often 
useful)
    --     use "item -1 of offsets(...)"
    -- by Peter M. Brigham, pmb...@gmail.com — freeware
if str is not in pContainer then return 0
    put 0 into startPoint
    repeat
       put offset(str,pContainer,startPoint) into thisOffset
       if thisOffset = 0 then exit repeat
       add thisOffset to startPoint
       put startPoint & comma after offsetList
       add length(str)-1 to startPoint
    end repeat
    return item 1 to -1 of offsetList -- delete trailing comma
end offsets

P.S. I love Jane Austen. One of my favorite books of all time is "Pride and 
Prejudice." It's so beautifully constructed.


Glad to hear that another programmer doesn't spend all their time in front of a computer screen!


-- Peter



Richmond.

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to