Re: Jane Austen's peculiarity

Richmond Sat, 08 Aug 2015 11:57:00 -0700

On 08/08/15 21:18, Peter M. Brigham wrote:

On Aug 8, 2015, at 1:56 PM, Richmond wrote:

On 08/08/15 20:48, Peter M. Brigham wrote:

On Aug 8, 2015, at 12:42 PM, Richmond wrote:

Jane Austen [amongst others] uses an interesting type of grammatical 
construction of this sort:

After breakfast, the girls walked to Meryton to inquire if Mr. Wickham
_were returned_, and to lament over his absence from the Netherfield ball.

Pride and Prejudice.

I would like to analyse a million word corpus that I have been granted access 
to for this type of construction.

However, I don't want to find examples of only 'were returned', but all 
examples of

were + infinitive / preterite / past participle

and, presumably for that I shall have to use wildcards . . .

OR ???

I'll leave it to those who speak Regex to suggest a wildcard solution. Here's another one 
(not tested) that will catch past participles ending in "ed".

Looks good; however, I am really looking for ALL preterites; such as 'become', 
so your 'ed' trap won't catch that.

I am wondering about using a listField of all the preterites that I am looking 
for.

if you do that then just make the repeat loop as follows:
    repeat for each item w in offList
       put word w+1 of pText into testWord
       if testWord ends with "ed" then put w & comma after outList
       else if testWord is among the words of fld "preteritesList"
       then put w & comma after outList
    end repeat

This will be faster if you put the preteritesList field into a variable before 
the repeat loop, since it's significantly faster for the engine to access the 
contents of a variable compared with the contents of a field.

Thanks for that one I've just made a fool of myself using a listField ofthe verb forms and the "thing" is glacially slow.


As soon as the stack has run its course I will implement your suggestion.

Richmond.


-- Peter

Peter M. Brigham
[email protected]
http://home.comcast.net/~pmbrig

Not sure how this will scale with large texts:

function findWere pText
    -- returns a comma-delim list of all the word offsets matching "were *ed"
    put wordOffsets("were", pText, true) into offList
    repeat for each item w in offList
       put word w+1 of pText into testWord
       if testWord ends with "ed" then put w & comma after outList
    end repeat
    return item 1 to -1 of outList
end if

function wordOffsets str, pContainer, matchWhole
    -- returns a comma-delimited list of all the wordOffsets of str in 
pContainer
    -- if matchWhole = true then only whole words are located
    --    else will find word matches everywhere str is part of a word in 
pContainer
    --    note that in LC words will include adjacent puncutation,
    --       so using matchWhole = true may exclude too many "words"
    -- duplicates are stripped out
    --    eg wordOffsets("co","the common coconut") = 2,3   not   2,3,3
    -- note: to get the last wordOffset of a string in a container (often 
useful)
    --    use "item -1 of wordOffsets(...)"
    -- by Peter M. Brigham, [email protected] — freeware
    -- requires offsets()

if matchWhole = empty then put false into matchWhole

    put offsets(str,pContainer) into offList
    if offList = 0 then return 0
    repeat for each item i in offList
       put the number of words of (char 1 to i of pContainer) into wdNbr
       if matchWhole then
          if word wdNbr of pContainer <> str then next repeat
       end if
       put 1 into A[wdNbr]
       -- using an array avoids duplicates
    end repeat
    put the keys of A into wordList
    sort lines of wordList ascending numeric
    replace cr with comma in wordList
    return wordList
end wordOffsets

function offsets str, pContainer
    -- returns a comma-delimited list of all the offsets of str in pContainer
    -- returns 0 if not found
    -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
    --     ie, overlapping offsets are not counted
    -- note: to get the last occurrence of a string in a container (often 
useful)
    --     use "item -1 of offsets(...)"
    -- by Peter M. Brigham, [email protected] — freeware

if str is not in pContainer then return 0

    put 0 into startPoint
    repeat
       put offset(str,pContainer,startPoint) into thisOffset
       if thisOffset = 0 then exit repeat
       add thisOffset to startPoint
       put startPoint & comma after offsetList
       add length(str)-1 to startPoint
    end repeat
    return item 1 to -1 of offsetList -- delete trailing comma
end offsets

P.S. I love Jane Austen. One of my favorite books of all time is "Pride and 
Prejudice." It's so beautifully constructed.


Glad to hear that another programmer doesn't spend all their time in front of a 
computer screen!


_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Jane Austen's peculiarity

Reply via email to