On 08/08/15 21:18, Peter M. Brigham wrote:
On Aug 8, 2015, at 1:56 PM, Richmond wrote:
On 08/08/15 20:48, Peter M. Brigham wrote:
On Aug 8, 2015, at 12:42 PM, Richmond wrote:
Jane Austen [amongst others] uses an interesting type of grammatical
construction of this sort:
After breakfast, the girls walked to Meryton to inquire if Mr. Wickham
_were returned_, and to lament over his absence from the Netherfield ball.
Pride and Prejudice.
I would like to analyse a million word corpus that I have been granted access
to for this type of construction.
However, I don't want to find examples of only 'were returned', but all
examples of
were + infinitive / preterite / past participle
and, presumably for that I shall have to use wildcards . . .
OR ???
I'll leave it to those who speak Regex to suggest a wildcard solution. Here's another one
(not tested) that will catch past participles ending in "ed".
Looks good; however, I am really looking for ALL preterites; such as 'become',
so your 'ed' trap won't catch that.
I am wondering about using a listField of all the preterites that I am looking
for.
if you do that then just make the repeat loop as follows:
repeat for each item w in offList
put word w+1 of pText into testWord
if testWord ends with "ed" then put w & comma after outList
else if testWord is among the words of fld "preteritesList"
then put w & comma after outList
end repeat
This will be faster if you put the preteritesList field into a variable before
the repeat loop, since it's significantly faster for the engine to access the
contents of a variable compared with the contents of a field.
Thanks for that one I've just made a fool of myself using a listField of
the verb forms and the "thing" is glacially slow.
As soon as the stack has run its course I will implement your suggestion.
Richmond.
-- Peter
Peter M. Brigham
pmb...@gmail.com
http://home.comcast.net/~pmbrig
Not sure how this will scale with large texts:
function findWere pText
-- returns a comma-delim list of all the word offsets matching "were *ed"
put wordOffsets("were", pText, true) into offList
repeat for each item w in offList
put word w+1 of pText into testWord
if testWord ends with "ed" then put w & comma after outList
end repeat
return item 1 to -1 of outList
end if
function wordOffsets str, pContainer, matchWhole
-- returns a comma-delimited list of all the wordOffsets of str in
pContainer
-- if matchWhole = true then only whole words are located
-- else will find word matches everywhere str is part of a word in
pContainer
-- note that in LC words will include adjacent puncutation,
-- so using matchWhole = true may exclude too many "words"
-- duplicates are stripped out
-- eg wordOffsets("co","the common coconut") = 2,3 not 2,3,3
-- note: to get the last wordOffset of a string in a container (often
useful)
-- use "item -1 of wordOffsets(...)"
-- by Peter M. Brigham, pmb...@gmail.com — freeware
-- requires offsets()
if matchWhole = empty then put false into matchWhole
put offsets(str,pContainer) into offList
if offList = 0 then return 0
repeat for each item i in offList
put the number of words of (char 1 to i of pContainer) into wdNbr
if matchWhole then
if word wdNbr of pContainer <> str then next repeat
end if
put 1 into A[wdNbr]
-- using an array avoids duplicates
end repeat
put the keys of A into wordList
sort lines of wordList ascending numeric
replace cr with comma in wordList
return wordList
end wordOffsets
function offsets str, pContainer
-- returns a comma-delimited list of all the offsets of str in pContainer
-- returns 0 if not found
-- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
-- ie, overlapping offsets are not counted
-- note: to get the last occurrence of a string in a container (often
useful)
-- use "item -1 of offsets(...)"
-- by Peter M. Brigham, pmb...@gmail.com — freeware
if str is not in pContainer then return 0
put 0 into startPoint
repeat
put offset(str,pContainer,startPoint) into thisOffset
if thisOffset = 0 then exit repeat
add thisOffset to startPoint
put startPoint & comma after offsetList
add length(str)-1 to startPoint
end repeat
return item 1 to -1 of offsetList -- delete trailing comma
end offsets
P.S. I love Jane Austen. One of my favorite books of all time is "Pride and
Prejudice." It's so beautifully constructed.
Glad to hear that another programmer doesn't spend all their time in front of a
computer screen!
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode