Of course I couldn't resist a tinker. I too am into text manipulation/searching
and wondered how I would go about this.
I looked at the repeat loops and realised they would run much faster if they
were inverted as I am sure the list of verbs would be less than the lines of
text being searched.
I also wanted to use a "repeat for each" construct as this is usually orders of
magnitude faster.
But this meant I needed the line count and adding a counter seemed counter
productive.
So I settled on using the lineoffset.
Here was my go...
on mouseUp
put empty into fld "COOKED"
put empty into fld "STARTT"
put empty into fld "STOPT"
put empty into lCooked1
put "started : " & the long time into fld "STARTT"
put the milliseconds into st
put fld "TEKST" into TEKST
put fld "WERBS" into WERBS
put 0 into acounter
put the number of lines of TEKST into numlines
repeat for each line KWERBS in WERBS
put "was " & KWERBS into FRAZE
put "were " & KWERBS into FRAZE2
put 0 into loffesta
put 0 into loffestb
put 1 into lcounta
put 1 into lcountb
repeat while lcounta <> 0
put lineoffset(FRAZE,TEKST,loffesta) into lcounta
if lcounta = 0 then
exit repeat
end if
put lcounta + loffesta into thelinea
put thelinea & " : " & line thelinea of TEKST & cr after lCooked1
put lcounta into loffesta
end repeat
repeat while lcountb <> 0
put lineoffset(FRAZE2,TEKST,loffestb) into lcountb
if lcountb = 0 then
exit repeat
end if
put lcountb + loffestb into thelineb
put thelineb & " : " & line thelineb of TEKST & cr after lCooked1
put lcountb into loffestb
end repeat
end repeat
put the number of lines of lCooked1 & " found"
put lcooked1 into fld "Cooked"
put "finished : " & the long time into fld "STOPT"
put the milliseconds into nd
put nd - st into fld "TIMET"
end mouseUp
I haven't tried returning to the original repeat order to see if this was
faster but running the above on Richmond's sample stack for the "WAS/WERE" case
delivered a result of three lines..
2663 : officers, who in comparison with the stranger, were become "stupid,
731 : was returned in due form. Miss Bennet's pleasing manners grew on the
4116 : were returned, and to lament over his absence from the Netherfield ball.
in 89 msec on my Mac running LC7.1Dp1
I was then going to examine colourising the found chunks when I realised that
the supplied text had line breaks within each paragraph.
This means none of the proposed solutions (including Richmond's own) will find
the desired phrase if it falls across one of these line breaks.
For my solution using lineoffset this is a dead end WHILE these line breaks
within a paragraph remain.
For the other solutions a simple expedient is to increase the number of FRAZEs
to four...
put "was " & KWERBS into FRAZE
put "was" & cr & KWERBS into FRAZE2
put "were " & KWERBS into FRAZE3
put "were" & cr & KWERBS into FRAZE4
This addition makes the extra FRAZES two "lines" and thus non valid arguments
for a lineoffset function.
or so I thought.
However given the unpredictability of the formatting of the text this was a
much too simplistic solution.
This solution breaks down where paragraphs are indented using spaces!
So, to keep the formatting as read in is problematic without knowing the
formatting used.
But if the focus is the actual text, then perhaps the "fancy" formatting is not
important.
Processing the text BEFORE searching so as to remove embedded line breaks and
space padding allows my original code to work fine.
inserting the following before the REPEATS does the trick (at least with the
example text
replace return with "^&*" in TEKST
put "\s+" into lmultispace
put replacetext (TEKST,lmultispace," ") into TEKST
replace "^&*^&*" with return in TEKST
replace "^&*" with " " in TEKST
replace return with return & return in TEKST
The only downside being the time to execute went from 89 msec to 616 msec.
you mileage may vary.
NOTE: My method does not identify multiple instances of the FRAZE within a
single line, however once it is found in a line it would be simple to see if it
occurred again.
Thanks for the diversion Richmond.
James
_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode