On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
I've already shovelled Ruyton of the Eleven Towns quite effectively:

https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0

No tokenising, in fact very basic stuff indeed.

Not wishing to bang on about over-complcating things . . . . .

There is actually a 'correct' more shovelistic approach (at least I *think* this is correct):

-- Ensure all punctuation is surrounded by space
repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€" & quote
  replace tPuncChar with space & tPuncChar & space in tText
end repeat

-- Ensure all whitespace is space
replace return with space in tText
replace tab with space in tText

-- Ensure there is never two spaces next to each other in tText
repeat while tText contains "  "
  replace "  " with " " in tText
end repeat

-- Ensure there is only ever one space between words in phrases
repeat while tPhrases contains "  "
  replace "  " with " " in tPhrases
end repeat

-- We can now use an itemDelimiter of space
set the itemDelimiter to space

-- Sort the phrases by descending word length.
sort lines of tPhrases descending numeric by the number of items in each

-- Now check for, and remove each phrase from the source text in turn
set the wholeMatches to true
repeat for each line tPhrase in tPhrases
  -- If the phrase is not present then skip to the next
  if itemOffset(tPhrase, tText) is 0 then
    next repeat
  end if

  -- Accumulate the phrase on the output list
  put tPhrase & return after tFoundPhrases

-- Remove the phrase from the input text (we assume here that * does not appear in any phrase)
  replace tPhrase with "*" in tText
end repeat

Warmest Regards,

Mark.

P.S. The above will be reasonable quick for small sets of phrases / small source texts - but I think as the size of either increases it will get very slow, very quickly!

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to