On 2015-12-09 22:27, dunb...@aol.com wrote:
The "word" chunk has always been loosely implemented, in that it can
hardly be considered a truly delimited chunk at all, like an item or a
line is. Even though it seems to have been included in the "family" of
chunks.

The "word" chunk is not loosely implemented - it does precisely what it is meant to do.

The 'item' and 'line' chunks can be classified as 'delimited' chunks in the sense that items in the string are divided up into an alternating sequence of part and delimiter:

a,b,c => [a] [,] [b] [,] [c]

In the case of 'word', though, what the 'parts' are and the 'delimiters' are is slightly more complex:

the quick    "brown fox" => [the] [ ] [quick] [    ] ["brown fox"]

Indeed, 'word' is perhaps what you might call a tokenized chunk, in that it is essentially matching parts and delimiters to distinct patterns:

delimiter pattern is matched to the regular expression [ \n\t]+

part pattern is matched to the regular expression (\"[^\"]*\"|[^ \n\t\"]+)

So, the 'inobviousness' I referred to is just the fact that there is not a single choice for what the delimiter should be when 'recombining' the string.

The 'sort' command can actually be broken down into more fundamental operations:

sort chunks of tVariable by func(each)

Is actually:

-- Create a sequence of parts to be sorted
split tVariable by chunk

-- Create a ordered list of indicies from 1 up to the number of parts
repeat with i = 1 to the number of elements in tVariable
  put i into tIndexes[i]
end repeat

-- Sort the indicies by comparing the relevant parts from the original variable -- (note that this is a notional command - you can't currently sort an indexed array) sort tIndexes by func(tVariable[each]) -- this notionally sorts the elements of tIndexes

-- Create a sorted sequence of parts based on the constructed indicies
repeat with i = 1 to the number of elements in tVariable
  put tVariable[tIndexes[i]] into tSortedVariable[i]
end repeat

-- Reconstruct a sorted string using the original 'delimiter'
combine tSortedVariable by chunk

Here you see that there are two operations which are obviously defined for item and line, but not so obviously defined for word - as there is not a unique choice for the delimiter when doing the final combine.

Of course, as Geoff correctly pointed out space works as a suitable choice for the delimiter in the final step in the case of word although you could choose tab, return, or any combination of them (indeed anything matching the regular expression [\n\t ]+.

In the more general case, the key thing is that you have to choose how to recombine the output string for the sort such that you have the following invariants:

1) the number of chunks in tSortedVariable is the number of chunks in tVariable

2) for each tChunk in tVariable, the number of occurrences of tChunk in tVariable is the number of occurrences of tChunk in tSorted Variable

Here (1) expresses the fact that you must have same number of chunks before and after. (2) expresses the fact that every part before must be present after, in exactly the same quantity.

With this in mind, you can then ask the question for any chunk (however it is defined) whether it is 'sortable' - a chunk is sortable if there exists a choice for delimiter which means (1) and (2) hold.

This is definitely true for item and line (as there is only one choice of delimiter). It is true of word if you choose space (or, indeed, any pattern matching [\n\t ]+). It is true of character, codepoint and codeunit. It is true of trueWord if you choose space. It is not in any way obvious for sentence chunks - the rules that define those are not simple - and I'm not sure there exists a choice of delimiter which would not (at least in some cases) change the set of parts you get in a recombined output string (i.e. you can probably construct examples where invariant (2) is broken).

Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to