On Thu, Dec 10, 2015 at 4:38 PM, Mark Waddingham <m...@livecode.com> wrote:
> > The "word" chunk is not loosely implemented - it does precisely what it is > meant to do. > > Which of course is the reason why the sort container command has no problem if you sort by word on the right side of the equation - 'by word x of each' > > Here you see that there are two operations which are obviously defined for > item and line, but not so obviously defined for word - as there is not a > unique choice for the delimiter when doing the final combine. > Well LC's definition of what a word is isn't exactly universally accepted, but once you understand it, it's extremely powerful and saves a huge amount of effort. As LC has it's own definition of what a word is then surely it could define exactly how it's going to output the final combine. More on this below. > > In the more general case, the key thing is that you have to choose how to > recombine the output string for the sort such that you have the following > invariants: > > 1) the number of chunks in tSortedVariable is the number of chunks in > tVariable > > This is not quite true is it: put "the,quick,brown," into tVar put the number of items in tVar into msg -- 3 sort items of tVar put the number of items in tVar into msg -- 4 Now I don't wish to discuss why this is, I understand why it is, I'm OK as to why it is. As with LC's definition of what a word is, when you understand what is happening under the hood you can work around it or work it to your advantage. Of note in the above, the number of chars has remained the same. > With this in mind, you can then ask the question for any chunk (however it > is defined) whether it is 'sortable' - a chunk is sortable if there exists > a choice for delimiter which means (1) and (2) hold. > > This is definitely true for item and line (as there is only one choice of > delimiter). It is true of word if you choose space (or, indeed, any pattern > matching [\n\t ]+). It is true of character... and I'm not sure there > exists a choice of [word] delimiter which would not (at least in some > cases) change the set of parts you get in a recombined output string (i.e. > you can probably construct examples where invariant (2) is broken). > > As with the item example above it's possible to 'confuse' the invariant rule so the fact that LC may sort words and output it in a manner some people don't agree with is irrelevant. IMO sort by word would work like this: the quick "brown fox" jumped over --contains spaces and tabs the[space]quick[tab]"brown[tab]fox"[space][tab]jumped[tab]over[tab] when sorted would come out like this: jumped over the quick "brown fox" [tab][tab]jumped[tab]over[tab]the[space]quick[space]"brown[tab]fox" In this case LC would follow the universally accepted sort order of tabs precede spaces. It keeps the number of chars exactly the same, just as LC already does. It sorts empties (although there really isn't such a thing as an empty word) to the beginning and then proceeds to hand out the rest of the delimiters singularly between each word and so there are tabs between some words, just as there was in the original, and spaces between others. No trailing delimiters, as is the current case for line/items. Now if you don't agree, and think it should come out some other way, that's OK, all that matters is whatever the output, it is consistent and published. LC could convert all tabs to spaces, it could remove all instances of multiple whitespace and replace it with a single space, I don't care, just as long as whatever it does is consistent and published. Just as some people don't think "New York" is one word, LC does, it's published that quoted phrases are counted as one word, and that's a very powerful thing. So again, as LC can already sort words on the right side of the equation - sort xxxxx of tVar by word y of each, it's seems only a minor step to make it possible on the left side of the equation - sort words of tVar. Obviously the sorting mechanism is in place it's just the actual presentation that needs a little thought - surely not that hard. Here's a script for displaying how LC already does word sorting: (watch for line wraps) on mouseUp put "the,quick,brown,fox," & space & ",jumped,over,the," & tab & ",lazy,dog," into tVar put "Chars = " & the number of chars of tVar & ", Items = " & the number of items of tVar & cr into msg sort items of tVar put "Chars = " & the number of chars of tVar & ", Items = " & the number of items of tVar & cr after msg put tVar & cr after msg answer "Ready for Part 2?" put "9 the quick brown fox jumped over the lazy dog" into line 1 of tVar --spaces put "8" & tab & "the" & tab & "quick" & tab & "brown" & tab & "fox" & tab & "jumped" & tab & "over" & tab & "the" & tab & "lazy" & tab & "dog" into line 2 of tVar --tabs put "7 the" & tab & "quick brown" & tab & "fox jumped" & tab & "over the" & tab & "lazy dog" into line 3 of tVar --spaces and tabs put "6 the" & tab & quote & "slick brown" & tab & "fox" & quote & " jumped" & tab & "over the" & tab & "lazy dog" into line 4 of tVar --quotes, tabs, spaces put "5 the" & tab & quote & "quick brown" & tab & "fox" & quote & " jumped" & tab & "over the" & tab & "lazy dog" into line 5 of tVar --quotes, tabs, spaces put "4 the" & tab & quote & " slick brown" & tab & "fox" & quote & " jumped" & tab & "over the" & tab & "lazy dog" into line 6 of tVar --quotes, tabs, spaces put "3 the" & tab & quote & " quick brown" & tab & "fox" & quote & " jumped" & tab & "over the" & tab & "lazy dog" into line 7 of tVar --quotes, tabs, spaces put "2 the" & tab & quote & tab & "slick brown" & tab & "fox" & quote & " jumped" & tab & "over the" & tab & "lazy dog" into line 8 of tVar --quotes, tabs, spaces put "1 the" & tab & quote & tab & "quick brown" & tab & "fox" & quote & " jumped" & tab & "over the" & tab & "lazy dog" into line 9 of tVar --quotes, tabs, spaces put tVar & cr & cr into msg put tVar into tVar2 put "This is sorted by word 3" & cr after msg sort lines of tVar by word 3 of each put tVar & cr & cr after msg put "This is sorted by char 3 of word 3" & cr after msg sort lines of tVar2 by char 3 of word 3 of each put tVar2 & cr & cr after msg end mouseUp _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode