Re: randomly order a list

Alex Tweedly Wed, 05 Jun 2013 15:14:38 -0700

Your code has a minor bug :-)

You "get MD5Digest(S[1]) "
instead of using   S[i]


Here's the code I used (now extended to check different lengths).

constant K=100000    -- the number of iterations
constant KLength=20   -- this * 36 --> the number of chars per line
on mouseup
   put empty into msg

   put the millisecs into t1
   repeat K times
      put random(999999999) into j3
   end repeat
   put "random" && the millisecs - t1 &CR after msg

   put the millisecs into t1
   repeat K times
      put random(999999999) & random(999999999) into j3
   end repeat
   put "random&random" && the millisecs - t1 &CR after msg

   put empty into L1
   put "abcdefghijklmnopqrstuvwxyz0123456789" after L1
   put the millisecs into t1
   repeat K times
      put md5digest(L1) into j1
   end repeat

put "md5" && the number of chars in L1 && the millisecs - t1 &CRafter msg


   put empty into L1
   repeat kLength times
      put "abcdefghijklmnopqrstuvwxyz0123456789" after L1
   end repeat
   put the millisecs into t1
   repeat K times
      put md5digest(L1) into j1
   end repeat

put "md5" && the number of chars in L1 && the millisecs - t1 &CRafter msg


end mouseup


and it gives something like

(700 chars)
random 20
random&random 157
md5 36 42
md5 720 235

(1440 chars)
random 17
random&random 161
md5 36 41
md5 1440 431

-- Alex.


On 05/06/2013 18:38, Geoff Canyon wrote:

What code were you using Alex? I thought the first step(s) of the MD5
process reduce (or grow) whatever input string is given to 128 bits, and
then everything from there operates on the 128 bit data. Likewise for SHA1,
in 160 bits.

In other words, the size of the individual strings should have a limited
impact on the MD5 algorithm. For example, the two times returned by this
are nearly identical:

on mouseUp
    repeat 5
       put random(999999999) after S[1]
    end repeat
    repeat 5000
       put random(999999999) after S[2]
    end repeat
    repeat with i = 1 to 2
       put ticks() into T
       repeat 1000000
          get MD5digest(S[1])
       end repeat
       put i && ticks() - T & cr after R
    end repeat
    put R
end mouseUp



On Tue, Jun 4, 2013 at 6:26 PM, Alex Tweedly <a...@tweedly.net> wrote:

No comments on the "collision-or-not-ness", but some concerns about
performance.

The performance of "random() & random()" is conveniently data-independent,
but that for md5digest() is not. With nice short lines, it is indeed faster
than the random&random version, but as the line size increases, so does the
time taken by all of the digest methods. I didn't test it thoroughly, but
the swap-over point is fairly low - somewhere around 500 chars per line.

-- Alex.


On 04/06/2013 18:51, Geoff Canyon wrote:

At the risk of beating the decaying equus -- the previously suggested
random() solutions should be fine for all purposes --I found an
alternative
that:

1. Is faster than sorting by random(999999999) & random(999999999)
2. Is about as fast as sorting by random(999999999)
3. Is (I think) less likely to have duplicate sort keys

The drawback is that it is determinative (albeit random) for any given set
of data, unless you are willing to accept performance equivalent to
sorting
by random(999999999) & random(999999999), while providing near-certainty
of
a true sort (I think).

The one-time, as fast as any solution so far, sort is:

        sort lines of myVar by md5digest(each)

Collisions are highly unlikely in 128 bits. Even random(999999999) &
random(999999999) only provides about 60 bits, which, to be clear, is
*more* than enough, but md5 is (I think) even more certain, and faster.
However, it will always produce the same results.

        sort lines of myVar by sha1digest(each)

Works roughly the same: 160 bits of guaranteed-no-collision-ness, but it's
a little slower, although still much faster than random(999999999) &
random(999999999). Like MD5, it will always sort the same data the same
(random) way.

The same-ness for either solution can (I think) be fixed by this:

        put ticks() into T
        sort lines of myVar by md5digest(T & each)

or

        put ticks() into T
        sort lines of myVar by sha1digest(T & each)

That should result in random results each time, and is a little faster
(MD5) or about 1/3 slower (SHA1) than random(999999999) &
random(999999999)

If anyone has thoughts on the collision-or-not-ness of MD5 or SHA1, feel
free to comment. Otherwise, I hope I'm done now ;-)
______________________________**_________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>


______________________________**_________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: randomly order a list

Reply via email to