On 07/11/2016 15:02, Richard Gaskin wrote:
Ben Rubinstein wrote:

(Re Mike and Mark's comments, if it's a small thing I'll use an
array; but for large quantities of data - I'm often dealing with
very large files, and after calling this function will loop over
tens or hundreds of thousands of rows using the variables - I feel
the need for speed outweighs the simplicity.)

Indeed, contrary to popular belief I've seen cases where certain aggregate
operations on an array take more time than achieving the same outcomes with
delimited lists.

But so far only a few.

How did you benchmark that, and what was the measured difference?

I'll confess: no benchmarking has taken place, just intuition.

I love arrays - coming from a HyperCard world in which I was similarly doing large amounts of processing over files, the ability to use hashed arrays when I discovered MetaCard made an enormous difference. I was and am blown away by the speed of access they allow.

The context in which I'm typically using this 'makeAccessVars' functionality is where the code loads a massive TSV file and then iterates through the rows doing various processing on the data. I don't want to hard-code the columns in which the data will be, because very occasionally that may change, and it's too easy to have a subtle bug here.

So the typical routine is something like

        do makeAccessVars("vi", line 1 of tTSVdata)
        delete line 1 of tTSVdata
        repeat for each line tRec in tTSVdata
                doSomething item viUserID of tRec, item viUserName of tRec
                ...
        end repeat


If I'm doing something that _isn't_ going to repeat a vast number of times, I often use a variation more along these lines:

        put line 1 of tTSVdata into tColumnNames
        delete line 1 of tTSVdata
        repeat for each line tRec in tTSVdata
                put explodeRow(tRec, tColumnNames) into aData
                doSomething aData["User ID"], aData["User Name"]
                ...
        end repeat

where 'explodeRow' does the obvious thing to construct an array containing the data from the row, each value indexed by the name of the column in which it appeared. I prefer that style, as it makes the "doSomething" part of the code - which is generally the most interesting bit, and therefore the one that needs easiest to understand - clearer. Obviously it must be slower though: but I admit I've never done the experiment to find out how significant the difference is.

(Actually I'd probably get better performance in the latter case, and further enhance readability, if I combined the two approaches, i.e modify 'makeAccessVars' to that instead of returning a string which when passed to 'do' declares variables named for each column and assigns indices to them, it's called for each row to assign the actual values to the variables.I'm not sure why I don't do this.)

Ben

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to