Once again I hope the algorithm experts can speed of the following... currently on just 50,000 points it takes 10 seconds ...as I indicate below I know why so this is why I am looking for better solutions
Key points are: - datasets are a minimun of 10,000 lines - datasets are always a multiple of 1,000 - datasets have two columns of data which are separated by a tab Goal: generate a new dataset that ALWAYS reduces down to 2,000 lines based on the max and min of groups of the original dataset So in the case of the minimum dataset you would divide 10,000 by 1,000 which gives the grouping size of 10 You then need to take original dataset from the start to finish in groups of 10. - find the min of the group of 10 - find the max of the group of 10 - put them in their original order found in the original dataset (the min value might be line 8 and the max value might be line 3 so in building the new dataset you would have the max value first and then the min value... this can change for each group of 10) - column 1 is linear ascending so it can be used to sort the new dataset caveats: -column 1 is a needed value but it is NOT the column that the max and min values are being looked at... these are in column 2 - once the established order of the two values are determined then their line values form the ne dataset Example: 1 23 2 12 3 9 4 77 5 2 6 13 7 44 8 83 9 2 10 37 In this example the result would be **Note if one or more values are the min or max the the first value found can be used 5 2 8 83 What I have tried: the original dataset is in a fld mydata the column 2 of fld mydata is in a fld mydataNew problem I try to solve is matching values sequentially always from the start of the data and too avoid getting earlier data I add an "x" after values in the second column to allow wholematches to work this is where I am losing my time (I believe) ==================== *on* mouseUp *put* the seconds into startTime *put* the number of lines of fld mydataNew into tcount *put* fld mydataNew into gbmem *put* tcount / 1000 into tgroup *put* 0 into x *put* 1 into y *repeat* for 1000 times *put* empty into tempMM *put* empty into minLine *put* empty into maxLine *put* x + y into startline *put* tgroup + startline into x *put* x into endline *put* line startline to endline of gbmem into grptest *replace* cr with comma in grptest *set* the wholematches to true *set* itemdel to tab *put* min(grptest) into mymin *put* the lineOffSet(mymin,gbmem) into minLine *put* max(grptest) into mymax *put* the lineOffSet(mymax,gbmem) into maxLine *put* minLine & tab & mymin & cr after tempMM *put* maxLine & tab & mymax & cr after tempMM *put* tempMM & cr after finalMM *repeat* with w = startline to endline *if* w <= tcount *then* *put* "x" after line w in gbmem *else* *end* *if* *end* *repeat* *end* *repeat* *sort* finalMM numeric *filter* finalMM without empty *put* cr & line (fld gbcurrent) of fld mydata after finalMM *put* finalMM into fld finalPointsNew *put* the seconds into endTime *put* endTime - startTime *end* mouseUp I believe that this should be doable less than 1 second and consistent since the repeat for loop will always be 1000 times only the group size changes thanks, Glen _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode