Re: Divide Large Data Blob?

Bob Sneidar via use-livecode Mon, 16 May 2022 15:46:36 -0700

So this has got me thinking. Apparently what I am calling Divide and Conquer is 
really called a binary sort. I have looked up on the interwebs to calculate the 
maximum number of iterations for a given number of values, but it seems that 
all the formulas offered up use functions for C. I am trying to figure out what 
a basic math formula for this is, given n values.


Bob S


> On May 16, 2022, at 15:23 , Bob Sneidar via use-livecode 
> <use-livecode@lists.runrev.com> wrote:
> 
> A maximum of 7 recursions are necessary to isolate a single instance of 100 
> possible values. 1000 requires a maximum of 10. 10000 values requires 14. The 
> idea is that for every factor of 10, you need roughly 3 more recursions. This 
> of course assumes the data is sorted, which in your case is sorted into 3 
> containers. If you know the limits of how many lines can be garbage, and how 
> many can be valid data, you narrow your scope significantly. 
> 
> Livecode is pretty damn quick at parsing this kind of data. If there are 
> consistent delimiters (in this case a line break) then even 20 or 30 
> recursions is child's play. 
> 
> Bob S
> 
> 
>> On May 16, 2022, at 15:00 , Bob Sneidar via use-livecode 
>> <use-livecode@lists.runrev.com> wrote:
>> 
>> Do you know exactly which lines you need to toss, or do you need to searc 
>> the data to find out where the beginning and end of the useful data is? 
>> If the former, then just put line x to y of your data into a new variable. 
>> If the latter, then a divide and conquer approach might be the answer. Get 
>> the line 30% in, test for valid, get the line 40% in, test, then 35% then 
>> 32.5% or 37.5% depending on your test. 
>> 
>> You may only have to do this a dozen or so times to find the exact line 
>> where your valid data begins. 
>> 
>> The other way of course is to get it all into a SQL database (how did you 
>> all know I was going to say that??) The downside is that you have to iterate 
>> through all your data once. The upside is a good one liner query statement 
>> may be all you need to process your data. And if you need to make multiple 
>> passes at your data, all the better. 
>> 
>> Bob S
>> 
>>> On May 16, 2022, at 10:46 , Rick Harrison via use-livecode 
>>> <use-livecode@lists.runrev.com> wrote:
>>> 
>>> I have a large chunk of data that I want to
>>> search as quickly as possible.  
>>> 
>>> Unfortunately the part I want to search is the 
>>> middle third of the data.  The other thirds at 
>>> the beginning and at the end are just junk and 
>>> slow down my search so I want to get rid of them.
>>> 
>>> I don’t want to search line by line as that
>>> takes way too long.
>>> 
>>> There’s no unique character dividing any
>>> of these data regions.
>>> 
>>> What’s the best way to do this?
>>> 
>>> Thanks in advance!
>>> 
>>> Rick
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your 
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Divide Large Data Blob?

Reply via email to