I once helped someone in the forum In 2009 someone at the forum wanted to index Wikipedia for "title". It took him 3 days to complete an indexing operation on a 23 GB xml file. http://forums.livecode.com/phpBB2/viewtopic.php?f=9&t=3690
after some optimizations it was down to 30 minutes http://forums.livecode.com/phpBB2/viewtopic.php?f=9&t=3728 When Roland started this thread I "unearthed" the old code, adapted it (mostly replacing char by byte) and tweaked it a bit. I now use a subset of the german Wikipedia. The file is 1.74 GB contains 395,621 times <Title>text of title</Title>. I extract those "Titles" and the byte where the "record" starts in the original file. I write that information to a file which is 23 MB large. I get a throughput of roughly 20,000 "records" or "hits" per second. It takes 20 seconds to gather all 395,621 records including writing out to the index file. I am using a SSD. As Richard says this needs a little tweaking. I found that in LC8 RC1 roughly 80,000 bytes per file access give best performance on my system a MacbookPro mid 2010. In LC 6 it is about 1 Mb per file access. (LC 6.7.10 is twice as fast, whereas LC 7.1.3 is about 30% slower) And every 1000 records when writing data out I throw in a "wait 0 milliseconds with messages" I can even type in a field without problem while indexing is running. This all is done using "binary read", simple "read" more than doubles the time needed. Of course this depends on your data if binary read is ok for you. So definitely one can process huge data files in LC without problem if one adapts the code to the problem. Doing this I discovered that LC 8 does not return "EOF" in the result when attempting to read past the end of the file. I reported the bug http://quality.livecode.com/show_bug.cgi?id=17413 reported 2016-04-15 09:35 BST merged 2016-04-15 11:38 BST This must be one of the fastest bug-fixes on record, 2 hours from reporting to "awaiting merge". Hats off to Mark Waddingham and the team. It will be fixed in LC 8 RC2 Kind regards Bernd -- View this message in context: http://runtime-revolution.278305.n4.nabble.com/LC7-and-8-Non-responsive-processing-large-text-files-tp4703419p4703566.html Sent from the Revolution - User mailing list archive at Nabble.com. _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode