David Epstein wrote:

> Richard Gaskin asks “Why?”
>
> I have developed a set of routines to analyze tabular data.  For KB
> or MB-sized files, it is convenient to display them in a field.  It
> would be simplest if I could also load GB-sized files and use my
> routines unchanged, but I accept that this is impractical.  But in
> order to design workarounds I’d like to get as much clarity as
> possible on what limits I am working around.

Do you read the text when it's measured in megabytes?

R and other data processing tools encourage habits of displaying results, but rarely the data set as a whole. Of course I haven't seen what you're working on, and I've had my own moments now and then when just randomly scanning large data sets has yielded "a ha!" insights, so I can appreciate the desire for your work with Cornell.

One option to consider, if practical for your needs, is that a one-time change to work with the data in a variable for all data regardless of size would at least obviate the need for special-casing data sets of specific size.


As for field limits, I believe Jacque summarized them well:

- Per line: 64k chars per line when rendered without text wrap
  (rendering limit only; field text still addressable, and everything
   works swimmingly in a var)

- Total - Logical: 4GB (32-bit ints used for allocation)

- Total - Practical: a mix of: available addressable space on the current system in its current state, likely at times requiring much more than the size of the data on disk given the iterative allocation calls to move the I/O buffer into the variable space, mitigated by any limitations imposed by the host OS's allocation routines provided for contiguous blocks (Mark Waddingham has cited in this many times how Win32 APIs have some limits on contiguous allocation far below the logical 4GB threshold).

- Total - Anecdotal: I use the Gutenberg KJV Bible file frequently for stress testing text routines, but while we think of the Bible as a large text it weighs in at just 4.5 MB. In rarer cases where I've needed to probe for outliers I've created test sets above 100 MB without issue, but begin to see major slowdowns long before that is line-wrapping calculations are needed, and further above ~100 MB significant slowdowns for display, scrolling, and save operations.

--
 Richard Gaskin
 Fourth World Systems
 Software Design and Development for the Desktop, Mobile, and the Web
 ____________________________________________________________________
 ambassa...@fourthworld.com                http://www.FourthWorld.com

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to