Re: FILE I/O Performance

Uwe Frenz Thu, 27 May 2004 02:45:34 -0700

George,

you asked on Wed, 26 May 2004 11:36:29 -0400:

I am still working on our new file i/o library and I have a performance/reliability question. The specification of the file format requires the data be sequential in the file... so if I want to add data to a table at the beginning of the file I am required to shift all of the following tables to make room for the new data. Therefore, if a program was writing to two tables in the file in a loop, there would be a lot of data shuffling... and it would get worse the longer the loop ran. Does anyone have any experience with this sort of thing? Am I worried about something that I wouldn't even be able to notice in the end? Our files routinely reach several hundred meg... and gigabyte files are not unheard of... what would be the performance hit on trying to insert data at the beginning of a several hundred meg file?

Some remarks: 1. At the moment, AFAIK, LabVIEW can access files just to a limit of ~2GB. The file pointer is an I32 and just the positive range can be used as real offset into the file. As most of the current OSs do not support >2 GB of continuous memory adress range and of file size, this is not really a severe limitation of LV. Maybe LV8 on Linux64 will help, but... ;-)) 2. I finished a large project last year where I had to handle different types of data in different sizes of up to some 10th of MB each (96 ch ECG data with 12 bit and 2kS/s). At the moment I have no access to the source files, but let me describe what I remember: My surely incomplete and somehow limited) solution was to use two arrays of strings. The first arry was defined as 0x20 strings of -say- 0x20 chars length each, that contained an implemented data type description in form of a label, say '$$Patient name', '$$rawECG' or '$$DAQ context', '$$processed ECG' or '$$logbook' and so on. This names implied a fixed data type. The second array contained also 0x20 strings, each of them contained flattened data of the implied type as given by the name string at the same index in the first array. I used 0x20 as array size and a fixed name size ('empty' name strings contained something like '$$empty ') in order to get a fixed offset of the 2nd array. But this can also be dynamically handled. This way I could open the file, load all required data as strings, cast 'em to the apropriate data type and let LabVIEW handle the insertions etc. in memory. Some of these data types had been tranparently compressed / decompressed when writing / reading 'em from file, which reduced in my case the file size by about 50%. I do not remember if I did a complete copy of the file at every write to the file or just if the size of one of the strings in the 2nd array was enlarged during operation and this type was not the last in the list. 3. Your solution of temp files looks quite similar, but I'd prefer keeping those data in memory (let the OS swap if required) and let the customer decide if he'd like to add some xtra MB or GB of RAM ;-)

I am not really sure but vaguely remember an optimization attempt LV does in string handling: It would, AFAIK, keep large strings not necessarily as flat strings, but can keep 'em also as linked lists or so in order to avaoid large data copying. It 'flattens' those lists just when needed, for example when writing those strings to files. So LV would do all required data shuffling for you and would do it probably better than what you could achieve. You just have to copy all to your file when finishing your app.

but are there issues in labview with having many files open at the same time? Are there other disadvantages with this approach I have not thought of?

I am not aware of any limits others than that already described. Those ancient 'FILES=xx' parameters from the DOS and DOS-based Win OS'es are long obsolete.

Greetings from Germany!
--
Uwe Frenz


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Uwe Frenz
Entwicklung
getemed Medizin- und Informationtechnik AG
Oderstr. 59
D-14513 Teltow

Tel.  +49 3328 39 42 0
Fax   +49 3328 39 42 99
[EMAIL PROTECTED]
WWW.Getemed.de

Re: FILE I/O Performance

Reply via email to