Re: Issues with storage of data in stack

Mark Talluto via use-livecode Thu, 22 Mar 2018 16:31:49 -0700

Hi Lagi,

Sorry about the delayed reply. I have been on a long business trip. Your early 
designs are far more sophisticated than what we put together here. Super 
impressive history you have.


LiveCode really is the champion here in that we are only using arrayEncode() 
and put myArrayA into url() to store the arrays. Selecting which array cluster 
to store might be easier to understand using a video. 

http://canelasoftware.com/pub/canela/liveCode_array_clustering.mp4

Once you understand how the array is structured, I think the method will be 
clear.

We do not preallocate space. No appending. We overwrite a cluster when one or 
more records are saved to disk. The write happens at the end of the CRUD 
operation taking place. Thus, if you ‘create’ a single record, the record is 
first created in memory and then the cluster it belongs to is written to disk. 
I have toyed with the idea of making the write to disk feature controllable by 
the dev. Thus, you could define when the write is to take place. For example, 
you might like to write to disk after every 5 transactions or so. But, I have 
not found the write to affect performance in a noticeable way to need to add 
that feature.

-Multi User-
Yes, everything is processed sequentially in the cloud. There are no open 
sockets so you can have massive concurrent connections. All cloud calls are 
done via ‘post’. They are handled by PHP scripts to write the request to a 
cache area. One or more LiveCode standalones on the other end processes the 
request in the order they are received. Thus, should a process go down, no data 
is lost. When the process comes a back up, everything continues again as 
normal. Scale is handled by having more than one process be available. More 
scaling is handled by having data stored across multiple droplets/VMs 
(sharding). This can keep repeating itself as needed.

-File Size Limitations-
The OS iNode limitations are negated by not reaching its maximum limit. We 
found 40,000 files would really bring the performance down. Adding clustering 
of arrays lowers the file count to acceptable and controllable levels. 

-Test Data-
100,000 records in table
Record size average: 45 chars
Keys in each record: last_name, first_name, date_of_birth, middle_name, 
student_number, gender, grade_level, active

A cluster size, clusters per table, time to load all clusters from disk to RAM, 
time to write all clusters from RAM to disk, time to write one cluster from RAM 
to disk:
1, 16, 1.46 secs, 1.5 secs, 91.4 ms
2, 256, 1.52 secs, 1.5 secs, 6.7 ms
3, 4096, 2.38 secs, 1.6 secs, 0.8 ms

I hope this information is helpful. Please let me know if you have any other 
questions.

Best regards,

Mark Talluto
livecloud.io <http://livecloud.io/>
nursenotes.net <http://nursenotes.net/>
canelasoftware.com <http://www.canelasoftware.com/>




> On Mar 12, 2018, at 10:31 AM, Lagi Pittas <iphonel...@gmail.com> wrote:
> 
> Hi Mark,
> 
> Thanks for the detailed explanation but I have a few (ish) questions ...
> 
> Hope you don't mind me asking these questions because I did have to
> write my own access routines in those bad old days before I started on
> Clipper/Foxpro/Delphi/Btrieve  and I do enjoy learning from others on
> the list and the forums - those AHA! moments when you finally get how
> the Heapsort works the night before the exam.
> 
> Many moons ago I wrote a multi-way B-TREE based  on the explanation in
> Wirth's Book "Algorithms + Data Structures = Programs" -  in UCSD
> Pascal for the Apple 2,  I  had a 5MB hard Drives for the bigger
> companies when I was lucky, for the smaller companies I made do with 2
> 143k floppy disks and Hashing for a "large" data set- oh the memories.
> I used   the B-Trees  if the codes were alphanumeric. I also had my
> own method where I kept the index in the first X Blocks of the file
> and loaded the parts in memory as they were needed - a brain dead
> version of yours I suppose.  I think we had about 40k of free ram to
> Play with so couldn't always keep everything in RAM. I even made the
> system multi-user and ran 20 Apple ][s on a network using a
> proprietary Nestar/Zynar network using Ribbon Cables -  it worked but
> am I glad we have Ethernet!
> 
> Anyway - I digress. I can understand the general idea of what you are
> explaining but it's the low level code for writing to the
> clusters/file on disk I'm not quite sure of.
> Which way do you build your initial file? Is it "Sparse" or prebuilt,
> or does each cluster  have a "pointer" to previous or next clusters?
> Do you have records "spanning" clusters or do you leave any spare
> space in a cluster empty. Do you mark a "record" as deleted but don't
> remove the record until it's overwritten or do what Foxpro/Dbase does
> and "PACK" them with a utility routine.
> I also presume you use the "AT" option in the write command to write
> the clusters randomly since you don't wriite the whole in memory table
> 
> Which brings me onto my final questions - I presume your system is
> multi-user because you have a server program that receives calls and
> executes them sequentially? And lastly what are the file size
> limitations doing it this way - do You also virtualize the data in
> memory?
> 
> Sorry for all the question but this is the interesting stuff
> 
> Regards Lagi

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Issues with storage of data in stack

Reply via email to