Hello, Peter,

On 02/07/2012, at 11:02 AM, Peter Haworth wrote:

> Probelm is I need to maintain uniqueness acorss two versions of the same
> stack file.  For example if both versions have a stack named "myStack" but
> then its name gets changed to "YourStack" in one of the versions, it's no
> longer identifiable as the same stack as "MyStack".
> 
> I have some ideas as to how to deal with this but wanted to check if anyone
> had come up with a generic solution to this problem.

I believe that this is an issue that all version control software (vcs) has to 
deal with, and the possible solutions and approaches are quite well-documented 
in various open-source groups. The main question is: if I have file "A", and 
change its name to "B", should the software consider it an entirely new file, 
or should it be able, somehow, to identify it as the old one, but with a name 
change? This problem happens not only with file names, but in fact, with any 
file metadata - for instance: what happens if I change the file access 
permissions? What happens if my system changed the 'modified date' for the file?

It seems to me, that because of this, the new trend is for the VCS to store the 
file *data* (= contents) separately from the *metadata* (= name, dates and 
permissions). So, the VCS internally may have a table named 'file_info', where 
each record is the metadata for a certain file. Then, there would be a second 
table, 'file_content', where each record would be the actual file dump/data. 
Each file_info relates to a single file_content. But the advantage of this 
design, is that a file_content may actually be connected to several file_infos.

For instance: in our example, if you create file "A", and store it in the 
system, it stores the metadata into a file_info, and the contents as a 
'file_content' record. If tomorrow I rename my file to "B", the system will 
recognise that the contents are the same, so it will create another 'file_info' 
record, but it will point it to the same 'file_content', rather than storing a 
duplicate of that.

I believe that VCS like GIT use fast hashing functions - like MD5 or SHA1 - to 
store and compare contents (ie., the 'ID' of each file_content record is 
actually the hash), which makes it a very fast algorithm when comparing the 
existing contents with new: you don't have to actually compare the contents of 
a file, you simply find the hash for the file, and see if you already have 
anything with the same hash in your file_contents table.

I hope this explanation helps a little.

Kindest regards,

--
Igor Couto
Sydney, Australia


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to