It does Jeff - thanks lots of detail there to translate into good 'ol code :)
On 12 January 2011 17:55, Jeff Massung <mass...@gmail.com> wrote: > On Wed, Jan 12, 2011 at 4:37 AM, David Bovill <da...@vaudevillecourt.tv > >wrote: > > > If it quacks like a duck it is a duck. > > > > So I have some data in a variable that I want to display. I can use is an > > array/number/date - but for other types of data I'm wandering... xml > should > > be easy, but harder would be to distinguish long text files from binary. > > Any > > ideas for hacks to distinguish: > > > > 1. images > > 2. sounds > > 3. video > > 4. binary blob > > 5. text > > 6. rtftext > > 7. utf8 > > > > > This is a pretty solved problem (except for the "array" part, which is a > LC-specific data type/format). Wish I had some references for you at the > moment, but here's some things to keep in mind: > > - First, use your OS when possible. Images, sounds, video, and often text > is > already done for you via registry on Windows or the 4-byte code on Mac > (i.e. > 'TEXT'). > > - Next, determine text vs. binary. This is usually done by just grabbing > the > first N (where N is ~1000) bytes and look for any that are < 10 or > 127. > If > you find any, it's binary - or unicode. > > - Binary starts the look at image vs. video vs. unicode. Image and video > are > pretty simple. You don't need to understand every form of image or video, > just a handful that will hit 99% of all images/videos out there. And they > all - very politely - have a nice header you can examine. For example, > looking at PNG: > > http://en.wikipedia.org/wiki/Portable_Network_Graphics#File_header > > From there, you can see that the first 4 bytes of a PNG file are 0x89 0x50 > 0x4E and 0x47 (where 50, 4E, and 47 are actually the ASCII letters 'PNG'). > Almost every single image and video format you'll care about will have > something very similar you can use. This is a great site you can reference: > > http://www.wotsit.org/ > > If you don't find a header that you understand, then you are looking at > either a straight binary lump/blob or multi-byte text file (unicode). > Remember that while UTF8 is not ASCII, it's designed to be > indistinguishable > from ASCII most of the time. I don't have any advice to give you here on > how > to determine if the file is unicode text or not... as I understand it this > is really a difficult problem to solve. I'm sure Google can help, though. > ;-) > > - At this point you've determined that the file is "text" in nature and you > are trying to specifically figure out if it's RTF, XML, INI, whatever. This > gets a little more tricky, as often times people skip what optional headers > could be there (e.g. <?xml ...?>, <!DOCTYPE ...>, ...) and you are left > with > either taking your best guess or going off the file extension. > > - RTF - I don't believe - has an actual "header" that lets you know it is > an > RTF file. Instead, just scan it and look for "{\" in the file followed by > some known RTF "tags". > > - XML/HTML/*ML, is a matter of scanning for some known tags (like <BODY>, > <HTML>) you know should be there near the top or - in the case of XML - > checking for namespaces in the tag names. > > Hope this helps! > > Jeff M. > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode