Wow, that was a very insightful email. Thank you for writing it.

Getting back to something actionable, what do you think about the idea of
just translating the CML standard to json? Outside of some nuances, XML and
JSON generally accomplish the same thing, so I would think that the
chemical XML standard would be easily translatable to chemical JSON.


On Fri, Jun 7, 2013 at 1:45 PM, Craig James <cja...@emolecules.com> wrote:

> On Fri, Jun 7, 2013 at 10:25 AM, Patrick Fuller 
> <patrickful...@gmail.com>wrote:
>
>>  A SMILES contains exactly the same information as the atom/bond lists
>> in a much more compact form. If you want to avoid the aromaticity problem,
>> just use Kekule form, which makes it virtually identical to any other
>> connection table format, but in about 10x to 20x fewer bytes. SMILES are
>> very easy to parse, and there are dozens of parsers around.
>>
>> What I truly like about smiles is that it's human readable + hashable,
>> which I see as the real goal. The shorter length is just a corollary of
>> that. Prove me wrong, but I think people make too big a deal about size of
>> molecule formats. I just bought a 2 TB hard disk drive for $70. WIth mongo
>> db + their json serialization, I estimated that I can put 200 million
>> verbose json mof structures on that drive. I only have a few thousand, so I
>> some room to spare.
>>
> I have a database of 10 million compounds. The SDF version, even
> compressed, is difficult over the internet.  It's not about disks, it's
> about file transfers and database performance.  It's not a matter of a few
> bytes here or there (I agree that people worry about file size too much).
> It's about a factor of ten or twenty.  Connection-table lists of atoms and
> bonds are just a dumb way to represent atoms and bonds.
>
>>  This discussion has focussed on the syntax of JSON, but completely
>> overlooks the real problem with ALL chemical file formats: how do you
>> handle all of the cases where a simple connection-table ("ball and stick")
>> doesn't capture reality? Things like aromaticity, tautomers,
>> organo-metallic bonds, boron-hydrogen cages, distributed bonds (ferrocenes
>> and the like) ... these are the problems.
>>
>> The point of json (and xml) is that they are *extensible*- that's why
>> json has exploded in the developer community.
>>
> This isn't necessarily a good thing.  One of the biggest problems in
> cheminformatics and molecular modeling is that people have altered existing
> formats to suit their own needs ... and that has led to disaster.  There is
> no such thing as the "PDB format" -- rather, you mostly have to know the
> origin of a particular PDB file in order to interpret it.  Each project
> effectively has its own "PDB format."
>
> JSON may be extensible, but that is useless unless there is a widely
> recognized authority on the meaning of each extension, along with
> open-source software that illustrates a practical application of the
> standard.
>
> Never forget the old joke, "The great thing about standards is that there
> are so many to choose from!"  JSON essentially gives you a stronger rope
> when you in the process of hanging yourself.
>
>>  If you need handles for aromaticity and metallic bonding, just add new
>> properties to the json/xml. Because of the extensibility, adding new
>> properties will not break any existing code.
>>
> Then why have a standard at all? What is the use of new properties if
> nobody knows what they mean?  What happens when five projects all introduce
> their own syntax and semantics for representing aromaticity and metallic
> bonding?  Chaos.
>
>>  That's the advantage over all of the older table formats, which weren't
>> built to be extensible. And you see the repercussions in scientific code
>> all the time.
>>
> The real problem had nothing to do with being "built to be extensible,"
> but rather that the table format definitions were controlled by commercial
> companies that had no interest in data exchange or in participation by the
> chemistry community.
>
> When I created the OpenSMILES.org web page, I more-or-less did it by
> stealing the leadership from Daylight, the company that invented SMILES.  I
> invited their participation but, while they didn't object to our project,
> they also elected to stay out of it.  SMILES now has a future that's in the
> hands of the community.  If the community decides to add features, we can
> ... and we'll all be able to agree on those features.
>
> It might seem as if I'm trying to discourage JSON, but nothing could be
> farther from the truth.  A modern, object-oriented, extensible and well
> documented format is long overdue.  The CML project is one such (you might
> want to look at it for ideas), but it never got traction.  Maybe JSON, with
> its widespread use and readily-available software, is just the thing.
>
> If you really want to make JSON a standard, the JSON syntax itself is a
> trivial part of the problem. The real problem is establishing standards for
> how each datatype is to be interpreted, followed by clear, published
> standards for each datatype.  If you let people just add their own
> datatypes on an as-you-please basis, you'll just have another Tower of
> Babel ... and that's where the name OpenBabel came from in the first place.
>
> Craig
>
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to