On 06/07/2018 12:54 AM, Eric S. Eberhard wrote: > I know I am the oddball here but -- why use DTDs at all?
I gave reasons above. I am working on a tool. How people using the tool is not under my control. Maybe we can focus on the opportunity to improve libxml2 a bit here. > I supply software to a lot of companies (thousands through > dealers). Many exchange millions of XML docs per day. I've used this > since it was libxml. Even have some patches in there. My application > is proprietary (meaning XML to get an order or tell a customer our > availability is simply XML I designed and documented and give to my > customer's customers (via download from a Web page)). Once they get > it working it pretty much always works. They write software to create > orders and send them to us -- it is consistent (I know, not everyone > has this luxury so this may not apply to everyone). So why check them? > > I also found that I was getting a gagillion support tickets because > DTDs ... simple things like a date ... seem to escape people -- take > June 7, 2018 > > In our date fields we will take: > Jun 7 2018 > June 7 2018 > the above with commas and any case (upper/lower/mixed) > 6/7/18 > 6/7/2018 > 2018/6/7 > 20180607 > 180606 > 06-07-18 > > And actually many many more. Anything that is a date goes through > this one routine and if there is any way in the world to extract a > date, we do. > > Ditto money -- say $1,245.56 > > We accept: > $1,245.56 > 1245.56 > 124556 (decimal is implied at 2 places if no decimal is > found) > 1,235.56 > > And many more - same thing, one routine reads it and if we can > possibly get a reasonable number, we do. > > This, in turn, reduced our CONSTANT support tickets for silly things > like a format of something to ZERO. Which I like. > > Even sicker -- we ignore case on tags. All of our XML is designed to > not use duplicate names with different cases (stupid thing to do > anyway -- expect orderNumber and OrderNumber to both be used, as > different things). > > As long as the customer is consistent and the XML is well formed we > scan the tree and compare tags without regard to case. A WHOLE LOT > more support tickets gone. > > A lot of the people we deal with are not sophisticated. As the > receiver of XML we decided it was much better to be as flexible as > possible and take what we can if at all possible. After all -- a DTD > can indeed tell you if an address comes in without a city name. And > reject it and usually generate a support ticket. Since we use an > on-line AVS system (more XML) and if we have the zip and the address > otherwise matches ... we don't need the city and state ... the AVS > system provides it. And if it fails they will get an error back from > us (from the application) anyway. So why use a DTD to see if the city > or state were sent? A LOT MORE support calls removed. > > And, of course, performance without the DTDs is much better. > > As a result we are able to give documentation to new customers and > they are able to get it up and running with little to no help. Any > serious errors we cannot fix are clearly explained in the responses BY > THE APPLICATION and not by a DTD. > > Being flexible on our end reduces support tickets which is all I > care. I would rather code for all the mistakes I can think of an > enduser would make (and we add new ones when they crop up) than be > strict and do a lot of support. We don't think DTDs are flexible > enough. And I hate making them :-) > > We do offer a page with DTDs they can use manually to check their > document if they like -- or they can send it to our test system. Once > they are running they seem to do just fine. > > As programmers it is hard to believe but sometimes it is better for us > to make slightly less efficient code in order to make the human aspect > much more efficient. I once had someone send me a link to a "contest" > which was a convoluted C statement and asking to solve what the result > would be. My response -- "fire the programmer!" > > If it takes 100s of competent C programmers to get the right answer > (and only a small percent did) to read a line of code -- it is bad > code. And for people's information, modern computers read ahead and > pre-execute code based on all kinds of weird logic. Simple C code is > easy for it to handle ... but convoluted code ends up stopping the > pre-execution and is actually slower -- may have less lines of code -- > but it will be slower. I see nothing wrong with short clear clean > code with as little craziness as possible. This is the same with XML > -- one can go overboard easily, K.I.S.S. :-) > > Not being so strict and no DTDs has had other benefits -- say EDI > (from old IBMs) -- we have a cheap program that maps EDI to XML and > back. So we can handle EDI -- and we don't need new software (after > the conversion). We accept the EDI, convert to XML, run our standard > application, create XML response, which is converted to EDI. The > package we use is low cost and no, it won't work too well with DTDs as > EDI has it's own problems. > > I could go on but most of you have probably skipped this post by now :-) > > E > > On 6/6/2018 3:00 PM, Stefan Sauer wrote: >> On 05/17/2018 06:01 PM, Stefan Sauer wrote: >> >>> On 05/17/2018 04:18 PM, Nick Wellnhofer wrote: >>> >>>> On 16/05/2018 21:51, Stefan Sauer wrote: >>>> >>>>> So one solution could be another flag to enable this? >>>>> >>>> Yes, but it would be rather ugly. >>>> >>> In which sense? I guess because it is something that noone should need >>> to know about or have to care about? >>> >>>>> Thanks, reading the code. Need to figure where we could cache external >>>>> subsets and what a suitable keys is (ExternalID ?). >>>>> >>>> Note that I'm currently not planning to review and integrate larger >>>> patches from other developers. I only took over some libxml2 >>>> maintenance duties because noone else did. So even if you write a >>>> high-quality patch, it might never get merged. >>>> >>> Thanks for making this clear upfront. This is how I ended up becoming >>> the gtkdoc maintainer :) >>> >>> >>>> Caching external subsets for XIncludes certainly sounds like a nice >>>> feature but I would prefer to find a simpler solution. For example, >>>> can't you just omit the external DTD from included documents? >>>> >>> Yeah, right now, the benefit of having the DTD is that one can validate >>> fragments. I'll do some research (aka grepping over existing projects) >>> to see how the doc-type headers being used today look like. If all that >>> people do is using an entity to inject the version, I'll write a >>> migration tool. >>> >>> We have a test that validates the doc, but I think I can change this to >>> just resolve all xincludes and check through the top-level doctype. >>> >> Just to add to this, I am assuming a lot of people follow this book >> http://www.sagehill.net/docbookxsl/ModularDoc.html#UsingXinclude >> >> and using a DOCTYPE is part of the examples. >> >>>> You wrote: >>>> >>>> >>>>> and gtk-doc will replicate this for the fragments (replacing 'book' with >>>>> e.g. 'refentry'). This way one can e.g. inject things like a version. >>>>> >>>> What do you mean by "inject things like a version"? Why exactly do >>>> your included documents have to reference an external DTD? >>>> >>> The documentation consists of a handwritten master doc (type book), that >>> includes more handwritten parts (e.g. tutorials, guides) and include >>> generated reference docs. When gtkdoc generated the reference docs, it >>> applies takes the doctype header of the master-doc as a template and >>> uses that for the generated reference docs. If the master doc has >>> entities declared, those can be expanded in the reference fragments. >>> Thats the part I will check how widely it is actually used. >>> >>> Stefan >>> >>> >>>> Another idea is to stop loading external DTDs for XIncludes without an >>>> XPointer expression. This would still change the behavior for some >>>> users but it's much less likely to cause problems. >>>> >> change the behaviour, as in we would not catch validation errors? >> Too bad that xmlXIncludeParseFile() does not get the parent parserCtx, >> in that case we could apply the same flags'. >> >>>> Nick >>>> >>> I definitely don't know enough about the implications here. I was mostly >>> thinking to see if we can stick a dictionary of <dtd-identifier, >>> xmlDtdPtr> into the Parser Context and before actually loading a dtd, >>> check if we did already and reuse. Somehow the dict needs to be stored >>> in the top-level doc, when parsing is done (do we need the dtds once the >>> doc has been parsed?). We only free the dtds with the top-level doc. But >>> I agree, it is not going to be a two liner. >>> >> It seems that xmldict is only handling key and value to be a string, >> right? So, we'll even need out one cache data structure. I'd say it >> would need to be on the _xmlXIncludeCtxt level. global is easier, but >> then we can't free it ever :/ >> >> Stefan >> >>> Stefan >>> >>> >>> _______________________________________________ >>> xml mailing list, project page http://xmlsoft.org/ >>> xml@gnome.org >>> https://mail.gnome.org/mailman/listinfo/xml >>> >> >> _______________________________________________ >> xml mailing list, project page http://xmlsoft.org/ >> xml@gnome.org >> https://mail.gnome.org/mailman/listinfo/xml >> >> > > -- > Eric S. Eberhard > VICS > 2933 W Middle Verde Road > Camp Verde, AZ 86322 > > 928-567-3727 work 928-301-7537 cell > > http://www.vicsmba.com/index.html (our work) > http://www.vicsmba.com/ourpics/index.html (fun pictures)
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml