Dreaded grammar error: s/If you're document/your document/ :) On Sep 21, 2012, at 1:27 PM, DM Smith <dmsm...@crosswire.org> wrote:
> So far the discussion is around whether the xml is well-formed. > Once you get that working, then you need to make sure it is valid wrt the > OSIS schema. > > There's an old tool that will convert sgml to well-formed xml. I think it was > James Clark's "sx". I've used it successfully on initial conversions and > getting something that will work within xml tools. > > Finally, OSIS has the notion of milestones for start and end elements. There > are semantic rules regarding this that cannot be checked by standard xml > tools. Osis2mod tries to handle this. When you get to that point, I can help > unravel the logging options. > > The purpose of milestoned elements is to allow for two competing document > models to be in the same xml document: BSP and BCV (names we've given it here > and in the wiki). > > We recommend using BSP (book, chapter, section, paragraph, poetry, lists to > all be containers, not milestoned) and verse elements be milestoned. > > Note, the OSIS manual says that if you have one element milestoned, then all > other elements with the same tag name have to be milestoned. Practically > speaking, this does not matter. SWORD and JSword don't care. Having verses > milestoned only if necessary is probably a better way to create a good XML > document. Start out with all of them as containers and each place where that > causes a problem, either fix the xml or if otherwise correct, convert to > milestoned verses. > > Generally speaking these BSP elements should not start just inside or at the > end of a verse. Rather they should be between verse elements or within the > text. When they are placed just after the verse start, they often will cause > the verse number to be orphaned. When they are placed just before the verse > end, then it is generally not noticeable (just bad form). > > Quotes will create the biggest grief in the above. They often cross > boundaries. Certainly, the beatitudes does, starting in one chapter and > ending a couple of chapters later. For this reason, using the milestoned > version is necessary. > > If you're document follows some simple rules (some required by xml, others > simplifications), then checking nesting is a simple matter of having a > push/pop stack of elements. The simple rules: > 1) All attributes when present have quoted values. > 2) All entities are properly formed and used when needed. Also, < and > are > not in attribute values. > 3) Tags are marked with < ... >, </ ... >, or < ... />. and now new lines > between < and >. > > If this is true then a simple perl script can be written to find the problems > in the file: > Look for < ... /> and skip them. They cause no problems. > Look for < xxx ... > and push the tag name along with its location in the > file on to the stack. > Look for < xxx />, compare xxx to the top element on the stack. If it doesn't > match, then it causes an error. > When you get to the end of the document and the stack is not empty, then the > elements on the stack are not closed properly. > > Printing out the stack (elements and locations) would help find what the > problem is. > > For example: > if xxx is deeper in the stack, then there is a problem with nesting. > Look at all the elements above the xxx on the stack for problems. > if it is not in the stack, then the element was not started prior to > that point or it may have been ended twice. > > Here is a simple perl script (that I wrote), which doesn't do that, but could > be adapted to do it. This creates a histogram/dictionary of tag and attribute > names. > > #!/usr/bin/perl > > use strict; > > my %tags = (); > my %attrs = (); > while (<>) > { > #print; > # While there is a tag on the line > while (/<[^\/\s>]+[\/\s>]/o) > { > # While there is an attribute in the tag > while (/<[^\/\s>]+\s+[^\=\/\>]+=\"[^\"]+\"/o) > { > # remove the attribute > s/<([^\/\s>]+)\s+([^\=\/\>]+)(\="[^\"]+\")(.*)/<$1 $4/o; > my ($t, $a, $v, $r) = ($1, $2, $3, $4); > $attrs{"$t.$a"}++; > } > # remove the tag > s/<([^\/\s>]+)[\/\s>]//o; > $tags{$1}++; > #print("do next tag on line\n"); > } > #print("do next line\n"); > } > > foreach my $tag (sort keys %tags) > { > print("$tag\n"); > } > > foreach my $attr (sort keys %attrs) > { > print("$attr\n"); > } > > Hope this helps, > DM > > On Sep 21, 2012, at 10:52 AM, Andrew Thule <thules...@gmail.com> wrote: > >> Thanks everyone for suggestions. I'll give them all a try. >> >> That said, the emacs recommendation is nearly a religious conversion >> recommendation. (I'm on the vi side of the vi verses emacs debate. I >> suppose as long as it doesn't kill me I should give it a try, though I'm not >> certain what impact it will have on the health of my soul ... :D ) >> >> ~A >> >> >> On Thursday, September 20, 2012, Daniel Owens wrote: >> I use jEdit with the XML plugin installed. I find it helps me find problems >> fairly easily. >> >> Daniel >> >> On 09/20/2012 05:26 PM, Greg Hellings wrote: >> There are a number of pieces of software out there that will >> pretty-print the XML for you, with indenting and whatnot. Overly >> indented for what you would want in production but decent for >> debugging mismatching nesting and the like. >> >> For example, 'xmllint --format' will properly indent the file, etc. I >> don't know how it will handle poorly formed XML. >> >> GUI editors can do wonders as well. On Windows I use Notepad++ and >> manually set it to display XML. gEdit and Geany - I believe - both >> support similar display worlds. And there are some plugins for Eclipse >> that might handle what you need as well. >> >> --Greg >> >> On Thu, Sep 20, 2012 at 4:19 PM, Karl Kleinpaste <k...@kleinpaste.org> wrote: >> Andrew Thule <thules...@gmail.com> writes: >> One of my least favour things is finding mismatched tags in OSIS.xml files >> Has anyone successfully climbed this summit? >> XEmacs and xml-mode (and font-lock-mode). M-C-f and M-C-b execute >> sgml-forward-element and -backward-. That is, sitting at the beginning >> of <tag>, M-C-f (meta-control-f) moves forward to the matching </tag>, >> properly handling nested tags. >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page >> >> >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page