Chris, If you're trying to avoid writing code to make this work you may want to consider using a more schema centric command-line program like xjparse [1] or jaxp.SourceValidator [2] instead of dom.Counter. With either of those you can specify a list of schema documents to use for validation. Additionally xjparse provides an option for specifying an XML Catalog [3] for resolving the schema locations.
Thanks. [1] http://nwalsh.com/java/xjparse/ [2] http://xerces.apache.org/xerces2-j/samples-jaxp.html#SourceValidator [3] http://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] [EMAIL PROTECTED] wrote on 10/08/2007 08:20:18 PM: > I think there's a better way which I'll sketch (because my project > uses a version of Xerces that is from before the DOM Level 3 > interfaces were included, so does something similar using older > stuff). > > A standard XML parser may be associated with an EntityResolver, which > supports a method taking a URI and returning an InputSource from which > the content may be read. Similarly, when a reference to a schema > namespace is found in a document (instance or schema) being read by a > validating parser, some kind of resolver will be called, if one has > been attached to the parser, to find the definition of the schema for > that namespace. The namespace URI is the argument to the relevant > method. This resolver thing (might be called LSResolver in the DOM > Level 3 L&S) is an interface, and your implementation may do whatever > it wants. Thus, you could create the resolver with some root location > in the file system as argument, or you could use > ClassLoader.getSystemResourceAsStream() or you could put the schemas > in a database and retrieve their text from there. Your resolver could > consult any schema locations it accumulated during its lifetime if you > had a way to capture these, and wouldn't have to use them literally, > but could interpret them as it wished. > > I suggest you consult the Xerces docs about how to install a resolver > for schemas. > > Jeff > > On 10/8/07, Chris Bray <[EMAIL PROTECTED]> wrote: > > Michael, I'm using Xerces-J 2.9.1, I even upgraded from 2.9.0 today to > > test any changes! > > > > Jeff, can you bear with me here I think I understand you... > > > > Jeff Greif wrote: > > > Maybe an example will be clearer. > > > > > > The instance document is, relative to some subtree of the file system, in > > > > > > instances/articles/doc1.xml > > > > > > There is a set of schemas that apply in > > > > > > schemas/{a,b,c,d}.xsd > > > > > > Suppose a.xsd imports b.xsd, and in addition, doc1.xml refers to > > > components from nsa, the namespace of a, and nsb, the namespace of b. > > > > > > So there are schema locations of the form {nsa, ../../schemas/a.xsd > > > nsb ../../schemas/b.xsd, ... } > > > > > > Now when the reference from doc1 -> nsb is found, the schema locations > > > can be used to find b.xsd. > > > > I'm with you up to here, because the schema locations were defined in > > doc1.xml they are relative to doc1.xml and therefore point to the > > correct xsd files. > > > > > If the reference from a.xsd -> nsb is > > > found, the schema locations will not work, because the location is > > > incorrect relative to the location of a.xsd. > > > > My reference from a.xsd -> nsb is in the form > > <xsd:import namepsace="nsb" schemaLocation="./b.xsd" /> > > This path to b.xsd is correct with respect to the a.xsd it is defined in > > (although incorrect with respect to doc1.xml). > > > > However this schema location hint is second in the queue behind the one > > specified in doc1.xml, when Xerces tries to use the one specified in > > doc1.xml here it fails with File Not Found(because when relative to > > a.xsd the doc1.xml's schema location is not valid), reports the error > > and stops parsing so the schema location specified here is never used. > > > > Other parsers continue looking at the hints in schema location and find > > the correct one specified on the <xsd:import> line, is there any way of > > telling Xerces to try all hints matching that namespace (in the same way > > XMLSpy, Microsoft .NET's System.Xml and Saxonica seem to do) rather than > > stop on the first "not found"? > > > > > You couldn't solve the > > > problem by changing the schema locations to look like {nsa, > > > ../../schemas/a.xsd nsb ./b.xsd, ... } because the doc1 -> nsb > > > reference would fail. However, in the first case, if the parser is > > > caching grammars, and the reference from doc1 -> nsb has already been > > > processed, the a.xsd -> nsb reference might not be a validation error > > > -- the schema locations are only a hint to the parser, and if it has > > > located and parsed the right grammar already, it can use it. > > > > So changing the schemaLocation works in my case because in processing > > a.xsd the parser finds b.xsd (via the schemaLocation relative to a.xsd) > > and caches it, therefore meaning it can use the cached copy in doc1.xml. > > > > > These are the problems with using relative URLs for the schema > > > locations, except in certain special cases. For example, if the > > > instance doc is > > > > > > instances/doc1.xml > > > > > > and the schemas are in > > > > > > schemas/{a,b,c,...}.xsd > > > > > > Then these schema locations: {nsa ../schemas/a.xsd nsb > > > ../schemas/b.xsd ...} will work successfully, but only because the > > > paths work whether the reference is from the instance doc or a schema > > > doc. > > > > Ideally I'd like to specify a "try all schema locations before error" or > > "do not stop on file not found error" property since there will *always* > > be one that works when used relative to the current location, is there a > > way of doing this? > > > > I'm guessing there is no "schema locations per file" property to turn > > off the global cache of schema location and switch to a per-file cache? > > Thus forcing Xerces to use the hint found at the current location. > > > > Maybe the easiest way to solve my problem is to re-jig my document > > locations so that the same relative path can be used to locate each of > > the schemas? Not ideal mind since I've spent a long time developing the > > inter-schema links to ensure they can always be linked together and I'd > > like to use that investment in some way and I can't help but think that > > moving the files so the relative paths fit for both scenarios is more of > > a by-product than something implemented by design. > > > > I'm under some commercial pressure here to switch to the method that > > works with the system that the customers use (XMLSpy et al) but I'd > > really like the same examples to work in Xerces-J, we've been extolling > > the virtues of XML and XMLSchema as the "common language" to unify our > > industry's data exchange and it'd look bad to have to change the > > examples we are producing to make them work in different parsers! > > > > Once again, that ended up a lot longer than I expected and I hope it > > makes sense, thanks for your time and patience. > > Chris > > > > > Jeff > > > > > > > > > > > > On 10/8/07, Chris Bray <[EMAIL PROTECTED]> wrote: > > >> Jeff. > > >> > > >> My comments inline. > > >> > > >> Chris > > >> > > >> Jeff Greif wrote: > > >>> When a relative URL is used for the location of an imported schema, it > > >>> is supposed to be relative to the URL of the importing document. So > > >>> if your instance document directly references the namespaces of one or > > >>> more schemas for validation, whose URLs are interpreted relative to > > >>> the location of the instance document. Probably some of the schemas > > >>> > > >> So my instance document _should_ have relative paths to the individual > > >> schemas in it's schemaLocation? > > >> Does the fact that Xerces is "changing" the base path to that of the > > >> first specified schema for each subsequent schema constitute a bug? > > >> Should I log this somewhere more formal? > > >>> contain <xsd:import> elements; those would require URLs relative to > > >>> the schema importing them. > > >>> > > >> Each of those schemas then further includes others using <xsd:import> > > >> and <xsd:include> (for example core.xsd actually includes about 30 or 40 > > >> smaller schemas from ./Core/schemaname.xsd) and this works as I'd > > >> expected it to. > > >>> Some of the schemas might be referenced both in the instance document > > >>> and in imports from other schemas referenced in the instance document. > > >>> I'm not sure there's a specification of where they must be found if > > >>> relative URLs are used. This may depend on the ordering of processing > > >>> of those references by the parser/validator. > > >>> > > >> When that is the case I am 100% sure that both the instance document and > > >> the "sub schemas" refer to the exact same document, so it shouldn't > > >> matter which of the references Xerces is using, it will resolve to the > > >> same schema anyway. > > >>> There is a section in the XML Schema 1.0 spec addressing this issue. > > >>> > > >>> Jeff > > >>> > > >>> > > >>> > > >>> On 10/8/07, Chris Bray <[EMAIL PROTECTED]> wrote: > > >>> > > >>>> Parshant, > > >>>> > > >>>> Changing the working dir of the JVM doesn't seem to make any > difference, > > >>>> using dom.Counter from the Xerces-J samples the parser still seems to > > >>>> change the working dir first to wherever the xml file is located, then > > >>>> to wherever the first xsd file specified is located and need all > > >>>> subsequent locations to be relative to that. > > >>>> > > >>>> Absolute paths work fine but I'm trying to include these files bundled > > >>>> in with a set of schema as examples of how to use the format, hence I > > >>>> don't know where my users will unzip the archives to (C: > \Users\username, > > >>>> c:\projects\projectname\, /usr/local/projects, /home etc) so > I can't set > > >>>> absolute paths in my distributed files. > > >>>> > > >>>> I was hoping to not need to actually write my own parsing program, just > > >>>> use the output from dom.Counter and a schemaLocation hint > (which fits my > > >>>> needs perfectly) since I'm not really a Java developer. > > >>>> > > >>>> I saw that jEdit page but I'd rather make my schemas validateagainst a > > >>>> standard Xerces installation than modify my jEdit installation to make > > >>>> them work, I feel this would be more useful for my users. > > >>>> > > >>>> Chris > > >>>> > > >>>> > > >>>> Prashant Reddy wrote: > > >>>> > > >>>>> I think the relative paths you have specified in the > schemaLocation will > > >>>>> be resolved against the "working dir". The working dir is usually the > > >>>>> directory at the cmd prompt when you launched the JVM. > > >>>>> > > >>>>> Have you tried giving absolute path to the XSD files ? > > >>>>> > > >>>>> A more portable solution to finding schema files locally is to use > > >>>>> EntityResolver[1]. > > >>>>> > > >>>>> If you are using JAXP 1.3/ JDK 1.5+ see : > > >>>>> https://jaxp.dev.java.net/article/jaxp-1_3-article.html > > >>>>> > > >>>>> > > >>>>> [1]:http://java.sun.com/j2se/1.5. > 0/docs/api/org/xml/sax/EntityResolver.html > > >>>>> > > >>>>> Hope this helps. > > >>>>> -Prashant > > >>>>> > > >>>>> > > >>>>> On Mon, 2007-10-08 at 13:17 +0100, Chris Bray wrote: > > >>>>> > > >>>>> > > >>>>>> All. > > >>>>>> > > >>>>>> Please go easy on me as I'm a newbie here, if this is a > really obvious > > >>>>>> problem I'm really sorry! > > >>>>>> I've been using Xerces to validate XML for a while now, and > I've found a > > >>>>>> troublesome scenario. > > >>>>>> > > >>>>>> In the top of my xml files I have a line specifying the > location of the > > >>>>>> external schemas required for this xml file like so: > > >>>>>> > > >>>>>> xsi:schemaLocation="http://www.diggsml.org/0.9.2 > > >>>>>> ../Schemas/diggs/core.xsd http://www.diggsml.org/0.9.2 /geotechnical > > >>>>>> ../Schemas/diggs/geotechnical.xsd " > > >>>>>> > > >>>>>> In this case specifying two namespaces and their associated > schema files > > >>>>>> (files exist and paths are correct). > > >>>>>> > > >>>>>> However this doesn't work using Xerces. I am required to change my > > >>>>>> schemaLocation attribute so that the first path points to > its xsd, then > > >>>>>> subsequent entries are relative to that first xsd, not to the current > > >>>>>> file, like so: > > >>>>>> > > >>>>>> xsi:schemaLocation=" http://www.diggsml.org/0.9.2 > > >>>>>> ../Schemas/diggs/core.xsd http://www.diggsml.org/0.9.2 /geotechnical > > >>>>>> ../geotechnical.xsd " > > >>>>>> > > >>>>>> Is there any way I can change this to work like the first example, as > > >>>>>> other parsers (XMLSpy and Stylus Studio in particular) > require the first > > >>>>>> syntax, all paths relative to current doc, what I believe > to be correct > > >>>>>> behaviour. I don't know how to build Xerces-J from source > to fix(?) this > > >>>>>> myself but I'd be willing to try if anyone can help me get > it building. > > >>>>>> > > >>>>>> Since my customers are all using XMLSpy etc I'm having to produce my > > >>>>>> example files in the earlier syntax, stopping my from usingXerces to > > >>>>>> validate them. > > >>>>>> > > >>>>>> As the biggest advocate of Free/OpenSource software in our > group (jEdit > > >>>>>> with Xerces plugin in particular) I really don't want to > have to change > > >>>>>> to use XMLSpy or Stylus Studio but this is quite awkward for me! > > >>>>>> > > >>>>>> That ended up being a longer mail than I'd expected! I hope you can > > >>>>>> help, if there's any more information you need (or a small > set of sample > > >>>>>> files) let me know. > > >>>>>> > > >>>>>> > > >>>>>> Chris Bray > > >>>>>> Software Engineer (DIGGS Project) > > >>>>>> Keynetix Ltd. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]