Hello,
First of all, sorry for my English, i take enough time to do my best,
but think that it still very bad for native speakers.
Usually we parse XML files process theme in memory and use XSLT to
format these file when they get written back into the file.
We tried to find a parser that may support element formatting and
indentation handling, but we failed to find one we can use
in combination with other XML libraries (XPATH ...)
The context:
We try to keep our specific code separated from the Apache-OFBiz code to
facilitate maintenance, in other words we don't want to fork.
To do that, our specific code is isolated into a kind of packages we
call an addons.
An addon is a group of patches that should be installed together in an
atomic operation (if one patch fails, none of the addon patches is applied).
Given that OFBiz have a huge number of XML files, and because of patches
(based on lines comparisons) being very sensitive for changes,
we implemented a kind of semantic Diff and Patch for XML files.
============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<patch>
<add-element path="/entity-engine-xml[1]"
previous="DataResource[@dataResourceId='ACCOUNTING_main' and
@localeString='en']">
<DataResource dataResourceId="HELP_ACCOUNTING1"
dataResourceName="Accounting Overview" dataResourceTypeId="OFBIZ_FILE"
dataTemplateTypeId="NONE"
isPublic="Y" localeString="fr" mimeTypeId="text/xml"
objectInfo="applications/accounting/data/helpdata/HELP_ACCOUNTING.xml"
statusId="CTNT_IN_PROGRESS"/>
</add-element>
</patch>
============================================
The problem
Some Apache-OFBiz's XML files are formatted with a specific indentation
rules (or no rules at all).
============================================
<service name="superconductivity" engine="Java"
location="org.ofbiz.marketing.marketing.MarketingServices"
invoke="signUpForContactList" auth="false">
<description>Signs an input email up for a ContactList with
_NA_ party using the system userLogin.
The intent is for anonymous sign ups to email lists. Also
validates email format.</description>
<attribute name="contactListId" type="String" mode="IN"
optional="false"/>
<attribute name="email" type="String" mode="IN" optional="false"/>
<attribute name="partyId" type="String" mode="IN" optional="true"/>
<attribute name="baseLocation" type="String" mode="IN"
optional="true"/>
</service>
============================================
The main specification we had was to keep intact all formatting texts
(series of spaces, tabulation and new lines characters) when the
DOM get written back into the file.
My solution
I tried, without success, to find a solution so I implemented my own
parser and solved the problem by creating a new type of
node called SpaceNode. A SpaceNode will represent the series of spaces
and new line car rater between two attributes declarations.
Questions
If there is a solution using any of the apache XML handing libraries ?
If not, does this solution seem to be suitable from point of view ?
any comment will be appreciated.
Thank you .