Ok, thanks for the feedback. I'll look into the suggestions that you offered.
Thanks again, Tom On Wed, May 4, 2011 at 1:17 AM, Aki Yoshida <[email protected]> wrote: > Hi Tom, > I think the wrong thing about this method is that it adds an extra > space at the beginning. If the file content is an XML and it starts > with the xml declaration, there will be an extra space in front of the > declaration that violates the well-formdness. > > You can create a jira issue for this particular bug. But this will not > really help your in the long run. I will explain the reason below. > > As I understand your use case, you want to use this method for reading > an XML file and creating its java string representation in your > application. As I see this method, it doesn't look like it was really > meant to be used for such purposes. Furthermore, it seems that this > class is only used in some unit test classes for performing a simple > content comparison. > > For your particular use case, you need to take care of the character > encoding and possibly the newline handling. This FileUtil's method > ignores the encoding of the file. If the file is using the utf-8 > encoding, you need to read the stream and covert it into a java String > using the utf-8 encoding. If it is in some other encoding like utf-16, > iso-8859-1, etc, you need to use that encoding for conversion. > Otherwise, you will have a corrupted String for some characters. > Regarding the newline handling, this method currently removes all the > CR/LFs. This is probably okay for the existing test use cases, but for > your use case, you may want to either preserve the new line characters > or to normalize them using the standard XML rule. So, there will be > some other issues you will encounter if you use this simple method. > > Therefore, I would recommed you not to use this FileUtil's method and > instead use an alternative approach using the xml parser to convert a > file for further processing (e.g., using InputSource to work on the > Source or XMLUtils.parse() to work on the Document). > > Regards, Aki > > 2011/5/3 Tom Eastmond <[email protected]>: >> That would be great to get this fixed - should I create a defect? I'd >> also love to not have it replace a single space with 2 spaces since >> that has caught me by surprise in my testing as well. Let me know what >> you'd like me to do. >> >> Thanks again, >> Tom Eastmond >> >> On Tue, May 3, 2011 at 6:19 AM, Aki Yoshida <[email protected]> wrote: >>> Sorry, >>> I realized this method has actually nothing to do with XML. >>> please ignore my comments on XML normalization. >>> regards, aki >>> >>> 2011/5/3 Aki Yoshida <[email protected]>: >>>> Hi, >>>> you are right. The normalizeCRLF() method should not add an extra >>>> space at the begining. We can fix this particular issue. >>>> >>>> But there is one open question, as the exact purpose (use case) of >>>> this method is not clear to me. Why do we need this normalization >>>> method that just removes all the CRs and LFs and replace each >>>> space/tab character with a single space and this method is >>>> automatically called in FileUtils.getStringFromFile()? >>>> >>>> Does someone else wants to have other normalization options such as >>>> doing the standard xml white space "ignore" handling or the >>>> end-of-line handling (i.e., replacing each CRLF pair to a single LF)? >>>> >>>> Regards, aki >>>> >>>> 2011/5/2 Tom Eastmond <[email protected]>: >>>>> I was using the FileUtils.getStringFromFile() method for some Camel >>>>> testing and was receiving a SAXParseException: The processing >>>>> instruction target matching "[xX][mM][lL]" is not allowed.]. >>>>> >>>>> It turns out that this was due to the was due to the >>>>> FileUtils.normalizeCRLF() method which replaces whitespace characters >>>>> (\s) with two spaces. This method appends leading spaces to the >>>>> contents (before the <?xml version="1.0" encoding="UTF-8"?> in this >>>>> case) which chokes the XML parser. Would it be feasible to forgo the >>>>> leading spaces at the start of a file in order to avoid this issue? >>>>> I'd be happy to submit a test case/patch if this seems like a valid >>>>> bug/fix. Please let me know if I should use another forum for this >>>>> request. >>>>> >>>>> Thanks for the excellent work, >>>>> >>>>> Tom Eastmond >>>>> >>>> >>> >> >
