[ 
https://issues.apache.org/jira/browse/TIKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372860#comment-16372860
 ] 

ASF GitHub Bot commented on TIKA-2580:
--------------------------------------

tballison closed pull request #220: Fix for TIKA-2580 contributed by ewanmellor.
URL: https://github.com/apache/tika/pull/220
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/tika-core/src/main/java/org/apache/tika/sax/SafeContentHandler.java 
b/tika-core/src/main/java/org/apache/tika/sax/SafeContentHandler.java
index d3152c680..f82098493 100644
--- a/tika-core/src/main/java/org/apache/tika/sax/SafeContentHandler.java
+++ b/tika-core/src/main/java/org/apache/tika/sax/SafeContentHandler.java
@@ -31,7 +31,8 @@
  * ({@link #characters(char[], int, int)} or
  * {@link #ignorableWhitespace(char[], int, int)}) passed to the decorated
  * content handler contain only valid XML characters. All invalid characters
- * are replaced with spaces.
+ * are replaced with the Unicode replacement character U+FFFD (though a
+ * subclass may change this by overriding the writeReplacement method).
  * <p>
  * The XML standard defines the following Unicode character ranges as
  * valid XML characters:


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SafeContentHandler documentation is incorrect about replacement character
> -------------------------------------------------------------------------
>
>                 Key: TIKA-2580
>                 URL: https://issues.apache.org/jira/browse/TIKA-2580
>             Project: Tika
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 1.17
>            Reporter: Ewan Mellor
>            Priority: Minor
>
> SafeContentHandler's doc comment states "All invalid characters are replaced 
> with spaces."  This has been untrue since TIKA-698 (Sep 2011).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to