report errors for wrongly-encoded files in ResourceLoader.getLines()
--------------------------------------------------------------------
Key: SOLR-2003
URL: https://issues.apache.org/jira/browse/SOLR-2003
Project: Solr
Issue Type: Improvement
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
Fix For: 3.1, 4.0
ResourceLoader is used to load things like stopwords and synonyms files, but it
uses the default 'Charset' argument for this.
when you open an InputStream with a Charset, you get:
{code}
decoder = charset.newDecoder().onMalformedInput(
CodingErrorAction.REPLACE).onUnmappableCharacter(
CodingErrorAction.REPLACE);
{code}
For cases like malformed encoded stopwords and synonyms files, I think its more
helpful to use CodingErrorAction.REPORT than to silently replace with a
replacement char. Then the user gets an exception.
See:
http://www.lucidimagination.com/search/document/1e50cb0992727fa1/foreign_characters_question
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]