[
https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035817#comment-13035817
]
Digy commented on LUCENENET-414:
--------------------------------
Fixed in 2.9.4g
DIGY
> The definition of CharArraySet is dangerously confusing and leads to bugs
> when used.
> ------------------------------------------------------------------------------------
>
> Key: LUCENENET-414
> URL: https://issues.apache.org/jira/browse/LUCENENET-414
> Project: Lucene.Net
> Issue Type: Bug
> Components: Lucene.Net Core
> Affects Versions: Lucene.Net 2.9.2
> Environment: Irrelevant
> Reporter: Vincent Van Den Berghe
> Priority: Minor
> Fix For: Lucene.Net 2.9.2
>
>
> Right now, CharArraySet derives from System.Collections.Hashtable, but
> doesn't actually use this base type for storing elements.
> However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a
> System.Collections.Hashtable. The trivial code to build your own stopword set
> using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of
> stopwords like this:
> CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET,
> ignoreCase: false);
> foreach (string domainSpecificStopWord in DomainSpecificStopWords)
> stopWords.Add(domainSpecificStopWord);
> ... will fail because the CharArraySet accepts an ICollection, which will be
> passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords
> will only contain the DomainSpecificStopWords, and not those from
> STOP_WORDS_SET.
> One workaround would be to replace the first line with this:
> CharArraySet stopWords = new
> CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count +
> DomainSpecificStopWords.Length, ignoreCase: false);
> foreach (string domainSpecificStopWord in
> (CharArraySet)StandardAnalyzer.STOP_WORDS_SET)
> stopWords.Add(domainSpecificStopWord);
> ... but this makes use of the implementation detail (the STOP_WORDS_SET is
> really an UnmodifiableCharArraySet which is itself a CharArraySet). It works
> because it forces the foreach() to use the correct
> CharArraySet.GetEnumerator(), which is defined as a "new" method (this has a
> bad code smell to it)
> At least 2 possibilities exist to solve this problem:
> - Make CharArraySet use the Hashtable instance and a custom comparator,
> instead of its own implementation.
> - Make CharArraySet use HashSet<char[]>, defined in .NET 4.0.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira