Re:[java-user]How did you guys store category info

2012-09-19 Thread 秋水
forward myself... lack of exploration about the apidocs. some interesting analyzer found after the last email. At 2012-09-20 10:23:09,"秋水" wrote: >Hello. >my project may require the tree style category info, how to store it so all >leaf docs under some category node could be retrieved ? >

[java-user]How did you guys store category info

2012-09-19 Thread 秋水
Hello. my project may require the tree style category info, how to store it so all leaf docs under some category node could be retrieved ? in thought, planing to store the vertical category info in field : "level 1", "level 2", ... with the "level last" field appended. no ideas about the ease of

[java-user]How did you guys store category info

2012-09-19 Thread 秋水
Hello. my project may require the tree style category info, how to store it so all leaf docs under some category node could be retrieved ? in thought, planing to store the vertical category info in field : "level 1", "level 2", ... with the "level last" field appended. no ideas about the ease of

RE: Using stop words with snowball analyzer and shingle filter

2012-09-19 Thread Steven A Rowe
Hi Martin, SnowballAnalyzer was deprecated in Lucene 3.0.3 and will be removed in Lucene 5.0. Looks like you're using Lucene 3.X; here's an (untested) Analyzer, based on Lucene 3.6 EnglishAnalyzer, (except substituting SnowballFilter for PorterStemmer; disabling stopword holes' position increm

Re: Using stop words with snowball analyzer and shingle filter

2012-09-19 Thread Jack Krupansky
The underscores are due to the fact that the StopFilter defaults to "enable position increments", so there are no terms at the positions where the stop words appeared in the source text. Unfortunately, SnowballAnalyzer does not pass that in as a parameter and is "final" so you can't subclass i

Re: SpanNearQuery distance issue

2012-09-19 Thread vempap
Shoot me. Thanks, I did not notice that the doc has ".. e a .." in the content. Thanks again for the reply :) -- View this message in context: http://lucene.472066.n3.nabble.com/SpanNearQuery-distance-issue-tp4008975p4009033.html Sent from the Lucene - Java Users mailing list archive at Nabble

Re: SpanNearQuery distance issue

2012-09-19 Thread Trejkaz
On Thu, Sep 20, 2012 at 4:28 AM, vempap wrote: > Hello All, > > I've a issue with respect to the distance measure of SpanNearQuery in > Lucene. Let's say I've following two documents: > > DocID: 6, cotent:"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1001 > 1002 1003 1004 1005 1006 1007 1008

Re: Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

2012-09-19 Thread Steve McKay
You're running out of memory. There are two ways to deal with that: give the JVM more heap, or use less heap. Are you sure your code is being affected by the NetBeans settings? It looks like they're used for NetBeans' own JVM so it's not going to change anything unless your code is running in-pr

SpanNearQuery distance issue

2012-09-19 Thread vempap
Hello All, I've a issue with respect to the distance measure of SpanNearQuery in Lucene. Let's say I've following two documents: DocID: 6, cotent:"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1001 1002 1003 1004 1005 1006 1007 1008 1009 1100", DocID: 7, content:"a b c d e a b c f g h i j k

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

2012-09-19 Thread Reyna Melara
Hi, I'm trying to index a big set of plain text files, almost 8,104,467 files, that are all under the same directory /media/MAFALDA/yohasebewp2txt/Archivos and want to get my index under /media/MAFALDA/LuceneIndex using IndexFiles.java program from the documentation. I'm using Netbeans IDE, and I