StopWords -- File Suffix

2005-10-17 Thread tcorbet
I thought that by using a StandardAnalyzer with a StopWord list that is a merge of the ENGLISH_STOP_WORDS and a handful of additions that I have provided -- additions which include the most common file suffixes [.txt, .xml, .doc, etc.] -- ought to eliminate any occurrence of those terms in the resu

Re: need help for generating Query String

2005-10-17 Thread Koji Sekiguchi
Hi, In a program I have indexed 10 files. When I do a search using the query "contents:java", it will return 2 documents. But when I give "-contents:java", then it will return an empty result set. Does anyone know what the right query string for this? I.e., to retrieve all documents that does no

need help for generating Query String

2005-10-17 Thread jibu mathew
Hi all, I need urgent help for the following issues. What is the query string to retrieve all the documents indexed (something similar to *.*)? In a program I have indexed 10 files. When I do a search using the query "contents:java", it will return 2 documents. But when I give "-contents:java

Re: Too many clauses

2005-10-17 Thread Chris Hostetter
: : To circumvent it, here are a few options that I have thought of: : 1. Chunk it up: : a. Create a filter based on a query that has a maximum of 1024. : b. Get its bits. : c. Get the next 1024 blocked skus and create a filter out of it and get : its bits. : d. AND the two BitSets. :

RE: Too many clauses

2005-10-17 Thread Sharma, Siddharth
I thought of that but I had that listed as a last fallback option because I was not sure what it meant in terms of performance since I am a newbie to Lucene. So if I bump up my heap (I assume that's what you are referring to when you say java pool) it'll be ok? Are there metrics around this? At x

RE: Too many clauses

2005-10-17 Thread Aigner, Thomas
Another way around it is to increase the max clause count. //Setting the clause Count BooleanQuery.setMaxClauseCount(int); Can use maxint or some number smaller.. When I set this high, I have had to set the java pool higher for memory as well. Tom -Original Message- From: Sharma, Siddh

Re: Clustering with Lucene

2005-10-17 Thread Stanislaw Osinski
Hi Joe, I'm one of Carrot2 developers and I have good news for you :) The example of using Carrot2 with Lucene is in the Carrot2 repository on SourceForge.net ( http://sourceforge.net/projects/carrot2). Please check out the "carrot2" module (http://cvs.sourceforge.net/viewcvs.py/carrot2/carrot2/)

Too many clauses

2005-10-17 Thread Sharma, Siddharth
Query: caught a class org.apache.lucene.queryParser.ParseException with message: Too many boolean clauses I realize why this is happening (the 1024 clauses limit for BooleanQuery). My question is more design related. During customer registration, the customer defines a set of skus/products that

Re: Lucene in Action : example code -> document-parsing framework ...

2005-10-17 Thread Patricio Galeas
Hello, first, thank you for your help !! I have replaced the JAR File in the "Java Build Path" von Eclipse with the lastest version (PDFBox-0.7.2.jar), but I still receive the same error message : Indexing E:\Galeas\lucene\data\pdfs\Beginning Java Server Pages.pdf Exception in thread "mai

Error with incremental load

2005-10-17 Thread Aigner, Thomas
Hi all, Was just wondering if anyone has come across this or if I'm doing something wrong here. On initial load of my index, I can close the writer and delete an entry and then update an entry, then open the writer again and go on to the next entry etc. Then while searching, everything t

RE: Lucene in Action : example code -> document-parsing framework ...

2005-10-17 Thread Ben Litchfield
In addition, the latest version(0.7.2) of PDFBox does not require log4j, so you could also upgrade to that version. Ben On Mon, 17 Oct 2005 [EMAIL PROTECTED] wrote: > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/log4j/Logger > at org.pdfbox.pdfparser.BaseParser.(Ba

RE: Lucene in Action : example code -> document-parsing framework ...

2005-10-17 Thread n.bulthuis
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Logger at org.pdfbox.pdfparser.BaseParser.(BaseParser.java:70) PDFBox cannot find Log4J. You can add Log4J to you classpath to fix this. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] S

Clustering with Lucene

2005-10-17 Thread msftblows
Hi All- I have seen an example using carrot2 for clustering, but have not really played with it that much. Does anyone have a good example of using clustering with Lucene...has anyone attempted to do it with carrot2 or something else? I was initially going to do a facated search...which would

Re: Lucene in Action : example code -> document-parsing framework ...

2005-10-17 Thread msftblows
Do you have the log4j.properties file in the classpath? -Original Message- From: Patricio Galeas <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Mon, 17 Oct 2005 15:50:46 +0200 Subject: Lucene in Action : example code -> document-parsing framework ... Hi ALL, I try to run the

Lucene in Action : example code -> document-parsing framework ...

2005-10-17 Thread Patricio Galeas
Hi ALL, I try to run the an example of the "Lucene in Action" book : Chapter 7: Parsing Common Document Formats: lia.handlingtypes.framework.FileIndexer I have downloaded all the source code from www.manning.com/hatcher2 and create a java project in Lucene 3.1. I become the following error mess