prefix query search problem if a hyphen exist in the search word

2007-11-26 Thread reeja
Hi, I faced some problem with prefix query search when the prefix text contains a hyphen. i'm using lucene-2.1. Search query is like this ttl:co-operative it returns more than 50 results, but if i convert the query like this ttl:co-operat* it returns no result. again i entered a query ttl:11-ami

RE: RAMDirectory vs FSDirectory

2007-11-26 Thread Chhabra, Kapil
> one can improve search performance by using a RAMDirectory created from an underlying FSDirectory using one of the parameterised constructors. Is this correct? Absolutly > Will a FSDirectory not automatically load the index into memory provided enough RAM is available? Not all index files are

RAMDirectory vs FSDirectory

2007-11-26 Thread Hardy Ferentschik
Hi there, I am using currently a FSDirectory to build my index. The reason for using a file system based index is that a full index rebuild takes around 30 minutes and I want to keep a persistent index. In 'Lucene in Action' I've read that one can improve search performance by using a RAMDi

Re: Score: Randomize form

2007-11-26 Thread Chris Hostetter
: I think you have a couple of problems here. First, you'll have to : normalize the scores to get *any* of them to be the same. Since : the scores are a float, very few of them will be exactly the same. it's not as rare as it seems, with lengthNorm byte encoding and low tf values it can happen q

Re: Score: Randomize form

2007-11-26 Thread Erick Erickson
I think you have a couple of problems here. First, you'll have to normalize the scores to get *any* of them to be the same. Since the scores are a float, very few of them will be exactly the same. I really suspect that you need to use a HitCollector (or TopDocs?) and collect the hits into buckets,

Re: Searching user-private annotations associated with indexed documents

2007-11-26 Thread lucene user
Here are the three options that seem practical to us right now. (1) Do The annotation search in postgres using LIKE or the postgres native full text search. Take the resulting list of file ids and use it to build a filter for the lucene query, the way we currently do for folders. (2) Add

Score: Randomize form

2007-11-26 Thread Haroldo Nascimento
Hi I show the results of searches as two criterios of sorting ("priority" and to after "score") of each document. I need present the result with same score of ramdomize form. For example: *Result of search 1: * keyword: hotel POS PRI SCORE DOC 15 100 A 24

Searching user-private annotations associated with indexed documents

2007-11-26 Thread lucene user
Folks I have some additional textual data that is user specific, basically annotations about documents. I would like to be able to do **combined** searches, looking for some words in the document and some in my users' private annotations about that document. Any suggestions about how I should hand

Re: Custom SynonymMap

2007-11-26 Thread java_user_
Were you able to find the post about a custom SynonymMap? Antonius Ng-2 wrote: > > Hi all, > > I'd like to add more words into SynonymMap for my application, but the > HashMap that holds all the words is not visible (private). > > Is there any other Class that I can use to implement SynonymAn

Re: Index: mixing the structure of persistence

2007-11-26 Thread Orion Letizi
Because Terracotta treats the Lucene indexes as cluster-wide shared objects, updates to those indexes are made automatically available across the entire cluster. On one machine, a plain Lucene index will give you the in-memory caching and spill-to-disk behavior you are looking for. Terracotta wi

Re: Problem indexing Word Documents

2007-11-26 Thread Grant Ingersoll
I would ask on the POI mailing list. This doesn't look to be a problem with Lucene. -Grant On Nov 26, 2007, at 1:17 PM, chris.b wrote: okay, so i'm very new to lucene, so it may be my bad, but i can get it to index .txt files, and when trying to index word documents (using poi), the pr

Re: Why exactly are fuzzy queries so slow?

2007-11-26 Thread Timo Nentwig
On Sunday 25 November 2007 11:54:15 markharw00d wrote: > For "fuzzy" you're going to pay one way or another. But which one is the cheapest? :) > You can use ngram analyzers on indexed content and queries which will > add IO costs ("files" becomes "fi","fil", "file","il","ile","iles" in > both you

Re: Sorting and TopDocCollector

2007-11-26 Thread Chris Hostetter
: I am using TopDocCollector in IndexerSearher.search(...) for get the : BitSet of result, but I need of sort the result by two variable: by : any term of document and by score. Is possible do it using Collector ? : : Have any form of use the method search(..., sort) and after get the : BitS

Problem indexing Word Documents

2007-11-26 Thread chris.b
okay, so i'm very new to lucene, so it may be my bad, but i can get it to index .txt files, and when trying to index word documents (using poi), the program starts running and when it reaches a .doc file, i get the following errors: Exception in thread "main" org.apache.poi.hpsf.IllegalPropertySe

Re: ApacheCon 2008 Europe - Lucene stuff

2007-11-26 Thread Grant Ingersoll
They have not announced the schedule yet. I know there were a number of Lucene related submissions and I would suspect Lucene will be well represented, as it was this year in Atlanta. -Grant On Nov 26, 2007, at 11:40 AM, Lukas Vlcek wrote: Hi, Is anybody going to present anything about L

Re: I have found a kind of strange behavior in StandardAnalyzer

2007-11-26 Thread Shai Erera
Hi I tried this code: TokenStream ts = analyzer.tokenStream("content", new StringReader(" www.abc.com")); Token t; while ((t = ts.next()) != null) { System.out.println(t); } If I pass "www.abc.com" (without an extra '.'), it prints (www.abc.com,0,11,typ

ApacheCon 2008 Europe - Lucene stuff

2007-11-26 Thread Lukas Vlcek
Hi, Is anybody going to present anything about Lucene (and related technologies - Solr, Hadoop, ...) at ApacheCon 2008 Europe? Any training sessions, invited talks and/or specific track? The conference pages (http://www.eu.apachecon.com/) does not contain any details yet. Regards, Lukas -- http:

Re: Lucene jdbc

2007-11-26 Thread Chris Lu
If you mean you want a jdbc driver that you can send "select * from lucene_index1 where name like '%abc%' " This would require the jdbc driver to translate SQL into Lucene query. An interesting idea, but never seen that before. The reason is that although look similar, SQL and Lucene query are qu

Re: how to increase the performance!!!

2007-11-26 Thread Erick Erickson
Do all your docs match your query? in which case iterating over the Hits object is very inefficient (see the docs Grant points you to). Erick On Nov 26, 2007 10:04 AM, Shakti_Sareen <[EMAIL PROTECTED]> wrote: > Hi all, > > The size of the folder where I am keeping the index files is 160 MB > > c

Re: how to increase the performance!!!

2007-11-26 Thread Grant Ingersoll
Are you opening your Searcher every time you query? Are your documents really large? Have you worked through http://wiki.apache.org/lucene-java/BasicsOfPerformance ? -Grant On Nov 26, 2007, at 10:04 AM, Shakti_Sareen wrote: Hi all, The size of the folder where I am keeping the index fil

I have found a kind of strange behavior in StandardAnalyzer

2007-11-26 Thread Eugenio Martinez
I am indexing with Lucene a hughe set of logfiles, about 130GB of plain text in disk (up to now), planning to build a system capable of perform searches over Terabytes of such info in a kind of metaindex built from a mesh of little ones, all of them created and maintained with Lucene. I have ra

Re: Lucene jdbc

2007-11-26 Thread Marcelo Ochoa
Mike: If you work with Oracle databases you can take a look at Oracle Lucene integration. http://www.infoq.com/news/2007/10/lucene-oracle http://issues.apache.org/jira/browse/LUCENE-724 By using OJVMDirectory you have Lucene integrated at the Oracle Engine as a new Domain Index, so you can use

RE: how to increase the performance!!!

2007-11-26 Thread Shakti_Sareen
Hi all, The size of the folder where I am keeping the index files is 160 MB containing 3277 documents. That's not too much. If you are doing things right, search should not take much time. Below is the code : String sNumber = null; hits = searcher.search(query); for (int i = 0;i < hits

Re: Index: mixing the structure of persistence

2007-11-26 Thread Erick Erickson
Unfortunately, there's not much anyone can say. If I can paraphrase what you're asking, it's "Will a Lucene search be fast enough?" The answer is "it depends". Asking the question "is part of the index stored in RAM" isn't really relevant. Yes, some parts of the index are cached in RAM. Yes, that

Re: Index: mixing the structure of persistence

2007-11-26 Thread Haroldo Nascimento
Hi, I have a very great volume of data (6.000.000 of documents) and I need to have a very fast search. I am thinking about using Terracotta (with Lucene) for clustering the solution. One of the advantages of the Terracotta is that part of the index is stored in memory and part is persisted em

Re: Lucene jdbc

2007-11-26 Thread Lukas Vlcek
AFAIK no. Lucene is revelance based query engine not relation based engine like SQL database. However, if you really want to use SQL on top of Lucene index then there can be a way. You need to store index into database (see here

RE: how to increase the performance!!!

2007-11-26 Thread Chhabra, Kapil
Hi Shakti, > I am using Searching is taking a lot of time. What do you mean by a lot of time? How much time is it taking? There are a lot of factors that affect the search speed. > The size of the folder where I am keeping the index files is 160 MB containing 3277 documents. That's not too much.

how to increase the performance!!!

2007-11-26 Thread Shakti_Sareen
Hi all, I am using Searching is taking a lot of time. The size of the folder where I am keeping the index files is 160 MB containing 3277 documents. I am using... query = parser.parse("ANY"); hits = searcher.search(query); How can I improve the performance of searching? Regards Sh

Re: deleteDocuments by Term[] for ALL terms

2007-11-26 Thread Michael McCandless
You can just create a query with your and'd terms, and then do this: Weight weight = query.weight(indexSearcher); IndexReader reader = indexSearcher.getIndexReader(); Scorer scorer = weight.scorer(reader); int delCount = 0; while(scorer.next()) { reader.deleteDocument(scorer.doc());

Re: How to delete old index

2007-11-26 Thread Michael McCandless
"Cool Coder" <[EMAIL PROTECTED]> wrote: > I tried with your suggestion but still it did not delete old index files. That's very odd. Are you sure you added that line after your first reader was closed & second one was opened? It's that first reader that prevents deleting of the old index files.