FieldCache memory estimation - term values are interned?

2010-04-30 Thread Koji Sekiguchi
Hello, Are Strings that are got via FieldCache.DEFAULT.getStrings( reader, field ) interned? Since I have a requirement for having FieldCaches of some fields in 250M docs index, I'd like to estimate memory consumed by FieldCache. By looking at FieldCacheImpl source code, it seems that field name

RE: Modify TermQueries or Tokens

2010-04-30 Thread Zhang, Lisheng
Hi, It looks good to me, but I did not test, when testing, we may print out both initialQuery.toString() // query produced by QueryParser finalQuery.toString() // query after your new function as comparison, besides testing the query result. Best regards, Lisheng -Original Message- F

RE: Modify TermQueries or Tokens

2010-04-30 Thread Christopher Condit
> 2) if I have to accept whole input string with all logic (AND, OR, ..) inside, >I think it is easier to change TermQuery afterwards than parsing the > string, >since final result from query parser should be BooleanQuery (in your > example), >then we iterate through each BooleanClause

RE: Modify TermQueries or Tokens

2010-04-30 Thread Zhang, Lisheng
Hi, In that case, I would do: 1) if I can somehow know the input words (like foo, nacho, ...), I will create lucene BooleanQuery myself, that's the case in my application. 2) if I have to accept whole input string with all logic (AND, OR, ..) inside, I think it is easier to change TermQuer

Re: Relevancy Practices

2010-04-30 Thread MitchK
I found your thread at the Solr-user-list. However, it seems like your topic belongs more to Lucene in general? I copy my posting from there, so that everything is accessible by one thread. -- I think the problems one has to

RE: Modify TermQueries or Tokens

2010-04-30 Thread Christopher Condit
Hi Lisheng- >> On a small index that I have I'd like to query certain fields by adding >> wildcards >> on either side of the term: foo -> *foo*. I realize the performance >> implications but there are some cases where these terms are crammed >> together in the indexed content (ie foonacho) and I

RE: Modify TermQueries or Tokens

2010-04-30 Thread Zhang, Lisheng
Hi, Just to make sure, below is the code I used to create wildcard query: String field = "title"; String value = "mytitle"; Term term = new Term(field, "*" + value + "*"); WildcardQuery wildcardQuery = new WildcardQuery(term); I tested in 2.4.1 and it worked for me well. Best regards, Lisheng

RE: Modify TermQueries or Tokens

2010-04-30 Thread Zhang, Lisheng
Hi, Lucene already have class WildcardQuery, I think you can add "*" on either side (or both), when creating Term: http://lucene.apache.org/java/3_0_1/api/core/index.html But notice by default QueryParser cannot parse *queryString. Best regards, Lisheng -Original Message- From: Christo

Modify TermQueries or Tokens

2010-04-30 Thread Christopher Condit
On a small index that I have I'd like to query certain fields by adding wildcards on either side of the term: foo -> *foo*. I realize the performance implications but there are some cases where these terms are crammed together in the indexed content (ie foonacho) and I need to be able to return

[OT] Lucene Boot Camp Training in Europe

2010-04-30 Thread Grant Ingersoll
I will be once again providing Lucene training in Europe this year as part of Lucene EuroCon (in place of the usual ApacheCon venue). This time it is in the beautiful city of Prague starting on May 18th. Registration is open. For more info, check out http://lucene-eurocon.org/training.html C

Re: Relevancy Practices

2010-04-30 Thread Grant Ingersoll
On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: > Also, tuning the algorithms to the users can be very important. For > instance, we have found that in a basic search functionality, the default > query parser operator OR works very well. But on a page for advanced users, > who want to very pre

Re: Using lucene in NFS

2010-04-30 Thread Ian Lea
The suggestion was that your single indexing job should update a local copy of the index and copy that to NFS for searching by other nodes. That should work. As for updating, you could index new reports into a new lucene index and then merge that into the existing index (IndexWriter.addIndexes()).

RE: Lucene QueryParser and Analyzer

2010-04-30 Thread Sudarsan, Sithu D.
Quick fix: Create a filter to replace commas with white space and then run your code. Sincerely, Sithu D Sudarsan -Original Message- From: Wei Ho [mailto:we...@princeton.edu] Sent: Thursday, April 29, 2010 7:01 PM To: java-user@lucene.apache.org Subject: Re: Lucene QueryParser and

Re: Using lucene in NFS

2010-04-30 Thread Vijay Veeraraghavan
hi Ian, Thanks for your reply. I am using the Lucene core 3.0 version. The index created will be accessed by the web application. The web application contains 4 nodes, clustered. What if all the nodes access the index. I think no any prob may raise. If i have a local index then what about it in the

Re: Relevancy Practices

2010-04-30 Thread Avi Rosenschein
On Thu, Apr 29, 2010 at 5:59 PM, Mark Bennett wrote: > Hi Grant, > > You're welcome to use any of my slides (Dave's got them), with attribution > of course. > > BUT > > Have you considered a section something like "why the hell do you think > Relevancy tweaking is gonna save you!?!?" > Basi

Re: Using lucene in NFS

2010-04-30 Thread Ian Lea
You don't say what version of lucene you are using, but in recent versions you may need to use SimpleFSLockFactory rather than the default, NativeFSLockFactory. See the javadocs. Lucene in general does work on NFS but there can be problems, particularly with concurrent access from multiple server

RE: IOExceptions when optimising the index

2010-04-30 Thread Uwe Schindler
As Lucene 2.9 switched to per-segment search, every query run separately on each segment of an index and the results are combined. There is no difference between an optimized or unoptimized index for this process. Furthermore, if you sort by fields, you should not optimize at all, as the FieldCa

Re: IOExceptions when optimising the index

2010-04-30 Thread Anna Hunecke
Hi Ian, thanks for the answer. I also assumed something like this. Telling the customer to switch to unix is not an option, so I'll try to solve the problem by scheduling the optimization to occur at some other time. Can you explain a bit more why you think optimization is not necessary? As far