Re: If you could have one feature in Lucene...

2010-02-24 Thread Ganesh
1. Payload per document which could be updated without a need to update the entire document. Usecase: The state of our indexed content will change based on the User action (Created/ Viewed/Deleted etc) and we are using Lucene as our database and we cannot use relational database only for this

Fuzzy membership of a term to the document

2010-02-24 Thread PlusPlus
Hi, I want to change the Lucene's similarity in a way that I can add Fuzzy memberships to the terms of a document. Thus, TF value of a term in one document is not always 1, it can add 0.7 to the value of the TF ( (In my application, each term is contained in a document at most once). This memb

Re: NAS vs SAN vs Server Disk RAID

2010-02-24 Thread Kay Kay
It might be useful to check out katta , from an infrastructure perspective. On 2/24/10 3:54 PM, Andrew Bruno wrote: Hello, I am working with an application that offers its customers their own index, primary two indexes for different needs per customer. As our business is growing and growing,

NAS vs SAN vs Server Disk RAID

2010-02-24 Thread Andrew Bruno
Hello, I am working with an application that offers its customers their own index, primary two indexes for different needs per customer. As our business is growing and growing, I now have a situation where the web application has its customer's index on one volume, and its getting close to 1Tbyte

Not getting any highlighter results when using wildcards

2010-02-24 Thread Woolf, Ross
When I use a WildcardQuery with the highlighter, I don't get any fragments back, I get null returned to strBetText. If I just use a term query then it works. TokenStream tokenStream = TokenSources.getTokenStream(indexReader, docId, strFieldName); QueryScorer scorer = new QueryScorer(query, s

Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

2010-02-24 Thread Bradford Stephens
The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) meetup is tonight! We're going to have a guest speaker from MongoDB :) As always, it's at the University of Washington, Allen Computer Science building, Room 303 at 6:45pm. You can find a map here: http://www.washington.edu/home/maps/s

RE: Phrase Search and NOT_ANALYZED

2010-02-24 Thread Murdoch, Paul
PhraseQuery appears to be working. Thanks to all. Paul -Original Message- From: java-user-return-45155-paul.b.murdoch=saic@lucene.apache.org [mailto:java-user-return-45155-paul.b.murdoch=saic@lucene.apache.org] On Behalf Of Murdoch, Paul Sent: Wednesday, February 24, 2010 5:0

RE: Phrase Search and NOT_ANALYZED

2010-02-24 Thread Murdoch, Paul
Thanks, I've been looking at that one too. I'm trying to make it happen with the StandardAnalyzer. Unfortunately, I think I see some redesign for more robustness in the future. Cheers, Paul -Original Message- From: java-user-return-45154-paul.b.murdoch=saic@lucene.apache.org

Re: Phrase Search and NOT_ANALYZED

2010-02-24 Thread Robert Muir
check out KeywordAnalyzer! On Wed, Feb 24, 2010 at 4:51 PM, Murdoch, Paul wrote: > It still happens if there are no stop words in the fieldValue. For > instance if fieldValue was "paul murdoch", Luke would show the query as > name:"paul murdoch" but no hits are returned. If I change to > Field.I

RE: Phrase Search and NOT_ANALYZED

2010-02-24 Thread Murdoch, Paul
It still fails even when there are no stop words. I'm going to try a PhraseQuery instead of relying on the QueryParser. Regards, Paul -Original Message- From: java-user-return-45151-paul.b.murdoch=saic@lucene.apache.org [mailto:java-user-return-45151-paul.b.murdoch=saic@lucene

RE: Phrase Search and NOT_ANALYZED

2010-02-24 Thread Murdoch, Paul
It still happens if there are no stop words in the fieldValue. For instance if fieldValue was "paul murdoch", Luke would show the query as name:"paul murdoch" but no hits are returned. If I change to Field.Index.ANALYZED it works. The problem with ANALYZED is that there is a possibility of pickin

RE: Phrase Search and NOT_ANALYZED

2010-02-24 Thread Digy
Since it is not analyzed, your text is stored as a single term in the index [something in the index]. But the query name:"something in the index" is translated as : find 4 consecutive terms which have values "something", "in","the" and "index" respectively. or if stop words are removed

Re: boosts for unstemmed matches (was Re: If you could have one feature in Lucene...)

2010-02-24 Thread Avi Rosenschein
On Wed, Feb 24, 2010 at 11:20 PM, Aaron Lav wrote: > On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote: > > On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll >wrote: > > > > > What would it be? > > > > > > > For scoring to take into account the non-analyzed token stream. > > > > Tha

Re: Phrase Search and NOT_ANALYZED

2010-02-24 Thread Erick Erickson
What does Luke's explain show you? That'll show you a lot about how the query gets transformed.. My first guess is that stop words are messing you up Erick On Wed, Feb 24, 2010 at 3:51 PM, Murdoch, Paul wrote: > Hi, > > > > I'm indexing a field using the StandardAnalyzer 2.9. > > > > fi

Re: If you could have one feature in Lucene...

2010-02-24 Thread Paul Libbrecht
I would wish a highlighting feature that's fully integrated. paul On 24-févr.-10, at 14:42, Grant Ingersoll wrote: What would it be? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands,

boosts for unstemmed matches (was Re: If you could have one feature in Lucene...)

2010-02-24 Thread Aaron Lav
On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote: > On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll wrote: > > > What would it be? > > > > For scoring to take into account the non-analyzed token stream. > > That is, if a field is analyzed (stemmed, lowercased, maybe even stop wor

Phrase Search and NOT_ANALYZED

2010-02-24 Thread Murdoch, Paul
Hi, I'm indexing a field using the StandardAnalyzer 2.9. field = new Field(fieldName, fieldValue, Field.Store.YES, Field.Index.NOT_ANALYZED); Let's say fieldName is "name" and fieldValue is "something in the index". When I perform the query... name:"something in the index" ...

Re: If you could have one feature in Lucene...

2010-02-24 Thread Michael van Rooyen
On 2010/02/24 03:42 PM, Grant Ingersoll wrote: What would it be? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Stop words counting when i

Re: If you could have one feature in Lucene...

2010-02-24 Thread Marcelo Ochoa
> What would it be? An extended query parser syntax (http://lucene.apache.org/java/2_9_1/queryparsersyntax.html) including geo-location search. For example: hsin (great circle): name:Minneapolis AND _val_:"recip(hsin(0.78, -1.6, lat_rad, lon_rad, 3963.205), 1, 1

Re: If you could have one feature in Lucene...

2010-02-24 Thread Avi Rosenschein
On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll wrote: > What would it be? > For scoring to take into account the non-analyzed token stream. That is, if a field is analyzed (stemmed, lowercased, maybe even stop words removed), that is fine for indexing. But tokens in the query matching the orig

Re: StandardAnalyzer and comma

2010-02-24 Thread Erick Erickson
It sounds to me like you'll have to pre-process your text, then use something like KeywordAnalyzer. The idea here is to do something like lowercase the strings (both index and query), and remove all non-letter (or whatever) characters, normalize whitespace (e.g. remove leading and trailing, turn al

RE: If you could have one feature in Lucene...

2010-02-24 Thread Yuval Feinstein
A pluggable scoring model that can incorporate BM25, TF/IDF and other variants of scoring. -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Wednesday, February 24, 2010 3:42 PM To: java-user@lucene.apache.org Subject: If you could have

Re: If you could have one feature in Lucene...

2010-02-24 Thread Simon Wistow
On Wed, Feb 24, 2010 at 08:42:02AM -0500, Grant Ingersoll said: > What would it be? Adding, deleting and updating of individual fields in a document. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For addit

Re: StandardAnalyzer and comma

2010-02-24 Thread Max Lynch
> > I tried the WhitespaceAnalyzer and liked the way the comma (among other > punctuation) was preserved. I'm running tests with that right now. > Unfortunately, if I want to look for "groupC" I have to append the comma > which won't make sense to a user. Also the query choice:"groupC, night" > d

RE: StandardAnalyzer and comma

2010-02-24 Thread Murdoch, Paul
I manually change all indexed and searched content to lowercase. The whole groupC thing was just for the example...sorry. My main problem is with the comma and whitespace. I would like to query for "night" and only get the one hit. The only reason changing StandardAnalyzer "may" :-) not be an o

Re: StandardAnalyzer and comma

2010-02-24 Thread Erick Erickson
OK, I'm confused. In your original message, you said that changing analyzers is NOT an option. Then you said you'll give WhitespaceAnalyzer a shot Assuming your original constraint is accurate, why isn't changing analyzers an option? Are you aware of PerFieldAnalyzerWrapper which allows you to

RE: StandardAnalyzer and comma

2010-02-24 Thread Murdoch, Paul
Thanks for the input. I'll give the WhitespaceAnalyzer a shot. Also, AFAIK, Field.Index.NOT_ANALYZED means that the content you index is not split into separate tokens so it is searchable, but only for exact matches. I may be able to get what I want with the WhitespaceAnalyzer and Field.Index.NO

Re: If you could have one feature in Lucene...

2010-02-24 Thread Chris Lu
2 features: Search and serializeable Query class in java serializable object format, or XML, or json format. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search

Re: StandardAnalyzer and comma

2010-02-24 Thread Max Lynch
Personally punctuation matters in my queries so I use WhitespaceAnalyzer. I also only want exact hits, so that analyzer works well for me. Also, AFAIK you don't set NOT_ANALYZED if you want to search through it. On Wed, Feb 24, 2010 at 10:33 AM, Murdoch, Paul wrote: > I'm using Lucene 2.9. How

StandardAnalyzer and comma

2010-02-24 Thread Murdoch, Paul
I'm using Lucene 2.9. How do I make a comma behave like a regular character using the StandardAnalyzer? Example: I have a field called "choice" and some field values: groupA, morning groupB, noon groupC, night morning noon night So a query choice:night returns "groupC, night" an

If you could have one feature in Lucene...

2010-02-24 Thread Grant Ingersoll
What would it be? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Highlighting large documents (Lucene 3.0.0)

2010-02-24 Thread -Arne-
Hi, I'm using Lucene 3.0.0 and have large documents to search (logfiles 0,5-20MB). For better search results the query tokens are truncated left and right. A search for "user" is made to "*user*". The performance of searching even complex queries with more than one searchterm is quite good. But h

Re: FastVectorHighlighter truncated queries

2010-02-24 Thread Koji Sekiguchi
halbtuerderschwarze wrote: query.rewrite() didn't help, for queries like ipod* or *ipod I still didn't get fragments. Arne You're right. This is still an open issue: https://issues.apache.org/jira/browse/LUCENE-1889 Koji -- http://www.rondhuit.com/en/ --

Re: FastVectorHighlighter truncated queries

2010-02-24 Thread halbtuerderschwarze
query.rewrite() didn't help, for queries like ipod* or *ipod I still didn't get fragments. Arne chrislusf wrote: > > This should be a common wildcard query highlighting problem. > You will need to query.rewrite() first, and pass the result to the > highlighter. > > -- > Chris Lu > -