Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-17 Thread Amin Mohammed-Coleman
I would love to come but I'm afraid I'm stuck in rainy old England :( Amin On 18 Apr 2009, at 01:08, Bradford Stephens wrote: OK, we've got 3 people... that's enough for a party? :) Surely there must be dozens more of you guys out there... c'mon, accelerate your knowledge! Join us in Seat

RE: IndexWriter update method

2009-04-17 Thread Newman, Billy
Perfect explanation, I think I have the idea now. Thanks so much! I would also like to test out the update with a term that does not have any matches to see if it will do an insert as that would make the code much simpler and efficient. From the documentation an update is a delete followed by

Re: IndexWriter update method

2009-04-17 Thread Erick Erickson
What you're missing is that the example has no unique ID, it wasn't created with update in mind. There's no hidden magic for Lucene knowing *what* document you want to have updated, you have to provide it yourself, and it should be unique. Imagine a parts catalog, or an index of a directory tree.

RE: IndexWriter update method

2009-04-17 Thread Newman, Billy
Ok I am still confused. Looking at the examples to index a document I would do something like the following: Document document = new Document(); document.add(Field.UnStored("article", article)); document.add(Field.Text("comments", comments)); Analyzer analyzer = n

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-17 Thread Bradford Stephens
OK, we've got 3 people... that's enough for a party? :) Surely there must be dozens more of you guys out there... c'mon, accelerate your knowledge! Join us in Seattle! On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens wrote: > Greetings, > > Would anybody be willing to join a PNW Hadoop and/o

Re: IndexWriter update method

2009-04-17 Thread Tim Williams
On Fri, Apr 17, 2009 at 7:27 PM, Newman, Billy wrote: > I am looking for info on how to use the IndexWriter.update method.  A short > example of how to add a document and then later update would > be very helpful.  I get lost because I can add a document with just the > document, but I need a do

Why is CustomScoreQuery limited to ValueSourceQuery type?

2009-04-17 Thread Steven Bethard
CustomScoreQuery only allows the secondary queries to be of type ValueSourceQuery instead of allowing them to be any type of Query. Why is that? Is there something that makes it hard to implement for arbitrary queries? Steve P.S. I played around with this briefly, and simply replacing all ValueSo

Re: Google's search Appliance relevance ranking

2009-04-17 Thread John Wang
Little I know about GSA, there isn't a distributed solution (old information, not sure if it is still the case), so it is not very easy to scale your search system. Something you can achieve rather easily with a Lucene/Solr implementation. There are other benefits of using an open source solution s

IndexWriter update method

2009-04-17 Thread Newman, Billy
I am looking for info on how to use the IndexWriter.update method. A short example of how to add a document and then later update would be very helpful. I get lost because I can add a document with just the document, but I need a document and a Term. I am not really sure what a Term is since

sub-scores for all clauses in a BooleanQuery

2009-04-17 Thread Steven Bethard
I have a BooleanQuery with several clauses. After running a search, in addition to seeing the overall score of each document, I need to see the sub-score produced by each clause. When all clauses match, this is relatively easy to get back by ".explain(...)", which gives me something like this: 0.3

RE: Need help : SpanNearQuery

2009-04-17 Thread Steven A Rowe
Hi Radha, On 4/17/2009 at 6:19 AM, Radhalakshmi Sreedharan wrote: > What I need is the following : > If my document field is ( ab,bc,cd,ef) and Search tokens are > (ab,bc,cd). > > Given the following : > I should get a hit even if all of the search tokens aren't present > If the tokens are f

RE: Need help : SpanNearQuery

2009-04-17 Thread Steven A Rowe
On 4/17/2009 at 10:33 AM, Radhalakshmi Sreedharan wrote: > > > I have a question related to SpanNearQuery. > > > > > > As of now, the SpanNearQuery has the constraint that all the > > > terms need to present in the document. [...] > > > But [...] I need a hit even if there are 2/3 terms found with

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-17 Thread Matthew Hall
Erm, I likely should have mentioned that this technique requires the use of a MultiFieldQueryParser. Matt Matthew Hall wrote: If you can build an analyzer that tokenizes the second field so that it filters out the words you don't want, you can then take advantage of more intelligent queries a

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-17 Thread Matthew Hall
If you can build an analyzer that tokenizes the second field so that it filters out the words you don't want, you can then take advantage of more intelligent queries as well. So for the example that pjaol wrote, the query would become something like this: Query= body:(game OR redskins) keyw

Re: Need help : SpanNearQuery

2009-04-17 Thread Paul Elschot
On Friday 17 April 2009 16:33:27 Radhalakshmi Sreedharan wrote: > Thanks Paul. Is there any alternative way of implementing this requirement? Start from scratch perhaps? Anyway, spans can be really tricky, so in case you're writing code for this, I have only four advices: test, test, test and test

Re: Google's search Appliance relevance ranking

2009-04-17 Thread Grant Ingersoll
On Apr 16, 2009, at 10:22 AM, Vasudevan Comandur wrote: Hi, The question that I am posting in this group may be inappropriate and I want to apologize for that. I wouldn't say it's inappropriate, but I don't know if anyone here could say with certainty b/c the last time I checked GSA w

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-17 Thread theDude_2
Ah, Interesting... I didnt think of that! I will try it and report back pjaol wrote: > > Why not put the keywords into the same document as another field? and > search > both fields > at once, you can then use lucene syntax to give a boosting to the keyword > fields. > e.g. > body:A good game

RE: Is it possible to add new document into existing lucene index?

2009-04-17 Thread daniel susanto
Thx, it works. :) Daniel Susanto http://susantodaniel.wordpress.com --- On Fri, 4/17/09, Uwe Schindler wrote: From: Uwe Schindler Subject: RE: Is it possible to add new document into existing lucene index? To: java-user@lucene.apache.org Date: Friday, April 17, 2009, 9:18 PM Hi Daniel, Just

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-17 Thread patrick o'leary
Why not put the keywords into the same document as another field? and search both fields at once, you can then use lucene syntax to give a boosting to the keyword fields. e.g. body:A good game last night by the redskins keyword: redskins Query= body:(game OR redskins) keyword:(game OR redskins)^10

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-17 Thread theDude_2
*Edit: each indexed text document contains a related field for identification purposes, so I would be able to identify the scores for both indexes through this field* theDude_2 wrote: > > I appreciate your response, and read the wiki article concerning the > Federated search > and > > I'm not

Faceting, Sort and DocIDSet

2009-04-17 Thread David Seltzer
I'm sorry If this question touches on too many things at once, but I'm having problems putting some ideas together - hopefully someone can help! I have a set of indexes, each index contains a month's worth of Articles. I need to be able to search the index (sorting by date) and then apply access-

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-17 Thread theDude_2
I appreciate your response, and read the wiki article concerning the Federated search and I'm not sure that my project falls into the "Federated Search" bucket... What I've done is created 2 indexes created with the same documents. One index, contains the full documents - great for pure relevanc

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-17 Thread patrick o'leary
I'd start by doing some research on the question rather than asking for a solution.. What your asking for can be considered 'Federated Search' http://en.wikipedia.org/wiki/Federated_search And it can be conceived in as many ways as you have document types. Any answer will probably end up customize

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-17 Thread theDude_2
(bump) - any thoughts? theDude_2 wrote: > > hi! > > I am trying to do something a little unique... > > I have a 90k text documents that I am trying to search > Search A: indexes and searches the documents using regular relevancy > search > Search B: indexes and searches the documents us

RE: Need help : SpanNearQuery

2009-04-17 Thread Radhalakshmi Sreedharan
Thanks Paul. Is there any alternative way of implementing this requirement? As a side note, Will the Shingle Filter help me getting all possible combination of the input tokens? -Original Message- From: Paul Elschot [mailto:paul.elsc...@xs4all.nl] Sent: Friday, April 17, 2009 8:00 PM To

Re: Need help : SpanNearQuery

2009-04-17 Thread Paul Elschot
To avoid passing all combinations to a NearSpansQuery some non trivial changes would be needed in the spans package. NearSpansUnOrdered (and maybe also NearSpansOrdered) would have to be extended to provide matching Spans when (the Spans of) not all terms/subqueries match. Also, quite likely, it

RE: Is it possible to add new document into existing lucene index?

2009-04-17 Thread Uwe Schindler
Hi Daniel, Just open the IndexWriter on the same Lucene Directory and specify the boolean ctor parameter "create" to false. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: daniel susanto [mailto:daniel_s

Is it possible to add new document into existing lucene index?

2009-04-17 Thread daniel susanto
Hi all, I'm NB in Lucene. Is it possible to add new document into existing document? I think it is important so that we don't need to re-index the all in file folder if just one or two file that need to be added into index. Thx. Daniel Susanto http://susantodaniel.wordpress.com

Re: Query scoring

2009-04-17 Thread Erick Erickson
Well, let's see the results of toString and/or Explain *from your code*. Otherwise, you haven't given us much to go on. Best Erick On Fri, Apr 17, 2009 at 1:07 AM, liat oren wrote: > Thanks for the answer. > > In Luke, I used the WhiteSpaceAnalyzer as well. The scores AND the explain > method w

RE: Need help : SpanNearQuery

2009-04-17 Thread Radhalakshmi Sreedharan
To make the question simple, What I need is the following : If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd). Given the following : I should get a hit even if all of the search tokens aren't present If the tokens are found they should be found within a distance x of ea

Re: London meet-up - 27th April

2009-04-17 Thread Richard Marr
Just a reminder that this London meet-up is on Monday the 27th. Please sign up or otherwise let me know so I can make sure there's anough space booked. Rich 2009/4/6 Richard Marr : > Hi all, > > Just to let everyone know... I'm organising (if you can call it that) > an informal London meet-up in

Re: readModifiedUTF8String stuck

2009-04-17 Thread Michael McCandless
On Fri, Apr 17, 2009 at 5:05 AM, MakMak wrote: > I am not retrieving many docs, the problem is that the whole file is stored > in the doc. I need the file content for highlighter to work. But the files > are normal-sized text files which in any case should not exceed 10-15mb. > Retrieving 25 of t

Re: Best way for paging with TopDocs class?

2009-04-17 Thread Michael McCandless
Actually, HitCollector itself isn't a performance killer (eg, at the end of the day, all searches inside Lucene are using some HitCollector to gather results). What is a performance killer is if you do something overly substantial (eg, calling IndexReader.document(...)) with every hit passed to th

Re: readModifiedUTF8String stuck

2009-04-17 Thread MakMak
I am not retrieving many docs, the problem is that the whole file is stored in the doc. I need the file content for highlighter to work. But the files are normal-sized text files which in any case should not exceed 10-15mb. Retrieving 25 of them(page size), worst case scenario will take 250mb of

Re: Best way for paging with TopDocs class?

2009-04-17 Thread Ivan Vasilev
Hi Alex, As I know HitColector is useful when you need to deal with some data of ALL the docs in the index, but when you need just top of them HitCollector is said to be a performance killer. Then is better to use Hits with the old API and TopDocs with current one. Ivan AlexElba wrote: Why

Re: readModifiedUTF8String stuck

2009-04-17 Thread Michael McCandless
Can you describe your app a bit? How many documents are you retrieving for each search? It seems like Weblogic noticed a single HTTP request took more than 600 seconds and then dumped out all stack traces? In which case, maybe the threads were not actually "stuck", but were doing something that

RE: Need help : SpanNearQuery

2009-04-17 Thread Radhalakshmi Sreedharan
Hi Steven, Thanks for your reply. I tried out your approach and the problem got solved to an extent but still it remains. The problem is the score reduces quite a bit even now as bc is not found in the combinations ( bc,cd) ( bc,ef) and ( ab,bc,cd,ef) etc. The boosting infact has a negative

Re: readModifiedUTF8String stuck

2009-04-17 Thread MakMak
Please do not mind these more traces: -- ExecuteThread: '30' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "647" seconds working on the request "Http Request: /search_results.jsp