Re: on-the-fly "filters" from docID lists

2010-07-22 Thread Mark Harwood
Re scalability of filter construction - the database is likely to hold stable primary keys not lucene doc ids which are unstable in the face of updates. You therefore need a quick way of converting stable database keys read from the db into current lucene doc ids to create the filter. That could

Reverse Lucene queries

2010-07-22 Thread skant
Hi all, I have an interesting problem...instead of going from a query to a document collection, is it possible to come up with the best fit query for a given document collection (results)? "Best fit" being a query which maximizes the hit scores of the resulting document collection. How should I ap

Re: Databases

2010-07-22 Thread Glen Newton
LuSql is a tool specifically oriented to extracting from JDBC accessible databases and indexing the contents. You can find it here: http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql User manual: http://cuvier.cisti.nrc.ca/~gnewton/lusql/v0.9/lusqlManual.pdf.html A new version i

Databases

2010-07-22 Thread manjula wijewickrema
Hi, Normally, when I am building my index directory for indexed documents, I used to keep my indexed files simply in a directory called 'filesToIndex'. So in this case, I do not use any standar database management system such as mySql or any other. 1) Will it be possible to use mySql or any other

Hot to get word importance in lucene index

2010-07-22 Thread Xaida
Hi all! hmmm, i need to get how important is the word in entire document collection that is indexed in the lucene index. I need to extract some "representable words", lets say concepts that are common and can be representable to whole collection. Or collection "keywords". I did the fulltext index

Re: on-the-fly "filters" from docID lists

2010-07-22 Thread Michael McCandless
Well, Lucene can apply such a filter rather quickly; but, your custom code first has to build it... so it's really a question of whether your custom code can build up / iterate the filter scalably. Mike On Thu, Jul 22, 2010 at 4:37 PM, Burton-West, Tom wrote: > Hi Mike and Martin, > > We have a

Using lucene for substring matching

2010-07-22 Thread Geir Gullestad Pettersen
Hi, I'm about to write an application that does very simple text analysis, namely dictionary based entity entraction. The alternative is to do in memory matching with substring: String text; // could be any size, but normally "news paper length" List matches; for( String wordOrPhrase : dictionary

Re: Scoring exact matches higher in a stemmed field

2010-07-22 Thread Itamar Syn-Hershko
On 22/7/2010 9:20 PM, Shai Erera wrote: How is that different than extending QP? Mainly because the problem I'm having isn't there, and doing it from there doesn't feel right, and definitely not like solving the issue. I want to explore what other options there are before doing anything, an

RE: on-the-fly "filters" from docID lists

2010-07-22 Thread Burton-West, Tom
Hi Mike and Martin, We have a similar use-case. Is there a scalability/performance issue with the getDocIdSet having to iterate through hundreds of thousands of docIDs? Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search -Original Message- From: Michael McCandless [mai

Re: Scoring exact matches higher in a stemmed field

2010-07-22 Thread Shai Erera
> > Ideally, that would be through a class or a function I can override or > extend > How is that different than extending QP? About the "song of songs" example -- the result you describe is already what will happen. A document which contains just the word 'song' will score lower than a document

Re: Inserting data from multiple databases in same index

2010-07-22 Thread Chris Lu
You can either 1) create one index for each database, and merge the results during search. 2) create the 2 indexes individually and merge them 3) merge records during SQL select. The 1) approach should be easy to scale linearly as your database grows. You can even distribute the indexes onto seve

Inserting data from multiple databases in same index

2010-07-22 Thread L Duperval
Hi, We are creating an index containing data from two databases. What we are trying to achieve is to make our search locate and return information no matter where the data came from. (BTW, we are using Compass, if it matters any) My problem is that I am not sure how to create such an index. Do I

Question to the writer of MultiPassIndexSplitter

2010-07-22 Thread Yatir Ben Shlomo
Hi, I heard work is being done on re-writing MultiPassIndexSplitter so it will be a single pass and work quicker. I was wondering if this is already done or when is it due ? Thanks

Re: Holding and changing index wide information

2010-07-22 Thread findbestopensource
Hi Jan, I think, you require version number for each commit OR updates. Say you added 10 docs then it is update 1, then modifed or added some more then it is update 2.. If it is so then my advice would be to have field named field-type, version-number and version-date-time as part of the field in

Re: Holding and changing index wide information

2010-07-22 Thread Ian Lea
Just add/update a dedicated document in the index. k=updatenumber v=whatever. Retrieve it with a search for k:updatenumber, update with iw.updateDocument(whatever). -- Ian. On Thu, Jul 22, 2010 at 12:55 PM, wrote: > Hi, > > When using incremental updating via Solr, we want to know, which up

Re: Different ranking results

2010-07-22 Thread Philippe
Well, that's difficult at the moment as I can also just reproduce this error for some few cases. But I will try to generate such an example.. Cheers, Philippe Am 22.07.2010 12:34, schrieb Ian Lea: No, I don't have an explanation. Perhaps a minimal self-contained program or test case wou

Holding and changing index wide information

2010-07-22 Thread jan.kurella
Hi, When using incremental updating via Solr, we want to know, which update is in the current index. Each update has a number. How can we store/change/retrieve this number with the index. We want to store it in the index to replicate it to any slaves as well. So basically can I store/change/ret

Re: Different ranking results

2010-07-22 Thread Ian Lea
No, I don't have an explanation. Perhaps a minimal self-contained program or test case would help. -- Ian. On Thu, Jul 22, 2010 at 10:23 AM, Philippe wrote: > Hi Ian, > > I'm using Version 2.93 of lucene. > > q.getClass() and q.toString() are exactly equal: > org.apache.lucene.search.BooleanQ

Re: Different ranking results

2010-07-22 Thread Philippe
Hi Ian, I'm using Version 2.93 of lucene. q.getClass() and q.toString() are exactly equal: org.apache.lucene.search.BooleanQuery TITLE:672 BOOK:672 However, the results for searcher.explain(q,n) significantly differ. It seems to me that "Query q = parser.parse("672");" searches only one the

Re: on-the-fly "filters" from docID lists

2010-07-22 Thread Michael McCandless
It sounds like you should implement a custom Filter? Its getDocIdSet would consult your foreign key-value store and iterate through the allowed docIDs, per segment. Mike On Wed, Jul 21, 2010 at 8:37 AM, Martin J wrote: > Hello, we are trying to implement a query type for Lucene (with eventual >

Re: Different ranking results

2010-07-22 Thread Ian Lea
They look the same to me too. What does q.getClass().getName() say in each case? q.toString()? searcher.explain(q, n)? What version of lucene? -- Ian. On Wed, Jul 21, 2010 at 10:25 PM, Philippe wrote: > Hi, > > I just performed two queries which, in my opinion, should lead to the same > do