Re scalability of filter construction - the database is likely to hold stable
primary keys not lucene doc ids which are unstable in the face of updates. You
therefore need a quick way of converting stable database keys read from the db
into current lucene doc ids to create the filter. That could
Hi all, I have an interesting problem...instead of going from a query
to a document collection, is it possible to come up with the best fit
query for a given document collection (results)? "Best fit" being a
query which maximizes the hit scores of the resulting document
collection.
How should I ap
LuSql is a tool specifically oriented to extracting from JDBC
accessible databases and indexing the contents.
You can find it here:
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
User manual:
http://cuvier.cisti.nrc.ca/~gnewton/lusql/v0.9/lusqlManual.pdf.html
A new version i
Hi,
Normally, when I am building my index directory for indexed documents, I
used to keep my indexed files simply in a directory called 'filesToIndex'.
So in this case, I do not use any standar database management system such
as mySql or any other.
1) Will it be possible to use mySql or any other
Hi all!
hmmm, i need to get how important is the word in entire document collection
that is indexed in the lucene index. I need to extract some "representable
words", lets say concepts that are common and can be representable to whole
collection. Or collection "keywords". I did the fulltext index
Well, Lucene can apply such a filter rather quickly; but, your custom
code first has to build it... so it's really a question of whether
your custom code can build up / iterate the filter scalably.
Mike
On Thu, Jul 22, 2010 at 4:37 PM, Burton-West, Tom wrote:
> Hi Mike and Martin,
>
> We have a
Hi,
I'm about to write an application that does very simple text analysis,
namely dictionary based entity entraction. The alternative is to do in
memory matching with substring:
String text; // could be any size, but normally "news paper length"
List matches;
for( String wordOrPhrase : dictionary
On 22/7/2010 9:20 PM, Shai Erera wrote:
How is that different than extending QP?
Mainly because the problem I'm having isn't there, and doing it from
there doesn't feel right, and definitely not like solving the issue. I
want to explore what other options there are before doing anything, an
Hi Mike and Martin,
We have a similar use-case. Is there a scalability/performance issue with the
getDocIdSet having to iterate through hundreds of thousands of docIDs?
Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search
-Original Message-
From: Michael McCandless [mai
>
> Ideally, that would be through a class or a function I can override or
> extend
>
How is that different than extending QP?
About the "song of songs" example -- the result you describe is already what
will happen. A document which contains just the word 'song' will score lower
than a document
You can either
1) create one index for each database, and merge the results during search.
2) create the 2 indexes individually and merge them
3) merge records during SQL select.
The 1) approach should be easy to scale linearly as your database grows.
You can even distribute the indexes onto seve
Hi,
We are creating an index containing data from two databases. What we are trying
to achieve is to make our search locate and return information no matter where
the data came from. (BTW, we are using Compass, if it matters any)
My problem is that I am not sure how to create such an index.
Do I
Hi,
I heard work is being done on re-writing MultiPassIndexSplitter so it will be a
single pass and work quicker.
I was wondering if this is already done or when is it due ?
Thanks
Hi Jan,
I think, you require version number for each commit OR updates. Say
you added 10 docs then it is update 1, then modifed or added some more
then it is update 2.. If it is so then my advice would be to have
field named field-type, version-number and version-date-time as part
of the field in
Just add/update a dedicated document in the index.
k=updatenumber
v=whatever.
Retrieve it with a search for k:updatenumber, update with
iw.updateDocument(whatever).
--
Ian.
On Thu, Jul 22, 2010 at 12:55 PM, wrote:
> Hi,
>
> When using incremental updating via Solr, we want to know, which up
Well,
that's difficult at the moment as I can also just reproduce this error
for some few cases. But I will try to generate such an example..
Cheers,
Philippe
Am 22.07.2010 12:34, schrieb Ian Lea:
No, I don't have an explanation. Perhaps a minimal self-contained
program or test case wou
Hi,
When using incremental updating via Solr, we want to know, which update is in
the current index. Each update has a number.
How can we store/change/retrieve this number with the index. We want to store
it in the index to replicate it to any slaves as well.
So basically can I store/change/ret
No, I don't have an explanation. Perhaps a minimal self-contained
program or test case would help.
--
Ian.
On Thu, Jul 22, 2010 at 10:23 AM, Philippe wrote:
> Hi Ian,
>
> I'm using Version 2.93 of lucene.
>
> q.getClass() and q.toString() are exactly equal:
> org.apache.lucene.search.BooleanQ
Hi Ian,
I'm using Version 2.93 of lucene.
q.getClass() and q.toString() are exactly equal:
org.apache.lucene.search.BooleanQuery
TITLE:672 BOOK:672
However, the results for searcher.explain(q,n) significantly differ. It
seems to me that "Query q = parser.parse("672");" searches only one the
It sounds like you should implement a custom Filter?
Its getDocIdSet would consult your foreign key-value store and iterate
through the allowed docIDs, per segment.
Mike
On Wed, Jul 21, 2010 at 8:37 AM, Martin J wrote:
> Hello, we are trying to implement a query type for Lucene (with eventual
>
They look the same to me too.
What does q.getClass().getName() say in each case? q.toString()?
searcher.explain(q, n)?
What version of lucene?
--
Ian.
On Wed, Jul 21, 2010 at 10:25 PM, Philippe wrote:
> Hi,
>
> I just performed two queries which, in my opinion, should lead to the same
> do
21 matches
Mail list logo