Hi simon,
You can index the past query log for your search application and search the
index the way you want...
- Bhavin pandya
- Original Message -
From: "Simon Wistow" <[EMAIL PROTECTED]>
To: "Lucene"
Sent: Friday, December 15, 2006 3:52 AM
Subject: Search Suggestions
Yahoo!
Hi qaz,
You can remove duplicates at search time by writing your own HitCollector...
- Bhavin pandya
- Original Message -
From: "qaz zaq" <[EMAIL PROTECTED]>
To:
Sent: Friday, December 15, 2006 1:01 AM
Subject: Duplicates removal in search results
How can i remove the duplicates re
Thanks Erick,
Using termdocs/termenum should work. One of my concerns is the performance: the
search results could reach 100K, so the performance may be impacted. One of
the alternative I am thinking is to collapse the data during indexing time,
but I haven't decided to go that way.
- Ori
:Just wondering if my repository has 1TB of index file, when I perform
: searching, does it takes up or allocate a lot of memory usage to read and
: retrieve the results?
try a mailing list search for "memory usage" ... i think you'll find some
previous discussions that may help.
-Hoss
-
Karl: it sounds like you are just refering to using the lucene docid as an
array index for the FieldCache of your "MyID" field ... that's a perfectly
valid use of the docid, the key being that you aren't expecting the id to
contain any meaningful data itself -- it's just a refrence number.
: > if
Hi,
Just wondering if my repository has 1TB of index file, when I perform
searching, does it takes up or allocate a lot of memory usage to read and
retrieve the results?
Thanks
regards,
Wooi Meng
--
View this message in context:
http://www.nabble.com/Search-index-performance-tf2825038
U of Tennessee professor Michael Berry maintains a good site regarding
software for computing SVD on large, sparse matrices:
http://www.cs.utk.edu/~lsi/
The site also points to the LSI patent.
FWIW it's very easy to extract term-doc counts from a lucene index and
format them for softw
How can i remove the duplicates records in the search results. i.e., I have
multiple results with the same title in 'title' field, and I want to only 1
record per title, how can I achieve that? thanks!!
Need
Hi,
Sent you a private email with some code attached ;-)
Malcolm
yeohwm <[EMAIL PROTECTED]> wrote:
Hi,
Thanks for the help. Please do let me know what jar file that I
needed and where I can find them.
Regards,
Wooi Meng
--
No virus found in this outgoing message.
Checked by AVG Free
Hi,
Thanks for the help. Please do let me know what jar file that I
needed and where I can find them.
Regards,
Wooi Meng
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.15.18/586 - Release Date: 12/13/2006
6:13 PM
you need to search for all documents with the title you care about, decide
which one to keep and remove all the others.
You'll probably need a TermDocs/TermEnum to go through all the items in your
index to create the list of documents to remove.
Erick
On 12/14/06, qaz zaq <[EMAIL PROTECTED]> wr
Yahoo! has a search suggestion feature so that if you search for say
'shoes' then it also reccomends
payless shoes, jordan shoes, aldo shoes, nike shoes, bakers shoes
and a bunch of others.
Has anyone built something like that in Lucene?
Simon
---
How can i remove the duplicates records in the search results. i.e., I have
multiple results with the same title in 'title' field, and I want to only 1
record per title, how can I achieve that? thanks!!
-
Everyone is raving about the all-new Yahoo! Mail beta.
There is an example in TestDateFilter
http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/search/TestDateFilter.java?view=log
"Cam Bazz" <[EMAIL PROTECTED]> wrote:
> Hello,
>
> how can I make a query to bring documents between timestamp begin and
> timestamp end, given that I
On Dec 14, 2006, at 11:16 AM, Soeren Pekrul wrote:
it is possible to extract the matrix from the indexing file?
I don’t know any API to extract the matrix from the index file
directly.
How could we make it work to write an open source decomposed vector
model search engine a la LSA witho
Hi,
I used the SAX api last year to parse and index the INEX 1.4 collection using
Lucene (eventually suceeded after many naive attempts).
Can you give me a sample of the XML you are trying to parse?
Email me and I should be able to send you some code which may help.
regards,
Malcol
mariolone wrote:
They are successful to extract the matrix.
But with collections of large documents is not one too much expensive
solution?
I have a quite small collection with 14,960 documents and 29,828 unique
terms. If I remember right it took a few minutes on a normal laptop
computer to
I'd search this mail archive for DateTools, this has been discussed
repeatedly and you'd get lots and lots of info.
Erick
On 12/14/06, Cam Bazz <[EMAIL PROTECTED]> wrote:
Hello,
how can I make a query to bring documents between timestamp begin and
timestamp end, given that I have stored my da
Two things I would check:
1) converting pubDate to String during indexing for later
date-range-filtering search results might not work well, because, e.g.,
string wise, "9" > "100". You could use Lucene's DateTools - there's an
example in TestDateFilter -
http://svn.apache.org/viewvc/lucene/ja
Hello,
how can I make a query to bring documents between timestamp begin and
timestamp end, given that I have stored my dates using
DateTools.timeToString(long)?
Best regards,
-C.B.
this made it very clear. thank you.
On 12/14/06, Erick Erickson <[EMAIL PROTECTED]> wrote:
UN_TOKENIZED is probably the safest way to store your dates. You could get
by with using, say, WhitespaceAnalyzer for indexing and parsing the query,
but that would invite hard-to-track bugs to no advanta
UN_TOKENIZED is probably the safest way to store your dates. You could get
by with using, say, WhitespaceAnalyzer for indexing and parsing the query,
but that would invite hard-to-track bugs to no advantage I can see.
I'll let someone more knowledgeable than me talk about NORMS
field.store.NO p
Hi Adrian,
I don't see anything obviously wrong with your code.
Can you give more details about which field values are different from
what you expect? I'm guessing it's the id field you're worried about,
but it's not clear from what you have written whether it's the title or
the id field which i
On 12/14/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
On Dec 13, 2006, at 1:51 PM, Patrick Turcotte wrote:
> I would suggest you take a look at exist-db (http://exist-db.org/).
I really doubt eXist can handle 10M XML files. Last time I tried it,
it choked on 20k of them.
It is true I don't
On Dec 13, 2006, at 1:51 PM, Patrick Turcotte wrote:
I would suggest you take a look at exist-db (http://exist-db.org/).
I really doubt eXist can handle 10M XML files. Last time I tried it,
it choked on 20k of them.
Erik
A database for XML documents that support XQuery.
We a
Hello Everyone,
I have two fields that contain the original and modification dates of
certain documents.
I decided to store them like:
Document entry = new Document();
entry.add(new Field("edate", DateTools.timeToString(edate.getTime(),
DateTools.Resolution.MINUTE), Field.Store.YES, Field.Index.
FYI: The Wiki has a fair number of resources on IR: http://
wiki.apache.org/jakarta-lucene/InformationRetrieval (I have added a
link to this conversation, which contains a lot of useful information)
Karl, if you are so inclined, please feel free to add any of the
references you have found t
Hello,
I have problem with my search code - i try to index some data with
searching simultanously. Everything goes fine till some number of data
are indexed then my fields are bugged.
Eg. I have field with title indexed as "Nowitzki führt "Mavs" zum
ersten Heimsieg" and inner id "15" (not doc id,
Hi!
I'm working on Lucene's vector model, and it's way of scoring, and I have
some doubts.
As I think Lucene introduces terms (DocumentWriter.addPosition, using
Postings) in index with some information,
such as offset, document number and term frequency.
I would like to apply to each term anoth
Hi Wooi,
>Just wondering is there anyone used Digester to extract xml content and
> index the xml file? Is there any source that I can refer to on how to
> extract the xml contents. Or is there any other xml parser is much easier to
> use?
Perhaps this article may help:
http://www-128.ibm.com
11 dec 2006 kl. 20.04 skrev Chris Hostetter:
if you are trying to think of Lucene's docid as a meaningful
number, you
are doing something wrong.
There is this one place where I use it. The index is add only, and
the only data that interests me is the stored field MyID, also kept
track i
Thanks for the aid, Soren!!!
They are successful to extract the matrix.
But with collections of large documents is not one too much expensive
solution?
it is possible to extract the matrix from the indexing file?
Mario
Sören Pekrul wrote:
>
> Hello Mario,
>
> I had a similar problem a few
I use XmlBeans to "unmarshall" an XML file into Java objects, from which
you can easily retrieve the textual values of any element to be used for
indexing.
See http://xmlbeans.apache.org/ for more information on this library.
There are various similar libraries but I find XmlBeans superior in s
Hi,
Just wondering is there anyone used Digester to extract xml content and
index the xml file? Is there any source that I can refer to on how to
extract the xml contents. Or is there any other xml parser is much easier to
use?
Thanks
regards,
Wooi Meng
--
View this message in context:
Soeren Pekrul wrote:
The score for a document is the sum of the term weights w(tf, idf) for
each containing term. So you have already the combination of
coordination level matching with IDF. Now it is possible that your query
requests three terms A, B and C. Two of them (A and B) are quite ofte
I think I understand now. I also have evidence from literature. So I would say
that my question is solved. :)
Thank you, Otis, and everybody else for contributing!
Karl
Original-Nachricht
Datum: Thu, 14 Dec 2006 09:40:31 +0100
Von: Soeren Pekrul <[EMAIL PROTECTED]>
An: java-us
Hello Mario,
I had a similar problem a few weeks ago (thread "How to get Term Weights
(document term matrix)?", 2006-11-02,
http://www.gossamer-threads.com/lists/lucene/java-user/41726).
I think there is no simple function creating a document term matrix or
accessing it. I extracted the matr
Karl Koch wrote:
If I do not misunderstand that extract, I would say it suggests the combination of coordination level matching with IDF. I am interested in your view and those who read this?
I understand that sentence:
"The natural solution is to correlate a term's matching value with its
co
Hi,
May be you can consider using Compass (http://www.opensymphony.com/compass/)
which could help you in your situation. They claim that some actions (like
updating the index very often) are treated in a very efficient way (due to
caching which is not a native part of Lucene library).
Regards,
L
39 matches
Mail list logo