I have an app that searches a single document against many queries.
Lets say the document was
The quick brown fox jumped over the lazy dog.
and my queries are
SpanNearQuery("quick","brown",50)
SpanNearQuery("quick","fox",50)
I would like to retrieve the slop or some sort of score that was ma
Someone pointed me there already. Looks interesting. Is there a
mailing list for the incubator? Does anyone know the status of the
proposal?
On 3/20/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
If you are thinking about putting textmining library elsewhere, allow me to
point out Tika:
http:
Boy, I'm looking forward to this! I read some of the background discussion. I
think this might fit as a Lucene contrib, but we'll be able to tell when the
code makes it into JIRA.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Searc
If you are thinking about putting textmining library elsewhere, allow me to
point out Tika:
http://wiki.apache.org/incubator/TikaProposal
Better home for your lib, perhaps?
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - S
I'm betting you can make SpanNearQuery work for you. In the simple case it's
a bunch of SpanQuerys (which in its simplest form is just a Span version of
TermQuery). The two other parameters are slop (See Lucene In Action for an
explanation of this) and whether the terms must appear in the order th
Hello all,
I have a how-to question. I have a field with these tokens in it (a b c f b g
a) and I am searching on it with these tokens (a f e g a). So far this is easy
I just set up a BooleanQuery with a bunch of optional TermQueries and get hits
on (a f g a) but not (e) which is close to what
Donna L Gresh wrote:
Also, the terms.close()
statement is outside the scope of terms. I changed to the following, is
this correct and should the
FAQ be changed?
try
{
TermEnum terms = indexReader.terms(new
Term("FIELD-NAME-HERE", ""));
Sounds interesting Martin!
Is the dictionary static, or is it generated from the corpus or from
user queries?
-Yonik
On 3/20/07, Martin Haye <[EMAIL PROTECTED]> wrote:
As part of XTF, an open source publishing engine that uses Lucene, I
developed a new spelling correction engine specifically to
As part of XTF, an open source publishing engine that uses Lucene, I
developed a new spelling correction engine specifically to provide "Did you
mean..." links for misspelled queries. I and a small group are preparing
this for submission as a contrib module to Lucene. And we're inviting
interested
I've been out of the loop for a while. I just saw this recent thread
and re-subscribed to the list.
In the next month or two I will be able to put some time into the
textmining library. Fast saved files are on the list of improvements
as well as other features that have been requested. I would al
Hello,
The response time for sorts depends on number of results.
If you don't need all documents returned you could use a filter.
One idea would be to use DateTools to save your dates as Strings
and build your query with FilteredQuery passing in a custom filter
to search this field.
The filter
Heh - it used to be in my sig ... my bad.
Thanks, all. :)
http://www.stubhub.com
On 3/20/07, bruce <[EMAIL PROTECTED]> wrote:
hey cass...
anyway you could let us know the site/app that we're powering!!!
always good to see what's going on in the world!
thanks
-Original Message-
F
Well, depending upon your storage requirements, it's actually
much easier than that. Assuming you're adding
this field (or a duplicate) as UN_TOKENIZED (in this case, no
need to store), you can just spin
through all the terms for that field with TermDocs/TermEnum.
The trick is to have your term st
In a web application, I have generally cached IndexSearcher in
application scope and reused it for all requests.
You will have to balance the demand for timeliness of updates with
the time it takes to build up the sort caches. You can't really have
instantaneous viewing of newly added docu
Erik,
I'm not using a cached IndexSearcher. Is this an option in an
environment where the underlying index changes on a second-by-second
basis? At what layer would a cached IndexSearcher be cached? At the
tomcat layer?
Caching at the object layer seems like it might help, but it doesn't
address m
hey cass...
anyway you could let us know the site/app that we're powering!!!
always good to see what's going on in the world!
thanks
-Original Message-
From: Cass Costello [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 20, 2007 12:58 PM
To: solr-user@lucene.apache.org; java-user@lucene
Are you using a cached IndexSearcher such that successive sorts on
the same field will be more efficient?
Erik
On Mar 20, 2007, at 3:39 PM, David Seltzer wrote:
Hi All,
I have a sort performance question:
I have a fairly large index consisting of chunks of full-text
transcript
...to everyone who helps make Lucene and Solr such fantastic tools.
I'm the Platform Architect for a leading online event ticket
after-marketplace (think eBay for tickets), and we've just completed a 12
month project to rewrite the Browse and Search components of our
customer-facing site. Both r
Hi All,
I have a sort performance question:
I have a fairly large index consisting of chunks of full-text
transcriptions of television, radio and other media, and I'm trying to
make it searchable and sortable by date. The search front-end uses a
parallelmultisearcher to search up to three
Thanks, I see what you are saying.
Seems that if I create the field at index time with term vectors stored,
then I can iterate through the documents and get both the unique
identifier and the terms, right? My original question was imprecise in
that I'm going to want to get all the terms for *al
Sorry, but you have to have the Lucene document ID, which you
can get either as part of a Hits or HitCollector or...
or by using TermDocs/TermEnum on your unique id (my_id in
your example).
Erick
On 3/20/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
You can do a document.get(field), *assuming*
You can do a document.get(field), *assuming* you have stored the data
(Field.Store.YES) at index time, although you may not get
stop words.
On 3/20/07, Donna L Gresh <[EMAIL PROTECTED]> wrote:
My apologies if this is a simple question--
How can I get all the (stemmed and stop words removed, et
My apologies if this is a simple question--
How can I get all the (stemmed and stop words removed, etc.) terms in a
particular field of a particular document?
Suppose my documents each consist of two fields, one with the name "my_id"
and a unique identifier, and the other being some text string
Another option for this might be IndexWriter.updateDocument().
"Erick Erickson" <[EMAIL PROTECTED]> wrote on 18/03/2007 15:28:09:
> BTW, instead of searching with a query, it might be faster
> to use TermEnum on your unique field. If TermEnum finds
> a term like the one you're about to add, you a
Lokeya <[EMAIL PROTECTED]> wrote on 18/03/2007 13:19:45:
>
> Yep I did that, and now my code looks as follows.
> The time taken for indexing one file is now
> => Elapsed Time in Minutes :: 0.3531
> which is really great
I am jumping in late so appologies if I am missing something.
However I don't
Thanks a lot.
On 3/20/07, karl wettin <[EMAIL PROTECTED]> wrote:
20 mar 2007 kl. 12.14 skrev SK R:
> Hi Mark,
> Thanks for your reply.
> Could i get this match length (docFreq) without using
> searcher.search(..) ?
>
> One more doubt is "Preformace for getting search
>> Could i get this match length (docFreq) without using searcher.search(..) ?
Yes, but it's likely to involve more code on your part. TermPositions is class
you want to look at. See the PhraseQuery implementation for examples of how to
use this.
>> One more doubt is "Preformace for getting sea
20 mar 2007 kl. 12.14 skrev SK R:
Hi Mark,
Thanks for your reply.
Could i get this match length (docFreq) without using
searcher.search(..) ?
One more doubt is "Preformace for getting search length by
using
searcher.search(...) is same as using reader.docFreq(..)??
Hi Mark,
Thanks for your reply.
Could i get this match length (docFreq) without using
searcher.search(..) ?
One more doubt is "Preformace for getting search length by using
searcher.search(...) is same as using reader.docFreq(..)??;
On 3/20/07, mark harwood <[EMAIL PROT
IndexSearcher s=new IndexSearcher("/indexes/myindex");
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("contents","test"));
pq.add(new Term("contents","under"));
int df=s.search(pq).length();
Cheers
Mark
- Original Message
From: SK R <[EMAIL PRO
Hi,
I can get docFreq. of single term like (f1:test) by using
indexReader.docFreq(new Term("f1","test")). But can't get docFreq. of phrase
term like f2:"test under") by the same method.
Is anything wrong in this code?
Please help me to resolve this problem.
Thanks & Regards
RSK
20 mar 2007 kl. 07.40 skrev thomas arni:
You can adapt the source code of StopAnalyzer.java in the analysis
package, or I suppose you can use the default constructor with a
empty stop word list (but please check this).
I often do this:
analyzer = new ...Analyzer(Collection.EMPTY_SET);
an
32 matches
Mail list logo