Re: Retrieving query-time join fromQuery hits

2020-06-08 Thread Mikhail Khludnev
t gives us full relation extracted. Not sure if it helps. On Mon, Jun 8, 2020 at 11:37 AM Stefan Onofrei wrote: > Thanks for the replies. > > @Mike: Yes, I think the idea is to run separate queries for each of the > resulting hits, as you described. I am concerned about the performance &

Re: Retrieving query-time join fromQuery hits

2020-06-08 Thread Stefan Onofrei
Thanks for the replies. @Mike: Yes, I think the idea is to run separate queries for each of the resulting hits, as you described. I am concerned about the performance implications of going down this route, especially when dealing with large result sets. @Mikhail: Thanks for the suggestion! I

Re: Retrieving query-time join fromQuery hits

2020-06-03 Thread Mikhail Khludnev
Hi, Stefan. Have you considered faceting/aggregation over `from` field? On Tue, May 12, 2020 at 7:23 PM Stefan Onofrei wrote: > Hi, > > When using Lucene’s query-time join feature [1], how can the hits from the > first phase which determine / contribute to the returned results be

Re: Retrieving query-time join fromQuery hits

2020-06-03 Thread Michael McCandless
Actually, I do not see how this can work efficiently with per-hit queries after the join. For each of the final joined hits, you must 1) retrieve the join key value(s) by pulling doc values iterators and advancing to the right docid, 2) run another query to "join backwards" to the hit

Re: Retrieving query-time join fromQuery hits

2020-05-20 Thread Michael McCandless
I am trying first to understand the proposed solution from the previous thread. You run query #1, it returns top N hits. From those hits you ask JoinUtil to create the "joined" query #2. You run the query #2 to get the top final (joined) hits. Then, to reconstruct which docids fro

Retrieving query-time join fromQuery hits

2020-05-12 Thread Stefan Onofrei
Hi, When using Lucene’s query-time join feature [1], how can the hits from the first phase which determine / contribute to the returned results be retrieved? This topic has been brought up before [2], and at the time the recommendation was to re-run the query with added constraints based on the

Re: Lucene 6.1: number of hits per document

2016-09-02 Thread szzoli
This link helped me, it contained the solution. Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-6-1-number-of-hits-per-document-tp4293245p4294403.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Re: Lucene 6.1: number of hits per document

2016-09-02 Thread Adrien Grand
m/FfFy2Amp. Le jeu. 1 sept. 2016 à 16:09, szzoli a écrit : > I call > IndexSearcher.search(Query, Collector) > but it is void. Where can I obtain the Scorer object? > collector.getTotalHits() > seems to return the number of the documents. > How can I tell it that it should coun

Re: Lucene 6.1: number of hits per document

2016-09-01 Thread szzoli
I call IndexSearcher.search(Query, Collector) but it is void. Where can I obtain the Scorer object? collector.getTotalHits() seems to return the number of the documents. How can I tell it that it should count the hits in the approriate document? -- View this message in context: http://lucene

Re: Lucene 6.1: number of hits per document

2016-09-01 Thread Adrien Grand
te these objects, too? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Lucene-6-1-number-of-hits-per-document-tp4293245p4294286.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --

Re: Lucene 6.1: number of hits per document

2016-09-01 Thread szzoli
message in context: http://lucene.472066.n3.nabble.com/Lucene-6-1-number-of-hits-per-document-tp4293245p4294286.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user

Re: Lucene 6.1: number of hits per document

2016-09-01 Thread Adrien Grand
Maybe you should clarify your use-case. For instance Uwe was assuming that you needed this information for debugging purposes while I was assuming that you needed it for your application logic. Le jeu. 1 sept. 2016 à 14:20, szzoli a écrit : > "If the Query is a TermQuery, you can get this number

Re: Lucene 6.1: number of hits per document

2016-09-01 Thread szzoli
ght be abstract, or have constuctors with several class parameters, ans so on... How can I get to Scorer.freq() so that the Scorer sould know, which Documentt's hits I am seaching for? Can you show me an example? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.c

RE: Lucene 6.1: number of hits per document

2016-08-31 Thread Uwe Schindler
Original Message- > From: Mikhail Khludnev [mailto:m...@apache.org] > Sent: Monday, August 29, 2016 2:17 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene 6.1: number of hits per document > > try fl=*,tf(text,'run') > or check explanation on debugQuery=true &

Re: Lucene 6.1: number of hits per document

2016-08-31 Thread szzoli
? To Mikhail Khludnev-2: What is " fl=*,tf(text,'run') " ? -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-6-1-number-of-hits-per-document-tp4293245p4294117.html Sent from the Lucene - Java Users mailing list a

Re: Lucene 6.1: number of hits per document

2016-08-29 Thread Mikhail Khludnev
run". A document contained two times, an > other three times this word. > I would like to see int the result for the firs document "2", for the other > "3". > > Thank you > > > Adrien Grand wrote > > What do you mean by "number of hits per d

Re: Lucene 6.1: number of hits per document

2016-08-29 Thread Adrien Grand
Le lun. 29 août 2016 à 12:00, a écrit : > I was searching for a word in an index (multiple files were indexed in a > library). I was searching e.g for "run". A document contained two times, an > other three times this word. > I would like to see int the result for the firs document "2", for the >

Re: Lucene 6.1: number of hits per document

2016-08-29 Thread szzoli
ou Adrien Grand wrote > What do you mean by "number of hits per documents"? Can you give an > example > maybe? > > Le jeu. 25 août 2016 à 16:29, szzoli > < > reg9sz...@freemail.hu > > > a écrit : > >> Hi, >> >> I would like to

Re: Lucene 6.1: number of hits per document

2016-08-28 Thread Adrien Grand
What do you mean by "number of hits per documents"? Can you give an example maybe? Le jeu. 25 août 2016 à 16:29, szzoli a écrit : > Hi, > > I would like to get the number of hits per document. > I googled around a lot, there were code snipplets for older versions. Non

Lucene 6.1: number of hits per document

2016-08-25 Thread szzoli
Hi, I would like to get the number of hits per document. I googled around a lot, there were code snipplets for older versions. None of them works with Lucene 6.1. Any help would be appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-6-1-number-of-hits-per

Search returning the correct number of hits but wrong stored data

2016-05-30 Thread Conny Gyllendahl
which consumes messages from a message queue and adds them to the cache (which in turn adds them to Lucene through the listener) (we get a burst of 2000-3000 messages every 5 minutes). And this is where I run in to problems, a search will return the correct number of hits (verified against database)

Re: Simple Similarity Implementation to Count the Number of Hits

2016-05-12 Thread Ahmet Arslan
Hi Luis, Thats an interesting question. Can you share your similarity? I suspect you return 1 expect Similarity#coord method. Not sure but, for phrase query, one may require to modify ExactPhraseScorer/ExactPhraseScorer etc. ahmet On Thursday, May 12, 2016 5:41 AM, Luís Filipe Nassif wrote:

Simple Similarity Implementation to Count the Number of Hits

2016-05-11 Thread Luís Filipe Nassif
Hi, In the past (lucene 4) I have tried to implement a simple Similarity to only count the number of occurrences (term frequencies) into the documents, ignoring norms, doc frequencies, boosts... It worked for some queries like term and wildcard queries, but not for others, like phrase and range qu

Re: Collector is collecting more than the specified hits

2014-02-18 Thread Michael McCandless
p://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118096.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: jav

Re: Collector is collecting more than the specified hits

2014-02-18 Thread saisantoshi
/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118096.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional

Re: Collector is collecting more than the specified hits

2014-02-18 Thread Michael McCandless
You look at the hits you got back, and save the docID of the very last hit, and use that on the follow-on search to get the "next page". This is how searchAfter works ... but you need to ensure you use the same searcher for follow-on requests; otherwise the docIDs are not comparable.

Re: Collector is collecting more than the specified hits

2014-02-18 Thread saisantoshi
.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4118048.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr

Re: Collector is collecting more than the specified hits

2014-02-18 Thread Michael McCandless
Sorry, searchAfter only works if you are sorting by score or by fields. It seems like you are sorting by docID? Ie, at first you want the top 100 hits sorted by docID, then the next 100, etc.? If so, you could just modify your collector so that you tell it up front the "afterDocID&quo

Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
= TopScoreDocCollector.create(101+100, true); it should call from 101 - 200 and not from 0-200. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117901.html Sent from the Lucene - Java Users

Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
Could you please elaborate on the above? I am not sure if the collector is already doing it or do I need to call any other API? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117883.html Sent

Re: Collector is collecting more than the specified hits

2014-02-17 Thread Michael McCandless
nd giving me the 100. Can we have the collector do it > intelligently by remembering the old search results and run the collector > for the next 100 only. > > Thanks, > Sai. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Collector-is

Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
xt: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117858.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user

Re: Collector is collecting more than the specified hits

2014-02-14 Thread Tri Cao
If I understand correctly, you'd like to shortcut the execution when you reach the desirednumber of hits. Unfortunately, I don't think there's a graceful way to do that right now inCollector. To stop further collecting, you need to throw an IOException (or a subtype of it)and catc

Re: Collector is collecting more than the specified hits

2014-02-14 Thread saisantoshi
I am not interested in the scores at all. My requirement is simple, I only need the first 100 hits or the numHits I specify ( irrespective of there scores). The collector should stop after collecting the numHits specified. Is there a way to tell in the collector to stop after collecting the

Re: Collector is collecting more than the specified hits

2014-02-14 Thread Michael McCandless
This is how Collector works: it is called for every document matching the query, and then its job is to choose which of those hits to keep. This is because in general the hits to keep can come at any time, not just the first N hits you see; e.g. the best scoring hit may be the very last one. But

Collector is collecting more than the specified hits

2014-02-13 Thread saisantoshi
message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-use

Re: How to get hits coordinates in Lucene 4.4.0

2013-09-06 Thread Darren Hoffman
TopDocs topDocs = isearcher.search(query, null, >1000); >ScoreDoc[] docs = topDocs.scoreDocs; > >StringBuilder result = new StringBuilder(); >StringBuilder debugInfo = new StringBuilder(); >

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-19 Thread Jon Stewart
; > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Mon, Aug 12, 2013 at 1:20 PM, Lingviston > >> wrote: > >> > I think that's OK for me. I just need to know the right way to get

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-19 Thread Michael McCandless
p://blog.mikemccandless.com >> >> >> On Mon, Aug 12, 2013 at 1:20 PM, Lingviston >> wrote: >> > I think that's OK for me. I just need to know the right way to get them. >> > Notice that queries must support boolean operators, *, ? and qoutes. >> &

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-19 Thread Jon Stewart
Aug 12, 2013 at 1:20 PM, Lingviston > wrote: > > I think that's OK for me. I just need to know the right way to get them. > > Notice that queries must support boolean operators, *, ? and qoutes. > > > > > > > > -- > > View this message in context:

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-18 Thread Karl Wettin
On Aug 13, 2013, at 12:55 PM, Michael McCandless wrote: > I'm less familiar with the older highlighters but likely it's possible > to get the absolute offsets from them as well. Using vector highlighter I've achieved that by extending and cloning the code of ScoreOrderFragmentsBuilder#makeFrag

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-13 Thread Lingviston
ghter is similar to this? I mean here I have custom Formatter and I need a custom one for PostingHighlighter too? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-hits-coordinates-in-Lucene-4-4-0-tp4083913p4084233.html Sent from the Lucene - Java Users mailing list ar

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-13 Thread Michael McCandless
ow the right way to get them. > Notice that queries must support boolean operators, *, ? and qoutes. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-get-hits-coordinates-in-Lucene-4-4-0-tp4083913p4084046.html > Sent from the Lucene

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Lingviston
I think that's OK for me. I just need to know the right way to get them. Notice that queries must support boolean operators, *, ? and qoutes. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-hits-coordinates-in-Lucene-4-4-0-tp4083913p4084046.html Sent fro

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Michael McCandless
raw highlights by myself > over the rendered pdf file (as far as I know lucene can't work with pdf by > default). > > Yes, offsets is what I'm looking for. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-get-hits-coordinates-in

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Lingviston
Like I said I will work with pdf files. So I will draw highlights by myself over the rendered pdf file (as far as I know lucene can't work with pdf by default). Yes, offsets is what I'm looking for. -- View this message in context: http://lucene.472066.n3.nabble.com/How-t

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Michael McCandless
Query query = queryParser.parse(searchString); > TopDocs topDocs = isearcher.search(query, null, > 1000); > ScoreDoc[] docs = topDocs.scoreDocs; > > StringBuilde

How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Lingviston
ScoreDoc[] docs = topDocs.scoreDocs; StringBuilder result = new StringBuilder(); StringBuilder debugInfo = new StringBuilder(); debugInfo.append("Number of hits: "); debugInfo.append(docs.length);

Re: Getting the number of all hits for the SpanQuery

2013-02-01 Thread Igor Shalyminov
the input parameters do not make sense to me. What are these objects for, and how does one build them? -- Best Regards, Igor Shalyminov 31.01.2013, 20:15, "Igor Shalyminov" : > Hello! > > I want to perform a SpanQuery and get the precise overall number of all hits > thro

Getting the number of all hits for the SpanQuery

2013-01-31 Thread Igor Shalyminov
Hello! I want to perform a SpanQuery and get the precise overall number of all hits throughout the entire index (i.e. if the query words combination appears multiple times in a document, I need that number counted). I've found a method called SpanQuery.getSpans, but the way of using it i

Re: Filtering top hits based on stored field? And Lucene 1.x -> 3.x for Dummies

2013-01-25 Thread Ian Lea
at I do not know why the search performance degrades when > doc() is called within the Collector. Is it simply that Lucene will present, > for example, thousands of candidate hits (from millions of indexed > documents) to the Collector even though the collector might only return the > top

Re: Filtering top hits based on stored field? And Lucene 1.x -> 3.x for Dummies

2013-01-25 Thread Andrew Gilmartin
ample, thousands of candidate hits (from millions of indexed documents) to the Collector even though the collector might only return the top handful? And so the Collector will need to load thousands of documents and it is this document loading that causes the performance degradation? Or is it

Re: Filtering top hits based on stored field? And Lucene 1.x -> 3.x for Dummies

2013-01-25 Thread Ian Lea
ions -- one general and one specific. > > The specific question is how, in Lucene 3.x, can I filter the > IndexSeacher.search() results based on stored fields within candidate hits? > It is not acceptable to perform the filter post search as now my hits list > is too short. In the past

Filtering top hits based on stored field? And Lucene 1.x -> 3.x for Dummies

2013-01-25 Thread Andrew Gilmartin
. The specific question is how, in Lucene 3.x, can I filter the IndexSeacher.search() results based on stored fields within candidate hits? It is not acceptable to perform the filter post search as now my hits list is too short. In the past calling doc() during a search (with my own collector

Re: getting the offset of hits in a search

2013-01-09 Thread Itai Peleg
Great! I'll look into that. Thanks! 2013/1/9 김한규 > Try SpanTermQuery, getSpans() function. It returns Spans object which you > can iterate through to find position of every hits in every documents. > > http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/S

Re: getting the offset of hits in a search

2013-01-09 Thread 김한규
Try SpanTermQuery, getSpans() function. It returns Spans object which you can iterate through to find position of every hits in every documents. http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/spans/SpanTermQuery.html 2013/1/9 Itai Peleg > Hi, > > I'n new to L

RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Ilya Zavorin
worked like a charm! thx! From: Jack Krupansky [j...@basetechnology.com] Sent: Thursday, June 14, 2012 3:30 PM To: java-user@lucene.apache.org Subject: Re: need to find locations of query hits in doc: works fine for regular text but not for phone

Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Jack Krupansky
-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Uwe, sorry but I am having trouble understanding this. Can you point me to a place in documentation that explains this in more detail (I've read

RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Ilya Zavorin
d) or some example code? Thanks much, Ilya -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, June 14, 2012 12:57 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone nu

Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Chris Hostetter
: Subject: need to find locations of query hits in doc: works fine for regular : text but not for phone numbers : Message-ID: : References: <1339635547170-3989548.p...@n3.nabble.com> : In-Reply-To: <1339635547170-3989548.p...@n3.nabble.com> https://people.apache.org/~hossman/#

RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Uwe Schindler
.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ilya Zavorin [mailto:izavo...@caci.com] > Sent: Thursday, June 14, 2012 6:49 PM > To: java-user@lucene.apache.org > Subject: RE: need to find locations of query hit

RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-14 Thread Ilya Zavorin
ngs of the query reliably? Thanks! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, June 13, 2012 11:42 PM To: java-user@lucene.apache.org Subject: Re: need to find locations of query hits in doc: works fine for regular text but not for phone

Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-13 Thread Jack Krupansky
the default field. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Wednesday, June 13, 2012 10:52 PM To: java-user@lucene.apache.org Subject: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Hello All, I am using 3.4. I need t

need to find locations of query hits in doc: works fine for regular text but not for phone numbers

2012-06-13 Thread Ilya Zavorin
Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers. Here's how I index my docs: String oc = "Joe dialed 800-555-1212 but got a busy signal"; doc.add

Re: Counting all the hits with parallel searching

2012-02-19 Thread Robert Muir
On Sun, Feb 19, 2012 at 10:23 AM, Benson Margulies wrote: > thanks, that's what I needed. > Thanks for bringing this up, I think its a common issue, I created https://issues.apache.org/jira/browse/LUCENE-3799 to hopefully improve the docs situation. -- lucidimagination.com

Re: Counting all the hits with parallel searching

2012-02-19 Thread Benson Margulies
ge numbers here: if you are not actually returning pages > of results to the user, but just counting hits, then pass > TotalHitCountCollector. > > -- > lucidimagination.com > > - > To unsubscribe,

RE: Counting all the hits with parallel searching

2012-02-19 Thread Uwe Schindler
the 2 million's result page, so pass a small number of top hits. To simply count all hits like you seem to do, there is a separate collector available: http://goo.gl/XsPVR - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -

Re: Counting all the hits with parallel searching

2012-02-19 Thread Robert Muir
something like 20. This is because it builds a priority queue of size _n_ to return results in sorted order. Don't pass huge numbers here: if you are not actually returning pages of results to the user, but just counting hits, then pass TotalHitCountCollect

Counting all the hits with parallel searching

2012-02-19 Thread Benson Margulies
If I have a lot of segments, and an executor service in my searcher, the following runs out of memory instantly, building giant heaps. Is there another way to express this? Should I file a JIRA that the parallel code should have some graceful behavior? int longestMentionFreq = searcher.search(long

Re: lucene hits vs topdocs

2011-11-21 Thread Ian Lea
> I last used dotLucene 143 and now I'm wanting to upgrade to 294. > > What I've discovered is that there are quite a few changes.. > > One of them is in respect of Search. Previously one supplied a query and > received a number of hits. I didn't have an issue with

lucene hits vs topdocs

2011-11-20 Thread Gwyn Carwardine
Hi I last used dotLucene 143 and now I'm wanting to upgrade to 294. What I've discovered is that there are quite a few changes.. One of them is in respect of Search. Previously one supplied a query and received a number of hits. I didn't have an issue with preservation of sta

Re: Query Hits

2011-08-05 Thread Ian Lea
> I am currently using lucene 2.4, Time to upgrade? > is there a way to count how many words from my query hits the post? > > Lets say my query is: > APPLE OR BANANA OR ORANGE > > > The post is: > I have a banana, i love to eat banana and apple > > This case

Query Hits

2011-08-04 Thread Tan Weijian
Hi , I am currently using lucene 2.4, is there a way to count how many words from my query hits the post? Lets say my query is: APPLE OR BANANA OR ORANGE The post is: I have a banana, i love to eat banana and apple This case, banana keyword is hit twice and apple is hit once, is there a way

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Michael McCandless
This (HashDocSet, and any other impls that handle the sparse case well) could be useful to have in Lucene's core. For example, for certain MultiTermQuerys we have this CONSTANT_SCORE_AUTO_REWRITE, which has iffy smelling heuristics to try to determine the best cutover point from ConstantScoreQuer

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Yonik Seeley
On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman wrote: > Seems like SortedVIntList can be used to store the info, but it has no > methods to build the list in the first place, requiring an array or bitset > in the constructor. It has a constructor that takes DocIdSetIterator - so you can pass an

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Jason Rutherglen
I think Solr has a HashDocSet implementation? On Tue, Apr 5, 2011 at 3:19 AM, Michael McCandless wrote: > Can we simply factor out (poach!) those useful-sounding classes from > Nutch into Lucene? > > Mike > > http://blog.mikemccandless.com > > On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman > w

Re: DocIdSet to represent small numberr of hits in large Document set

2011-04-05 Thread Michael McCandless
Can we simply factor out (poach!) those useful-sounding classes from Nutch into Lucene? Mike http://blog.mikemccandless.com On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman wrote: > I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4). > > Many of our indexes are 5M+ Documents,

DocIdSet to represent small numberr of hits in large Document set

2011-04-04 Thread Antony Bowesman
I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4). Many of our indexes are 5M+ Documents, however, only a small subset of these are relevant to any user. As a DocIdSet, backed by a BitSet or OpenBitSet, is rather inefficient in terms of memory use, what is the recommended

Re: no. of documents with hits vs. no. of hits

2011-03-15 Thread Ian Lea
ccess to the values passed to the coord() method of the Similarity in use for a search. Sounds hairy, maybe impossible. -- Ian. On Mon, Mar 14, 2011 at 2:52 PM, Michael Wiegand wrote: > Hi, > > Does Lucene always count the number of documents with hits matching a query > or is it

no. of documents with hits vs. no. of hits

2011-03-14 Thread Michael Wiegand
Hi, Does Lucene always count the number of documents with hits matching a query or is it also possible to count the overall number of hits? There would be a difference between the two if within a document there is actually more than one hit. Thank you in advance! Best, Michael

Re: overall number of hits

2011-03-11 Thread Ian Lea
There are search methods that don't require a filter, but you are right that there is nothing quite as simple as search(q). >From http://www.gossamer-threads.com/lists/lucene/java-user/95032 you can use TopDocs tp = ms.search(lucquery, 1); And then the total count is in tp.totalHits -- Ian. O

overall number of hits

2011-03-11 Thread Michael Wiegand
Hi, I am currently mainly interested in the overall number of matches in a document collection (several GBs) given a particular query. At the moment I am not interested in the matching documents themselves; just the number would be sufficient. In previous versions of lucene the Searcher class h

Re: Lucene , hits per document

2011-01-30 Thread sharma
Grant Ingersoll apache.org> writes: With a little logic on your size to count, you can use SpanQueries to do that. -Grant On Jan 21, 2011, at 4:03 PM, Sharma Kollaparthi wrote: Hi , I have started to use Lucene for searching in HTML files. Is it possible to get Hits

Re: Lucene , hits per document

2011-01-25 Thread Grant Ingersoll
With a little logic on your size to count, you can use SpanQueries to do that. -Grant On Jan 21, 2011, at 4:03 PM, Sharma Kollaparthi wrote: > Hi , > > I have started to use Lucene for searching in HTML files. Is it > possible to get Hits per document, when we search for

Lucene , hits per document

2011-01-21 Thread Sharma Kollaparthi
Hi , I have started to use Lucene for searching in HTML files. Is it possible to get Hits per document, when we search for phrases like "Hello World" and wild card searches like "te?t"? I managed to return the number of hits per document if there is only one term

Re: Get number of hits for various combinations

2010-10-20 Thread Pradeep Singh
Use Solr and look at faceting. On Wed, Oct 20, 2010 at 9:12 AM, Bob Miller wrote: > Dear all, > > > I would like to use a tag cloud of keywords to narrow down the currently > displayed search results. In order to implement this, it must be possible > to > determine the n

Get number of hits for various combinations

2010-10-20 Thread Bob Miller
Dear all, I would like to use a tag cloud of keywords to narrow down the currently displayed search results. In order to implement this, it must be possible to determine the number of hits for the current search query combined with each keyword in the tag cloud. What would be the most

Re: No hits when querying multiple fields

2010-07-27 Thread Erick Erickson
H, what analyzers are you using at index and query time? Are they identical? But I think your basic problem is phrases. Parsing text:"hello world" expects the words "hello" and "world" to appear sequentially in the text field. Try something like title:(+hello +world). But depending upon how yo

Re: No hits when querying multiple fields

2010-07-27 Thread Geir Gullestad Pettersen
Just to clarify some things that could be misunderstood. First, I meant that I added two fields to a document which was then indexed, not two separate documents. Second, I noticed in the lucene mail archive that some additional charactes, especially "*", had sneaked into my query examples. This w

No hits when querying multiple fields

2010-07-27 Thread Geir Gullestad Pettersen
Consider the following two documents which I have added to my index: doc.add( new Field("text", "hello world", Field.Store.YES, > Field.Index.ANALYZED)); > doc.add( new Field("id", "1", Field.Store.YES, Field.Index.ANALYZED)); > Using the StandardQueryParser I can retrieve my document with eithe

RE: search hits not returned until I stop and restart application

2010-06-21 Thread Steven A Rowe
nuss [mailto:andrew_n...@yahoo.com] > Sent: Monday, June 21, 2010 2:44 PM > To: java-user@lucene.apache.org > Subject: RE: search hits not returned until I stop and restart application > > > "Maybe you aren't using the IndexReader instance returned by reopen(), but >

RE: search hits not returned until I stop and restart application

2010-06-21 Thread andynuss
to be thrashing with a commit after each one, and then a reopen of the reader and reconstruction of my searcher. Do others manage this type of thing with a thread that fires at intervals to commit if dirty? -- View this message in context: http://lucene.472066.n3.nabble.com/search-hits-no

RE: search hits not returned until I stop and restart application

2010-06-21 Thread Steven A Rowe
> -Original Message- > From: andynuss [mailto:andrew_n...@yahoo.com] > Sent: Monday, June 21, 2010 1:29 PM > To: java-user@lucene.apache.org > Subject: RE: search hits not returned until I stop and restart application > > > "So you gotta call commit() or

RE: search hits not returned until I stop and restart application

2010-06-21 Thread andynuss
(1) called the IndexWriter singleton commit() function, (2) then called the IndexReader singleton reopen() function (no arguments). (My IndexReader is read only.) Still didn't find hits in that book. Then I tried (3) creating a new IndexSearcher on top of this IndexReader and that also didn&#x

RE: search hits not returned until I stop and restart application

2010-06-21 Thread Steven A Rowe
--- > From: andynuss [mailto:andrew_n...@yahoo.com] > Sent: Monday, June 21, 2010 11:02 AM > To: java-user@lucene.apache.org > Subject: search hits not returned until I stop and restart application > > > Hi, > > I have an IndexWriter singleton in my program, and an IndexSearc

search hits not returned until I stop and restart application

2010-06-21 Thread andynuss
Hi, I have an IndexWriter singleton in my program, and an IndexSearcher singleton based on a readonly IndexReader singleton. When I use the IndexWriter to index a large document to lucene, and then, while the program is still running, use my previously created IndexSearcher to find hits in that

Re: Finding the position of search hits from Lucene

2010-06-10 Thread Ian Lea
200 > > ... > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Finding-the-position-of-search-hits-from-Lucene-tp885956p886229.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --

Re: Finding the position of search hits from Lucene

2010-06-10 Thread tituspullo
position 4 . .. doc = 2345 position 200 ... -- View this message in context: http://lucene.472066.n3.nabble.com/Finding-the-position-of-search-hits-from-Lucene-tp885956p886229.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: Finding the position of search hits from Lucene

2010-06-10 Thread Simon Willnauer
imply ..:) > I try to rerank and  want to  know  later exactly  the original position in > the hitlist.. > > titus > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Finding-the-position-of-search-hits-from-Lucene-tp885956p886146.html > Sent from the Lu

Re: Finding the position of search hits from Lucene

2010-06-10 Thread tituspullo
ition-of-search-hits-from-Lucene-tp885956p886146.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-u

  1   2   3   4   5   6   >