RE: Changing Term Vectors for Query

2021-06-07 Thread Marcel D.
-- > > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Marcel D. truebau...@protonmail.com.INVALID > > Sent: Monday, June 7, 2021 9:53 AM > > To: java-user@lucene.

RE: Changing Term Vectors for Query

2021-06-07 Thread Uwe Schindler
D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Marcel D. > Sent: Monday, June 7, 2021 9:53 AM > To: java-user@lucene.apache.org > Subject: Re: Changing Term Vectors for Query > > Hi Adrien, > i forgot to mention that i also need the

Re: Changing Term Vectors for Query

2021-06-07 Thread Marcel D.
Hi Adrien, i forgot to mention that i also need the original frequencies. I have some queries i need to perform with the original frequencies and then some with custom frequencies, but as im only having a small index and a few queries that would work, but a solution where i dont have to change t

Re: Changing Term Vectors for Query

2021-06-07 Thread Adrien Grand
Hi Marcel, You can make Lucene index custom frequencies using something like DelimitedTermFrequencyTokenFilter , which would be easier than writing a custom Query/

Changing Term Vectors for Query

2021-06-06 Thread Hannes Lohr
Hello, for some Queries i need to calcuate the score mostly like the normal score, but for some documents certain terms are assigned a Frequency given by me and the score should be calculated with these new term frequencies. After some research, it seems i have to implement a custom Query, custo

Re: Term vectors

2014-09-30 Thread Jack Krupansky
Jack Krupansky -Original Message- From: John Cecere Sent: Tuesday, September 30, 2014 10:23 AM To: java-user@lucene.apache.org Subject: Term vectors I'm looking for documentation on how to use term vectors in Lucene. Specifically what I'd like be able to do is to return t

Term vectors

2014-09-30 Thread John Cecere
I'm looking for documentation on how to use term vectors in Lucene. Specifically what I'd like be able to do is to return the positions of found search terms/phrases/etc. within a document. I've been able to find bits and pieces of information here and there, but no actua

Re: Document term vectors in Lucene 4

2013-01-18 Thread Jon Stewart
Thanks! I still can't see what was wrong with my original code--must have been a dumb typo somewhere--but starting over from that example now works on indices generated from my real indexing code. I will try to blog about it next week so there is some sample code up on the web for anyone else searc

Re: Document term vectors in Lucene 4

2013-01-18 Thread Ian Lea
To get stats from the whole index I think you need to come at this from a different direction. See the 4.0 migration guide for some details. With a variation on your code and 2 docs doc1: foobar qux quote doc2: foobar qux qux quorum this code snippet Fields fields = MultiFields.getFiel

Re: Document term vectors in Lucene 4

2013-01-17 Thread Jon Stewart
D'oh Thanks! Does TermsEnum.totalTermFreq() return the per-doc frequencies? It looks like it empirically, but the documentation refers to corpus usage, not document.field usage. Jon On Thu, Jan 17, 2013 at 10:00 AM, Ian Lea wrote: > typo time. You need doc2.add(...) not 2 doc.add(...) stat

Re: Document term vectors in Lucene 4

2013-01-17 Thread Ian Lea
typo time. You need doc2.add(...) not 2 doc.add(...) statements. -- Ian. On Thu, Jan 17, 2013 at 2:49 PM, Jon Stewart wrote: > On Thu, Jan 17, 2013 at 9:08 AM, Robert Muir wrote: >> Which statistics in particular (which methods)? > > I'd like to know the frequency of each term in each docume

Re: Document term vectors in Lucene 4

2013-01-17 Thread Jon Stewart
On Thu, Jan 17, 2013 at 9:08 AM, Robert Muir wrote: > Which statistics in particular (which methods)? I'd like to know the frequency of each term in each document. Those term counts for the most frequent terms in the corpus will make it into the document vectors for clustering. Looking at Terms

Re: Document term vectors in Lucene 4

2013-01-17 Thread Robert Muir
; default list of stop words. >> >> Not relevant, but why are you using SlowCompositeReaderWrapper rather than >> just >> IndexReader rdr = DirectoryReader.open(dir)? I get the same results either >> way, >> >> >> -- >> Ian. >> >>

Re: Document term vectors in Lucene 4

2013-01-17 Thread Jon Stewart
this" and "is" are presumably in the > default list of stop words. > > Not relevant, but why are you using SlowCompositeReaderWrapper rather than > just > IndexReader rdr = DirectoryReader.open(dir)? I get the same results either > way, > > > -- > I

Re: Document term vectors in Lucene 4

2013-01-17 Thread Ian Lea
are presumably in the default list of stop words. Not relevant, but why are you using SlowCompositeReaderWrapper rather than just IndexReader rdr = DirectoryReader.open(dir)? I get the same results either way, -- Ian. On Thu, Jan 17, 2013 at 5:52 AM, Jon Stewart wrote: > Hello, > >

Document term vectors in Lucene 4

2013-01-16 Thread Jon Stewart
Hello, I cannot extract document term vectors from an index, and have not turned up much in some determined googling. In short, when I call IndexReader.getTermVector(docID, field) or IndexReader.getTermVectors(docID) and then navigate down to the Terms for the specified field, I get a null result

RE: Norms and Term Vectors in Lucene 4.0

2012-10-30 Thread Scott Smith
-user@lucene.apache.org Subject: Re: Norms and Term Vectors in Lucene 4.0 hey scott, On Mon, Oct 29, 2012 at 11:56 PM, Scott Smith wrote: > Converting some code to lucene 4.0, it appears that we can no longer set > whether we want to store norms or termvectors using the "sug

Re: Norms and Term Vectors in Lucene 4.0

2012-10-30 Thread Simon Willnauer
hey scott, On Mon, Oct 29, 2012 at 11:56 PM, Scott Smith wrote: > Converting some code to lucene 4.0, it appears that we can no longer set > whether we want to store norms or termvectors using the "sugared" Field > classes (e.g., StringField() and TextField). I gather the defaults are to > st

Norms and Term Vectors in Lucene 4.0

2012-10-29 Thread Scott Smith
Converting some code to lucene 4.0, it appears that we can no longer set whether we want to store norms or termvectors using the "sugared" Field classes (e.g., StringField() and TextField). I gather the defaults are to store norms and to not store termvectors? If I don't want norms on a field,

Searching by similarity using term vectors

2012-02-14 Thread Mike O'Leary
If I have indexed a set of documents using term vectors, is there support in Lucene to treat a list of query terms as a small document, create a term vector for it, and find documents by computing similarity between the query's term vector and the term vectors in the index? If so, wha

Re: Retrieving the term vectors of a document in Nutch

2012-01-10 Thread atcach
dd(td.freq()); } It never enters the while. Regards ! -- View this message in context: http://lucene.472066.n3.nabble.com/Retrieving-the-term-vectors-of-a-document-in-Nutch-tp560993p3647617.html Sent from the Lucene - Java Users mailing list archive at

Re: Retrieving the term vectors of a document in Nutch

2009-06-08 Thread House Less
Hello Grant, > I'd ask on the nutch-u...@lucene.apache.org mailing list. While Lucene can > do > all of these things, it is not clear how Nutch exposes, if at all, any of > this > information. You should be able to get results there. Thanks, I'll be sure to ask them. > Note, however, t

Re: Retrieving the term vectors of a document in Nutch

2009-06-08 Thread Grant Ingersoll
I'd ask on the nutch-u...@lucene.apache.org mailing list. While Lucene can do all of these things, it is not clear how Nutch exposes, if at all, any of this information. You should be able to get results there. Note, however, that Term Vecs must be created during indexing by creating th

Re: Retrieving the term vectors of a document in Nutch

2009-06-07 Thread House Less
In retrospect, pardon my stupidity: surely it cannot be right that the term frequency vector for a page is not present within Nutch, for it needs this to compute the score for a page given a query. I would appreciate it if you would tell me where I may find it given a document number. Thank you

Retrieving the term vectors of a document in Nutch

2009-06-07 Thread House Less
Hello everyone, I am quite new to development with Nutch, so you must forgive my question if it is amateurish. After some reading of Luke's source code, I found to my dismay that obtaining the TermFreqVector of a document via the IndexReader resulted in no vectors at all. A mailing list entry

Re: large term vectors

2008-02-11 Thread Karl Wettin
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/document/Field.Index.html#NO_NORMS ? 11 feb 2008 kl. 15.55 skrev <[EMAIL PROTECTED]>: Hi Grant, Lucene 2.2.0 I'm not actually explicitely storing term vectors. It seems the huge amount of byte arrays is actually

RE: large term vectors

2008-02-11 Thread marc.dumontier
Hi Grant, Lucene 2.2.0 I'm not actually explicitely storing term vectors. It seems the huge amount of byte arrays is actually coming from SegmentReader.norms. Maybe that cache constantly grows as I read somewhere that it's on-demand. I'm not using any field or document boosting

RE: large term vectors

2008-02-11 Thread marc.dumontier
ms to have something to do with the norms (SegmentReader.norms) Marc -Original Message- From: Cedric Ho [mailto:[EMAIL PROTECTED] Sent: Sunday, February 10, 2008 9:19 PM To: java-user@lucene.apache.org Subject: Re: large term vectors Is it a single index ? My index is also in the 200G

Re: large term vectors

2008-02-11 Thread Grant Ingersoll
Hi Marc, Can you give more info about what your field properties are? Your subject line implies you are storing term vectors, is that the case? Also, what version of Lucene are you using? Cheers, Grant On Feb 8, 2008, at 10:51 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED] >

Re: large term vectors

2008-02-10 Thread Cedric Ho
I guess it would be quite different for different apps. For me, I do index update on a single machine: index each incoming documents into one chunk according to some rule to ensure even distribution. Then copy all the updated indexes to some other machines for searching. Each machine will then reo

Re: large term vectors

2008-02-10 Thread Briggs
So, I have a question about 'splitting indexes'. I see people say this all over, but how have people been handling this. I'm going to start a new thread, and there probably was one back in the day, but I am going to fire it up again. But, how did you do it? On Feb 10, 2008 9:18 PM, Cedric Ho <

Re: large term vectors

2008-02-10 Thread Cedric Ho
Is it a single index ? My index is also in the 200G range, but I never managed to get a single index of size > 20G and still get acceptable performance (in both searching and updating). So I split my indexes into chunks of < 10G I am curious as to how you manage such a single large index. Cedric

large term vectors

2008-02-08 Thread marc.dumontier
Hi, I have a large index which is around 275GB. As I search different parts of the index, the memory footprint grows with large byte arrays being stored. They never seem to get unloaded or GC'ed. Is there any way to control this behavior so that I can periodically unload cached information?

Re: term vectors

2006-11-15 Thread Grant Ingersoll
getTermFreqVector() for each desired field on each document, then sum that; but this seems slow to me. -Original Message- From: Michael D. Curtin [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 15, 2006 11:35 AM To: java-user@lucene.apache.org Subject: Re: term vectors Phil Rosen wrote: I am

Re: term vectors

2006-11-15 Thread Michael D. Curtin
Phil Rosen wrote: I would like to get the sum of frequency counts for each term in the fields I specify across the search results. I can just iterate through the documents and use getTermFreqVector() for each desired field on each document, then sum that; but this seems slow to me. It seems

RE: term vectors

2006-11-15 Thread Phil Rosen
. -Original Message- From: Michael D. Curtin [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 15, 2006 11:35 AM To: java-user@lucene.apache.org Subject: Re: term vectors Phil Rosen wrote: > I am building an application that requires I index a set of documents on > the scale of hundr

Re: term vectors

2006-11-15 Thread Michael D. Curtin
Phil Rosen wrote: I am building an application that requires I index a set of documents on the scale of hundreds of thousands. A document can have a varying number of attribute fields with an unknown set of potential values. I realize that just indexing a blob of fields would be much faster, ho

Re: term vectors

2006-11-15 Thread Erick Erickson
Why do you think you need term frequencies in the first place? What is it that you're trying to do that just searching wouldn't accomplish? I've often jumped into the middle of something and made it way too complex, so I'm asking to see if you're doing something similar . Lucene has no requi

term vectors

2006-11-15 Thread Phil Rosen
Hello, Thanks in advance for your help, I am really stumped I feel. I am building an application that requires I index a set of documents on the scale of hundreds of thousands. A document can have a varying number of attribute fields with an unknown set of potential values. I realize th

Re: Impact of Term Vectors

2005-12-14 Thread Grant Ingersoll
I have successfully used term vectors for both the TREC English and Arabic collections. Take a look at the code I posted at http://www.cnlp.org/apachecon2005 or Erik and Otis' book, "Lucene In Action", which both have good examples of term vectors. Perhaps there is something w

Re: Impact of Term Vectors

2005-12-13 Thread Ira Goldstein
We've run into an issue with the term vectors. When indexing a small corpus (~3k docs, 1.3G) everything works fine, as it does with a small number of documents from TREC-6 (so we believe that our indexing code is ok). However, when we tried to index the full TREC-6 corpus (~300,000 docs, 2G

Impact of Term Vectors (was ApacheCon next week)

2005-12-13 Thread Dan Climan
Good question. I was wondering about the impact of adding term vectors with the various options. For example, is adding term vectors with both positions and offsets a significant impact? Which current parts of lucene (including contributions) take advantage of term vectors being present? I know

Re: Fwd: Re: Term Vectors

2005-11-11 Thread Grant Ingersoll
-- Received: Fri, 28 Oct 2005 08:22:04 PM EDT From: Chris Hostetter <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Subject: Re: Term Vectors : "Now, you can get these term vectors per documents with the Lucene API if the : index was built with the term vectors option." : : H

Re: Fwd: Re: Term Vectors

2005-11-11 Thread marigoldcc
ROTECTED]> wrote: > > > -- Original Message -- > Received: Fri, 28 Oct 2005 08:22:04 PM EDT > From: Chris Hostetter <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Subject: Re: Term Vectors > > : "Now, you can get these term vectors per &