OK, final note. I wish I knew what kind of drugs I was on when I first
thought that the sizes were so much smaller. Because they weren't. I got to
thinking that "gee, it's kind of weird that if you don't specify anything
for TermVector when creating a field, you get all this advanced stuff. If it
It's always embarrassing when the correct unit test takes, say, 3 minutes to
write and I've engaged in all this angst that I could have dispelled all by
myself (although it is nice to have confirmation from folks in the know).
The answer is that omitting term vectors has no influence on the behav
My apologies to Erik...and Erick...I am horrible with names.
If I am reading Grant's email correctly, he also said you don't need to
store the Term Vectors...just that if you did store them, you can use
them with the highlighter so that you do not need to reanalyze the
text...why exactly this
Thanks for that addition, it may well be important to me (as well as
pointing up a weakness in my unit tests. Honest, I've been thinking about
explicitly testing this. Really. I'll get around to it real soon now.
Truly). We store multiple entries in the same field, think of it as
storing a lis
As Erick said, Term positions are kept regardless of whether you store
term vectors. The positional information is needed for phrase queries,
span queries, etc. You certainly don't lose the ability to use phrase
queries if you do not store term vectors. If you check out the Posting
class in Doc
As Erik stated, you don't need term vectors to do spans, but I
thought I would add a bit on the difference between positions and
offsets.
Positions are what is stored in Lucene internally (see
Token.getPositionIncrement() and TermPositions) and are usually just
consecutive integers (altho
Erik Hatcher sez no.
Erick
On 2/14/07, karl wettin <[EMAIL PROTECTED]> wrote:
14 feb 2007 kl. 15.03 skrev Erick Erickson:
> My reasoning was that I do need position information since I need
> to do Span
> queries, but character information (WITH_OFFSETS) isn't necessary
> here/now.
> So I t
14 feb 2007 kl. 15.03 skrev Erick Erickson:
My reasoning was that I do need position information since I need
to do Span
queries, but character information (WITH_OFFSETS) isn't necessary
here/now.
So I thought I'd make a small test to see if this was worth
pursuing. If
omitting offsets ha
You've made me a happy man .
Thanks again.
[EMAIL PROTECTED] .
On 2/14/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
On Feb 14, 2007, at 9:03 AM, Erick Erickson wrote:
> My reasoning was that I do need position information since I need
> to do Span
> queries, but character information (WITH_OF
On Feb 14, 2007, at 9:03 AM, Erick Erickson wrote:
My reasoning was that I do need position information since I need
to do Span
queries, but character information (WITH_OFFSETS) isn't necessary
here/now.
1> Am I going off a cliff here? I suppose this is really answered by
2> what is the d
I'm indexing books, with a significant amount of overhead in each document
and a LOT of OCR data. I'm indexing over 20,000 books and the index size is
8G. So I decided to play around with not storing some of the termvector
information and I'm shocked at how much smaller the index is. By storing al
11 matches
Mail list logo