Thank you Erick.
I'm talking about more then 10,000 documents and 95% less then 10 fields.
Maximum number of fields per document is unlimited.
But in practice it's no more the 20.
I'm interesting: does Lucene have any internal optimization,
which depend of the fields count or fields size, as database do?
I mean to determinate position of row X in index:
positionX = sum(fieldsize[1]+...fieldsize[i])*(X-1)
Sergey Kabashnyuk
eXo Platform SAS
I'd go with option 1 unless and until you could demonstrate performance
problems. Speaking of which, you'd get a more informed answer if you
provided a bit more data, like how many fields are we talking, how many
documents, etc. If you're indexing 10,000 documents, go with the
simplest.
If you're indexing 1,000,000,000 documents, more thought is required
<G>..
Do you expect 3 fields/doc or 30,000 fields/doc?
But the reason I'd go with <1> is that your second option has some
issues.
1> how to tokenize? You'll probably have to write a custom one or risk
getting tokens "name" "value" rather than "[EMAIL PROTECTED]".
2> Forming queries is, I believe, equally complex in both cases, so
choose the conceptually simplest one. Let's say you have
to search on foo1:val1 and foo2:val2. In the first case this is
simple +foo1:val1 +foo2:val2. For your second case, you get
+bigfield:[EMAIL PROTECTED] + bigfield:[EMAIL PROTECTED] There's not much
difference between the two.
3> Back to my initial comment about resource usage: we don't
have enough data to answer whether it makes any difference.
But even if we did, you'd find the response a variation of
"you'll have to try it and see" since there are so many
variables.
But I'll repeat that I always go with the simplest approach unless and
until I'm certain there's a problem...
Best
Erick
On Thu, Jul 31, 2008 at 10:36 AM, Sergey Kabashnyuk
<[EMAIL PROTECTED]>wrote:
The best strategy.
Hello.
I want to ask you opinion about to "How
store multiple fields of same document".
I see now two possibility's.
1. Multiple fields in document
2. One filed: for example named PROPERTIES, with multiple instances.
And values combined with name for example "[EMAIL PROTECTED]"
What choice the best for search speed and resource usage?
Thanks.
Sergey Kabashnyuk
eXo Platform SAS
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]