Haven't a clue <G>. Erick
On Thu, Jul 31, 2008 at 11:29 AM, Sergey Kabashnyuk <[EMAIL PROTECTED]>wrote: > Thank you Erick. > > I'm talking about more then 10,000 documents and 95% less then 10 fields. > Maximum number of fields per document is unlimited. > But in practice it's no more the 20. > > > I'm interesting: does Lucene have any internal optimization, > which depend of the fields count or fields size, as database do? > I mean to determinate position of row X in index: > > positionX = sum(fieldsize[1]+...fieldsize[i])*(X-1) > > > Sergey Kabashnyuk > eXo Platform SAS > > > I'd go with option 1 unless and until you could demonstrate performance >> problems. Speaking of which, you'd get a more informed answer if you >> provided a bit more data, like how many fields are we talking, how many >> documents, etc. If you're indexing 10,000 documents, go with the simplest. >> If you're indexing 1,000,000,000 documents, more thought is required <G>.. >> Do you expect 3 fields/doc or 30,000 fields/doc? >> >> But the reason I'd go with <1> is that your second option has some issues. >> 1> how to tokenize? You'll probably have to write a custom one or risk >> getting tokens "name" "value" rather than "[EMAIL PROTECTED]". >> 2> Forming queries is, I believe, equally complex in both cases, so >> choose the conceptually simplest one. Let's say you have >> to search on foo1:val1 and foo2:val2. In the first case this is >> simple +foo1:val1 +foo2:val2. For your second case, you get >> +bigfield:[EMAIL PROTECTED] + bigfield:[EMAIL PROTECTED] There's not much >> difference between the two. >> 3> Back to my initial comment about resource usage: we don't >> have enough data to answer whether it makes any difference. >> But even if we did, you'd find the response a variation of >> "you'll have to try it and see" since there are so many >> variables. >> >> But I'll repeat that I always go with the simplest approach unless and >> until I'm certain there's a problem... >> >> Best >> Erick >> >> On Thu, Jul 31, 2008 at 10:36 AM, Sergey Kabashnyuk <[EMAIL PROTECTED] >> >wrote: >> >> The best strategy. >>> >>> Hello. >>> I want to ask you opinion about to "How >>> store multiple fields of same document". >>> >>> I see now two possibility's. >>> 1. Multiple fields in document >>> 2. One filed: for example named PROPERTIES, with multiple instances. >>> And values combined with name for example "[EMAIL PROTECTED]" >>> >>> What choice the best for search speed and resource usage? >>> >>> Thanks. >>> >>> Sergey Kabashnyuk >>> eXo Platform SAS >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >