Hi, thanks for the reply.
For the document, in my last mail.. multifieldQuery: name: Alexander AND domain: history AND first_sentence: Macedon Single field query: bagOfWords: Alexander history Macedon I can for sure say that multiple copies are not index. But the number of fields in which text is divided are many. Can that be a reason? How is field data stored in Index and searched? I read the document on file formats on lucene's web site but it was not very clear. Cheers, Anshul On Wed, Jan 21, 2009 at 4:42 PM, Ian Lea <ian....@gmail.com> wrote: > Hi > > > Space: 700Mb vs 4.5Gb sounds way too big a difference. Are you sure > you aren't loading multiple copies of the data or something like that? > > Queries: a 20 times slowdown for a multi field query also sounds way > too big. What do the simple and multi field queries look like? > > > > -- > Ian. > > > On Wed, Jan 21, 2009 at 1:39 PM, Anshul jain <anshul.j...@epfl.ch> wrote: > > Hi, > > > > I've indexed around half a million XML documents. Here is the document > > sample: > > > > <a:attribute> > > > > <a:name>cogito:Name</a:name> > > > > <a:value>Alexander the Great</a:value> > > > > </a:attribute> > > > > > > > > <a:attribute> > > > > <a:name>cogito:domain</a:name> > > > > <a:value>ancient history</a:value> > > > > </a:attribute> > > > > > > > > <a:attribute> > > > > <a:name>cogito:first_sentence</a:name> > > > > <a:value> > > > > Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10 > 323 > > BC), also known as Alexander III, was an ancient Greek king (basileus) of > > Macedon (336-323 BC). > > > > </a:value> > > > > </a:attribute> > > > > > > Average size of documents is around 4KB. > > > > There are a few performance issues I need help with. When I index > documents, > > in a structured manner, using field information like: > > name: alexander the great > > domain: ancient history > > first_sentence: Alexander the Great (Greek: or Megas Alexandros; July 20 > 356 > > BC June 10 323 BC), also known as Alexander III, was an ancient Greek > king > > (basileus) of Macedon (336-323 BC). > > bagOfWords: alexander the great ancient history Alexander the Great > (Greek: > > or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as > Alexander > > III, was an ancient Greek king (basileus) of Macedon (336-323 BC). > > > > bagOfWords is the field with all the text appended to it. > > > > I get the index size of 4.5 GB, but if I just append the text and store > in > > one field > > like: > > value: alexander the great ancient history Alexander the Great (Greek: or > > Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander > > III, was an ancient Greek king (basileus) of Macedon (336-323 BC). > > > > the index size is only 700 MB.. why is this happening? > > > > > > > > Also the query execution time of MultiFieldQueries is very slow, it is 20 > > times slower than single field query. Is it normal, what could be the > > reason for that? > > > > Thanks, > > Cheers, > > Anshul > > > > -- > > Anshul Jain > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Anshul Jain