There is some bottleneck when you have a large number of fields and of words. Each field has its own list of terms which means that the dictionary, in the worst case, could be of size n*m (with n the number of fields, and m the number of terms). This can lead to some overhead when looking up a term in the case where n and m is large. (Term lookup occurs for each keyword in a query).

Another problem (for the end user) of using an arbitrary number of fields is that the user will have to know exactly which field names to query. By default, Lucene cannot search efficiently on an arbitrary number of fields, unless you create a "content" field that you will use to index the values from all the fields. This will duplicate the data inside the index (even if it is cheap to index two times the same data, it can be problematic for very large index).

We have released recently a plugin for Lucene (SIREn [1]) that tackles such particular problem. It has been developped initially to create a search engine for RDF data (standard model for data interchange on the web). It allows to index an arbitrary number of fields without facing the two previous problems, but also to keep web scale performance. In addition, it allows to use keyword search on the field names, and better support of multi-valued fields.

I think the best it to give try, do a benchmark using Lucene and SIREn, and see which one answers more your needs (in term of response time, and also on search capabilities). If your index stays relatively small (few thousands or maybe millions of documents), then maybe Lucene is a good choice, but if your expect to have a large index (millions of documents) with an arbitrary number of fields (thousands or even more like tens of thousands), then maybe SIREn will be more suitable.

[1] http://siren.sindice.com/
--
Renaud Delbru

On 12/03/10 13:43, Erick Erickson wrote:
There's no requirement that all documents have the same
fields, Lucene is fine with different docs having different
fields.

There's no limit on the number of different fields allowed
that I know of, but I'm sure someone will chime in if there
is....

HTH
Erick

On Fri, Mar 12, 2010 at 7:51 AM, Vinicius Carvalho<
viniciusccarva...@gmail.com>  wrote:

Hello there! We are indexing metadata for our medias. One ideia is that
each
user adds its own metadata, so each document may have different
number/name/type of fields. Is this ok on Lucene? I mean, is Lucene ok with
the this relax approach.

Also, considering that each user may define its own metadata, we may have
several different types of fields. Is there a limit for this?

Regards

--
The intuitive mind is a sacred gift and the
rational mind is a faithful servant. We have
created a society that honors the servant and
has forgotten the gift.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to