Throwing this out for discussion, its roots are that a client raised
the question referred to here:
https://issues.apache.org/jira/browse/LUCENE-1761

That got me to thinking. We often recommend "just reindex" as a
solution for numbers of issues. As Lucene is used on ever-larger
collections of documents, reindexing can get more and more
painful/unlikely and perhaps impossible. I'd like to discuss a bit
whether Lucene has evolved new capabilities that we might want to use
to support some (limited) "index maintenance" tasks.

So let's postulate an "index maintenance" utility. Checkindex is
already one diagnostic/maintenance task. IndexUpgrade could be thought
of as a maintenance task too.

Would it be possible (and worth the effort) to add other kinds of
tasks, Especially with DocValues type fields? Some kinds of functions
that come to mind:
> remove field X from the corpus
> Add field Y to every doc in the corpus with default value Z (maybe docValues 
> only?)
> ???

Possibly some of the kinds of maintenance tasks that people dream up
could be restricted to, say, docValues fields or primitive types. I
don't see how it would work to delete a text-based field and re-add it
with a different analysis chain, that's just crazy.

But Solr/Lucene/ES are being more and more used in analytics functions
rather than straight text search. I continually have to check my
assumptions at the client's door about what the purpose of "search"
is. I've seen many situations where the search is simple keyword
matching, the value the client sees is being able to perform analytics
on the results of those simple searches.

Now, I'll be the first to admit that my knowledge of Lucene internals
(and that's where the bulk of the work would be no doubt) is scanty
and it may be that this is all totally impossible. It may also be that
there are few enough use-cases for this kind of thing that the effort
is a poor ROI.

That said, we've been handing out advice of "just reindex" for a long
while. Reindexing may just not be possible so I wanted to kick off a
discussion here.

Thoughts?
Erick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to