Throwing this out for discussion, its roots are that a client raised the question referred to here: https://issues.apache.org/jira/browse/LUCENE-1761
That got me to thinking. We often recommend "just reindex" as a solution for numbers of issues. As Lucene is used on ever-larger collections of documents, reindexing can get more and more painful/unlikely and perhaps impossible. I'd like to discuss a bit whether Lucene has evolved new capabilities that we might want to use to support some (limited) "index maintenance" tasks. So let's postulate an "index maintenance" utility. Checkindex is already one diagnostic/maintenance task. IndexUpgrade could be thought of as a maintenance task too. Would it be possible (and worth the effort) to add other kinds of tasks, Especially with DocValues type fields? Some kinds of functions that come to mind: > remove field X from the corpus > Add field Y to every doc in the corpus with default value Z (maybe docValues > only?) > ??? Possibly some of the kinds of maintenance tasks that people dream up could be restricted to, say, docValues fields or primitive types. I don't see how it would work to delete a text-based field and re-add it with a different analysis chain, that's just crazy. But Solr/Lucene/ES are being more and more used in analytics functions rather than straight text search. I continually have to check my assumptions at the client's door about what the purpose of "search" is. I've seen many situations where the search is simple keyword matching, the value the client sees is being able to perform analytics on the results of those simple searches. Now, I'll be the first to admit that my knowledge of Lucene internals (and that's where the bulk of the work would be no doubt) is scanty and it may be that this is all totally impossible. It may also be that there are few enough use-cases for this kind of thing that the effort is a poor ROI. That said, we've been handing out advice of "just reindex" for a long while. Reindexing may just not be possible so I wanted to kick off a discussion here. Thoughts? Erick --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
