[
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286909#comment-13286909
]
Andrzej Bialecki commented on LUCENE-3312:
-------------------------------------------
Comments to patch 04:
* index.Document is an interface, I think for better extensibility in the
future it could be an abstract class - who knows what we will want to put there
in addition to the iterators...
* as noted on IRC, this strong decoupling of stored and indexed content poses
some interesting questions:
** since you can add multiple fields with the same name, you can now add an
arbitrary sequence of Stored and Indexed fields (all with the same name). This
means that you can now store parts of a field that are not indexed, and parts
of a field that are indexed but not stored.
** previously, if a field was flagged as indexed but didn't have a tokenStream,
its String or Reader value would be used to create a token stream. Now if you
want a value to be stored and indexed you have to add two fields with the same
name - one StoredField and the other an IndexedField for which you create a
token stream from the value. My assumption is that StoredField-s will never be
used anymore as potential sources of token streams?
* maybe this is a good moment to change all getters that return arrays of
fields or values to return List-s, since all the code is doing underneath is
collecting them into lists and then converting to arrays?
* previously we allowed one to remove fields from document by name, are we
going to allow this now separately for indexed and stored fields?
* minor nit: there's a grammar mistake in Field.setTokenStream(..):
"TokenStream fields tokenized".
> Break out StorableField from IndexableField
> -------------------------------------------
>
> Key: LUCENE-3312
> URL: https://issues.apache.org/jira/browse/LUCENE-3312
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/index
> Reporter: Michael McCandless
> Assignee: Nikola Tankovic
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: Field Type branch
>
> Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch,
> lucene-3312-patch-03.patch, lucene-3312-patch-04.patch
>
>
> In the field type branch we have strongly decoupled
> Document/Field/FieldType impl from the indexer, by having only a
> narrow API (IndexableField) passed to IndexWriter. This frees apps up
> use their own "documents" instead of the "user-space" impls we provide
> in oal.document.
> Similarly, with LUCENE-3309, we've done the same thing on the
> doc/field retrieval side (from IndexReader), with the
> StoredFieldsVisitor.
> But, maybe we should break out StorableField from IndexableField,
> such that when you index a doc you provide two Iterables -- one for the
> IndexableFields and one for the StorableFields. Either can be null.
> One downside is possible perf hit for fields that are both indexed &
> stored (ie, we visit them twice, lookup their name in a hash twice,
> etc.). But the upside is a cleaner separation of concerns in API....
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]