[
https://issues.apache.org/jira/browse/SOLR-6913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270981#comment-14270981
]
Grant Ingersoll commented on SOLR-6913:
---------------------------------------
I think the regular workflow for exploring new datasets is to just start
throwing it at Solr and then to tweak the data, not tweak the schema. Data
first, schema second. So, for instance, I'm working on this citibike data. My
first step is to index it w/ no schema whatsoever. I then iterate by writing a
little python to index some of the columns as spatial. What I don't do is go
muck w/ the schema, hence the name data-driven.
> audit & cleanup "schema" in data_driven_schema_configs
> ------------------------------------------------------
>
> Key: SOLR-6913
> URL: https://issues.apache.org/jira/browse/SOLR-6913
> Project: Solr
> Issue Type: Task
> Reporter: Hoss Man
> Assignee: Steve Rowe
> Priority: Blocker
> Fix For: 5.0, Trunk
>
> Attachments: SOLR-6913-trim-schema.patch,
> SOLR-6913-trim-schema.patch, SOLR-6913.patch
>
>
> the data_driven_schema_configs configset has some issues that should be
> reviewed carefully & cleaned up...
> * currentkly includes a schema.xml file:
> ** this was previously pat of the old example to show the automatic
> "bootstraping" of schema.xml -> managed-schema, but at this point it's just
> kind of confusing
> ** we should just rename this to "managed-schema" in svn - the ref guide
> explains the bootstraping
> * the effective schema as it currently stands includes a bunch of copyFields
> & dynamicFields that are taken wholesale from the techproducts example
> ** some of these might make sense to keep in a general example (ie: "\*_txt")
> but in general they should all be reviewed.
> ** a bunch of this cruft is actually commented out already, but anything we
> don't want to keep should be removed to eliminate confusion
> * SOLR-6471 added an explicit "_text" field as the default and made it a
> copyField catchall (ie: "\*")
> ** the ref guide schema API example responses need to reflect the existence
> of this field:
> https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
> ** we should draw heavy attention to this field+copyField -- both with a "/!\
> NOTE" in the refguide and call it out in solrconfig.xml & "managed-schema"
> file comments since people who start with these configs may be suprised and
> wind up with a very bloated index
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]