[ 
https://issues.apache.org/jira/browse/SOLR-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-15277:
----------------------------------
    Attachment: schema-designer-7.png

> Schema Designer in Admin UI
> ---------------------------
>
>                 Key: SOLR-15277
>                 URL: https://issues.apache.org/jira/browse/SOLR-15277
>             Project: Solr
>          Issue Type: New Feature
>          Components: Admin UI
>            Reporter: Timothy Potter
>            Assignee: Timothy Potter
>            Priority: Major
>         Attachments: schema-designer-1.png, schema-designer-2.png, 
> schema-designer-3.png, schema-designer-4.png, schema-designer-5.png, 
> schema-designer-6.png, schema-designer-7.png
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Augment Solr’s schema-guessing (aka “schemaless”) mode with a new interactive 
> Schema Designer feature in the Admin UI to improve the getting started 
> experience.
> The goal of Solr’s current schema guessing mode was to reduce friction when 
> first getting started. However, the current solution suffers from two main 
> problems:
>  # Most decisions are made based on the first doc seen by the schema guessing 
> logic and thus leads to poor decisions around text fields, single vs. 
> multi-valued, and numeric types. Modern data is complicated and any opaque 
> schema guessing tool that only looks at a single doc is too limited. For an 
> in-depth analysis of the issues surrounding this feature, see: 
> https://issues.apache.org/jira/browse/SOLR-14701.
>  # Difficult to iterate and refine the schema. If an incorrect decision is 
> made by the schema guesser, Solr puts the onus on the user to troubleshoot, 
> typically requiring looking at logs, address issues with the guessed schema 
> (via cumbersome API calls and knowledge of fields / field types / dynamic 
> fields, etc), delete and re-index the documents. Instead of a friendly 
> getting started experience, the user now has to come up a steep learning 
> curve of looking at logs, deleting documents, using the Schema and/or 
> ConfigSets API correctly, and re-indexing. Operations like changing a 
> single-valued to multi-valued field (or vice-versa) with docValues enabled 
> requires deleting the entire Lucene index and rebuilding it.
> Put frankly, the current “getting started” experience misses the mark on ease 
> of use. The community is largely in agreement of this fact and seeks a better 
> solution. Problem #1 can be addressed using a sampling approach where the 
> schema guessing logic looks at multiple docs instead of a single before 
> making decisions.
> Problem #2 requires a solution that allows users to quickly iterate on the 
> schema design and immediately see the results of a change. No API only 
> solution is sufficient for solving this issue. Users need a GUI to assist 
> them in tuning the schema interactively without having to mess with XML or 
> the Schema or ConfigSet APIs directly.
> We can assume that users will be able to start Solr locally and launch the 
> Admin UI. I don’t think we can throw them directly into defining a collection 
> (config set, shards, replicas, etc). But we can safely assume they have some 
> data they want to search. Thus, a GUI driven approach based around the user’s 
> sample data is a natural first step for improving the getting started 
> experience (see attached schema-designer-1.png).
> Moreover, Solr schema design involves a number of non-trivial concepts that 
> may be unfamiliar to new users, e.g. dynamic fields, doc values, copy fields, 
> indexed vs. stored, term vectors, dynamic fields, and so on. A GUI based 
> approach can guide the user in the nuances of Solr schemas. Context sensitive 
> help can link to the Reference Guide. 
> The best way to do that is show how their data will get indexed (visually) 
> and let them tweak the results interactively. For instance, if you uncheck 
> *indexed* for a field, the user will see that they cannot sort by that field 
> in the Query Tester. The *Query Tester* will be schema driven with type-ahead 
> drop-down fields populated from the current schema. If users change the stop 
> words file, they can see the result take effect immediately in the UI.
> h3. Workflow
> Screenshots from a prototype schema designer UI are attached to this Jira 
> (schema-designer-1.png). The prototype repurposes several existing views into 
> a more seamless, interactive workflow vs. a number of different screens which 
> require the user to stitch together a cohesive experience.
> The basic workflow for the Schema Designer is:
>  # Launch Solr and open Admin UI, click on *Schema Designer*. At this point, 
> there are no cores or collections but the *_default* config set is loaded 
> into ZK. The end user does not care about collections or cores or config sets 
> at this point. Rather, their main goal is to get some data indexed correctly 
> so they can start playing around with Solr queries, i.e. the fun stuff.
>  # User either selects an existing schema (via type-ahead drop-down) or 
> enters the name of a new schema, e.g. “books”. If new, then the *_default* 
> configset is used as the starting point. (see attached schema-designer-2.png)
>  # Next, the user either uploads sample docs or pastes text into the sample 
> docs text area. 
>  # User pushes the *Analyze Documents* button, which populates the *Schema 
> Editor* tree in the middle with the results of the “guessing”. This is where 
> we can apply as much intelligence as possible to aid the user in getting 
> started.
>  # The *Schema Editor* is a tree with nodes for *Fields*, *Field Types*, and 
> *Files*, with the *Fields* tab being their main focus. User tweaks the schema 
> settings for each field as needed. They can also add new fields & field types.
>  # When saving changes, the updates are stored in a temp configset in ZK. 
> This way, the user won’t lose any changes if their connection drops or they 
> leave and come back a few days later. Live config sets will not be affected 
> until the user *Publishes* their changes.
>  # Users can switch types (string -> text), single/multi-valued, enable doc 
> values, vectors, etc directly in the Schema editor.
>  # As the user refines their schema, they can use the *Query Tester* form in 
> the lower left to see how their schema changes impact document matching 
> results.
>  # As the user changes their schema, the query is re-executed against the 
> updates. Behind the scenes, the Schema Designer may need to delete and 
> re-index all sample documents, but this is transparent to the user.
>  # Once satisfied with the schema, the user can apply the changes to Solr 
> directly via *Publish* (save as a ConfigSet in ZK) or download the ConfigSet 
> to a zip file. (see schema-designer-3.png) The user can choose to index the 
> sample docs after applying the updates by specifying a target collection. If 
> the collection doesn’t exist, the Schema Designer creates it on-the-fly using 
> the saved Config Set. Our goal is ease of use, so we don’t want to make the 
> user go elsewhere to create a collection, just do it inline if that’s what 
> they want.
> h3. Design Notes
> During the analysis step, the designer backend creates a temporary config set 
> in ZK named *_designer_<schema>*, where *<schema>* is provided by the user, 
> such as “books” in the example wireframe. This allows the designer backend to 
> persist changes to the schema automatically as the user refines the schema. 
> We use a temporary configset in ZK so that live configsets and collections 
> are not affected during the refinement process. The ZK version of the schema 
> is used to enforce MVCC to ensure that two users cannot step on each other’s 
> changes concurrently. Although, it’s envisioned that the typical use case is 
> for one user to refine the schema at a time.
> Additionally, during the refinement process, the schema designer creates a 
> temporary collection named *_designer_<schema>*. The temp collection allows 
> the designer backend to quickly index the sample docs to support the Query 
> Tester feature. It also serves as a real-time tester of the schema changes 
> before the changes are applied to live collections.
> The sample docs provided by the user are stored in the Solr blob store so 
> they don’t have to be re-parsed on every change to the schema / query request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to