[ https://issues.apache.org/jira/browse/SOLR-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy Potter updated SOLR-15277: ---------------------------------- Attachment: schema-designer-7.png > Schema Designer in Admin UI > --------------------------- > > Key: SOLR-15277 > URL: https://issues.apache.org/jira/browse/SOLR-15277 > Project: Solr > Issue Type: New Feature > Components: Admin UI > Reporter: Timothy Potter > Assignee: Timothy Potter > Priority: Major > Attachments: schema-designer-1.png, schema-designer-2.png, > schema-designer-3.png, schema-designer-4.png, schema-designer-5.png, > schema-designer-6.png, schema-designer-7.png > > Time Spent: 1h > Remaining Estimate: 0h > > Augment Solr’s schema-guessing (aka “schemaless”) mode with a new interactive > Schema Designer feature in the Admin UI to improve the getting started > experience. > The goal of Solr’s current schema guessing mode was to reduce friction when > first getting started. However, the current solution suffers from two main > problems: > # Most decisions are made based on the first doc seen by the schema guessing > logic and thus leads to poor decisions around text fields, single vs. > multi-valued, and numeric types. Modern data is complicated and any opaque > schema guessing tool that only looks at a single doc is too limited. For an > in-depth analysis of the issues surrounding this feature, see: > https://issues.apache.org/jira/browse/SOLR-14701. > # Difficult to iterate and refine the schema. If an incorrect decision is > made by the schema guesser, Solr puts the onus on the user to troubleshoot, > typically requiring looking at logs, address issues with the guessed schema > (via cumbersome API calls and knowledge of fields / field types / dynamic > fields, etc), delete and re-index the documents. Instead of a friendly > getting started experience, the user now has to come up a steep learning > curve of looking at logs, deleting documents, using the Schema and/or > ConfigSets API correctly, and re-indexing. Operations like changing a > single-valued to multi-valued field (or vice-versa) with docValues enabled > requires deleting the entire Lucene index and rebuilding it. > Put frankly, the current “getting started” experience misses the mark on ease > of use. The community is largely in agreement of this fact and seeks a better > solution. Problem #1 can be addressed using a sampling approach where the > schema guessing logic looks at multiple docs instead of a single before > making decisions. > Problem #2 requires a solution that allows users to quickly iterate on the > schema design and immediately see the results of a change. No API only > solution is sufficient for solving this issue. Users need a GUI to assist > them in tuning the schema interactively without having to mess with XML or > the Schema or ConfigSet APIs directly. > We can assume that users will be able to start Solr locally and launch the > Admin UI. I don’t think we can throw them directly into defining a collection > (config set, shards, replicas, etc). But we can safely assume they have some > data they want to search. Thus, a GUI driven approach based around the user’s > sample data is a natural first step for improving the getting started > experience (see attached schema-designer-1.png). > Moreover, Solr schema design involves a number of non-trivial concepts that > may be unfamiliar to new users, e.g. dynamic fields, doc values, copy fields, > indexed vs. stored, term vectors, dynamic fields, and so on. A GUI based > approach can guide the user in the nuances of Solr schemas. Context sensitive > help can link to the Reference Guide. > The best way to do that is show how their data will get indexed (visually) > and let them tweak the results interactively. For instance, if you uncheck > *indexed* for a field, the user will see that they cannot sort by that field > in the Query Tester. The *Query Tester* will be schema driven with type-ahead > drop-down fields populated from the current schema. If users change the stop > words file, they can see the result take effect immediately in the UI. > h3. Workflow > Screenshots from a prototype schema designer UI are attached to this Jira > (schema-designer-1.png). The prototype repurposes several existing views into > a more seamless, interactive workflow vs. a number of different screens which > require the user to stitch together a cohesive experience. > The basic workflow for the Schema Designer is: > # Launch Solr and open Admin UI, click on *Schema Designer*. At this point, > there are no cores or collections but the *_default* config set is loaded > into ZK. The end user does not care about collections or cores or config sets > at this point. Rather, their main goal is to get some data indexed correctly > so they can start playing around with Solr queries, i.e. the fun stuff. > # User either selects an existing schema (via type-ahead drop-down) or > enters the name of a new schema, e.g. “books”. If new, then the *_default* > configset is used as the starting point. (see attached schema-designer-2.png) > # Next, the user either uploads sample docs or pastes text into the sample > docs text area. > # User pushes the *Analyze Documents* button, which populates the *Schema > Editor* tree in the middle with the results of the “guessing”. This is where > we can apply as much intelligence as possible to aid the user in getting > started. > # The *Schema Editor* is a tree with nodes for *Fields*, *Field Types*, and > *Files*, with the *Fields* tab being their main focus. User tweaks the schema > settings for each field as needed. They can also add new fields & field types. > # When saving changes, the updates are stored in a temp configset in ZK. > This way, the user won’t lose any changes if their connection drops or they > leave and come back a few days later. Live config sets will not be affected > until the user *Publishes* their changes. > # Users can switch types (string -> text), single/multi-valued, enable doc > values, vectors, etc directly in the Schema editor. > # As the user refines their schema, they can use the *Query Tester* form in > the lower left to see how their schema changes impact document matching > results. > # As the user changes their schema, the query is re-executed against the > updates. Behind the scenes, the Schema Designer may need to delete and > re-index all sample documents, but this is transparent to the user. > # Once satisfied with the schema, the user can apply the changes to Solr > directly via *Publish* (save as a ConfigSet in ZK) or download the ConfigSet > to a zip file. (see schema-designer-3.png) The user can choose to index the > sample docs after applying the updates by specifying a target collection. If > the collection doesn’t exist, the Schema Designer creates it on-the-fly using > the saved Config Set. Our goal is ease of use, so we don’t want to make the > user go elsewhere to create a collection, just do it inline if that’s what > they want. > h3. Design Notes > During the analysis step, the designer backend creates a temporary config set > in ZK named *_designer_<schema>*, where *<schema>* is provided by the user, > such as “books” in the example wireframe. This allows the designer backend to > persist changes to the schema automatically as the user refines the schema. > We use a temporary configset in ZK so that live configsets and collections > are not affected during the refinement process. The ZK version of the schema > is used to enforce MVCC to ensure that two users cannot step on each other’s > changes concurrently. Although, it’s envisioned that the typical use case is > for one user to refine the schema at a time. > Additionally, during the refinement process, the schema designer creates a > temporary collection named *_designer_<schema>*. The temp collection allows > the designer backend to quickly index the sample docs to support the Query > Tester feature. It also serves as a real-time tester of the schema changes > before the changes are applied to live collections. > The sample docs provided by the user are stored in the Solr blob store so > they don’t have to be re-parsed on every change to the schema / query request. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org