[ 
https://issues.apache.org/jira/browse/SOLR-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339787#comment-17339787
 ] 

Timothy Potter commented on SOLR-15277:
---------------------------------------

I actually think the tree UI element works pretty well for the schema editor as 
it collects the various schema object types in separate folders, with an 
emphasis on fields. It also allows the user to select a node in the tree and 
then see details for the selected node on the right of the tree. What more 
could you want? An accordion would still need to show a list of objects in each 
pane and then details of the selected field on the right, so all you'd be doing 
is replacing collapsable folders in the tree with an accordion? That doesn't 
seem like a significant improvement to me and would require introducing yet 
another UI component with the right licensing. The {{jstree}} component is 
already in the project and works good enough for this scenario.

Will port the SchemaDesignerSettings DAO logic to use ConfigSet metadata 
instead of overlay.

As far as removing schema guessing, I'm not sure? This is a new tool that users 
can optionally use to build / refine their schema, so I see it more as a 
complementary tool with field guessing, one that is more iterative and 
interactive. But we still need code that looks at incoming data and "guesses" 
at the correct schema to suggest. Maybe you mean we remove the 
{{add-schema-fields}} step from the field guessing URP Chain bundled in the 
{{_default}} schema? The stages to normalize field names and do some Locale 
specific parsing seem useful to me. Right now, we give the user a warning when 
using the {{_default}} schema in production but don't really give them any 
additional advice / help on what to do otherwise. Now with the schema designer, 
we can go one step further and suggest users refine their schema using the UI. 
The ultimate problem here is if you get the schema wrong and then try to make 
changes after some docs are indexed, it's cumbersome with our APIs, esp. for 
new users.

> Schema Designer in Admin UI
> ---------------------------
>
>                 Key: SOLR-15277
>                 URL: https://issues.apache.org/jira/browse/SOLR-15277
>             Project: Solr
>          Issue Type: New Feature
>          Components: Admin UI
>            Reporter: Timothy Potter
>            Assignee: Timothy Potter
>            Priority: Major
>         Attachments: schema-designer-1.png, schema-designer-2.png, 
> schema-designer-3.png, schema-designer-4.png, schema-designer-5.png, 
> schema-designer-6.png
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Augment Solr’s schema-guessing (aka “schemaless”) mode with a new interactive 
> Schema Designer feature in the Admin UI to improve the getting started 
> experience.
> The goal of Solr’s current schema guessing mode was to reduce friction when 
> first getting started. However, the current solution suffers from two main 
> problems:
>  # Most decisions are made based on the first doc seen by the schema guessing 
> logic and thus leads to poor decisions around text fields, single vs. 
> multi-valued, and numeric types. Modern data is complicated and any opaque 
> schema guessing tool that only looks at a single doc is too limited. For an 
> in-depth analysis of the issues surrounding this feature, see: 
> https://issues.apache.org/jira/browse/SOLR-14701.
>  # Difficult to iterate and refine the schema. If an incorrect decision is 
> made by the schema guesser, Solr puts the onus on the user to troubleshoot, 
> typically requiring looking at logs, address issues with the guessed schema 
> (via cumbersome API calls and knowledge of fields / field types / dynamic 
> fields, etc), delete and re-index the documents. Instead of a friendly 
> getting started experience, the user now has to come up a steep learning 
> curve of looking at logs, deleting documents, using the Schema and/or 
> ConfigSets API correctly, and re-indexing. Operations like changing a 
> single-valued to multi-valued field (or vice-versa) with docValues enabled 
> requires deleting the entire Lucene index and rebuilding it.
> Put frankly, the current “getting started” experience misses the mark on ease 
> of use. The community is largely in agreement of this fact and seeks a better 
> solution. Problem #1 can be addressed using a sampling approach where the 
> schema guessing logic looks at multiple docs instead of a single before 
> making decisions.
> Problem #2 requires a solution that allows users to quickly iterate on the 
> schema design and immediately see the results of a change. No API only 
> solution is sufficient for solving this issue. Users need a GUI to assist 
> them in tuning the schema interactively without having to mess with XML or 
> the Schema or ConfigSet APIs directly.
> We can assume that users will be able to start Solr locally and launch the 
> Admin UI. I don’t think we can throw them directly into defining a collection 
> (config set, shards, replicas, etc). But we can safely assume they have some 
> data they want to search. Thus, a GUI driven approach based around the user’s 
> sample data is a natural first step for improving the getting started 
> experience (see attached schema-designer-1.png).
> Moreover, Solr schema design involves a number of non-trivial concepts that 
> may be unfamiliar to new users, e.g. dynamic fields, doc values, copy fields, 
> indexed vs. stored, term vectors, dynamic fields, and so on. A GUI based 
> approach can guide the user in the nuances of Solr schemas. Context sensitive 
> help can link to the Reference Guide. 
> The best way to do that is show how their data will get indexed (visually) 
> and let them tweak the results interactively. For instance, if you uncheck 
> *indexed* for a field, the user will see that they cannot sort by that field 
> in the Query Tester. The *Query Tester* will be schema driven with type-ahead 
> drop-down fields populated from the current schema. If users change the stop 
> words file, they can see the result take effect immediately in the UI.
> h3. Workflow
> Screenshots from a prototype schema designer UI are attached to this Jira 
> (schema-designer-1.png). The prototype repurposes several existing views into 
> a more seamless, interactive workflow vs. a number of different screens which 
> require the user to stitch together a cohesive experience.
> The basic workflow for the Schema Designer is:
>  # Launch Solr and open Admin UI, click on *Schema Designer*. At this point, 
> there are no cores or collections but the *_default* config set is loaded 
> into ZK. The end user does not care about collections or cores or config sets 
> at this point. Rather, their main goal is to get some data indexed correctly 
> so they can start playing around with Solr queries, i.e. the fun stuff.
>  # User either selects an existing schema (via type-ahead drop-down) or 
> enters the name of a new schema, e.g. “books”. If new, then the *_default* 
> configset is used as the starting point. (see attached schema-designer-2.png)
>  # Next, the user either uploads sample docs or pastes text into the sample 
> docs text area. 
>  # User pushes the *Analyze Documents* button, which populates the *Schema 
> Editor* tree in the middle with the results of the “guessing”. This is where 
> we can apply as much intelligence as possible to aid the user in getting 
> started.
>  # The *Schema Editor* is a tree with nodes for *Fields*, *Field Types*, and 
> *Files*, with the *Fields* tab being their main focus. User tweaks the schema 
> settings for each field as needed. They can also add new fields & field types.
>  # When saving changes, the updates are stored in a temp configset in ZK. 
> This way, the user won’t lose any changes if their connection drops or they 
> leave and come back a few days later. Live config sets will not be affected 
> until the user *Publishes* their changes.
>  # Users can switch types (string -> text), single/multi-valued, enable doc 
> values, vectors, etc directly in the Schema editor.
>  # As the user refines their schema, they can use the *Query Tester* form in 
> the lower left to see how their schema changes impact document matching 
> results.
>  # As the user changes their schema, the query is re-executed against the 
> updates. Behind the scenes, the Schema Designer may need to delete and 
> re-index all sample documents, but this is transparent to the user.
>  # Once satisfied with the schema, the user can apply the changes to Solr 
> directly via *Publish* (save as a ConfigSet in ZK) or download the ConfigSet 
> to a zip file. (see schema-designer-3.png) The user can choose to index the 
> sample docs after applying the updates by specifying a target collection. If 
> the collection doesn’t exist, the Schema Designer creates it on-the-fly using 
> the saved Config Set. Our goal is ease of use, so we don’t want to make the 
> user go elsewhere to create a collection, just do it inline if that’s what 
> they want.
> h3. Design Notes
> During the analysis step, the designer backend creates a temporary config set 
> in ZK named *_designer_<schema>*, where *<schema>* is provided by the user, 
> such as “books” in the example wireframe. This allows the designer backend to 
> persist changes to the schema automatically as the user refines the schema. 
> We use a temporary configset in ZK so that live configsets and collections 
> are not affected during the refinement process. The ZK version of the schema 
> is used to enforce MVCC to ensure that two users cannot step on each other’s 
> changes concurrently. Although, it’s envisioned that the typical use case is 
> for one user to refine the schema at a time.
> Additionally, during the refinement process, the schema designer creates a 
> temporary collection named *_designer_<schema>*. The temp collection allows 
> the designer backend to quickly index the sample docs to support the Query 
> Tester feature. It also serves as a real-time tester of the schema changes 
> before the changes are applied to live collections.
> The sample docs provided by the user are stored in the Solr blob store so 
> they don’t have to be re-parsed on every change to the schema / query request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to