[ https://issues.apache.org/jira/browse/SOLR-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433074#comment-17433074 ]
Timothy Potter edited comment on SOLR-14701 at 10/22/21, 5:27 PM: ------------------------------------------------------------------ Schema Designer in the UI gives us a nice interactive getting started experience and supports an API endpoint where users can push a bunch of arbitrary docs into a temporary staging area and then tweak / refine the schema from the UI. Here's an example in the ref guide that illustrates this using the techproducts docs: https://solr.apache.org/guide/8_10/schema-designer.html#iteratively-post-sample-documents (it's just basic and can be enhanced to support richer document types such as PDF). So I'm all for deprecating the {{add-unknown-fields-to-the-schema}} URP chain from the {{_default}} schema in 8.11 and removing in 9. Any "guessing" features should be moved into the Schema Designer backend ... that should make it clear that Solr has tools to help users build an initial schema but they're part of a design process and aren't supported for indexing into live production collections. The main open issue is the URP's in the {{add-unknown-fields-to-the-schema}} that do transformation / smart parsing on the input data, e.g. {{parse-date}}; Alexandre brought this issue up previously (see above). These transformational URPs can be useful b/c they allow for some flexibility in the format of incoming fields, e.g. you send text that looks like a timestamp into Solr and send up with a {{pdate}} field. I'm actually fine with removing these in 9 too and requiring well-formed input data, as most modern indexing solutions require a lot more transformation / parsing / enrichment on data destined for Solr, so removing basic transformations from the URP chain is probably not a huge loss for most users. In other words, most indexing clients are probably already doing some other, possibly more complicated, transformation on the data before Solr sees it, so these apps don't really need Solr to try to parse dates for them, etc. Moreover, keeping the transformational URP's in the chain can certainly be an option that the Schema Designer offers via a toggle: flexible date parsing? check ... and so on. Heck, the SD could even compare the results with and without these URPs and automatically suggest the one's that "fired" on the sample input documents. So the tl;dr here for Solr 9 is: * Deprecate the {{add-unknown-fields-to-the-schema}} URP chain from the {{_default}} schema in 8.11, as well as the {{solr.AddSchemaFieldsUpdateProcessorFactory}}. Remove them in 9.0 * Keep the "transformational" URP stages like {{parse-date}} and wire into the Schema Designer UI to let users toggle these features on/off for their URP chain * Continually improve the Schema Designer experience throughout Solr 9.x, such as adding support for PDFs and other common rich document types * Update the ref guide to point users to the Schema Designer as a getting started tool; also remove all the field guessing content was (Author: thelabdude): Schema Designer in the UI gives us a nice interactive getting started experience and supports an API endpoint where users can push a bunch of arbitrary docs into a temporary staging area and then tweak / refine the schema from the UI. Here's an example in the ref guide that illustrates this using the techproducts docs: https://solr.apache.org/guide/8_10/schema-designer.html#iteratively-post-sample-documents (it's just basic and can be enhanced to support richer document types such as PDF). So I'm all for deprecating the {{add-unknown-fields-to-the-schema}} URP chain from the {{_default}} schema in 8.11 and removing in 9. Any "guessing" features should be moved into the Schema Designer backend ... that should make it clear that Solr has tools to help users build an initial schema but they're part of a design process and aren't supported for indexing into live production collections. The main open issue is the URP's in the {{add-unknown-fields-to-the-schema}} that do transformation / smart parsing on the input data, e.g. {{parse-date}}; Alexandre brought this issue up previously (see above). These transformational URPs can be useful b/c they allow for some flexibility in the format of incoming fields, e.g. you send text that looks like a timestamp into Solr and send up with a {{pdate}} field. I'm actually fine with removing these in 9 too and requiring well-formed input data, as most modern indexing solutions require a lot more transformation / parsing / enrichment on data destined for Solr, so removing basic transformations from the URP chain is probably not a huge loss for most users. In other words, most indexing clients are probably already doing some other, possibly more complicated, transformation on the data before Solr sees it, so these apps don't really need Solr to try to parse dates for them, etc. Moreover, keeping the transformational URP's in the chain can certainly be an option that the Schema Designer offers via a toggle: flexible date parsing? check ... and so on. So the tl;dr here for Solr 9 is: * Deprecate the {{add-unknown-fields-to-the-schema}} URP chain from the {{_default}} schema in 8.11, as well as the {{solr.AddSchemaFieldsUpdateProcessorFactory}}. Remove them in 9.0 * Keep the "transformational" URP stages like {{parse-date}} and wire into the Schema Designer UI to let users toggle these features on/off for their URP chain * Continually improve the Schema Designer experience throughout Solr 9.x, such as adding support for PDFs and other common rich document types * Update the ref guide to point users to the Schema Designer as a getting started tool; also remove all the field guessing content > Deprecate Schemaless Mode (Discussion) > -------------------------------------- > > Key: SOLR-14701 > URL: https://issues.apache.org/jira/browse/SOLR-14701 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis > Reporter: Marcus Eagan > Priority: Blocker > Fix For: main (9.0) > > Attachments: image-2020-08-04-01-35-03-075.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > I know this won't be the most popular ticket out there, but I am growing more > and more sympathetic to the idea that we should rip many of the freedoms out > that cause users more harm than not. One of the freedoms I saw time and time > again to cause issues was schemaless mode. It doesn't work as named or > documented, so I think it should be deprecated. > If you use it in production reliably and in a way that cannot be accomplished > another way, I am happy to hear from more knowledgeable folks as to why > deprecation is a bad idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org