dsmiley commented on code in PR #2395: URL: https://github.com/apache/solr/pull/2395#discussion_r1586827784
########## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessor.java: ########## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.Locale; +import org.apache.solr.common.SolrException; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.update.AddUpdateCommand; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NumFieldLimitingUpdateRequestProcessor extends UpdateRequestProcessor { Review Comment: javadoc needed, even if one line linking to a factory for further info. Or don't make public :-) ########## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ########## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { + + private int numFields; + + public NumFieldsMonitor(SolrCore core) { + super(core); + this.numFields = -1; + } + + @Override + public void postCommit() { + RefCounted<SolrIndexSearcher> indexSearcher = null; + try { + indexSearcher = getCore().getSearcher(); Review Comment: see SolrCore.withSearcher to make this code less involved. I added that long ago and getSearcher's javadocs recommend using it. ########## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ########## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * <p>The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * <p>Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * <p>{@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * <ul> + * <li><code>maxFields</code> - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * <li><code>warnOnly</code> - (optional) If <code>true</code> then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to <code>false + * </code> if not specified + * </ul> + * + * @since 9.6.0 Review Comment: You know what'd be cool? If we had a magic string like VERSION_ON_RELEASE and the release wizard automated a find-replace on the relevant branches. Hey @janhoy -- just sharing my wild ideas :-) ########## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ########## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { Review Comment: public? make inter class of the URP / factory? ########## solr/core/src/test/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorIntegrationTest.java: ########## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.UUID; +import java.util.concurrent.TimeUnit; +import org.apache.solr.client.solrj.request.CollectionAdminRequest; +import org.apache.solr.cloud.SolrCloudTestCase; +import org.apache.solr.common.SolrInputDocument; +import org.hamcrest.Matchers; +import org.junit.BeforeClass; +import org.junit.Test; + +public class NumFieldLimitingUpdateRequestProcessorIntegrationTest extends SolrCloudTestCase { + + private static String SINGLE_SHARD_COLL_NAME = "singleShardColl"; Review Comment: this is the only collection you are using in your test, yet it has this curious name. Why not just use "collection1" and call it COLL_NAME? ########## solr/solr-ref-guide/modules/configuration-guide/pages/update-request-processors.adoc: ########## @@ -337,6 +337,12 @@ Documents processed prior to the offender are indexed by Solr; documents followi + Alternatively, the processor offers a "permissive" mode (`permissiveMode=true`) which skips the offending document and logs a warning, but doesn't abort the remainder of the batch or return an error to users. +{solr-javadocs}/core/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.html[NumFieldLimitingUpdateRequestProcessorFactory]:: Fails update requests once a core has exceeded a configurable "maximum" number of fields. ++ +Solr performance can degrade and even become unstable if cores accumulate too many (e.g. more than 500) fields. The "NumFieldLimiting" URP is offered as a safeguard that helps users notice potentially-dangerous schema design and/or mis-use of dynamic fields, before these performance and stability problems would manifest. +Note that the field count an index reports can be influenced by deleted (but not yet purged) documents, and may vary from replica to replica. +In order to avoid these sort of discrepancies between replicas, use of this URP should almost always precede DistributedUpdateProcessor in when running in SolrCloud mode. Review Comment: This advice surprised me. I guess why not, as only the first node receiving the batch will evaluate this logic. It may/may-not be the leader but whatever. ########## solr/core/src/test-files/solr/configsets/cloud-minimal-field-limiting/conf/solrconfig.xml: ########## @@ -0,0 +1,73 @@ +<?xml version="1.0" ?> + +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<!-- Minimal solrconfig.xml with /select, /admin and /update only --> + +<config> + + <dataDir>${solr.data.dir:}</dataDir> + + <directoryFactory name="DirectoryFactory" + class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> + <schemaFactory class="ClassicIndexSchemaFactory"/> + + <luceneMatchVersion>${tests.luceneMatchVersion:LATEST}</luceneMatchVersion> + + <updateHandler class="solr.DirectUpdateHandler2"> + <commitWithin> + <softCommit>${solr.commitwithin.softcommit:true}</softCommit> + </commitWithin> + <updateLog class="${solr.ulog:solr.UpdateLog}"></updateLog> + </updateHandler> + + <requestHandler name="/select" class="solr.SearchHandler"> + <lst name="defaults"> + <str name="echoParams">explicit</str> + <str name="indent">true</str> + <str name="df">text</str> + </lst> + </requestHandler> + + <updateProcessor class="solr.NumFieldLimitingUpdateRequestProcessorFactory" name="max-fields"> + <int name="maxFields">${solr.test.maxFields:1234}</int> + </updateProcessor> Review Comment: why declare this separately instead of in the one chain that uses it? It's not obvious, when looking at the chain, that there is this additional processor. ########## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ########## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { + + private int numFields; + + public NumFieldsMonitor(SolrCore core) { + super(core); + this.numFields = -1; + } + + @Override + public void postCommit() { Review Comment: do we need this at all? Won't the callback for newSearcher address this? ########## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ########## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * <p>The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * <p>Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * <p>{@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * <ul> + * <li><code>maxFields</code> - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * <li><code>warnOnly</code> - (optional) If <code>true</code> then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to <code>false + * </code> if not specified + * </ul> + * + * @since 9.6.0 Review Comment: 9.7.0 ########## solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java: ########## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.common.util.NamedList; +import org.apache.solr.core.SolrCore; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.plugin.SolrCoreAware; + +/** + * This factory generates an UpdateRequestProcessor which fails update requests once a core has + * exceeded a configurable maximum number of fields. Meant as a safeguard to help users notice + * potentially-dangerous schema design before performance and stability problems start to occur. + * + * <p>The URP uses the core's {@link SolrIndexSearcher} to judge the current number of fields. + * Accordingly, it undercounts the number of fields in the core - missing all fields added since the + * previous searcher was opened. As such, the URP's request-blocking is "best effort" - it cannot be + * relied on as a precise limit on the number of fields. + * + * <p>Additionally, the field-counting includes all documents present in the index, including any + * deleted docs that haven't yet been purged via segment merging. Note that this can differ + * significantly from the number of fields defined in managed-schema.xml - especially when dynamic + * fields are enabled. The only way to reduce this field count is to delete documents and wait until + * the deleted documents have been removed by segment merges. Users may of course speed up this + * process by tweaking Solr's segment-merging, triggering an "optimize" operation, etc. + * + * <p>{@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two configuration parameters: + * + * <ul> + * <li><code>maxFields</code> - (required) The maximum number of fields before update requests + * should be aborted. Once this limit has been exceeded, additional update requests will fail + * until fields have been removed or the "maxFields" is increased. + * <li><code>warnOnly</code> - (optional) If <code>true</code> then the URP logs verbose warnings + * about the limit being exceeded but doesn't abort update requests. Defaults to <code>false + * </code> if not specified + * </ul> + * + * @since 9.6.0 + */ +public class NumFieldLimitingUpdateRequestProcessorFactory extends UpdateRequestProcessorFactory + implements SolrCoreAware { + + private static final String MAXIMUM_FIELDS_PARAM = "maxFields"; + private static final String WARN_ONLY_PARAM = "warnOnly"; + + private NumFieldsMonitor numFieldsMonitor; + private int maximumFields; + private boolean warnOnly; + + @Override + public void inform(final SolrCore core) { + // register a commit callback for monitoring the number of fields in the schema + numFieldsMonitor = new NumFieldsMonitor(core); + core.getUpdateHandler().registerCommitCallback(numFieldsMonitor); + core.registerNewSearcherListener(numFieldsMonitor); + } + + @Override + public void init(NamedList<?> args) { + warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? args.getBooleanArg(WARN_ONLY_PARAM) : false; + + if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) { + throw new IllegalArgumentException( + "The " + + MAXIMUM_FIELDS_PARAM + + " parameter is required for " + + getClass().getName() + + ", but no value was provided."); + } + final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM); + if (rawMaxFields == null || !(rawMaxFields instanceof Integer)) { + throw new IllegalArgumentException( + MAXIMUM_FIELDS_PARAM + " must be configured as a non-null <int>"); + } + maximumFields = (Integer) rawMaxFields; + if (maximumFields <= 0) { + throw new IllegalArgumentException(MAXIMUM_FIELDS_PARAM + " must be a positive integer"); + } + } + + @Override + public UpdateRequestProcessor getInstance( + SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { + + // TODO Should we skip to the next URP if running in SolrCloud and *not* a leader? + return new NumFieldLimitingUpdateRequestProcessor( + req, next, maximumFields, numFieldsMonitor.getCurrentNumFields(), warnOnly); Review Comment: Woah, okay, I wasn't expecting this. The `numFieldsMonitor.getCurrentNumFields` is only read on load of the URP; it's not per-doc. If that's adequate (I suppose; this is just a soft limit) then this can be much simpler -- just go read the SolrIndexSearcher right now to get the field count! No NumFieldsMonitor needed. Furthermore the URP could be renamed to simply "BlockNewDocsUrp" and is small enough can be an inner class really. ########## solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java: ########## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.update.processor; + +import org.apache.solr.core.AbstractSolrEventListener; +import org.apache.solr.core.SolrCore; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RefCounted; + +public class NumFieldsMonitor extends AbstractSolrEventListener { Review Comment: I suppose you didn't like my proposal that would eliminate the need for this event listener; it'd keep the logic within NumFieldLimitingURP to grab the latest searcher, if unchanged, and basically just lookup the fieldInfos size at that point. Maybe the need for a WeakReference made it be sort of complex, albeit this new component for an event listener perhaps could be said is also complex. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org