Re: [PR] Compute facets while collecting [lucene]

via GitHub Fri, 26 Jul 2024 07:45:46 -0700


gsmiller commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1693205018



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/FacetRecorder.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.recorders;
+
+import java.io.IOException;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter;
+import org.apache.lucene.sandbox.facet.misc.FacetRollup;
+import org.apache.lucene.sandbox.facet.ordinals.OrdinalIterator;
+
+/**
+ * Record data for each facet of each doc.
+ *
+ * <p>TODO: In the next iteration we can add an extra layer between 
FacetRecorder and
+ * LeafFacetRecorder, e.g. SliceFacetRecorder. The new layer will be created 
per {@link
+ * org.apache.lucene.search.Collector}, which means that collecting of 
multiple leafs (segments)
+ * within a slice is sequential and can be done to a single non-sync map to 
improve performance and
+ * reduce memory consumption. We already tried that, but didn't see any 
performance improvement.
+ * Given that it also makes lazy leaf recorder init in {@link
+ * org.apache.lucene.sandbox.facet.FacetFieldCollector} trickier, it was 
decided to rollback the
+ * initial attempt and try again later, in the next iteration.
+ */
+public interface FacetRecorder {
+  /** Get leaf recorder. */
+  LeafFacetRecorder getLeafRecorder(LeafReaderContext context) throws 
IOException;
+
+  /** Return next collected ordinal, or {@link LeafFacetCutter#NO_MORE_ORDS} */
+  OrdinalIterator recordedOrds();
+
+  /** True if there are no records */
+  boolean isEmpty();

Review Comment:
   Got it. I'm still not sure how useful `isEmpty` is though? It would only 
tell you that a recorder didn't record anything right? I wonder how common that 
is? As a counter point, a common faceting use-case with taxonomy faceting is to 
pack all dimensions into one index field. If a user is facing on a specific set 
of dimensions in a global field like this, it can also be the case that the 
recorders "sees" plenty of facets but none in the dimensions the user cares 
about. The `isEmpty` method doesn't cover that case at all (you still have to 
create an iterator over the dims you care about and then find out it's empty). 
So I'm just not sure how useful it is, and advocate for not adding speculative 
API methods unless there's a solid use-case as it can make future extension 
cumbersome/awkward.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Compute facets while collecting [lucene]

Reply via email to