jmalkin commented on code in PR #644:
URL: https://github.com/apache/datasketches-java/pull/644#discussion_r1931180669


##########
src/main/java/org/apache/datasketches/theta/UnionImpl.java:
##########
@@ -328,38 +327,13 @@ public void union(final Sketch sketchIn) {
 
     unionThetaLong_ = min(min(unionThetaLong_, sketchIn.getThetaLong()), 
gadget_.getThetaLong()); //Theta rule
     unionEmpty_ = false;
-    final int curCountIn = sketchIn.getRetainedEntries(true);
-    if (curCountIn > 0) {
-      if (sketchIn.isOrdered() && (sketchIn instanceof CompactSketch)) { //Use 
early stop
-        //Ordered, thus compact
-        if (sketchIn.hasMemory()) {
-          final Memory skMem = sketchIn.getMemory();
-          final int preambleLongs = skMem.getByte(PREAMBLE_LONGS_BYTE) & 0X3F;
-          for (int i = 0; i < curCountIn; i++ ) {
-            final int offsetBytes = preambleLongs + i << 3;
-            final long hashIn = skMem.getLong(offsetBytes);
-            if (hashIn >= unionThetaLong_) { break; } // "early stop"
-            gadget_.hashUpdate(hashIn); //backdoor update, hash function is 
bypassed
-          }
-        }
-        else { //sketchIn is on the Java Heap or has array
-          final long[] cacheIn = sketchIn.getCache(); //not a copy!
-          for (int i = 0; i < curCountIn; i++ ) {
-            final long hashIn = cacheIn[i];
-            if (hashIn >= unionThetaLong_) { break; } // "early stop"
-            gadget_.hashUpdate(hashIn); //backdoor update, hash function is 
bypassed
-          }
-        }
-      } //End ordered, compact
-      else { //either not-ordered compact or Hash Table form. A HT may have 
dirty values.
-        final long[] cacheIn = sketchIn.getCache(); //if off-heap this will be 
a copy
-        final int arrLongs = cacheIn.length;
-        for (int i = 0, c = 0; i < arrLongs && c < curCountIn; i++ ) {
-          final long hashIn = cacheIn[i];
-          if (hashIn <= 0L || hashIn >= unionThetaLong_) { continue; } 
//rejects dirty values
-          gadget_.hashUpdate(hashIn); //backdoor update, hash function is 
bypassed
-          c++; //ensures against invalid state inside the incoming sketch
-        }
+    HashIterator it = sketchIn.iterator();
+    while (it.next()) {
+      final long hash = it.get();
+      if (hash < unionThetaLong_ && hash < gadget_.getThetaLong()) {
+        gadget_.hashUpdate(hash); // backdoor update, hash function is bypassed
+      } else {
+        if (sketchIn.isOrdered()) { break; }

Review Comment:
   Consider a flag



##########
src/main/java/org/apache/datasketches/theta/DirectCompactCompressedSketch.java:
##########
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.theta;
+
+import static org.apache.datasketches.theta.PreambleUtil.extractEntryBitsV4;
+import static 
org.apache.datasketches.theta.PreambleUtil.extractNumEntriesBytesV4;
+import static org.apache.datasketches.theta.PreambleUtil.extractPreLongs;
+import static org.apache.datasketches.theta.PreambleUtil.extractSeedHash;
+import static org.apache.datasketches.theta.PreambleUtil.extractThetaLongV4;
+import static org.apache.datasketches.theta.PreambleUtil.wholeBytesToHoldBits;
+
+import org.apache.datasketches.memory.Memory;
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.datasketches.thetacommon.ThetaUtil;
+
+/**
+ * An off-heap (Direct), compact, compressed, read-only sketch. It is not 
empty, not a single item and ordered.
+ *
+ * <p>This sketch can only be associated with a Serialization Version 4 format 
binary image.</p>
+ *
+ * <p>This implementation uses data in a given Memory that is owned and 
managed by the caller.
+ * This Memory can be off-heap, which if managed properly will greatly reduce 
the need for
+ * the JVM to perform garbage collection.</p>
+ */
+class DirectCompactCompressedSketch extends DirectCompactSketch {
+  /**
+   * Construct this sketch with the given memory.
+   * @param mem Read-only Memory object with the order bit properly set.
+   */
+  DirectCompactCompressedSketch(final Memory mem) {
+    super(mem);
+  }
+
+  /**
+   * Wraps the given Memory, which must be a SerVer 4 compressed CompactSketch 
image.
+   * Must check the validity of the Memory before calling. The order bit must 
be set properly.

Review Comment:
   again



##########
src/main/java/org/apache/datasketches/theta/DirectCompactCompressedSketch.java:
##########
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.theta;
+
+import static org.apache.datasketches.theta.PreambleUtil.extractEntryBitsV4;
+import static 
org.apache.datasketches.theta.PreambleUtil.extractNumEntriesBytesV4;
+import static org.apache.datasketches.theta.PreambleUtil.extractPreLongs;
+import static org.apache.datasketches.theta.PreambleUtil.extractSeedHash;
+import static org.apache.datasketches.theta.PreambleUtil.extractThetaLongV4;
+import static org.apache.datasketches.theta.PreambleUtil.wholeBytesToHoldBits;
+
+import org.apache.datasketches.memory.Memory;
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.datasketches.thetacommon.ThetaUtil;
+
+/**
+ * An off-heap (Direct), compact, compressed, read-only sketch. It is not 
empty, not a single item and ordered.
+ *
+ * <p>This sketch can only be associated with a Serialization Version 4 format 
binary image.</p>
+ *
+ * <p>This implementation uses data in a given Memory that is owned and 
managed by the caller.
+ * This Memory can be off-heap, which if managed properly will greatly reduce 
the need for
+ * the JVM to perform garbage collection.</p>
+ */
+class DirectCompactCompressedSketch extends DirectCompactSketch {
+  /**
+   * Construct this sketch with the given memory.
+   * @param mem Read-only Memory object with the order bit properly set.

Review Comment:
   Rather than "properly set" maybe just say it must be ordered and with the 
bit set to true?



##########
src/main/java/org/apache/datasketches/theta/IntersectionImpl.java:
##########
@@ -448,27 +446,16 @@ private void performIntersect(final Sketch sketchIn) {
     final long[] matchSet = new long[ min(curCount_, 
sketchIn.getRetainedEntries(true)) ];
 
     int matchSetCount = 0;
-    if (sketchIn.isOrdered()) {
-      //ordered compact, which enables early stop
-      for (int i = 0; i < arrLongsIn; i++ ) {
-        final long hashIn = cacheIn[i];
-        //if (hashIn <= 0L) continue;  //<= 0 should not happen
-        if (hashIn >= thetaLong_) {
-          break; //early stop assumes that hashes in input sketch are ordered!
-        }
+    HashIterator it = sketchIn.iterator();
+    while (it.next()) {
+      final long hashIn = it.get();
+      if (hashIn < thetaLong_) {
         final int foundIdx = hashSearch(hashTable, lgArrLongs_, hashIn);
-        if (foundIdx == -1) { continue; }
-        matchSet[matchSetCount++] = hashIn;
-      }
-    }
-    else {
-      //either unordered compact or hash table
-      for (int i = 0; i < arrLongsIn; i++ ) {
-        final long hashIn = cacheIn[i];
-        if (hashIn <= 0L || hashIn >= thetaLong_) { continue; }
-        final int foundIdx = hashSearch(hashTable, lgArrLongs_, hashIn);
-        if (foundIdx == -1) { continue; }
-        matchSet[matchSetCount++] = hashIn;
+        if (foundIdx != -1) {
+          matchSet[matchSetCount++] = hashIn;
+        }
+      } else {
+        if (sketchIn.isOrdered()) { break; } // early stop

Review Comment:
   This won't change over the course of the intersection. Should it be a flag 
pre-computed in advance to avoid a function call every time? Not sure if the 
JVM can easily determine there's no chance of a side-effect from other code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to