dlmarion commented on code in PR #83:
URL: https://github.com/apache/accumulo-access/pull/83#discussion_r1939324935


##########
src/main/java/org/apache/accumulo/access/AccessExpression.java:
##########
@@ -87,109 +97,131 @@
  * @see <a href="https://github.com/apache/accumulo-access";>Accumulo Access 
Documentation</a>
  * @since 1.0.0
  */
-public interface AccessExpression extends Serializable {
+public abstract class AccessExpression implements Serializable {
+  /*
+   * This is package private so that it can not be extended by classes outside 
of this package and
+   * create a mutable implementation. In this package all implementations that 
extends are
+   * immutable.
+   */
+  AccessExpression() {}
 
   /**
    * @return the expression that was used to create this object.
    */
-  String getExpression();
+  public abstract String getExpression();
 
-  /**
-   * @return the unique set of authorizations that occur in the expression. 
For example, for the
-   *         expression {@code (A&B)|(A&C)|(A&D)}, this method would return 
{@code [A,B,C,D]}.
-   */
-  Authorizations getAuthorizations();
+  @Override
+  public boolean equals(Object o) {
+    if (o instanceof AccessExpression) {
+      return ((AccessExpression) o).getExpression().equals(getExpression());
+    }
 
-  /**
-   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
-   */
-  static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+    return false;
   }
 
-  /**
-   * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * When the {@code normalize} parameter is true, then will deduplicate, 
sort, flatten, and remove
-   * unneeded parentheses or quotes in the expressions. Normalization is done 
in addition to
-   * validation. The following list gives examples of what each normalization 
step does.
-   *
-   * <ul>
-   * <li>As an example of flattening, the expression {@code A&(B&C)} flattens 
to {@code A&B&C}.</li>
-   * <li>As an example of sorting, the expression {@code (Z&Y)|(C&B)} sorts to
-   * {@code (B&C)|(Y&Z)}</li>
-   * <li>As an example of deduplication, the expression {@code X&Y&X} 
normalizes to {@code X&Y}</li>
-   * <li>As an example of unneeded quotes, the expression {@code "ABC"&"XYZ"} 
normalizes to
-   * {@code ABC&XYZ}</li>
-   * <li>As an example of unneeded parentheses, the expression {@code 
(((ABC)|(XYZ)))} normalizes to
-   * {@code ABC|XYZ}</li>
-   * </ul>
-   *
-   * @param expression an access expression
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed. If repeatedly
-   *        normalizing expressions, consider using a cache that maps 
un-normalized expressions to
-   *        normalized ones. Since the normalization process is deterministic, 
the computation can
-   *        be cached.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
-   */
-  static AccessExpression of(String expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  @Override
+  public int hashCode() {
+    return getExpression().hashCode();
+  }
+
+  @Override
+  public String toString() {
+    return getExpression();
   }
 
   /**
-   * <p>
    * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+  public static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * If only validation is needed, then call {@link #validate(byte[])} because 
it will avoid copying
-   * the expression like this method does. This method must copy the byte 
array into a String in
-   * order to create an immutable AccessExpression.
-   *
-   * @see #of(String, boolean) for information about normlization.
-   * @param expression an access expression that is expected to be encoded 
using UTF-8
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
+   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}

Review Comment:
   Is this comment still valid? Does the two-arg `of()` still exist?



##########
src/main/java/org/apache/accumulo/access/AccessExpression.java:
##########
@@ -87,109 +97,131 @@
  * @see <a href="https://github.com/apache/accumulo-access";>Accumulo Access 
Documentation</a>
  * @since 1.0.0
  */
-public interface AccessExpression extends Serializable {
+public abstract class AccessExpression implements Serializable {
+  /*
+   * This is package private so that it can not be extended by classes outside 
of this package and
+   * create a mutable implementation. In this package all implementations that 
extends are
+   * immutable.
+   */
+  AccessExpression() {}
 
   /**
    * @return the expression that was used to create this object.
    */
-  String getExpression();
+  public abstract String getExpression();
 
-  /**
-   * @return the unique set of authorizations that occur in the expression. 
For example, for the
-   *         expression {@code (A&B)|(A&C)|(A&D)}, this method would return 
{@code [A,B,C,D]}.
-   */
-  Authorizations getAuthorizations();
+  @Override
+  public boolean equals(Object o) {
+    if (o instanceof AccessExpression) {
+      return ((AccessExpression) o).getExpression().equals(getExpression());
+    }
 
-  /**
-   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
-   */
-  static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+    return false;
   }
 
-  /**
-   * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * When the {@code normalize} parameter is true, then will deduplicate, 
sort, flatten, and remove
-   * unneeded parentheses or quotes in the expressions. Normalization is done 
in addition to
-   * validation. The following list gives examples of what each normalization 
step does.
-   *
-   * <ul>
-   * <li>As an example of flattening, the expression {@code A&(B&C)} flattens 
to {@code A&B&C}.</li>
-   * <li>As an example of sorting, the expression {@code (Z&Y)|(C&B)} sorts to
-   * {@code (B&C)|(Y&Z)}</li>
-   * <li>As an example of deduplication, the expression {@code X&Y&X} 
normalizes to {@code X&Y}</li>
-   * <li>As an example of unneeded quotes, the expression {@code "ABC"&"XYZ"} 
normalizes to
-   * {@code ABC&XYZ}</li>
-   * <li>As an example of unneeded parentheses, the expression {@code 
(((ABC)|(XYZ)))} normalizes to
-   * {@code ABC|XYZ}</li>
-   * </ul>
-   *
-   * @param expression an access expression
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed. If repeatedly
-   *        normalizing expressions, consider using a cache that maps 
un-normalized expressions to
-   *        normalized ones. Since the normalization process is deterministic, 
the computation can
-   *        be cached.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
-   */
-  static AccessExpression of(String expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  @Override
+  public int hashCode() {
+    return getExpression().hashCode();
+  }
+
+  @Override
+  public String toString() {
+    return getExpression();
   }
 
   /**
-   * <p>
    * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+  public static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * If only validation is needed, then call {@link #validate(byte[])} because 
it will avoid copying
-   * the expression like this method does. This method must copy the byte 
array into a String in
-   * order to create an immutable AccessExpression.
-   *
-   * @see #of(String, boolean) for information about normlization.
-   * @param expression an access expression that is expected to be encoded 
using UTF-8
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
+   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  public static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * @return an empty AccessExpression that is immutable.
    */
-  static AccessExpression of() {
+  public static AccessExpression of() {
     return AccessExpressionImpl.EMPTY;
   }
 
+  public static ParsedAccessExpression parse(byte[] expression) {
+    if (expression.length == 0) {
+      return ParsedAccessExpressionImpl.EMPTY;
+    }
+
+    return 
ParsedAccessExpressionImpl.parseExpression(Arrays.copyOf(expression, 
expression.length));
+  }
+
+  public static ParsedAccessExpression parse(String expression) {
+    if (expression.isEmpty()) {

Review Comment:
   If expression is null, is that valid and empty?



##########
src/main/java/org/apache/accumulo/access/AccessExpression.java:
##########
@@ -87,109 +97,131 @@
  * @see <a href="https://github.com/apache/accumulo-access";>Accumulo Access 
Documentation</a>
  * @since 1.0.0
  */
-public interface AccessExpression extends Serializable {
+public abstract class AccessExpression implements Serializable {
+  /*
+   * This is package private so that it can not be extended by classes outside 
of this package and
+   * create a mutable implementation. In this package all implementations that 
extends are
+   * immutable.
+   */
+  AccessExpression() {}
 
   /**
    * @return the expression that was used to create this object.
    */
-  String getExpression();
+  public abstract String getExpression();
 
-  /**
-   * @return the unique set of authorizations that occur in the expression. 
For example, for the
-   *         expression {@code (A&B)|(A&C)|(A&D)}, this method would return 
{@code [A,B,C,D]}.
-   */
-  Authorizations getAuthorizations();
+  @Override
+  public boolean equals(Object o) {
+    if (o instanceof AccessExpression) {
+      return ((AccessExpression) o).getExpression().equals(getExpression());
+    }
 
-  /**
-   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
-   */
-  static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+    return false;
   }
 
-  /**
-   * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * When the {@code normalize} parameter is true, then will deduplicate, 
sort, flatten, and remove
-   * unneeded parentheses or quotes in the expressions. Normalization is done 
in addition to
-   * validation. The following list gives examples of what each normalization 
step does.
-   *
-   * <ul>
-   * <li>As an example of flattening, the expression {@code A&(B&C)} flattens 
to {@code A&B&C}.</li>
-   * <li>As an example of sorting, the expression {@code (Z&Y)|(C&B)} sorts to
-   * {@code (B&C)|(Y&Z)}</li>
-   * <li>As an example of deduplication, the expression {@code X&Y&X} 
normalizes to {@code X&Y}</li>
-   * <li>As an example of unneeded quotes, the expression {@code "ABC"&"XYZ"} 
normalizes to
-   * {@code ABC&XYZ}</li>
-   * <li>As an example of unneeded parentheses, the expression {@code 
(((ABC)|(XYZ)))} normalizes to
-   * {@code ABC|XYZ}</li>
-   * </ul>
-   *
-   * @param expression an access expression
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed. If repeatedly
-   *        normalizing expressions, consider using a cache that maps 
un-normalized expressions to
-   *        normalized ones. Since the normalization process is deterministic, 
the computation can
-   *        be cached.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
-   */
-  static AccessExpression of(String expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  @Override
+  public int hashCode() {
+    return getExpression().hashCode();
+  }
+
+  @Override
+  public String toString() {
+    return getExpression();
   }
 
   /**
-   * <p>
    * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+  public static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * If only validation is needed, then call {@link #validate(byte[])} because 
it will avoid copying
-   * the expression like this method does. This method must copy the byte 
array into a String in
-   * order to create an immutable AccessExpression.
-   *
-   * @see #of(String, boolean) for information about normlization.
-   * @param expression an access expression that is expected to be encoded 
using UTF-8
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
+   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  public static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * @return an empty AccessExpression that is immutable.
    */
-  static AccessExpression of() {
+  public static AccessExpression of() {
     return AccessExpressionImpl.EMPTY;
   }
 
+  public static ParsedAccessExpression parse(byte[] expression) {
+    if (expression.length == 0) {
+      return ParsedAccessExpressionImpl.EMPTY;
+    }
+
+    return 
ParsedAccessExpressionImpl.parseExpression(Arrays.copyOf(expression, 
expression.length));
+  }
+
+  public static ParsedAccessExpression parse(String expression) {
+    if (expression.isEmpty()) {
+      return ParsedAccessExpressionImpl.EMPTY;
+    }
+    return 
ParsedAccessExpressionImpl.parseExpression(expression.getBytes(UTF_8));

Review Comment:
   Is there a reason not to call `parse(expression.getBytes(UTF_8))` ?



##########
src/main/java/org/apache/accumulo/access/ParsedAccessExpressionImpl.java:
##########
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.access;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.accumulo.access.ByteUtils.AND_OPERATOR;
+import static org.apache.accumulo.access.ByteUtils.OR_OPERATOR;
+import static org.apache.accumulo.access.ByteUtils.isAndOrOperator;
+import static 
org.apache.accumulo.access.ParsedAccessExpression.ExpressionType.AND;
+import static 
org.apache.accumulo.access.ParsedAccessExpression.ExpressionType.AUTHORIZATION;
+import static 
org.apache.accumulo.access.ParsedAccessExpression.ExpressionType.OR;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicReference;
+
+// This class is intentionally package private
+final class ParsedAccessExpressionImpl extends ParsedAccessExpression {
+
+  private static final long serialVersionUID = 1L;
+
+  private final byte[] expression;
+  private final int offset;
+  private final int length;
+
+  private final ExpressionType type;
+  private final List<ParsedAccessExpression> children;
+
+  private final AtomicReference<String> stringExpression = new 
AtomicReference<>(null);
+
+  static final ParsedAccessExpression EMPTY = new ParsedAccessExpressionImpl();
+
+  ParsedAccessExpressionImpl(byte operator, byte[] expression, int offset, int 
length,
+      List<ParsedAccessExpression> children) {
+    if (children.isEmpty()) {
+      throw new IllegalArgumentException("Must have children with an 
operator");
+    }
+
+    if (operator != AND_OPERATOR && operator != OR_OPERATOR) {
+      throw new IllegalArgumentException("Unknown operator " + operator);
+    } else if (operator == AND_OPERATOR) {
+      this.type = AND;
+    } else {
+      this.type = OR;
+    }
+
+    this.expression = expression;
+    this.offset = offset;
+    this.length = length;
+    this.children = List.copyOf(children);
+  }
+
+  ParsedAccessExpressionImpl(byte[] expression, int offset, int length) {
+    this.type = AUTHORIZATION;
+    this.expression = expression;
+    this.offset = offset;
+    this.length = length;
+    this.children = List.of();
+  }
+
+  ParsedAccessExpressionImpl() {
+    this.type = ExpressionType.EMPTY;
+    this.offset = 0;
+    this.length = 0;
+    this.expression = new byte[0];
+    this.children = List.of();
+  }
+
+  @Override
+  public String getExpression() {
+    String strExp = stringExpression.get();
+    if (strExp != null) {
+      return strExp;
+    }
+    strExp = new String(expression, offset, length, UTF_8);
+    stringExpression.compareAndSet(null, strExp);
+    return stringExpression.get();
+  }
+
+  @Override
+  public ExpressionType getType() {
+    return type;
+  }
+
+  @Override
+  public List<ParsedAccessExpression> getChildren() {
+    return children;
+  }
+
+  static ParsedAccessExpression parseExpression(byte[] expression) {
+    if (expression.length == 0) {
+      return ParsedAccessExpressionImpl.EMPTY;
+    }
+
+    Tokenizer tokenizer = new Tokenizer(expression);
+    var parsed = ParsedAccessExpressionImpl.parseExpression(tokenizer, false);
+
+    if (tokenizer.hasNext()) {
+      // not all input was read, so not a valid expression
+      tokenizer.error("Unexpected character '" + (char) tokenizer.peek() + 
"'");
+    }
+
+    return parsed;
+  }

Review Comment:
   Should this be moved into ParsedAccessExpression?



##########
src/main/java/org/apache/accumulo/access/ParsedAccessExpression.java:
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.access;
+
+import java.util.List;
+
+/**
+ * Instances of this class are immutable and wrap a verified access expression 
and a parse tree for
+ * the access expression. To create an instance of this class call
+ * {@link AccessExpression#parse(String)}. The Accumulo Access project has 
examples that show how to
+ * use the parse tree.
+ *
+ * @since 1.0.0
+ */
+public abstract class ParsedAccessExpression extends AccessExpression {
+
+  /*
+   * This is package private so that it can not be extended by classes outside 
of this package and
+   * create a mutable implementation. In this package all implementations that 
extends are
+   * immutable.
+   */
+  ParsedAccessExpression() {}
+
+  /**
+   * @since 1.0.0
+   */
+  public enum ExpressionType {
+    /**
+     * Indicates an access expression uses and operators at the top level.
+     */
+    AND,
+    /**
+     * Indicates an access expression uses or operators at the top level.
+     */
+    OR,
+    /**
+     * Indicates an access expression is a single authorization. For this type
+     * {@link #getExpression()} will return the authorization in quoted and 
escaped form. Depending
+     * on the use case {@link #unquote(String)} may need to be called.
+     */
+    AUTHORIZATION,
+    /**
+     * Indicates this is the empty access expression.
+     */
+    EMPTY
+  }
+
+  public abstract ExpressionType getType();
+
+  /**
+   * When {@link #getType()} returns OR or AND, then this method will return 
the sub expressions.
+   * When {@link #getType()} method returns AUTHORIZATION or EMPTY this method 
will return an empty
+   * list.

Review Comment:
   Suggest that we add that this returns an unmodifiable list. I know it's 
implied, but I think we should state it again.



##########
src/main/java/org/apache/accumulo/access/AccessExpression.java:
##########
@@ -87,109 +97,131 @@
  * @see <a href="https://github.com/apache/accumulo-access";>Accumulo Access 
Documentation</a>
  * @since 1.0.0
  */
-public interface AccessExpression extends Serializable {
+public abstract class AccessExpression implements Serializable {
+  /*
+   * This is package private so that it can not be extended by classes outside 
of this package and
+   * create a mutable implementation. In this package all implementations that 
extends are
+   * immutable.
+   */
+  AccessExpression() {}
 
   /**
    * @return the expression that was used to create this object.
    */
-  String getExpression();
+  public abstract String getExpression();
 
-  /**
-   * @return the unique set of authorizations that occur in the expression. 
For example, for the
-   *         expression {@code (A&B)|(A&C)|(A&D)}, this method would return 
{@code [A,B,C,D]}.
-   */
-  Authorizations getAuthorizations();
+  @Override
+  public boolean equals(Object o) {
+    if (o instanceof AccessExpression) {
+      return ((AccessExpression) o).getExpression().equals(getExpression());
+    }
 
-  /**
-   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
-   */
-  static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+    return false;
   }
 
-  /**
-   * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * When the {@code normalize} parameter is true, then will deduplicate, 
sort, flatten, and remove
-   * unneeded parentheses or quotes in the expressions. Normalization is done 
in addition to
-   * validation. The following list gives examples of what each normalization 
step does.
-   *
-   * <ul>
-   * <li>As an example of flattening, the expression {@code A&(B&C)} flattens 
to {@code A&B&C}.</li>
-   * <li>As an example of sorting, the expression {@code (Z&Y)|(C&B)} sorts to
-   * {@code (B&C)|(Y&Z)}</li>
-   * <li>As an example of deduplication, the expression {@code X&Y&X} 
normalizes to {@code X&Y}</li>
-   * <li>As an example of unneeded quotes, the expression {@code "ABC"&"XYZ"} 
normalizes to
-   * {@code ABC&XYZ}</li>
-   * <li>As an example of unneeded parentheses, the expression {@code 
(((ABC)|(XYZ)))} normalizes to
-   * {@code ABC|XYZ}</li>
-   * </ul>
-   *
-   * @param expression an access expression
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed. If repeatedly
-   *        normalizing expressions, consider using a cache that maps 
un-normalized expressions to
-   *        normalized ones. Since the normalization process is deterministic, 
the computation can
-   *        be cached.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
-   */
-  static AccessExpression of(String expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  @Override
+  public int hashCode() {
+    return getExpression().hashCode();
+  }
+
+  @Override
+  public String toString() {
+    return getExpression();
   }
 
   /**
-   * <p>
    * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+  public static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * If only validation is needed, then call {@link #validate(byte[])} because 
it will avoid copying
-   * the expression like this method does. This method must copy the byte 
array into a String in
-   * order to create an immutable AccessExpression.
-   *
-   * @see #of(String, boolean) for information about normlization.
-   * @param expression an access expression that is expected to be encoded 
using UTF-8
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
+   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  public static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * @return an empty AccessExpression that is immutable.
    */
-  static AccessExpression of() {
+  public static AccessExpression of() {
     return AccessExpressionImpl.EMPTY;
   }
 
+  public static ParsedAccessExpression parse(byte[] expression) {
+    if (expression.length == 0) {
+      return ParsedAccessExpressionImpl.EMPTY;
+    }
+
+    return 
ParsedAccessExpressionImpl.parseExpression(Arrays.copyOf(expression, 
expression.length));
+  }
+
+  public static ParsedAccessExpression parse(String expression) {
+    if (expression.isEmpty()) {
+      return ParsedAccessExpressionImpl.EMPTY;
+    }
+    return 
ParsedAccessExpressionImpl.parseExpression(expression.getBytes(UTF_8));
+  }
+
   /**
    * Quickly validates that an access expression is properly formed.
    *
    * @param expression a potential access expression that is expected to be 
encoded using UTF-8
    * @throws InvalidAccessExpressionException if the given expression is not 
valid
    */
-  static void validate(byte[] expression) throws 
InvalidAccessExpressionException {
-    AccessExpressionImpl.validate(expression);
+  public static void validate(byte[] expression) throws 
InvalidAccessExpressionException {
+    if (expression.length > 0) {
+      Tokenizer tokenizer = new Tokenizer(expression);
+      Predicate<Tokenizer.AuthorizationToken> atp = authToken -> true;
+      ParserEvaluator.parseAccessExpression(tokenizer, atp, atp);
+    } // else empty expression is valid, avoid object allocation
   }
 
   /**
    * @see #validate(byte[])
    */
-  static void validate(String expression) throws 
InvalidAccessExpressionException {
-    AccessExpressionImpl.validate(expression);
+  public static void validate(String expression) throws 
InvalidAccessExpressionException {
+    if (!expression.isEmpty()) {
+      validate(expression.getBytes(UTF_8));
+    } // else empty expression is valid, avoid object allocation
+  }
+
+  /**
+   * Validates and access expression and finds all authorizations in it 
passing them to the
+   * authorizationConsumer. For example, for the expression {@code 
(A&B)|(A&C)|(A&D)}, this method
+   * would pass {@code A,B,A,C,A,D} to the consumer one at a time. The 
function will conceptually
+   * call {@link #unquote(String)} prior to passing an authorization to 
authorizationConsumer.
+   *
+   * <p>
+   * What this method does could also be accomplished by creating a parse tree 
using
+   * {@link AccessExpression#parse(String)} and then recursively walking the 
parse tree. The
+   * implementation of this method does not create a parse tree and is much 
faster. If a parse tree
+   * is already available, then it would likely be faster to use it rather 
than call this method.
+   * </p>
+   *
+   * @throws InvalidAccessExpressionException when the expression is not valid.
+   */
+  public static void findAuthorizations(String expression, Consumer<String> 
authorizationConsumer)
+      throws InvalidAccessExpressionException {
+    findAuthorizations(expression.getBytes(UTF_8), authorizationConsumer);
+  }
+
+  /**
+   * @see #findAuthorizations(String, Consumer)
+   */
+  public static void findAuthorizations(byte[] expression, Consumer<String> 
authorizationConsumer)
+      throws InvalidAccessExpressionException {
+    Tokenizer tokenizer = new Tokenizer(expression);
+    Predicate<Tokenizer.AuthorizationToken> atp = authToken -> {
+      // TODO avoid creating BytesWrapper obj
+      authorizationConsumer.accept(AccessEvaluatorImpl
+          .unescape(new BytesWrapper(authToken.data, authToken.start, 
authToken.len)));
+      return true;
+    };
+    ParserEvaluator.parseAccessExpression(tokenizer, atp, atp);

Review Comment:
   This and the `validate` method above differ only in the Predicate being used 
from what I can tell. I wonder if the similar logic should be moved to a 
private method for reuse.



##########
src/main/java/org/apache/accumulo/access/AccessExpression.java:
##########
@@ -87,109 +97,131 @@
  * @see <a href="https://github.com/apache/accumulo-access";>Accumulo Access 
Documentation</a>
  * @since 1.0.0
  */
-public interface AccessExpression extends Serializable {
+public abstract class AccessExpression implements Serializable {
+  /*
+   * This is package private so that it can not be extended by classes outside 
of this package and
+   * create a mutable implementation. In this package all implementations that 
extends are
+   * immutable.
+   */
+  AccessExpression() {}
 
   /**
    * @return the expression that was used to create this object.
    */
-  String getExpression();
+  public abstract String getExpression();
 
-  /**
-   * @return the unique set of authorizations that occur in the expression. 
For example, for the
-   *         expression {@code (A&B)|(A&C)|(A&D)}, this method would return 
{@code [A,B,C,D]}.
-   */
-  Authorizations getAuthorizations();
+  @Override
+  public boolean equals(Object o) {
+    if (o instanceof AccessExpression) {
+      return ((AccessExpression) o).getExpression().equals(getExpression());
+    }
 
-  /**
-   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
-   */
-  static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+    return false;
   }
 
-  /**
-   * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * When the {@code normalize} parameter is true, then will deduplicate, 
sort, flatten, and remove
-   * unneeded parentheses or quotes in the expressions. Normalization is done 
in addition to
-   * validation. The following list gives examples of what each normalization 
step does.
-   *
-   * <ul>
-   * <li>As an example of flattening, the expression {@code A&(B&C)} flattens 
to {@code A&B&C}.</li>
-   * <li>As an example of sorting, the expression {@code (Z&Y)|(C&B)} sorts to
-   * {@code (B&C)|(Y&Z)}</li>
-   * <li>As an example of deduplication, the expression {@code X&Y&X} 
normalizes to {@code X&Y}</li>
-   * <li>As an example of unneeded quotes, the expression {@code "ABC"&"XYZ"} 
normalizes to
-   * {@code ABC&XYZ}</li>
-   * <li>As an example of unneeded parentheses, the expression {@code 
(((ABC)|(XYZ)))} normalizes to
-   * {@code ABC|XYZ}</li>
-   * </ul>
-   *
-   * @param expression an access expression
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed. If repeatedly
-   *        normalizing expressions, consider using a cache that maps 
un-normalized expressions to
-   *        normalized ones. Since the normalization process is deterministic, 
the computation can
-   *        be cached.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
-   */
-  static AccessExpression of(String expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  @Override
+  public int hashCode() {
+    return getExpression().hashCode();
+  }
+
+  @Override
+  public String toString() {
+    return getExpression();
   }
 
   /**
-   * <p>
    * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, false);
+  public static AccessExpression of(String expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * <p>
-   * Validates an access expression and creates an immutable AccessExpression 
object.
-   *
-   * <p>
-   * If only validation is needed, then call {@link #validate(byte[])} because 
it will avoid copying
-   * the expression like this method does. This method must copy the byte 
array into a String in
-   * order to create an immutable AccessExpression.
-   *
-   * @see #of(String, boolean) for information about normlization.
-   * @param expression an access expression that is expected to be encoded 
using UTF-8
-   * @param normalize If true then the expression will be normalized, if false 
the expression will
-   *        only be validated. Normalization is expensive so only use when 
needed.
-   * @throws InvalidAccessExpressionException when the expression is not valid.
+   * This is equivalent to calling {@code AccessExpression.of(expression, 
false);}
    */
-  static AccessExpression of(byte[] expression, boolean normalize)
-      throws InvalidAccessExpressionException {
-    return new AccessExpressionImpl(expression, normalize);
+  public static AccessExpression of(byte[] expression) throws 
InvalidAccessExpressionException {
+    return new AccessExpressionImpl(expression);
   }
 
   /**
    * @return an empty AccessExpression that is immutable.
    */
-  static AccessExpression of() {
+  public static AccessExpression of() {
     return AccessExpressionImpl.EMPTY;
   }
 
+  public static ParsedAccessExpression parse(byte[] expression) {
+    if (expression.length == 0) {
+      return ParsedAccessExpressionImpl.EMPTY;
+    }
+
+    return 
ParsedAccessExpressionImpl.parseExpression(Arrays.copyOf(expression, 
expression.length));
+  }
+
+  public static ParsedAccessExpression parse(String expression) {
+    if (expression.isEmpty()) {
+      return ParsedAccessExpressionImpl.EMPTY;
+    }
+    return 
ParsedAccessExpressionImpl.parseExpression(expression.getBytes(UTF_8));
+  }
+
   /**
    * Quickly validates that an access expression is properly formed.
    *
    * @param expression a potential access expression that is expected to be 
encoded using UTF-8
    * @throws InvalidAccessExpressionException if the given expression is not 
valid
    */
-  static void validate(byte[] expression) throws 
InvalidAccessExpressionException {
-    AccessExpressionImpl.validate(expression);
+  public static void validate(byte[] expression) throws 
InvalidAccessExpressionException {
+    if (expression.length > 0) {

Review Comment:
   If expression is null, is that valid?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to