dimas-b commented on code in PR #4075:
URL: https://github.com/apache/polaris/pull/4075#discussion_r3024516538
##########
client/python/apache_polaris/cli/command/utils.py:
##########
@@ -64,3 +69,140 @@ def format_timestamp(ms_since_epoch: int) -> str:
ms_since_epoch / 1000, tz=datetime.timezone.utc
)
return dt.strftime("%Y-%m-%d %H:%M:%S UTC")
+
+
+def is_fuzzy_match(query: str, target: str, threshold: float = 0.85) -> bool:
+ """
+ Determine if a query matches a target using multi-stage fuzzy strategies
and case-insensitive.
+ """
+ if not query:
+ return False
+ q = query.lower()
+ t = target.lower()
+ query_len = len(q)
+ # Exact match
+ if q == t:
+ return True
+ # Prefix match
+ if t.startswith(q):
+ return True
+ # Substring match: enabled for length > 1
+ if query_len > 1 and q in t:
+ return True
+ # Subsequence match: enabled for length > 2
+ if query_len > 2:
+ iterator = iter(t)
+ if all(char in iterator for char in q):
Review Comment:
TBH, I'm not sure if there are any case that will get a match by this rule,
but not get a match by the `SequenceMatcher` (below) 🤔 Do you have any examples
like that?
##########
client/python/apache_polaris/cli/command/utils.py:
##########
@@ -64,3 +69,140 @@ def format_timestamp(ms_since_epoch: int) -> str:
ms_since_epoch / 1000, tz=datetime.timezone.utc
)
return dt.strftime("%Y-%m-%d %H:%M:%S UTC")
+
+
+def is_fuzzy_match(query: str, target: str, threshold: float = 0.85) -> bool:
+ """
+ Determine if a query matches a target using multi-stage fuzzy strategies
and case-insensitive.
+ """
+ if not query:
+ return False
+ q = query.lower()
+ t = target.lower()
+ query_len = len(q)
+ # Exact match
+ if q == t:
+ return True
+ # Prefix match
+ if t.startswith(q):
+ return True
+ # Substring match: enabled for length > 1
+ if query_len > 1 and q in t:
+ return True
+ # Subsequence match: enabled for length > 2
+ if query_len > 2:
+ iterator = iter(t)
+ if all(char in iterator for char in q):
Review Comment:
TBH, I'm not sure if there are any cases that will get a match by this rule,
but not get a match by the `SequenceMatcher` (below) 🤔 Do you have any examples
like that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]