MonkeyCanCode commented on code in PR #4075:
URL: https://github.com/apache/polaris/pull/4075#discussion_r3024273939
##########
client/python/apache_polaris/cli/command/utils.py:
##########
@@ -64,3 +69,140 @@ def format_timestamp(ms_since_epoch: int) -> str:
ms_since_epoch / 1000, tz=datetime.timezone.utc
)
return dt.strftime("%Y-%m-%d %H:%M:%S UTC")
+
+
+def is_fuzzy_match(query: str, target: str, threshold: float = 0.85) -> bool:
+ """
+ Determine if a query matches a target using multi-stage fuzzy strategies
and case-insensitive.
+ """
+ if not query:
+ return False
+ q = query.lower()
+ t = target.lower()
+ query_len = len(q)
+ # Exact match
+ if q == t:
+ return True
+ # Prefix match
+ if t.startswith(q):
+ return True
+ # Substring match: enabled for length > 1
+ if query_len > 1 and q in t:
+ return True
+ # Subsequence match: enabled for length > 2
+ if query_len > 2:
+ iterator = iter(t)
+ if all(char in iterator for char in q):
Review Comment:
Yes. Similar to fuzzy search, we don't know the total length. So users can
reduce the search result by providing more characters. I put 3 characters
minimal before fuzzy search to avoid user typed 'a' then it returns everything
contains letter 'a'.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]