"In the current implementation (‘%’ could be a wildcard only at the start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘."
--> Yes in the current impl, it means ENDS WITH '%escape' but we want SASI to understand the %% as an escape for % so the goal is that SASI understands LIKE '%%escape' as EQUALS TO '%escape'. Am I correct ? "Moreover all terms that contains single ‘%’ somewhere in the middle should cause an exception." --> Not necessarily, sometime people may want to search text pattern containing the literal %. Imagine the text "this year the average income has increase by 10%". People may want to search for "10%". "BUT may be it’s better to make escaping more universal to support a future possible case where a wildcard could be placed in the middle of a term too?" --> I guess universal escaping for % is the cleaner and better solution. However it may involve some complex regular expression. I'm not sure that input.replaceAll("%%", "%") trick would work for any cases. And we also need to define when we want to detect operation type (LIKE_PREFIX, LIKE_SUFFIX, LIKE_CONTAINS, EQUAL) ? Should we detect operation type BEFORE escaping or AFTER escaping ? On Mon, Sep 26, 2016 at 3:54 PM, Mikhail Krupitskiy < mikhail.krupits...@jetbrains.com> wrote: > LIKE '%%%escape' --> EQUALS TO '%%escape' ??? > > In the current implementation (‘%’ could be a wildcard only at the > start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘. > Moreover all terms that contains single ‘%’ somewhere in the middle should > cause an exception. > BUT may be it’s better to make escaping more universal to support a future > possible case where a wildcard could be placed in the middle of a term too? > > Thanks, > Mikhail > > On 24 Sep 2016, at 21:09, DuyHai Doan <doanduy...@gmail.com> wrote: > > Reminder, right now, the % character is only interpreted as wildcard IF > AND ONLY IF it is the first/last character of the searched term > > > LIKE '%escape' --> ENDS WITH 'escape' > > If we use % to escape %, > LIKE '%%escape' --> EQUALS TO '%escape' > > LIKE '%%%escape' --> EQUALS TO '%%escape' ??? > > > > > On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy < > mikhail.krupits...@jetbrains.com> wrote: > >> Hi, Jim, >> >> What pattern should be used to search “ends with ‘%escape’ “ with your >> conception? >> >> Thanks, >> Mikhail >> >> On 22 Sep 2016, at 18:51, Jim Ancona <j...@anconafamily.com> wrote: >> >> To answer DuyHai's question without introducing new syntax, I'd suggest: >> >> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' >> >> So the first two %'s are translated to a literal, non-wildcard % and the >> third % is a wildcard because it's not doubled. >> >> Jim >> >> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy < >> mikhail.krupits...@jetbrains.com> wrote: >> >>> I guess that it should be similar to how it is done in SQL for LIKE >>> patterns. >>> >>> You can introduce an escape character, e.g. ‘\’. >>> Examples: >>> ‘%’ - any string >>> ‘\%’ - equal to ‘%’ character >>> ‘\%foo%’ - starts from ‘%foo’ >>> ‘%%%escape’ - ends with ’escape’ >>> ‘\%%’ - starts from ‘%’ >>> ‘\\\%%’ - starts from ‘\%’ . >>> >>> What do you think? >>> >>> Thanks, >>> Mikhail >>> >>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com> wrote: >>> >>> Hello Mikhail >>> >>> It's more complicated that it seems >>> >>> LIKE '%%escape' means EQUAL TO '%escape' >>> >>> LIKE '%escape' means ENDS WITH 'escape' >>> >>> What's about LIKE '%%%escape' ???? >>> >>> How should we treat this case ? Replace %% by % at the beginning of the >>> searched term ?? >>> >>> >>> >>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy < >>> mikhail.krupits...@jetbrains.com> wrote: >>> >>>> Hi! >>>> >>>> We’ve talked about two items: >>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern. >>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with >>>> help of LIKE. >>>> >>>> Item #1was resolved as CASSANDRA-12573. >>>> >>>> Regarding to item #2: you said the following: >>>> >>>> A possible fix would be: >>>> >>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending >>>> on the column data type) >>>> 2) remove the escape character e.g. before parsing OR use some advanced >>>> regex to exclude the %% from parsing e.g >>>> >>>> Step 2) is dead easy but step 1) is harder because I don't know if >>>> converting the bytebuffer into String at this stage of the CQL parser is >>>> expensive or not (in term of computation) >>>> >>>> Let me try a patch >>>> >>>> So is there any update on this? >>>> >>>> Thanks, >>>> Mikhail >>>> >>>> >>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy < >>>> mikhail.krupits...@jetbrains.com> wrote: >>>> >>>> Hi! >>>> >>>> Have you had a chance to try your patch or solve the issue in an other >>>> way? >>>> >>>> Thanks, >>>> Mikhail >>>> >>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com> wrote: >>>> >>>> Ok so I've found the source of the issue, it's pretty well hidden >>>> because it is NOT in the SASI source code directly. >>>> >>>> Here is the method where C* determines what kind of LIKE expression >>>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES) >>>> >>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/ >>>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.j >>>> ava#L733-L778 >>>> >>>> As you can see, it's pretty simple, maybe too simple. Indeed, they >>>> forget to remove escape character BEFORE doing the matching so if your >>>> search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS. >>>> >>>> A possible fix would be: >>>> >>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending >>>> on the column data type) >>>> 2) remove the escape character e.g. before parsing OR use some advanced >>>> regex to exclude the %% from parsing e.g >>>> >>>> Step 2) is dead easy but step 1) is harder because I don't know if >>>> converting the bytebuffer into String at this stage of the CQL parser is >>>> expensive or not (in term of computation) >>>> >>>> Let me try a patch >>>> >>>> >>>> >>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com> >>>> wrote: >>>> >>>>> Ok you're right, I get your point >>>>> >>>>> LIKE '%%esc%' --> startWith('%esc') >>>>> >>>>> LIKE 'escape%%' --> = 'escape%' >>>>> >>>>> What I strongly suspect is that in the source code of SASI, we parse >>>>> the % xxx % expression BEFORE applying escape. That will explain the >>>>> observed behavior. E.g: >>>>> >>>>> LIKE '%%esc%' parsed as %xxx% where xxx = %esc >>>>> >>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape% >>>>> >>>>> Let me check in the source code and try to reproduce the issue >>>>> >>>>> >>>>> >>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy < >>>>> mikhail.krupits...@jetbrains.com> wrote: >>>>> >>>>>> Looks like we have different understanding of what results are >>>>>> expected. >>>>>> I based my understanding on http://docs.datastax.com/en >>>>>> /cql/3.3/cql/cql_using/useSASIIndex.html >>>>>> According to the doc ‘esc’ is a pattern for exact match and I guess >>>>>> that there is no semantical difference between two LIKE patterns (both of >>>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’. >>>>>> >>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>>>>> *containing* '%esc' so *%esc*apeme is a possible match and also >>>>>> escape*%esc* >>>>>> >>>>>> Why ‘containing’? I expect that it should be ’starting’.. >>>>>> >>>>>> >>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>>>>> *starting* with 'escape%' so *escape%*me is a valid result and also >>>>>> *escape%*esc >>>>>> >>>>>> Why ’starting’? I expect that it should be ‘exact matching’. >>>>>> >>>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it >>>>>> returns nothing (CASSANDRA-12573). >>>>>> >>>>>> What I’m missing? >>>>>> >>>>>> Thanks, >>>>>> Mikhail >>>>>> >>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com> wrote: >>>>>> >>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING ' >>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': >>>>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa >>>>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'}; >>>>>> >>>>>> I don't see any problem in the results you got >>>>>> >>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results >>>>>> *containing* '%esc' so *%esc*apeme is a possible match and also >>>>>> escape*%esc* >>>>>> >>>>>> Why ‘containing’? I expect that it should be ’starting’.. >>>>>> >>>>>> >>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results >>>>>> *starting* with 'escape%' so *escape%*me is a valid result and also >>>>>> *escape%*esc >>>>>> >>>>>> Why ’starting’? I expect that it should be ‘exact matching’. >>>>>> >>>>>> >>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy < >>>>>> mikhail.krupits...@jetbrains.com> wrote: >>>>>> >>>>>>> Thanks for the reply. >>>>>>> Could you please provide what index definition did you use? >>>>>>> With the index from my script I get the following results: >>>>>>> >>>>>>> cqlsh:test> select * from escape; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 1 | %escapeme >>>>>>> 2 | escape%me >>>>>>> * 3 | escape%esc* >>>>>>> >>>>>>> Contains search >>>>>>> >>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 1 | %escapeme >>>>>>> 3 >>>>>>> * | escape%esc*(2 rows) >>>>>>> >>>>>>> >>>>>>> Prefix search >>>>>>> >>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 2 | escape%me >>>>>>> 3 >>>>>>> * | escape%esc* >>>>>>> >>>>>>> Thanks, >>>>>>> Mikhail >>>>>>> >>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com> wrote: >>>>>>> >>>>>>> Use % to escape % >>>>>>> >>>>>>> cqlsh:test> select * from escape; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 1 | %escapeme >>>>>>> 2 | escape%me >>>>>>> >>>>>>> >>>>>>> Contains search >>>>>>> >>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%'; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 1 | %escapeme >>>>>>> >>>>>>> (1 rows) >>>>>>> >>>>>>> >>>>>>> Prefix search >>>>>>> >>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%'; >>>>>>> >>>>>>> id | val >>>>>>> ----+----------- >>>>>>> 2 | escape%me >>>>>>> >>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy < >>>>>>> mikhail.krupits...@jetbrains.com> wrote: >>>>>>> >>>>>>>> Hi Cassandra guys, >>>>>>>> >>>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char >>>>>>>> in a search pattern. >>>>>>>> Here is my test script: >>>>>>>> >>>>>>>> DROP keyspace if exists kmv; >>>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : >>>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ; >>>>>>>> USE kmv; >>>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY >>>>>>>> KEY(id, c1)); >>>>>>>> CREATE CUSTOM INDEX ON kmv.kmv ( c2 ) USING ' >>>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { >>>>>>>> 'analyzed' : 'true', >>>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa >>>>>>>> si.analyzer.NonTokenizingAnalyzer', >>>>>>>> 'case_sensitive' : 'false', >>>>>>>> 'mode' : 'CONTAINS' >>>>>>>> }; >>>>>>>> >>>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd'); >>>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd'); >>>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%'); >>>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1'); >>>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd'); >>>>>>>> >>>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_'; >>>>>>>> >>>>>>>> _pattern_ '%%%' finds all columns that contain %. >>>>>>>> How to find columns that start form ‘%’ or ‘%a’? >>>>>>>> How to find columns that end with ‘%’? >>>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to >>>>>>>> differentiate ‘%’ char form % as a command symbol? (Also there is a >>>>>>>> related >>>>>>>> issue CASSANDRA-12573). >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mikhail >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >>> >> >> > >