Val, > I believe this is something we can look into in the scope of Ignite 3. > Andrey, does Calcite have any support for this? What's your view on this?
As Atri already mentioned, SQL 92 standard declares "LIKE" operator for pattern matching. Calcite supports LIKE operator. I've found it is a RexNode (expression) and I doubt it supports indices. Maybe, LIKE can use a sorted index for prefix matching or equality conditions, but it is very far from what we are talking about. Full-text search term is much wider than just a pattern matching. Lucene provides much more capabilities on that and has rich syntax contrary to "LIKE" operator. So, LIKE operator is the standard operator with the defined contract. I'm not sure it is worth integrating Lucene just for it. I think we should have native support for full-text search queries and/or a custom SQL function. E.g. Postgres syntax for FTS queries [1] is completely different to "LIKE" operator. [1] https://www.postgresql.org/docs/9.5/textsearch-intro.html#TEXTSEARCH-MATCHING On Sat, Jul 24, 2021 at 4:49 PM Courtney Robinson <courtney.robin...@hypi.io> wrote: > Hey Ari, > Yes, I wasn't suggesting that Solr should be used. That's just what we're > doing now out of necessity. > It was more the fact that Calcite's SqlOperator can be used to provide the > interface to Lucene. > For all the reasons you mentioned and more, using Lucene is the right > choice > > Calcite doesn't have support for Solr but it has an ES adapter which is > what we modified to support Solr. > > Regards, > Courtney Robinson > Founder and CEO, Hypi > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> > > <https://hypi.io> > https://hypi.io > > > On Sat, Jul 24, 2021 at 1:59 PM Atri Sharma <a...@apache.org> wrote: > > > What that entails is that the end user has to keep a Solr cluster > running, > > which comes with its own challenges (now you have to manage two systems > > instead of one). > > > > I believe Calcite has native support for Solr? > > > > OTOH, having native Lucene indices allow us to control per partition > > indices with no distributed overhead, since Lucene is a per node instance > > with no global coordination. > > > > On Sat, 24 Jul 2021, 16:57 Courtney Robinson, <courtney.robin...@hypi.io > > > > wrote: > > > > > I'll add in here. > > > I agree with you Valentin, the decoupled state of text queries makes it > > > useless for most use cases we have. > > > > > > As it relates to Calcite and Ignite 3, one approach (the one we're > taking > > > because we use calcite independent of Ignite) is to provide a bunch of > > SQL > > > functions that we implement as SqlOperator > > > < > > > > > > https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlOperator.html > > > >. > > > I forget how we've done aggregation functions but we have those too and > > > they map to Solr aggregations (which ultimately end up in lucene). > > > > > > This allows Solr filters to take part in the rest of the query. It's > > > probably more complex than this for Ignite but that's one possible > route > > > but we generate queries like select x from T0 where term(args to solr > > term > > > query) AND ... > > > > > > Regards, > > > Courtney Robinson > > > Founder and CEO, Hypi > > > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> > > > > > > <https://hypi.io> > > > https://hypi.io > > > > > > > > > On Fri, Jul 23, 2021 at 7:14 PM Valentin Kulichenko < > > > valentin.kuliche...@gmail.com> wrote: > > > > > > > Atri, > > > > > > > > Sure, go ahead. Let's put the ideas on paper and have a discussion. > > > > > > > > -Val > > > > > > > > On Fri, Jul 23, 2021 at 10:59 AM Atri Sharma <a...@apache.org> > wrote: > > > > > > > > > Thanks Andrey. > > > > > > > > > > I have collected answers or proposals to many of these questions > and > > > > > would like to start a wiki page covering what we can do for Ignite > 3. > > > > > > > > > > Does that sound good, please? > > > > > > > > > > On Fri, Jul 23, 2021 at 4:26 PM Andrey Mashenkov > > > > > <andrey.mashen...@gmail.com> wrote: > > > > > > > > > > > > Atri, > > > > > > > > > > > > First of all, I'd recommend going through the Ignite ticket to > > gather > > > > > > information about the current implementation issues and users' > > wants. > > > > > > Then look at a code to get a complete understanding of how things > > > work > > > > > now, > > > > > > which may help in future decisions. > > > > > > > > > > > > As we use the outdated Lucene version, some things may be > > irrelevant > > > > for > > > > > > the latest Lucene version. > > > > > > So, you will need expertise in the internals of modern Lucene > > version > > > > to > > > > > > understand what capabilities, guarantees, and limitations Lucene > > has > > > > and > > > > > > could bring to the Ignite. > > > > > > The expertise could be got from the Lucene project code or Lucene > > > > project > > > > > > dev-list. > > > > > > > > > > > > > > > > > > As for now, the potential capabilities are not clear to me. > > > > > > At first glance, I see the next topics that must be covered at > > first: > > > > > > > > > > > > General questions > > > > > > * How Lucene index can be split among the nodes? > > > > > > * If we'll have a single index for all partitions on the > particular > > > > node, > > > > > > then how index records will be aware of partitioning? > > > > > > This is important to filter out backup records from the results > to > > > > avoid > > > > > > duplicates. > > > > > > * How results from several nodes can be merged on the Reduce > stage? > > > > > > * Does Lucene supports smth like JOIN operation or others that > may > > > > > require > > > > > > data from another partition or index? > > > > > > If so, then it likes to multistep query with merging results on > > > > > > intermediate stages and requires detailed investigation and > design. > > > > > > It is ok if Ignite will have some limitations here, but we would > > like > > > > to > > > > > > know about them at the early stage. > > > > > > * How effectively map Lucene files to the page memory? Is it even > > > > > possible? > > > > > > Otherwise, how to deal with potential OOM on large queries and > > memory > > > > > > capacity planning? > > > > > > > > > > > > Persistence. > > > > > > * How and what consistency guarantees could we have/expect? > > > > > > Seems, we may not be able to write physical records for Lucene > > index > > > to > > > > > our > > > > > > WAL. What can we do with this? > > > > > > > > > > > > Transactions. > > > > > > * Will we support transactions? > > > > > > * Should Lucene be aware of Transaction and track mvcc (or > > whatever) > > > > > > versions for the records? > > > > > > * What will be consistency guarantees? > > > > > > > > > > > > UX > > > > > > * How to add FullText search queries syntax into Calcite? > > > > > > * AFAIK, the Lucene index has many properties for tuning. How > will > > > the > > > > > user > > > > > > configure the index? > > > > > > * How and where to store the settings? What are cluster-wide and > > > what a > > > > > > local to the particular node? > > > > > > * Will be all the settings immutable? Can be they changed on-fly? > > > after > > > > > > node/grid restart? > > > > > > * Any limitations on query syntax? > > > > > > > > > > > > SQL > > > > > > * Will we support FullText search in SQL? > > > > > > * How to integrate Lucene index into Calcite? What is the cost > > model? > > > > > > Splitting rules? Traits? > > > > > > * What about consistency with DDL operations, e.g. column rename? > > > > > > Ignite indices will operate column ID, so rename operation will > not > > > > > affect > > > > > > the index. > > > > > > > > > > > > > > > > > > With all of this, you can go with the IEP (or even some short > > > summary) > > > > > and > > > > > > further POC and implementation. > > > > > > That's a big deal, so let's discuss what could be done here. > > > > > > > > > > > > On Fri, Jul 23, 2021 at 12:58 PM Atri Sharma <a...@apache.org> > > > wrote: > > > > > > > > > > > > > I am actually happy to drive the feature for Ignite 3. FTS is > > very > > > > > > > important for me and I think Ignite users will benefit from it > > > > > > > greatly. > > > > > > > > > > > > > > If it makes sense to be focusing on Ignite 3 for this > > capability, I > > > > am > > > > > > > eager to contribute there and lead the development. > > > > > > > > > > > > > > Please share your thoughts. > > > > > > > > > > > > > > On Fri, Jul 23, 2021 at 3:21 PM Andrey Mashenkov > > > > > > > <andrey.mashen...@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Atri, > > > > > > > > > > > > > > > > All the Jira tickets we have on the Full-text search (FTS) > > thing > > > > are > > > > > > > > targeted to Ignite 2. > > > > > > > > > > > > > > > > AFAIK, we want, but we have NOT committed to FTS support in > > > Ignite > > > > 3, > > > > > > > yet. > > > > > > > > By the way, we are getting requests for this thing from the > > user > > > > > side, > > > > > > > and > > > > > > > > definitely, > > > > > > > > FTS would be a valuable feature for Ignite. > > > > > > > > > > > > > > > > It will be great if the one wants to drive it, any help will > be > > > > > > > appreciated. > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 23, 2021 at 12:12 PM Atri Sharma < > a...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > An update, please. I am working through persistence of > Lucene > > > > index > > > > > > > using > > > > > > > > > Ignite Dictionary, and will be asking some questions soon. > > > > > > > > > > > > > > > > > > I had one doubt - - where does this change go? Ignite 3? > > > > > > > > > > > > > > > > > > Also, I know we want to build native support for text > > searches > > > in > > > > > > > Ignite 3. > > > > > > > > > Is the work I am proposing here part of that, or will that > > be a > > > > > > > separate > > > > > > > > > effort? > > > > > > > > > > > > > > > > > > On Mon, 28 Jun 2021, 19:20 Ilya Kasnacheev, < > > > > > ilya.kasnach...@gmail.com > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hello! > > > > > > > > > > > > > > > > > > > > I think that number one is the most important one, then > > maybe > > > > it > > > > > > > will see > > > > > > > > > > more use and other deficiencies become more apparent, > > leading > > > > to > > > > > more > > > > > > > > > > tickets and visibility. > > > > > > > > > > > > > > > > > > > > Maybe 2. and 3. will even use a different approach when > > > > > persistence > > > > > > > is > > > > > > > > > > implemented. > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > -- > > > > > > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > пн, 28 июн. 2021 г. в 14:34, Atri Sharma < > a...@apache.org > > >: > > > > > > > > > > > > > > > > > > > > > Hello Again! > > > > > > > > > > > > > > > > > > > > > > I have been looking into the aforementioned and here > are > > my > > > > > follow > > > > > > > up > > > > > > > > > > > thoughts: > > > > > > > > > > > > > > > > > > > > > > 1. Support persistence of Lucene indexes. > > > > > > > > > > > 2. https://issues.apache.org/jira/browse/IGNITE-12401 > > > (Needs > > > > > > > fixing of > > > > > > > > > > > moving partitions first) > > > > > > > > > > > 3. Figure out how to return scores from nodes and use > > them > > > as > > > > > sort > > > > > > > > > > > parameters on the coordinator node > > > > > > > > > > > (https://issues.apache.org/jira/browse/IGNITE-12291) > > > > > > > > > > > > > > > > > > > > > > Please let me know if this looks ok to make text > queries > > > > > > > functional? > > > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 21, 2021 at 2:49 PM Alexei Scherbakov > > > > > > > > > > > <alexey.scherbak...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi. > > > > > > > > > > > > > > > > > > > > > > > > One of the biggest issues with text queries is a lack > > of > > > > > support > > > > > > > for > > > > > > > > > > > lucene > > > > > > > > > > > > indices persistence, which makes this functionality > > > useless > > > > > if a > > > > > > > > > > > > persistence is enabled. > > > > > > > > > > > > > > > > > > > > > > > > I would first take care of it. > > > > > > > > > > > > > > > > > > > > > > > > пн, 21 июн. 2021 г. в 12:16, Maksim Timonin < > > > > > > > timonin.ma...@gmail.com > > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Atri! > > > > > > > > > > > > > > > > > > > > > > > > > > You're right, Actually there is a lack of support > for > > > > > > > TextQueries. > > > > > > > > > > For > > > > > > > > > > > the > > > > > > > > > > > > > last ticket I'm doing I see some obvious issues > with > > > them > > > > > (no > > > > > > > page > > > > > > > > > > size > > > > > > > > > > > > > support, for example). I'm glad that somebody wants > > to > > > > > maintain > > > > > > > > > this > > > > > > > > > > > > > functionality. Thanks a lot! > > > > > > > > > > > > > > > > > > > > > > > > > > For the MergeSort algorithm there is already a > patch > > > for > > > > > that > > > > > > > [1]. > > > > > > > > > > It's > > > > > > > > > > > > > currently on review. This patch introduces an > > abstract > > > > > reducer > > > > > > > for > > > > > > > > > > > > > CacheQueries with 2 implementations (unordered, > > > > > merge-sort). > > > > > > > Then > > > > > > > > > > > TextQuery > > > > > > > > > > > > > leverages on MergeSort to order results from > multiple > > > > > nodes by > > > > > > > > > score. > > > > > > > > > > > This > > > > > > > > > > > > > patch also fixes the pageSize issue, I've mentioned > > > > before. > > > > > > > Could > > > > > > > > > you > > > > > > > > > > > > > please check if it fully matches your idea? Any > > issues > > > or > > > > > > > comments > > > > > > > > > > are > > > > > > > > > > > > > welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > I've prepared this ticket, because I need the > > MergeSort > > > > > > > algorithm > > > > > > > > > for > > > > > > > > > > > the > > > > > > > > > > > > > new type of queries I'm implementing (IndexQuery, > it > > > > should > > > > > > > also > > > > > > > > > > > provide > > > > > > > > > > > > > ordered results over multiple nodes). Currently I'm > > not > > > > > > > planning to > > > > > > > > > > go > > > > > > > > > > > > > further with TextQuery, so if you're going to > support > > > > this > > > > > > > it'll > > > > > > > > > be a > > > > > > > > > > > great > > > > > > > > > > > > > contribution, I think. > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > https://issues.apache.org/jira/browse/IGNITE-14703 > > > > > > > > > > > > > [2] https://github.com/apache/ignite/pull/9081 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 21, 2021 at 11:11 AM Atri Sharma < > > > > > a...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have been looking into our text queries support > > and > > > > see > > > > > > > that it > > > > > > > > > > has > > > > > > > > > > > > > > limited community support. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Therefore, I volunteer to be the maintainer of > the > > > > > module and > > > > > > > > > work > > > > > > > > > > on > > > > > > > > > > > > > > enhancing it further. > > > > > > > > > > > > > > > > > > > > > > > > > > > > First goal would be to move to Lucene 8.x, then > > work > > > on > > > > > > > sorted > > > > > > > > > > reduce > > > > > > > > > > > > > > - merge across nodes. Fundamentally, this is > doable > > > > since > > > > > > > Lucene > > > > > > > > > > > ranks > > > > > > > > > > > > > > documents according to their score, and documents > > are > > > > > > > returned in > > > > > > > > > > the > > > > > > > > > > > > > > order of their score. Since the scoring function > is > > > > > > > homogeneous, > > > > > > > > > > this > > > > > > > > > > > > > > means that across nodes, we can compare scores > and > > > > merge > > > > > > > sort. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please let me know if I can take this up. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Alexei Scherbakov > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best regards, > > > > > > > > Andrey V. Mashenkov > > > > > > > > > > > > > > -- > > > > > > > Regards, > > > > > > > > > > > > > > Atri > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best regards, > > > > > > Andrey V. Mashenkov > > > > > > > > > > -- > > > > > Regards, > > > > > > > > > > Atri > > > > > Apache Concerted > > > > > > > > > > > > > > > -- Best regards, Andrey V. Mashenkov