In my experience, one of the biggest usability issues with the current support of text queries is that they are completely decoupled from SQL. I.e. you can either execute a SQL query OR a text query. Modern databases, on the other hand, typically allow creating text-based indexes within regular tables and then using those indexes within regular SQL queries. Here is an example from Oracle: https://docs.oracle.com/cd/B10501_01/text.920/a96517/cdefault.htm
I believe this is something we can look into in the scope of Ignite 3. Andrey, does Calcite have any support for this? What's your view on this? -Val On Fri, Jul 23, 2021 at 3:56 AM Andrey Mashenkov <andrey.mashen...@gmail.com> wrote: > Atri, > > First of all, I'd recommend going through the Ignite ticket to gather > information about the current implementation issues and users' wants. > Then look at a code to get a complete understanding of how things work now, > which may help in future decisions. > > As we use the outdated Lucene version, some things may be irrelevant for > the latest Lucene version. > So, you will need expertise in the internals of modern Lucene version to > understand what capabilities, guarantees, and limitations Lucene has and > could bring to the Ignite. > The expertise could be got from the Lucene project code or Lucene project > dev-list. > > > As for now, the potential capabilities are not clear to me. > At first glance, I see the next topics that must be covered at first: > > General questions > * How Lucene index can be split among the nodes? > * If we'll have a single index for all partitions on the particular node, > then how index records will be aware of partitioning? > This is important to filter out backup records from the results to avoid > duplicates. > * How results from several nodes can be merged on the Reduce stage? > * Does Lucene supports smth like JOIN operation or others that may require > data from another partition or index? > If so, then it likes to multistep query with merging results on > intermediate stages and requires detailed investigation and design. > It is ok if Ignite will have some limitations here, but we would like to > know about them at the early stage. > * How effectively map Lucene files to the page memory? Is it even possible? > Otherwise, how to deal with potential OOM on large queries and memory > capacity planning? > > Persistence. > * How and what consistency guarantees could we have/expect? > Seems, we may not be able to write physical records for Lucene index to our > WAL. What can we do with this? > > Transactions. > * Will we support transactions? > * Should Lucene be aware of Transaction and track mvcc (or whatever) > versions for the records? > * What will be consistency guarantees? > > UX > * How to add FullText search queries syntax into Calcite? > * AFAIK, the Lucene index has many properties for tuning. How will the user > configure the index? > * How and where to store the settings? What are cluster-wide and what a > local to the particular node? > * Will be all the settings immutable? Can be they changed on-fly? after > node/grid restart? > * Any limitations on query syntax? > > SQL > * Will we support FullText search in SQL? > * How to integrate Lucene index into Calcite? What is the cost model? > Splitting rules? Traits? > * What about consistency with DDL operations, e.g. column rename? > Ignite indices will operate column ID, so rename operation will not affect > the index. > > > With all of this, you can go with the IEP (or even some short summary) and > further POC and implementation. > That's a big deal, so let's discuss what could be done here. > > On Fri, Jul 23, 2021 at 12:58 PM Atri Sharma <a...@apache.org> wrote: > > > I am actually happy to drive the feature for Ignite 3. FTS is very > > important for me and I think Ignite users will benefit from it > > greatly. > > > > If it makes sense to be focusing on Ignite 3 for this capability, I am > > eager to contribute there and lead the development. > > > > Please share your thoughts. > > > > On Fri, Jul 23, 2021 at 3:21 PM Andrey Mashenkov > > <andrey.mashen...@gmail.com> wrote: > > > > > > Hi Atri, > > > > > > All the Jira tickets we have on the Full-text search (FTS) thing are > > > targeted to Ignite 2. > > > > > > AFAIK, we want, but we have NOT committed to FTS support in Ignite 3, > > yet. > > > By the way, we are getting requests for this thing from the user side, > > and > > > definitely, > > > FTS would be a valuable feature for Ignite. > > > > > > It will be great if the one wants to drive it, any help will be > > appreciated. > > > > > > > > > On Fri, Jul 23, 2021 at 12:12 PM Atri Sharma <a...@apache.org> wrote: > > > > > > > Hello, > > > > > > > > An update, please. I am working through persistence of Lucene index > > using > > > > Ignite Dictionary, and will be asking some questions soon. > > > > > > > > I had one doubt - - where does this change go? Ignite 3? > > > > > > > > Also, I know we want to build native support for text searches in > > Ignite 3. > > > > Is the work I am proposing here part of that, or will that be a > > separate > > > > effort? > > > > > > > > On Mon, 28 Jun 2021, 19:20 Ilya Kasnacheev, < > ilya.kasnach...@gmail.com > > > > > > > wrote: > > > > > > > > > Hello! > > > > > > > > > > I think that number one is the most important one, then maybe it > > will see > > > > > more use and other deficiencies become more apparent, leading to > more > > > > > tickets and visibility. > > > > > > > > > > Maybe 2. and 3. will even use a different approach when persistence > > is > > > > > implemented. > > > > > > > > > > Regards, > > > > > -- > > > > > Ilya Kasnacheev > > > > > > > > > > > > > > > пн, 28 июн. 2021 г. в 14:34, Atri Sharma <a...@apache.org>: > > > > > > > > > > > Hello Again! > > > > > > > > > > > > I have been looking into the aforementioned and here are my > follow > > up > > > > > > thoughts: > > > > > > > > > > > > 1. Support persistence of Lucene indexes. > > > > > > 2. https://issues.apache.org/jira/browse/IGNITE-12401 (Needs > > fixing of > > > > > > moving partitions first) > > > > > > 3. Figure out how to return scores from nodes and use them as > sort > > > > > > parameters on the coordinator node > > > > > > (https://issues.apache.org/jira/browse/IGNITE-12291) > > > > > > > > > > > > Please let me know if this looks ok to make text queries > > functional? > > > > > > > > > > > > Atri > > > > > > > > > > > > On Mon, Jun 21, 2021 at 2:49 PM Alexei Scherbakov > > > > > > <alexey.scherbak...@gmail.com> wrote: > > > > > > > > > > > > > > Hi. > > > > > > > > > > > > > > One of the biggest issues with text queries is a lack of > support > > for > > > > > > lucene > > > > > > > indices persistence, which makes this functionality useless if > a > > > > > > > persistence is enabled. > > > > > > > > > > > > > > I would first take care of it. > > > > > > > > > > > > > > пн, 21 июн. 2021 г. в 12:16, Maksim Timonin < > > timonin.ma...@gmail.com > > > > >: > > > > > > > > > > > > > > > Hi, Atri! > > > > > > > > > > > > > > > > You're right, Actually there is a lack of support for > > TextQueries. > > > > > For > > > > > > the > > > > > > > > last ticket I'm doing I see some obvious issues with them (no > > page > > > > > size > > > > > > > > support, for example). I'm glad that somebody wants to > maintain > > > > this > > > > > > > > functionality. Thanks a lot! > > > > > > > > > > > > > > > > For the MergeSort algorithm there is already a patch for that > > [1]. > > > > > It's > > > > > > > > currently on review. This patch introduces an abstract > reducer > > for > > > > > > > > CacheQueries with 2 implementations (unordered, merge-sort). > > Then > > > > > > TextQuery > > > > > > > > leverages on MergeSort to order results from multiple nodes > by > > > > score. > > > > > > This > > > > > > > > patch also fixes the pageSize issue, I've mentioned before. > > Could > > > > you > > > > > > > > please check if it fully matches your idea? Any issues or > > comments > > > > > are > > > > > > > > welcome. > > > > > > > > > > > > > > > > I've prepared this ticket, because I need the MergeSort > > algorithm > > > > for > > > > > > the > > > > > > > > new type of queries I'm implementing (IndexQuery, it should > > also > > > > > > provide > > > > > > > > ordered results over multiple nodes). Currently I'm not > > planning to > > > > > go > > > > > > > > further with TextQuery, so if you're going to support this > > it'll > > > > be a > > > > > > great > > > > > > > > contribution, I think. > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-14703 > > > > > > > > [2] https://github.com/apache/ignite/pull/9081 > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 21, 2021 at 11:11 AM Atri Sharma < > a...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > I have been looking into our text queries support and see > > that it > > > > > has > > > > > > > > > limited community support. > > > > > > > > > > > > > > > > > > Therefore, I volunteer to be the maintainer of the module > and > > > > work > > > > > on > > > > > > > > > enhancing it further. > > > > > > > > > > > > > > > > > > First goal would be to move to Lucene 8.x, then work on > > sorted > > > > > reduce > > > > > > > > > - merge across nodes. Fundamentally, this is doable since > > Lucene > > > > > > ranks > > > > > > > > > documents according to their score, and documents are > > returned in > > > > > the > > > > > > > > > order of their score. Since the scoring function is > > homogeneous, > > > > > this > > > > > > > > > means that across nodes, we can compare scores and merge > > sort. > > > > > > > > > > > > > > > > > > Please let me know if I can take this up. > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > Atri > > > > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Best regards, > > > > > > > Alexei Scherbakov > > > > > > > > > > > > -- > > > > > > Regards, > > > > > > > > > > > > Atri > > > > > > Apache Concerted > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Andrey V. Mashenkov > > > > -- > > Regards, > > > > Atri > > Apache Concerted > > > > > -- > Best regards, > Andrey V. Mashenkov >