Hello! I have a hunch that we are trying to build Apache Solr (or Solr Cloud) into Apache Ignite. I think that's a lot of effort that is not very justified.
I don't think we should try to implement sorting in Apache Ignite, because it is a lot of work, and a lot of code in our code base which we don't really want. Regards, -- Ilya Kasnacheev пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <shul...@gmail.com>: > Dear Igniters, > > The first part of TextQuery improvement - a result limit - was developed > and merged. > Now we have to develop most important functionality here - proper sorting > of Lucene index response and correct reducing of them for distributed > queries. > > *There are two Lucene based aspects* > > 1. In case of using no sorting fields, the documents in response are still > ordered by relevance. > Actually this is ScoreDoc.score value. > In order to reduce the distributed results correctly, the score should be > passed with response. > > 2. When sorting by conventional fields, then Lucene should have these > fields properly indexed and > corresponding Sort object should be applied to Lucene's search call. > In order to mark those fields a new annotation like '@SortField' may be > introduced. > > *Reducing on Ignite * > > The obvious point of distributed response reduction is class > GridCacheDistributedQueryFuture. > Though, @Ivan Pavlukhin mentioned class with similar functionality: > ReduceIndexSorted > What I see here, that it is tangled with H2 related classes ( > org.h2.result.Row) and might not be unified with TextQuery reduction. > > Still need a support here. > > Overall, the goal of this letter is to initiate discussion on TextQuery > Sorting implementation and come closer to ticket creation. > > BR, > Yuriy Shuliha > > вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <andrey.mashen...@gmail.com> > пише: > > > Hi Dmitry, Yuriy. > > > > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger > > 'total' field and 'limit; field as primitive int. > > > > Both fields are used inside synchronized block only. > > So, we can make both private and downgrade AtomicInteger to primitive > int. > > > > Most likely, these fields can be replaced with one field. > > > > > > > > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <dpav...@apache.org> > > wrote: > > > > > Hi Andrey, > > > > > > I've checked this ticket comments, and there is a TC Bot visa (with no > > > blockers). > > > > > > Do you have any concerns related to this patch? > > > > > > Sincerely, > > > Dmitriy Pavlov > > > > > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <shul...@gmail.com>: > > > > > >> Andrey, > > >> > > >> Per you request, I created ticket > > >> https://issues.apache.org/jira/browse/IGNITE-12291 linked to > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189 > > >> > > >> Could you please proceed with PR merge ? > > >> > > >> BR, > > >> Yuriy Shuliha > > >> > > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <andrey.mashen...@gmail.com > > > > >> пише: > > >> > > >> > Hi Yuri, > > >> > > > >> > To get access to TC Bot you should register as TeamCity user [1], if > > you > > >> > didn't do this already. > > >> > Then you will be able to authorize on Ignite TC Bot page with same > > >> > credentials. > > >> > > > >> > [1] https://ci.ignite.apache.org/registerUser.html > > >> > > > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <shul...@gmail.com> > > wrote: > > >> > > > >> >> Andrew, > > >> >> > > >> >> I have corrected PR according to your notes. Please review. > > >> >> What will be the next steps in order to merge in? > > >> >> > > >> >> Y. > > >> >> > > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov < > > andrey.mashen...@gmail.com> > > >> >> пише: > > >> >> > > >> >> > Yuri, > > >> >> > > > >> >> > I've done with review. > > >> >> > No crime found, but trivial compatibility bug. > > >> >> > > > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <shul...@gmail.com> > > >> wrote: > > >> >> > > > >> >> > > Denis, > > >> >> > > > > >> >> > > Thank you for your attention to this. > > >> >> > > as for now, the > > https://issues.apache.org/jira/browse/IGNITE-12189 > > >> >> > ticket > > >> >> > > is still pending review. > > >> >> > > Do we have a chance to move it forward somehow? > > >> >> > > > > >> >> > > BR, > > >> >> > > Yuriy Shuliha > > >> >> > > > > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <dma...@apache.org> пише: > > >> >> > > > > >> >> > > > Yuriy, > > >> >> > > > > > >> >> > > > I've seen you opening a pull-request with the first changes: > > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189 > > >> >> > > > > > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the > > review? > > >> >> > > > > > >> >> > > > - > > >> >> > > > Denis > > >> >> > > > > > >> >> > > > > > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван < > > >> vololo...@gmail.com> > > >> >> > > wrote: > > >> >> > > > > > >> >> > > > > Yuriy, > > >> >> > > > > > > >> >> > > > > Thank you for providing details! Quite interesting. > > >> >> > > > > > > >> >> > > > > Yes, we already have support of distributed limit and > merging > > >> >> sorted > > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and > > >> >> > > > > MergeStreamIterator are used for merging sorted streams. > > >> >> > > > > > > >> >> > > > > Could you please also clarify about score/relevance? Is it > > >> >> provided > > >> >> > by > > >> >> > > > > Lucene engine for each query result? I am thinking how to > do > > >> >> sorted > > >> >> > > > > merge properly in this case. > > >> >> > > > > > > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga < > > shul...@gmail.com > > >> >: > > >> >> > > > > > > > >> >> > > > > > Ivan, > > >> >> > > > > > > > >> >> > > > > > Thank you for interesting question! > > >> >> > > > > > > > >> >> > > > > > Text searches (or full text searches) are mostly > > >> human-oriented. > > >> >> > And > > >> >> > > > the > > >> >> > > > > > point of user's interest is topmost part of response. > > >> >> > > > > > Then user can read it, evaluate and use the given records > > for > > >> >> > further > > >> >> > > > > > purposes. > > >> >> > > > > > > > >> >> > > > > > Particularly in our case, we use Ignite for operations > with > > >> >> > financial > > >> >> > > > > data, > > >> >> > > > > > and there lots of text stuff like assets names, fin. > > >> >> instruments, > > >> >> > > > > companies > > >> >> > > > > > etc. > > >> >> > > > > > In order to operate with this quickly and reliably, users > > >> used > > >> >> to > > >> >> > > work > > >> >> > > > > with > > >> >> > > > > > text search, type-ahead completions, suggestions. > > >> >> > > > > > > > >> >> > > > > > For this purposes we are indexing particular string data > in > > >> >> > separate > > >> >> > > > > caches. > > >> >> > > > > > > > >> >> > > > > > Sorting capabilities and response size limitations are > very > > >> >> > important > > >> >> > > > > > there. As our API have to provide most relevant > information > > >> in > > >> >> view > > >> >> > > of > > >> >> > > > > > limited size. > > >> >> > > > > > > > >> >> > > > > > Now let me comment some Ignite/Lucene perspective. > > >> >> > > > > > Actually Ignite queries and Lucene returns > > >> *TopDocs.scoresDocs > > >> >> > > *already > > >> >> > > > > > sorted by *score *(relevance). So most relevant documents > > >> are on > > >> >> > the > > >> >> > > > top. > > >> >> > > > > > And currently distributed queries responses from > different > > >> nodes > > >> >> > are > > >> >> > > > > merged > > >> >> > > > > > into final query cursor queue in arbitrary way. > > >> >> > > > > > So in fact we already have the score order ruined here. > > Also > > >> >> Ignite > > >> >> > > > > > requests all possible documents from Lucene that is > > redundant > > >> >> and > > >> >> > not > > >> >> > > > > good > > >> >> > > > > > for performance. > > >> >> > > > > > > > >> >> > > > > > I'm implementing *limit* parameter to be part of > *TextQuery > > >> *and > > >> >> > have > > >> >> > > > to > > >> >> > > > > > notice that we still have to add sorting for text queries > > >> >> > processing > > >> >> > > in > > >> >> > > > > > order to have applicable results. > > >> >> > > > > > > > >> >> > > > > > *Limit* parameter itself should improve the part of > issues > > >> from > > >> >> > > above, > > >> >> > > > > but > > >> >> > > > > > definitely, sorting by document score at least should be > > >> >> > implemented > > >> >> > > > > along > > >> >> > > > > > with limit. > > >> >> > > > > > > > >> >> > > > > > This is a pretty short commentary if you still have any > > >> >> questions, > > >> >> > > > please > > >> >> > > > > > ask, do not hesitate) > > >> >> > > > > > > > >> >> > > > > > BR, > > >> >> > > > > > Yuriy Shuliha > > >> >> > > > > > > > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван < > > vololo...@gmail.com> > > >> >> пише: > > >> >> > > > > > > > >> >> > > > > > > Yuriy, > > >> >> > > > > > > > > >> >> > > > > > > Greatly appreciate your interest. > > >> >> > > > > > > > > >> >> > > > > > > Could you please elaborate a little bit about sorting? > > What > > >> >> tasks > > >> >> > > > does > > >> >> > > > > > > it help to solve and how? It would be great to provide > an > > >> >> > example. > > >> >> > > > > > > > > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > > >> >> > > > > > > alexey.scherbak...@gmail.com>: > > >> >> > > > > > > > > > >> >> > > > > > > > Denis, > > >> >> > > > > > > > > > >> >> > > > > > > > I like the idea of throwing an exception for enabled > > text > > >> >> > queries > > >> >> > > > on > > >> >> > > > > > > > persistent caches. > > >> >> > > > > > > > > > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted > > searches. > > >> >> > > > > > > > > > >> >> > > > > > > > Yury, please proceed with ticket creation. > > >> >> > > > > > > > > > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda < > > >> dma...@apache.org > > >> >> >: > > >> >> > > > > > > > > > >> >> > > > > > > > > Igniters, > > >> >> > > > > > > > > > > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in regards > > >> >> full-text > > >> >> > > > > search > > >> >> > > > > > > API > > >> >> > > > > > > > > evolution as long as Yury is ready to push it > > forward. > > >> >> > > > > > > > > > > >> >> > > > > > > > > As for the in-memory mode only, it makes total > sense > > >> for > > >> >> > > > in-memory > > >> >> > > > > data > > >> >> > > > > > > > > grid deployments when Ignite caches data of an > > >> underlying > > >> >> DB > > >> >> > > like > > >> >> > > > > > > Postgres. > > >> >> > > > > > > > > As part of the changes, I would simply throw an > > >> exception > > >> >> (by > > >> >> > > > > default) > > >> >> > > > > > > if > > >> >> > > > > > > > > the one attempts to use text indices with the > native > > >> >> > > persistence > > >> >> > > > > > > enabled. > > >> >> > > > > > > > > If the person is ready to live with that limitation > > >> that > > >> >> an > > >> >> > > > > explicit > > >> >> > > > > > > > > configuration change is needed to come around the > > >> >> exception. > > >> >> > > > > > > > > > > >> >> > > > > > > > > Thoughts? > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > - > > >> >> > > > > > > > > Denis > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga < > > >> >> > > shul...@gmail.com > > >> >> > > > > > > >> >> > > > > > > wrote: > > >> >> > > > > > > > > > > >> >> > > > > > > > > > Hello to all again, > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Thank you for important comments and notes given > > >> below! > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Let me answer and continue the discussion. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Alexei has referenced to > > >> >> > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-5371 > > >> where > > >> >> > > > > > > > > > absence of index persistence was declared as an > > >> >> obstacle to > > >> >> > > > > further > > >> >> > > > > > > > > > development. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > a) This ticket is already closed as not valid.b) > > >> There > > >> >> are > > >> >> > > > > definite > > >> >> > > > > > > needs > > >> >> > > > > > > > > > (and in our project as well) in just in-memory > > >> indexing > > >> >> of > > >> >> > > > > selected > > >> >> > > > > > > data. > > >> >> > > > > > > > > > We intend to use search capabilities for fetching > > >> >> limited > > >> >> > > > amount > > >> >> > > > > of > > >> >> > > > > > > > > records > > >> >> > > > > > > > > > that should be used in type-ahead search / > > >> suggestions. > > >> >> > > > > > > > > > Not all of the data will be indexed and the are > no > > >> need > > >> >> in > > >> >> > > > Lucene > > >> >> > > > > > > index > > >> >> > > > > > > > > to > > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of > > >> >> text-search > > >> >> > > > usage. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > (II) Necessary fixes in current implementation. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset* > > seems > > >> to > > >> >> be > > >> >> > > not > > >> >> > > > > > > required > > >> >> > > > > > > > > in > > >> >> > > > > > > > > > text-search tasks for now) > > >> >> > > > > > > > > > I have investigated the data flow for distributed > > >> text > > >> >> > > queries. > > >> >> > > > > it > > >> >> > > > > > > was > > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'* > > >> >> > > > > > > > > > For now each server-node returns all response > > >> records to > > >> >> > the > > >> >> > > > > > > client-node > > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred thousands > > >> >> records. > > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all > the > > >> >> results > > >> >> > > are > > >> >> > > > > added > > >> >> > > > > > > to > > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in arbitrary > > >> order > > >> >> by > > >> >> > > > pages. > > >> >> > > > > > > > > > I did not find here any means to deliver > > >> deterministic > > >> >> > > result. > > >> >> > > > > > > > > > So implementing limit as part of query and > > >> >> > > > > (GridCacheQueryRequest) > > >> >> > > > > > > will > > >> >> > > > > > > > > not > > >> >> > > > > > > > > > change the nature of response but will limit load > > on > > >> >> nodes > > >> >> > > and > > >> >> > > > > > > > > networking. > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > Can we consider to open a ticket for this? > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > (III) Further extension of Lucene API exposition > to > > >> >> Ignite > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > a) Sorting > > >> >> > > > > > > > > > The solution for this could be: > > >> >> > > > > > > > > > - Make entities comparable > > >> >> > > > > > > > > > - Add custom comparator to entity > > >> >> > > > > > > > > > - Add annotations to mark sorted fields for > Lucene > > >> >> indexing > > >> >> > > > > > > > > > - Use comparators when merging responses or > > reducing > > >> to > > >> >> > > desired > > >> >> > > > > > > limit on > > >> >> > > > > > > > > > client node. > > >> >> > > > > > > > > > Will require full result set to be loaded into > > >> memory. > > >> >> > Though > > >> >> > > > > can be > > >> >> > > > > > > used > > >> >> > > > > > > > > > for relatively small limits. > > >> >> > > > > > > > > > BR, > > >> >> > > > > > > > > > Yuriy Shuliha > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov < > > >> >> > > > > > > > > alexey.scherbak...@gmail.com> > > >> >> > > > > > > > > > пише: > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > Yuriy, > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > Note what one of major blockers for text > queries > > is > > >> >> [1] > > >> >> > > which > > >> >> > > > > makes > > >> >> > > > > > > > > > lucene > > >> >> > > > > > > > > > > indexes unusable with persistence and main > reason > > >> for > > >> >> > > > > > > discontinuation. > > >> >> > > > > > > > > > > Probably it's should be addressed first to make > > >> text > > >> >> > > queries > > >> >> > > > a > > >> >> > > > > > > valid > > >> >> > > > > > > > > > > product feature. > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > Distributed sorting and advanved querying is > > indeed > > >> >> not a > > >> >> > > > > trivial > > >> >> > > > > > > task. > > >> >> > > > > > > > > > > Some kind of merging must be implemented on > query > > >> >> > > originating > > >> >> > > > > node. > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > [1] > > >> https://issues.apache.org/jira/browse/IGNITE-5371 > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda < > > >> >> > > dma...@apache.org > > >> >> > > > >: > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > Yuriy, > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > If you are ready to take over the full-text > > >> search > > >> >> > > indexes > > >> >> > > > > then > > >> >> > > > > > > > > please > > >> >> > > > > > > > > > go > > >> >> > > > > > > > > > > > ahead. The primary reason why the community > > >> wants to > > >> >> > > > > discontinue > > >> >> > > > > > > them > > >> >> > > > > > > > > > > first > > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the > > >> limitations > > >> >> > > listed > > >> >> > > > > by > > >> >> > > > > > > Andrey > > >> >> > > > > > > > > > and > > >> >> > > > > > > > > > > > minimal support from the community end. > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > - > > >> >> > > > > > > > > > > > Denis > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey > > Mashenkov > > >> < > > >> >> > > > > > > > > > > > andrey.mashen...@gmail.com> > > >> >> > > > > > > > > > > > wrote: > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Hi Yuriy, > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to > > discontinue > > >> >> > > > TextQueries > > >> >> > > > > in > > >> >> > > > > > > > > Ignite > > >> >> > > > > > > > > > > [1]. > > >> >> > > > > > > > > > > > > Motivation here is text indexes are not > > >> >> persistent, > > >> >> > not > > >> >> > > > > > > > > transactional > > >> >> > > > > > > > > > > and > > >> >> > > > > > > > > > > > > can't be user together with SQL or inside > > SQL. > > >> >> > > > > > > > > > > > > and there is a lack of interest from > > community > > >> >> side. > > >> >> > > > > > > > > > > > > You are weclome to take on these issues and > > >> make > > >> >> > > > > TextQueries > > >> >> > > > > > > great. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit > > resultset. > > >> >> > > > > > > > > > > > > Query results return from data node to > > >> client-side > > >> >> > > cursor > > >> >> > > > > in > > >> >> > > > > > > > > > > page-by-page > > >> >> > > > > > > > > > > > > manner and > > >> >> > > > > > > > > > > > > this parameter is designed control page > size. > > >> It > > >> >> is > > >> >> > > > > supposed > > >> >> > > > > > > query > > >> >> > > > > > > > > > > > executes > > >> >> > > > > > > > > > > > > lazily on server side and > > >> >> > > > > > > > > > > > > it is not excepted full resultset be loaded > > to > > >> >> memory > > >> >> > > on > > >> >> > > > > server > > >> >> > > > > > > > > side > > >> >> > > > > > > > > > at > > >> >> > > > > > > > > > > > > once, but by pages. > > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire > > >> resultset > > >> >> > into > > >> >> > > > > memory > > >> >> > > > > > > > > before > > >> >> > > > > > > > > > > > first > > >> >> > > > > > > > > > > > > page is sent to client? > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > I'd think a new parameter should be added > to > > >> limit > > >> >> > > > result. > > >> >> > > > > The > > >> >> > > > > > > best > > >> >> > > > > > > > > > > > > solution is to use query language commands > > for > > >> >> this, > > >> >> > > e.g. > > >> >> > > > > > > > > > > "LIMIT/OFFSET" > > >> >> > > > > > > > > > > > in > > >> >> > > > > > > > > > > > > SQL. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is > > >> >> distributed > > >> >> > > > > operation > > >> >> > > > > > > and > > >> >> > > > > > > > > > same > > >> >> > > > > > > > > > > > > user query will be executed on data nodes > > >> >> > > > > > > > > > > > > and then results from all nodes should be > > >> correcly > > >> >> > > merged > > >> >> > > > > > > before > > >> >> > > > > > > > > > being > > >> >> > > > > > > > > > > > > returned via client-cursor. > > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every node > and > > >> >> then on > > >> >> > > > merge > > >> >> > > > > > > phase. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting > > results > > >> >> make > > >> >> > no > > >> >> > > > > sence > > >> >> > > > > > > > > without > > >> >> > > > > > > > > > > > > sorting, > > >> >> > > > > > > > > > > > > as there is no guarantee every next query > run > > >> will > > >> >> > > return > > >> >> > > > > same > > >> >> > > > > > > data > > >> >> > > > > > > > > > > > because > > >> >> > > > > > > > > > > > > of page reordeing. > > >> >> > > > > > > > > > > > > Basically, merge phase receive results from > > >> data > > >> >> > nodes > > >> >> > > > > > > > > asynchronously > > >> >> > > > > > > > > > > and > > >> >> > > > > > > > > > > > > messages from different nodes can't be > > ordered. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 2. > > >> >> > > > > > > > > > > > > a. "tokenize" param name (for > > @QueryTextFiled) > > >> >> looks > > >> >> > > more > > >> >> > > > > > > verbose, > > >> >> > > > > > > > > > > isn't > > >> >> > > > > > > > > > > > > it. > > >> >> > > > > > > > > > > > > b,c. What about distributed query? How > > partial > > >> >> > results > > >> >> > > > from > > >> >> > > > > > > nodes > > >> >> > > > > > > > > > will > > >> >> > > > > > > > > > > be > > >> >> > > > > > > > > > > > > merged? > > >> >> > > > > > > > > > > > > Does Lucene allows to configure comparator > > for > > >> >> data > > >> >> > > > > sorting? > > >> >> > > > > > > > > > > > > What comparator Ignite should choose to > sort > > >> >> result > > >> >> > on > > >> >> > > > > merge > > >> >> > > > > > > phase? > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not > configurable > > at > > >> >> all. > > >> >> > > E.g. > > >> >> > > > > it is > > >> >> > > > > > > > > > > > impossible > > >> >> > > > > > > > > > > > > to configure Tokenizer. > > >> >> > > > > > > > > > > > > I'd think about possible ways to configure > > >> engine > > >> >> at > > >> >> > > > first > > >> >> > > > > and > > >> >> > > > > > > only > > >> >> > > > > > > > > > > then > > >> >> > > > > > > > > > > > go > > >> >> > > > > > > > > > > > > further to discuss\implement complex > > features, > > >> >> > > > > > > > > > > > > that may depends on engine config. > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy > > Shuliga < > > >> >> > > > > > > shul...@gmail.com> > > >> >> > > > > > > > > > > wrote: > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Dear community, > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > By starting this chain I'd like to open > > >> >> discussion > > >> >> > > that > > >> >> > > > > would > > >> >> > > > > > > > > come > > >> >> > > > > > > > > > to > > >> >> > > > > > > > > > > > > > contribution results in subj. area. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, backed > up > > >> by > > >> >> > > > different > > >> >> > > > > > > > > > mechanisms, > > >> >> > > > > > > > > > > > > > including Lucene. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past > year > > >> >> > release). > > >> >> > > > > > > > > > > > > > This is a wide spread and mature > technology > > >> that > > >> >> > > covers > > >> >> > > > > text > > >> >> > > > > > > > > search > > >> >> > > > > > > > > > > > area > > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data indexing). > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene > > >> functionality > > >> >> to > > >> >> > > > Ignite > > >> >> > > > > > > > > indexing > > >> >> > > > > > > > > > > and > > >> >> > > > > > > > > > > > > > query mechanisms for text data*. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > It's quite simple request at current > stage. > > >> It > > >> >> is > > >> >> > > > coming > > >> >> > > > > > > from our > > >> >> > > > > > > > > > > > > project's > > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful for > a > > >> lot > > >> >> more > > >> >> > > > > people. > > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss > > about > > >> >> Jira > > >> >> > > > > tickets for > > >> >> > > > > > > > > them. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 1.[trivial] Use dataQuery.getPageSize() > > to > > >> >> limit > > >> >> > > > search > > >> >> > > > > > > > > response > > >> >> > > > > > > > > > > > items > > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). Currently > > it > > >> is > > >> >> > > calling > > >> >> > > > > > > > > > > > > > IndexSearcher.search(query, > > >> >> *Integer.MAX_VALUE*) - > > >> >> > so > > >> >> > > > > > > basically > > >> >> > > > > > > > > all > > >> >> > > > > > > > > > > > > scored > > >> >> > > > > > > > > > > > > > matches will me returned, what we do not > > >> need in > > >> >> > most > > >> >> > > > > cases. > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more > capable > > >> >> search > > >> >> > > call > > >> >> > > > > can be > > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query, > > count, > > >> >> > > > > > > > > > > > > > sort) * > > >> >> > > > > > > > > > > > > > Implementation steps: > > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField* > parameter > > in > > >> >> > > > > > > *@QueryTextFiled * > > >> >> > > > > > > > > > > > > > annotation. If > > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but not > > >> >> tokenized. > > >> >> > > > > Number > > >> >> > > > > > > types > > >> >> > > > > > > > > > are > > >> >> > > > > > > > > > > > > > preferred here. > > >> >> > > > > > > > > > > > > > b) Add *sort* collection to *TextQuery* > > >> >> > constructor. > > >> >> > > It > > >> >> > > > > > > should > > >> >> > > > > > > > > > define > > >> >> > > > > > > > > > > > > > desired sort fields used for querying. > > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in > > >> >> > > > > GridLuceneIndex.query(). > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries with > > >> >> > *TextQuery*, > > >> >> > > > > > > including > > >> >> > > > > > > > > > > > > > terms/queries boosting. > > >> >> > > > > > > > > > > > > > *This section for voting only, as > requires > > >> more > > >> >> > > > detailed > > >> >> > > > > > > work. > > >> >> > > > > > > > > > Should > > >> >> > > > > > > > > > > > be > > >> >> > > > > > > > > > > > > > extended if community is interested in > it.* > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > Looking forward to your comments! > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > BR, > > >> >> > > > > > > > > > > > > > Yuriy Shuliha > > >> >> > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > -- > > >> >> > > > > > > > > > > > > Best regards, > > >> >> > > > > > > > > > > > > Andrey V. Mashenkov > > >> >> > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > -- > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > Best regards, > > >> >> > > > > > > > > > > Alexei Scherbakov > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > -- > > >> >> > > > > > > Best regards, > > >> >> > > > > > > Ivan Pavlukhin > > >> >> > > > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > -- > > >> >> > > > > Best regards, > > >> >> > > > > Ivan Pavlukhin > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > -- > > >> >> > Best regards, > > >> >> > Andrey V. Mashenkov > > >> >> > > > >> >> > > >> > > > >> > > > >> > -- > > >> > Best regards, > > >> > Andrey V. Mashenkov > > >> > > > >> > > > > > > > -- > > Best regards, > > Andrey V. Mashenkov > > >