Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Zhenya Stanilovsky Tue, 26 Nov 2019 03:12:29 -0800
Ok, lets forgot Solr and go through ASF way, if Yuriy prove this functionality 
is helpful and PR it, why not ?
 
isn`t it ?
  
>Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev 
><ilya.kasnach...@gmail.com>:
> 
>Hello!
>
>The problem here is that Solr is a multi-year effort by a lot of people. We
>can't match that.
>
>Maybe we could integrate with Solr/Solr Cloud instead, by feeding our cache
>information into their storage for indexing and relying on their own
>mechanisms for distributed IR sorting?
>
>Regards,
>--
>Ilya Kasnacheev
>
>
>вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < arzamas...@mail.ru.invalid
>>:
>
>>
>> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
>>
>> thanks !
>>
>> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
>>  ilya.kasnach...@gmail.com >:
>> >
>> >Hello!
>> >
>> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
>> into
>> >Apache Ignite. I think that's a lot of effort that is not very justified.
>> >
>> >I don't think we should try to implement sorting in Apache Ignite, because
>> >it is a lot of work, and a lot of code in our code base which we don't
>> >really want.
>> >
>> >Regards,
>> >--
>> >Ilya Kasnacheev
>> >
>> >
>> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shul...@gmail.com >:
>> >
>> >> Dear Igniters,
>> >>
>> >> The first part of TextQuery improvement - a result limit - was developed
>> >> and merged.
>> >> Now we have to develop most important functionality here - proper
>> sorting
>> >> of Lucene index response and correct reducing of them for distributed
>> >> queries.
>> >>
>> >> *There are two Lucene based aspects*
>> >>
>> >> 1. In case of using no sorting fields, the documents in response are
>> still
>> >> ordered by relevance.
>> >> Actually this is ScoreDoc.score value.
>> >> In order to reduce the distributed results correctly, the score should
>> be
>> >> passed with response.
>> >>
>> >> 2. When sorting by conventional fields, then Lucene should have these
>> >> fields properly indexed and
>> >> corresponding Sort object should be applied to Lucene's search call.
>> >> In order to mark those fields a new annotation like '@SortField' may be
>> >> introduced.
>> >>
>> >> *Reducing on Ignite *
>> >>
>> >> The obvious point of distributed response reduction is class
>> >> GridCacheDistributedQueryFuture.
>> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
>> >> ReduceIndexSorted
>> >> What I see here, that it is tangled with H2 related classes (
>> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
>> >>
>> >> Still need a support here.
>> >>
>> >> Overall, the goal of this letter is to initiate discussion on TextQuery
>> >> Sorting implementation and come closer to ticket creation.
>> >>
>> >> BR,
>> >> Yuriy Shuliha
>> >>
>> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <  andrey.mashen...@gmail.com
>> >
>> >> пише:
>> >>
>> >> > Hi Dmitry, Yuriy.
>> >> >
>> >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
>> >> > 'total' field and 'limit; field as primitive int.
>> >> >
>> >> > Both fields are used inside synchronized block only.
>> >> > So, we can make both private and downgrade AtomicInteger to primitive
>> >> int.
>> >> >
>> >> > Most likely, these fields can be replaced with one field.
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <  dpav...@apache.org
>> >
>> >> > wrote:
>> >> >
>> >> > > Hi Andrey,
>> >> > >
>> >> > > I've checked this ticket comments, and there is a TC Bot visa (with
>> no
>> >> > > blockers).
>> >> > >
>> >> > > Do you have any concerns related to this patch?
>> >> > >
>> >> > > Sincerely,
>> >> > > Dmitriy Pavlov
>> >> > >
>> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <  shul...@gmail.com >:
>> >> > >
>> >> > >> Andrey,
>> >> > >>
>> >> > >> Per you request, I created ticket
>> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
>> >> > >>
>>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
>> >> > >>
>> >> > >> Could you please proceed with PR merge ?
>> >> > >>
>> >> > >> BR,
>> >> > >> Yuriy Shuliha
>> >> > >>
>> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
>>  andrey.mashen...@gmail.com
>> >> >
>> >> > >> пише:
>> >> > >>
>> >> > >> > Hi Yuri,
>> >> > >> >
>> >> > >> > To get access to TC Bot you should register as TeamCity user
>> [1], if
>> >> > you
>> >> > >> > didn't do this already.
>> >> > >> > Then you will be able to authorize on Ignite TC Bot page with
>> same
>> >> > >> > credentials.
>> >> > >> >
>> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
>> >> > >> >
>> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <  shul...@gmail.com
>> >
>> >> > wrote:
>> >> > >> >
>> >> > >> >> Andrew,
>> >> > >> >>
>> >> > >> >> I have corrected PR according to your notes. Please review.
>> >> > >> >> What will be the next steps in order to merge in?
>> >> > >> >>
>> >> > >> >> Y.
>> >> > >> >>
>> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
>> >> >  andrey.mashen...@gmail.com >
>> >> > >> >> пише:
>> >> > >> >>
>> >> > >> >> > Yuri,
>> >> > >> >> >
>> >> > >> >> > I've done with review.
>> >> > >> >> > No crime found, but trivial compatibility bug.
>> >> > >> >> >
>> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
>>  shul...@gmail.com >
>> >> > >> wrote:
>> >> > >> >> >
>> >> > >> >> > > Denis,
>> >> > >> >> > >
>> >> > >> >> > > Thank you for your attention to this.
>> >> > >> >> > > as for now, the
>> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
>> >> > >> >> > ticket
>> >> > >> >> > > is still pending review.
>> >> > >> >> > > Do we have a chance to move it forward somehow?
>> >> > >> >> > >
>> >> > >> >> > > BR,
>> >> > >> >> > > Yuriy Shuliha
>> >> > >> >> > >
>> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <  dma...@apache.org >
>> пише:
>> >> > >> >> > >
>> >> > >> >> > > > Yuriy,
>> >> > >> >> > > >
>> >> > >> >> > > > I've seen you opening a pull-request with the first
>> changes:
>> >> > >> >> > > >  https://issues.apache.org/jira/browse/IGNITE-12189
>> >> > >> >> > > >
>> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the
>> >> > review?
>> >> > >> >> > > >
>> >> > >> >> > > > -
>> >> > >> >> > > > Denis
>> >> > >> >> > > >
>> >> > >> >> > > >
>> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
>> >> > >>  vololo...@gmail.com >
>> >> > >> >> > > wrote:
>> >> > >> >> > > >
>> >> > >> >> > > > > Yuriy,
>> >> > >> >> > > > >
>> >> > >> >> > > > > Thank you for providing details! Quite interesting.
>> >> > >> >> > > > >
>> >> > >> >> > > > > Yes, we already have support of distributed limit and
>> >> merging
>> >> > >> >> sorted
>> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
>> >> > >> >> > > > > MergeStreamIterator are used for merging sorted streams.
>> >> > >> >> > > > >
>> >> > >> >> > > > > Could you please also clarify about score/relevance? Is
>> it
>> >> > >> >> provided
>> >> > >> >> > by
>> >> > >> >> > > > > Lucene engine for each query result? I am thinking how
>> to
>> >> do
>> >> > >> >> sorted
>> >> > >> >> > > > > merge properly in this case.
>> >> > >> >> > > > >
>> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
>> >> >  shul...@gmail.com
>> >> > >> >:
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Ivan,
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Thank you for interesting question!
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Text searches (or full text searches) are mostly
>> >> > >> human-oriented.
>> >> > >> >> > And
>> >> > >> >> > > > the
>> >> > >> >> > > > > > point of user's interest is topmost part of response.
>> >> > >> >> > > > > > Then user can read it, evaluate and use the given
>> records
>> >> > for
>> >> > >> >> > further
>> >> > >> >> > > > > > purposes.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Particularly in our case, we use Ignite for operations
>> >> with
>> >> > >> >> > financial
>> >> > >> >> > > > > data,
>> >> > >> >> > > > > > and there lots of text stuff like assets names, fin.
>> >> > >> >> instruments,
>> >> > >> >> > > > > companies
>> >> > >> >> > > > > > etc.
>> >> > >> >> > > > > > In order to operate with this quickly and reliably,
>> users
>> >> > >> used
>> >> > >> >> to
>> >> > >> >> > > work
>> >> > >> >> > > > > with
>> >> > >> >> > > > > > text search, type-ahead completions, suggestions.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > For this purposes we are indexing particular string
>> data
>> >> in
>> >> > >> >> > separate
>> >> > >> >> > > > > caches.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Sorting capabilities and response size limitations are
>> >> very
>> >> > >> >> > important
>> >> > >> >> > > > > > there. As our API have to provide most relevant
>> >> information
>> >> > >> in
>> >> > >> >> view
>> >> > >> >> > > of
>> >> > >> >> > > > > > limited size.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective.
>> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
>> >> > >> *TopDocs.scoresDocs
>> >> > >> >> > > *already
>> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant
>> documents
>> >> > >> are on
>> >> > >> >> > the
>> >> > >> >> > > > top.
>> >> > >> >> > > > > > And currently distributed queries responses from
>> >> different
>> >> > >> nodes
>> >> > >> >> > are
>> >> > >> >> > > > > merged
>> >> > >> >> > > > > > into final query cursor queue in arbitrary way.
>> >> > >> >> > > > > > So in fact we already have the score order ruined
>> here.
>> >> > Also
>> >> > >> >> Ignite
>> >> > >> >> > > > > > requests all possible documents from Lucene that is
>> >> > redundant
>> >> > >> >> and
>> >> > >> >> > not
>> >> > >> >> > > > > good
>> >> > >> >> > > > > > for performance.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of
>> >> *TextQuery
>> >> > >> *and
>> >> > >> >> > have
>> >> > >> >> > > > to
>> >> > >> >> > > > > > notice that we still have to add sorting for text
>> queries
>> >> > >> >> > processing
>> >> > >> >> > > in
>> >> > >> >> > > > > > order to have applicable results.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > *Limit* parameter itself should improve the part of
>> >> issues
>> >> > >> from
>> >> > >> >> > > above,
>> >> > >> >> > > > > but
>> >> > >> >> > > > > > definitely, sorting by document score at least should
>> be
>> >> > >> >> > implemented
>> >> > >> >> > > > > along
>> >> > >> >> > > > > > with limit.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > This is a pretty short commentary if you still have
>> any
>> >> > >> >> questions,
>> >> > >> >> > > > please
>> >> > >> >> > > > > > ask, do not hesitate)
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > BR,
>> >> > >> >> > > > > > Yuriy Shuliha
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
>> >> >  vololo...@gmail.com >
>> >> > >> >> пише:
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > > Yuriy,
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > > Greatly appreciate your interest.
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > > Could you please elaborate a little bit about
>> sorting?
>> >> > What
>> >> > >> >> tasks
>> >> > >> >> > > > does
>> >> > >> >> > > > > > > it help to solve and how? It would be great to
>> provide
>> >> an
>> >> > >> >> > example.
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
>> >> > >> >> > > > > > >  alexey.scherbak...@gmail.com >:
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > Denis,
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > I like the idea of throwing an exception for
>> enabled
>> >> > text
>> >> > >> >> > queries
>> >> > >> >> > > > on
>> >> > >> >> > > > > > > > persistent caches.
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted
>> >> > searches.
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > Yury, please proceed with ticket creation.
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
>> >> > >>  dma...@apache.org
>> >> > >> >> >:
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > > Igniters,
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in
>> regards
>> >> > >> >> full-text
>> >> > >> >> > > > > search
>> >> > >> >> > > > > > > API
>> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it
>> >> > forward.
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes total
>> >> sense
>> >> > >> for
>> >> > >> >> > > > in-memory
>> >> > >> >> > > > > data
>> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of an
>> >> > >> underlying
>> >> > >> >> DB
>> >> > >> >> > > like
>> >> > >> >> > > > > > > Postgres.
>> >> > >> >> > > > > > > > > As part of the changes, I would simply throw an
>> >> > >> exception
>> >> > >> >> (by
>> >> > >> >> > > > > default)
>> >> > >> >> > > > > > > if
>> >> > >> >> > > > > > > > > the one attempts to use text indices with the
>> >> native
>> >> > >> >> > > persistence
>> >> > >> >> > > > > > > enabled.
>> >> > >> >> > > > > > > > > If the person is ready to live with that
>> limitation
>> >> > >> that
>> >> > >> >> an
>> >> > >> >> > > > > explicit
>> >> > >> >> > > > > > > > > configuration change is needed to come around
>> the
>> >> > >> >> exception.
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > Thoughts?
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > -
>> >> > >> >> > > > > > > > > Denis
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga <
>> >> > >> >> > >  shul...@gmail.com
>> >> > >> >> > > > >
>> >> > >> >> > > > > > > wrote:
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > > Hello to all again,
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > Thank you for important comments and notes
>> given
>> >> > >> below!
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > Let me answer and continue the discussion.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > Alexei has referenced to
>> >> > >> >> > > > > > > > > >
>> >>  https://issues.apache.org/jira/browse/IGNITE-5371
>> >> > >> where
>> >> > >> >> > > > > > > > > > absence of index persistence was declared as
>> an
>> >> > >> >> obstacle to
>> >> > >> >> > > > > further
>> >> > >> >> > > > > > > > > > development.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > a) This ticket is already closed as not
>> valid.b)
>> >> > >> There
>> >> > >> >> are
>> >> > >> >> > > > > definite
>> >> > >> >> > > > > > > needs
>> >> > >> >> > > > > > > > > > (and in our project as well) in just in-memory
>> >> > >> indexing
>> >> > >> >> of
>> >> > >> >> > > > > selected
>> >> > >> >> > > > > > > data.
>> >> > >> >> > > > > > > > > > We intend to use search capabilities for
>> fetching
>> >> > >> >> limited
>> >> > >> >> > > > amount
>> >> > >> >> > > > > of
>> >> > >> >> > > > > > > > > records
>> >> > >> >> > > > > > > > > > that should be used in type-ahead search /
>> >> > >> suggestions.
>> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the
>> are
>> >> no
>> >> > >> need
>> >> > >> >> in
>> >> > >> >> > > > Lucene
>> >> > >> >> > > > > > > index
>> >> > >> >> > > > > > > > > to
>> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of
>> >> > >> >> text-search
>> >> > >> >> > > > usage.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
>> implementation.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset*
>> >> > seems
>> >> > >> to
>> >> > >> >> be
>> >> > >> >> > > not
>> >> > >> >> > > > > > > required
>> >> > >> >> > > > > > > > > in
>> >> > >> >> > > > > > > > > > text-search tasks for now)
>> >> > >> >> > > > > > > > > > I have investigated the data flow for
>> distributed
>> >> > >> text
>> >> > >> >> > > queries.
>> >> > >> >> > > > > it
>> >> > >> >> > > > > > > was
>> >> > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'*
>> >> > >> >> > > > > > > > > > For now each server-node returns all response
>> >> > >> records to
>> >> > >> >> > the
>> >> > >> >> > > > > > > client-node
>> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred
>> thousands
>> >> > >> >> records.
>> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all
>> >> the
>> >> > >> >> results
>> >> > >> >> > > are
>> >> > >> >> > > > > added
>> >> > >> >> > > > > > > to
>> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in
>> arbitrary
>> >> > >> order
>> >> > >> >> by
>> >> > >> >> > > > pages.
>> >> > >> >> > > > > > > > > > I did not find here any means to deliver
>> >> > >> deterministic
>> >> > >> >> > > result.
>> >> > >> >> > > > > > > > > > So implementing limit as part of query and
>> >> > >> >> > > > > (GridCacheQueryRequest)
>> >> > >> >> > > > > > > will
>> >> > >> >> > > > > > > > > not
>> >> > >> >> > > > > > > > > > change the nature of response but will limit
>> load
>> >> > on
>> >> > >> >> nodes
>> >> > >> >> > > and
>> >> > >> >> > > > > > > > > networking.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this?
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API
>> exposition
>> >> to
>> >> > >> >> Ignite
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > a) Sorting
>> >> > >> >> > > > > > > > > > The solution for this could be:
>> >> > >> >> > > > > > > > > > - Make entities comparable
>> >> > >> >> > > > > > > > > > - Add custom comparator to entity
>> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for
>> >> Lucene
>> >> > >> >> indexing
>> >> > >> >> > > > > > > > > > - Use comparators when merging responses or
>> >> > reducing
>> >> > >> to
>> >> > >> >> > > desired
>> >> > >> >> > > > > > > limit on
>> >> > >> >> > > > > > > > > > client node.
>> >> > >> >> > > > > > > > > > Will require full result set to be loaded into
>> >> > >> memory.
>> >> > >> >> > Though
>> >> > >> >> > > > > can be
>> >> > >> >> > > > > > > used
>> >> > >> >> > > > > > > > > > for relatively small limits.
>> >> > >> >> > > > > > > > > > BR,
>> >> > >> >> > > > > > > > > > Yuriy Shuliha
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov <
>> >> > >> >> > > > > > > > >  alexey.scherbak...@gmail.com >
>> >> > >> >> > > > > > > > > > пише:
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > > Yuriy,
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > Note what one of major blockers for text
>> >> queries
>> >> > is
>> >> > >> >> [1]
>> >> > >> >> > > which
>> >> > >> >> > > > > makes
>> >> > >> >> > > > > > > > > > lucene
>> >> > >> >> > > > > > > > > > > indexes unusable with persistence and main
>> >> reason
>> >> > >> for
>> >> > >> >> > > > > > > discontinuation.
>> >> > >> >> > > > > > > > > > > Probably it's should be addressed first to
>> make
>> >> > >> text
>> >> > >> >> > > queries
>> >> > >> >> > > > a
>> >> > >> >> > > > > > > valid
>> >> > >> >> > > > > > > > > > > product feature.
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > Distributed sorting and advanved querying is
>> >> > indeed
>> >> > >> >> not a
>> >> > >> >> > > > > trivial
>> >> > >> >> > > > > > > task.
>> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented on
>> >> query
>> >> > >> >> > > originating
>> >> > >> >> > > > > node.
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > [1]
>> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda <
>> >> > >> >> > >  dma...@apache.org
>> >> > >> >> > > > >:
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > Yuriy,
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > If you are ready to take over the
>> full-text
>> >> > >> search
>> >> > >> >> > > indexes
>> >> > >> >> > > > > then
>> >> > >> >> > > > > > > > > please
>> >> > >> >> > > > > > > > > > go
>> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the
>> community
>> >> > >> wants to
>> >> > >> >> > > > > discontinue
>> >> > >> >> > > > > > > them
>> >> > >> >> > > > > > > > > > > first
>> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the
>> >> > >> limitations
>> >> > >> >> > > listed
>> >> > >> >> > > > > by
>> >> > >> >> > > > > > > Andrey
>> >> > >> >> > > > > > > > > > and
>> >> > >> >> > > > > > > > > > > > minimal support from the community end.
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > -
>> >> > >> >> > > > > > > > > > > > Denis
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey
>> >> > Mashenkov
>> >> > >> <
>> >> > >> >> > > > > > > > > > > >  andrey.mashen...@gmail.com >
>> >> > >> >> > > > > > > > > > > > wrote:
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to
>> >> > discontinue
>> >> > >> >> > > > TextQueries
>> >> > >> >> > > > > in
>> >> > >> >> > > > > > > > > Ignite
>> >> > >> >> > > > > > > > > > > [1].
>> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are not
>> >> > >> >> persistent,
>> >> > >> >> > not
>> >> > >> >> > > > > > > > > transactional
>> >> > >> >> > > > > > > > > > > and
>> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or
>> inside
>> >> > SQL.
>> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from
>> >> > community
>> >> > >> >> side.
>> >> > >> >> > > > > > > > > > > > > You are weclome to take on these issues
>> and
>> >> > >> make
>> >> > >> >> > > > > TextQueries
>> >> > >> >> > > > > > > great.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit
>> >> > resultset.
>> >> > >> >> > > > > > > > > > > > > Query results return from data node to
>> >> > >> client-side
>> >> > >> >> > > cursor
>> >> > >> >> > > > > in
>> >> > >> >> > > > > > > > > > > page-by-page
>> >> > >> >> > > > > > > > > > > > > manner and
>> >> > >> >> > > > > > > > > > > > > this parameter is designed control page
>> >> size.
>> >> > >> It
>> >> > >> >> is
>> >> > >> >> > > > > supposed
>> >> > >> >> > > > > > > query
>> >> > >> >> > > > > > > > > > > > executes
>> >> > >> >> > > > > > > > > > > > > lazily on server side and
>> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be
>> loaded
>> >> > to
>> >> > >> >> memory
>> >> > >> >> > > on
>> >> > >> >> > > > > server
>> >> > >> >> > > > > > > > > side
>> >> > >> >> > > > > > > > > > at
>> >> > >> >> > > > > > > > > > > > > once, but by pages.
>> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire
>> >> > >> resultset
>> >> > >> >> > into
>> >> > >> >> > > > > memory
>> >> > >> >> > > > > > > > > before
>> >> > >> >> > > > > > > > > > > > first
>> >> > >> >> > > > > > > > > > > > > page is sent to client?
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be
>> added
>> >> to
>> >> > >> limit
>> >> > >> >> > > > result.
>> >> > >> >> > > > > The
>> >> > >> >> > > > > > > best
>> >> > >> >> > > > > > > > > > > > > solution is to use query language
>> commands
>> >> > for
>> >> > >> >> this,
>> >> > >> >> > > e.g.
>> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
>> >> > >> >> > > > > > > > > > > > in
>> >> > >> >> > > > > > > > > > > > > SQL.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is
>> >> > >> >> distributed
>> >> > >> >> > > > > operation
>> >> > >> >> > > > > > > and
>> >> > >> >> > > > > > > > > > same
>> >> > >> >> > > > > > > > > > > > > user query will be executed on data
>> nodes
>> >> > >> >> > > > > > > > > > > > > and then results from all nodes should
>> be
>> >> > >> correcly
>> >> > >> >> > > merged
>> >> > >> >> > > > > > > before
>> >> > >> >> > > > > > > > > > being
>> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
>> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every
>> node
>> >> and
>> >> > >> >> then on
>> >> > >> >> > > > merge
>> >> > >> >> > > > > > > phase.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting
>> >> > results
>> >> > >> >> make
>> >> > >> >> > no
>> >> > >> >> > > > > sence
>> >> > >> >> > > > > > > > > without
>> >> > >> >> > > > > > > > > > > > > sorting,
>> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next
>> query
>> >> run
>> >> > >> will
>> >> > >> >> > > return
>> >> > >> >> > > > > same
>> >> > >> >> > > > > > > data
>> >> > >> >> > > > > > > > > > > > because
>> >> > >> >> > > > > > > > > > > > > of page reordeing.
>> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive results
>> from
>> >> > >> data
>> >> > >> >> > nodes
>> >> > >> >> > > > > > > > > asynchronously
>> >> > >> >> > > > > > > > > > > and
>> >> > >> >> > > > > > > > > > > > > messages from different nodes can't be
>> >> > ordered.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > 2.
>> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
>> >> > @QueryTextFiled)
>> >> > >> >> looks
>> >> > >> >> > > more
>> >> > >> >> > > > > > > verbose,
>> >> > >> >> > > > > > > > > > > isn't
>> >> > >> >> > > > > > > > > > > > > it.
>> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? How
>> >> > partial
>> >> > >> >> > results
>> >> > >> >> > > > from
>> >> > >> >> > > > > > > nodes
>> >> > >> >> > > > > > > > > > will
>> >> > >> >> > > > > > > > > > > be
>> >> > >> >> > > > > > > > > > > > > merged?
>> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure
>> comparator
>> >> > for
>> >> > >> >> data
>> >> > >> >> > > > > sorting?
>> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose to
>> >> sort
>> >> > >> >> result
>> >> > >> >> > on
>> >> > >> >> > > > > merge
>> >> > >> >> > > > > > > phase?
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
>> >> configurable
>> >> > at
>> >> > >> >> all.
>> >> > >> >> > > E.g.
>> >> > >> >> > > > > it is
>> >> > >> >> > > > > > > > > > > > impossible
>> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
>> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to
>> configure
>> >> > >> engine
>> >> > >> >> at
>> >> > >> >> > > > first
>> >> > >> >> > > > > and
>> >> > >> >> > > > > > > only
>> >> > >> >> > > > > > > > > > > then
>> >> > >> >> > > > > > > > > > > > go
>> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex
>> >> > features,
>> >> > >> >> > > > > > > > > > > > > that may depends on engine config.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy
>> >> > Shuliga <
>> >> > >> >> > > > > > >  shul...@gmail.com >
>> >> > >> >> > > > > > > > > > > wrote:
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > Dear community,
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to
>> open
>> >> > >> >> discussion
>> >> > >> >> > > that
>> >> > >> >> > > > > would
>> >> > >> >> > > > > > > > > come
>> >> > >> >> > > > > > > > > > to
>> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities,
>> backed
>> >> up
>> >> > >> by
>> >> > >> >> > > > different
>> >> > >> >> > > > > > > > > > mechanisms,
>> >> > >> >> > > > > > > > > > > > > > including Lucene.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past
>> >> year
>> >> > >> >> > release).
>> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature
>> >> technology
>> >> > >> that
>> >> > >> >> > > covers
>> >> > >> >> > > > > text
>> >> > >> >> > > > > > > > > search
>> >> > >> >> > > > > > > > > > > > area
>> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data
>> indexing).
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene
>> >> > >> functionality
>> >> > >> >> to
>> >> > >> >> > > > Ignite
>> >> > >> >> > > > > > > > > indexing
>> >> > >> >> > > > > > > > > > > and
>> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > It's quite simple request at current
>> >> stage.
>> >> > >> It
>> >> > >> >> is
>> >> > >> >> > > > coming
>> >> > >> >> > > > > > > from our
>> >> > >> >> > > > > > > > > > > > > project's
>> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful
>> for
>> >> a
>> >> > >> lot
>> >> > >> >> more
>> >> > >> >> > > > > people.
>> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss
>> >> > about
>> >> > >> >> Jira
>> >> > >> >> > > > > tickets for
>> >> > >> >> > > > > > > > > them.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
>> dataQuery.getPageSize()
>> >> > to
>> >> > >> >> limit
>> >> > >> >> > > > search
>> >> > >> >> > > > > > > > > response
>> >> > >> >> > > > > > > > > > > > items
>> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query().
>> Currently
>> >> > it
>> >> > >> is
>> >> > >> >> > > calling
>> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
>> >> > >> >> *Integer.MAX_VALUE*) -
>> >> > >> >> > so
>> >> > >> >> > > > > > > basically
>> >> > >> >> > > > > > > > > all
>> >> > >> >> > > > > > > > > > > > > scored
>> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we do
>> not
>> >> > >> need in
>> >> > >> >> > most
>> >> > >> >> > > > > cases.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more
>> >> capable
>> >> > >> >> search
>> >> > >> >> > > call
>> >> > >> >> > > > > can be
>> >> > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query,
>> >> > count,
>> >> > >> >> > > > > > > > > > > > > > sort) *
>> >> > >> >> > > > > > > > > > > > > > Implementation steps:
>> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField*
>> >> parameter
>> >> > in
>> >> > >> >> > > > > > > *@QueryTextFiled *
>> >> > >> >> > > > > > > > > > > > > > annotation. If
>> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but
>> not
>> >> > >> >> tokenized.
>> >> > >> >> > > > > Number
>> >> > >> >> > > > > > > types
>> >> > >> >> > > > > > > > > > are
>> >> > >> >> > > > > > > > > > > > > > preferred here.
>> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
>> *TextQuery*
>> >> > >> >> > constructor.
>> >> > >> >> > > It
>> >> > >> >> > > > > > > should
>> >> > >> >> > > > > > > > > > define
>> >> > >> >> > > > > > > > > > > > > > desired sort fields used for querying.
>> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in
>> >> > >> >> > > > > GridLuceneIndex.query().
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries
>> with
>> >> > >> >> > *TextQuery*,
>> >> > >> >> > > > > > > including
>> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
>> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as
>> >> requires
>> >> > >> more
>> >> > >> >> > > > detailed
>> >> > >> >> > > > > > > work.
>> >> > >> >> > > > > > > > > > Should
>> >> > >> >> > > > > > > > > > > > be
>> >> > >> >> > > > > > > > > > > > > > extended if community is interested in
>> >> it.*
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments!
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > BR,
>> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > --
>> >> > >> >> > > > > > > > > > > > > Best regards,
>> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > --
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > Best regards,
>> >> > >> >> > > > > > > > > > > Alexei Scherbakov
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > > --
>> >> > >> >> > > > > > > Best regards,
>> >> > >> >> > > > > > > Ivan Pavlukhin
>> >> > >> >> > > > > > >
>> >> > >> >> > > > >
>> >> > >> >> > > > >
>> >> > >> >> > > > >
>> >> > >> >> > > > > --
>> >> > >> >> > > > > Best regards,
>> >> > >> >> > > > > Ivan Pavlukhin
>> >> > >> >> > > > >
>> >> > >> >> > > >
>> >> > >> >> > >
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> > --
>> >> > >> >> > Best regards,
>> >> > >> >> > Andrey V. Mashenkov
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >
>> >> > >> >
>> >> > >> > --
>> >> > >> > Best regards,
>> >> > >> > Andrey V. Mashenkov
>> >> > >> >
>> >> > >>
>> >> > >
>> >> >
>> >> > --
>> >> > Best regards,
>> >> > Andrey V. Mashenkov
>> >> >
>> >>
>>
>>
>>
>>
>
Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Reply via email to