Re: Text Queries Support

Kseniya Romanova Thu, 28 Oct 2021 00:27:05 -0700

I think we can invite them to our virtual meetup and share details. Your
thoughts?


чт, 28 окт. 2021 г. в 10:15, Ivan Pavlukhin <vololo...@gmail.com>:

> Hi Maximiliano,
>
> Thank you for pointing this out, rather interesting. Have you tried to
> communicate with a hawkore team? I doubt that anyone in Community
> knows implementation details of hawkore additions.
>
> 2021-10-22 19:58 GMT+03:00, Maximiliano Gazquez <maximiliano....@gmail.com
> >:
> > Hello everyone!
> >
> > I wanted to add this to the discussion.
> > I've found this project https://github.com/hawkore/ignite-hk which
> promises
> > to solve most of the issues that are being discussed here like
> pagination,
> > sorting and most important, persisting the lucene index.
> >
> > It does stuff like this to create indexes:
> >
> > CREATE INDEX PERSON_LUCENE_IDX ON "PUBLIC".PERSON(LUCENE)
> > FULLTEXT '{
> > ''refresh_seconds'':''60'',
> > ''directory_path'':'''',
> > ''ram_buffer_mb'':''10'',
> > ''max_cached_mb'':''-1'',
> > ''partitioner'':''{"type":"token","partitions":10}'',
> > ''optimizer_enabled'':''true'',
> > ''optimizer_schedule'':''0 1 * * *'',
> > ''version'':''0'',
> > ''schema'':''{
> >     "default_analyzer":"english",
> >
> >
> "analyzers":{"my_custom_analyzer":{"type":"snowball","language":"Spanish","stopwords":"el,la,lo,loas,las,a,ante,bajo,cabe,con,contra"}},
> >     "fields":{
> >
> >
> "duration":{"type":"date_range","from":"start_date","to":"stop_date","validated":false,"pattern":"yyyy/MM/dd"},
> >
> >
> "place":{"type":"geo_point","latitude":"latitude","longitude":"longitude"},
> >       "date":{"type":"date","validated":true,"pattern":"yyyy/MM/dd"},
> >       "number":{"type":"integer","validated":false,"boost":1.0},
> >       "gender":{"type":"string","validated":true,"case_sensitive":true},
> >       "bool":{"type":"boolean","validated":false},
> >
> >
> "phrase":{"type":"text","validated":false,"analyzer":"my_custom_analyzer"},
> >       "name":{"type":"string","validated":false,"case_sensitive":true},
> >       "animal":{"type":"string","validated":false,"case_sensitive":true},
> >       "age":{"type":"integer","validated":false,"boost":1.0},
> >       "food":{"type":"string","validated":false,"case_sensitive":true}
> >     }
> >   }''
> > }';
> >
> > And this to use that lucene index from inside SQL:
> >
> > SELECT * FROM "test".user
> > WHERE lucene = '{ query : {
> >                               type : "boolean",
> >                               must : [{type : "wildcard", field : "name",
> > value : "J*"},
> >                                       {type : "wildcard", field : "food",
> > value : "tu*"}]}}';
> >
> > More examples here
> >
> https://github.com/hawkore/examples-apache-ignite-extensions/tree/master/examples-advanced-ignite-indexing
> >
> > I don't have anything to do with that company but it would be great to
> know
> > how they implemented this stuff.
> >
> >
> > On Mon, Aug 9, 2021 at 3:00 AM Ivan Pavlukhin <vololo...@gmail.com>
> wrote:
> >
> >> Hi Atri,
> >>
> >> Sorry for a late answer.
> >>
> >> > I didn't quite understand. Are you proposing that Ignite should not
> >> > have
> >> FTS capabilities?
> >>
> >> It seems an option to me. IMHO it is better to have no FTS instead of
> >> something like current Ignite TextQueries.
> >>
> >> 2021-08-03 12:45 GMT+03:00, Atri Sharma <a...@apache.org>:
> >> > Hi Ivan,
> >> >
> >> > I didn't quite understand. Are you proposing that Ignite should not
> >> > have FTS capabilities?
> >> >
> >> > Atri
> >> >
> >> > On Tue, Aug 3, 2021 at 2:57 PM Ivan Pavlukhin <vololo...@gmail.com>
> >> wrote:
> >> >>
> >> >> Hi Atri,
> >> >>
> >> >> My main concern is non-maleficence. Every task has several solutions,
> >> >> e.g. straightforward ones:
> >> >> 1. Do not implement FTS.
> >> >> 2. Create own implementation.
> >> >>
> >> >> Some of the strongest ones live without FTS [1].
> >> >>
> >> >> [1] https://github.com/cockroachdb/cockroach/issues/7821
> >> >>
> >> >> 2021-08-02 11:33 GMT+03:00, Atri Sharma <a...@apache.org>:
> >> >> > Hi Ivan,
> >> >> >
> >> >> > Would you like to propose an alternative to Lucene?
> >> >> >
> >> >> > Atri
> >> >> >
> >> >> > On Mon, 2 Aug 2021, 13:48 Ivan Pavlukhin, <vololo...@gmail.com>
> >> wrote:
> >> >> >
> >> >> >> Folks,
> >> >> >>
> >> >> >> Sorry if read the thread not thoroughly enough, but do we consider
> >> >> >> Lucene as obviously right choice? In my understanding Ignite
> >> >> >> history
> >> >> >> has shown clearly that "fastest feature implementation" is not
> >> usually
> >> >> >> the best. And one example of this are text queries. Are not we
> >> >> >> trying
> >> >> >> to do a same mistake again? FTS is a huge feature, I do not
> believe
> >> >> >> there is an easy win for it.
> >> >> >>
> >> >> >> 2021-07-27 19:18 GMT+03:00, Atri Sharma <a...@apache.org>:
> >> >> >> > Andrey,
> >> >> >> >
> >> >> >> >> Per-partition Lucene index looks simple to implement, but it
> may
> >> >> >> >> require
> >> >> >> >> per-partition SQL to make full-text search expressions work
> >> >> >> >> correctly
> >> >> >> >> within the SQL quiery.
> >> >> >> > I think that as long as we follow the map - reduce process that
> >> >> >> > we
> >> >> >> > already do for other queries, we should be fine.
> >> >> >> >
> >> >> >> >> Per-partition SQL index may kill the performance. We already
> >> >> >> >> tried
> >> >> >> >> to
> >> >> >> >> do
> >> >> >> >> that in Ignite 2. However, QueryParallelism feature helps to
> >> >> >> >> speed
> >> >> >> >> up
> >> >> >> >> some
> >> >> >> >> data-intensive queries,
> >> >> >> >> but hits the performance in simple cases, and at some point
> >> >> >> >> (e.g.
> >> >> >> >> segments
> >> >> >> >> > number of CPU) the performance rapidly degrades with the
> >> >> >> >> > increasing
> >> >> >> >> number of segments.
> >> >> >> >
> >> >> >> > Yeah, that is always the case, but a global index will be a
> >> >> >> > nightmare
> >> >> >> > in terms of concurrency and pessimistic concurrency control will
> >> >> >> > anyways kill the benefits, coupled with the metadata
> >> >> >> > requirements.
> >> >> >> > What were the specific issues with per partition index?
> >> >> >> >>
> >> >> >> >> AFAIK, Lucene widely used bitmap indices that are easy to
> merge.
> >> >> >> >> Maybe, the map-reduce technique underneath FTS expressions and
> >> some
> >> >> >> hacks
> >> >> >> >> will add a minimal overhead.
> >> >> >> >
> >> >> >> > Lucene uses many types of indices but the aspect here is that
> per
> >> >> >> > partition Lucene indices can return docIDs and we can merge them
> >> >> >> > in
> >> >> >> > reduce phase. So we are abstracted out from specifics of the
> >> >> >> > internal
> >> >> >> > index being used to serve the query.
> >> >> >> >
> >> >> >> >>
> >> >> >> >> > As illustrated by Ilya, we can use Ignite's WAL records to
> >> >> >> >> > rebuild
> >> >> >> >> > Lucene indices. The important thing here is to not treat
> >> >> >> >> > Lucene
> >> >> >> >> > indices as source of truth.
> >> >> >> >> To use WAL we either should relay Lucene files to our Page
> >> >> >> >> memory
> >> >> >> >> or
> >> >> >> >> be
> >> >> >> >> aware of Lucene files structure.
> >> >> >> >> The first looks tricky, as we should guarantee a contiguous
> >> address
> >> >> >> space
> >> >> >> >> in Page memory for reflecting Lucene file. Maybe separate
> >> >> >> >> managed
> >> >> >> >> memory
> >> >> >> >> segment with its own rules?
> >> >> >> >
> >> >> >> > Why not use Lucene's MMappedDirectory and map it to our storage
> >> >> >> > classes?
> >> >> >> >
> >> >> >> >>
> >> >> >> >> >> Transactions.
> >> >> >> >> >> * Will we support transactions?
> >> >> >> >> > Lucene has no concept of transactions.
> >> >> >> >> Yes, but we have.
> >> >> >> >> Lucene index may be non-transactional, but users never expect
> to
> >> >> >> >> see
> >> >> >> >> uncommited data.
> >> >> >> >> How does this connect with transactional SQL?
> >> >> >> > We could have the Lucene writes done as a part of transactions
> >> >> >> > and
> >> >> >> > ack
> >> >> >> > back only when it succeeds/fails. WDYT?
> >> >> >> >>
> >> >> >> >> On Tue, Jul 27, 2021 at 1:36 PM Atri Sharma <a...@apache.org>
> >> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >> > Sorry, I planned on creating a Wiki page for this, but it
> >> >> >> >> > makes
> >> >> >> >> > more
> >> >> >> >> > sense to be replying here.
> >> >> >> >> >
> >> >> >> >> > > * How Lucene index can be split among the nodes?
> >> >> >> >> >
> >> >> >> >> > We can have partition level indices on each node.
> >> >> >> >> >
> >> >> >> >> > > * If we'll have a single index for all partitions on the
> >> >> >> >> > > particular
> >> >> >> >> > > node,
> >> >> >> >> > > then how index records will be aware of partitioning?
> >> >> >> >> >
> >> >> >> >> > Index records dont need to be aware of partitioning -- each
> >> >> >> >> > Lucene
> >> >> >> >> > index is independent.
> >> >> >> >> >
> >> >> >> >> > > This is important to filter out backup records from the
> >> results
> >> >> >> >> > > to
> >> >> >> >> > > avoid
> >> >> >> >> > > duplicates.
> >> >> >> >> >
> >> >> >> >> > We can merge documents from different nodes and remove
> >> duplicates
> >> >> >> >> > as
> >> >> >> >> > long as docIDs are globally unique.
> >> >> >> >> >
> >> >> >> >> > > * How results from several nodes can be merged on the
> Reduce
> >> >> >> >> > > stage?
> >> >> >> >> >
> >> >> >> >> > As long as documents have a globally unique docID, Lucene has
> >> >> >> >> > merge
> >> >> >> >> > functions that can merge results from multiple partial
> >> >> >> >> > results.
> >> >> >> >> >
> >> >> >> >> > > * Does Lucene supports smth like JOIN operation or others
> >> >> >> >> > > that
> >> >> >> >> > > may
> >> >> >> >> > require
> >> >> >> >> > > data from another partition or index?
> >> >> >> >> >
> >> >> >> >> > As illustrated by Ilya, Block-Join works for us.
> >> >> >> >> >
> >> >> >> >> > > If so, then it likes to multistep query with merging
> results
> >> on
> >> >> >> >> > > intermediate stages and requires detailed investigation and
> >> >> >> >> > > design.
> >> >> >> >> > > It is ok if Ignite will have some limitations here, but we
> >> >> >> >> > > would
> >> >> >> like
> >> >> >> >> > > to
> >> >> >> >> > > know about them at the early stage.
> >> >> >> >> >
> >> >> >> >> > > * How effectively map Lucene files to the page memory? Is
> it
> >> >> >> >> > > even
> >> >> >> >> > possible?
> >> >> >> >> >
> >> >> >> >> > Lucene has PageDirectory implementations which allow storing
> >> >> >> >> > Lucene
> >> >> >> >> > indices on different kind of file structures. It has a
> >> >> >> >> > MMappedFileDirectory that we could use?
> >> >> >> >> >
> >> >> >> >> > > Otherwise, how to deal with potential OOM on large queries
> >> >> >> >> > > and
> >> >> >> memory
> >> >> >> >> > > capacity planning?
> >> >> >> >> >
> >> >> >> >> > We can use Lucene's MMapped directory.
> >> >> >> >> >
> >> >> >> >> > >
> >> >> >> >> > > Persistence.
> >> >> >> >> > > * How and what consistency guarantees could we have/expect?
> >> >> >> >> >
> >> >> >> >> > Lucene does not have WAL logs but is append only
> >> >> >> >> >
> >> >> >> >> > > Seems, we may not be able to write physical records for
> >> >> >> >> > > Lucene
> >> >> >> >> > > index
> >> >> >> >> > > to
> >> >> >> >> > our
> >> >> >> >> > > WAL. What can we do with this?
> >> >> >> >> >
> >> >> >> >> > As illustrated by Ilya, we can use Ignite's WAL records to
> >> >> >> >> > rebuild
> >> >> >> >> > Lucene indices. The important thing here is to not treat
> >> >> >> >> > Lucene
> >> >> >> >> > indices as source of truth.
> >> >> >> >> > >
> >> >> >> >> > > Transactions.
> >> >> >> >> > > * Will we support transactions?
> >> >> >> >> > Lucene has no concept of transactions.
> >> >> >> >> >
> >> >> >> >> > > * Should Lucene be aware of Transaction and track mvcc (or
> >> >> >> >> > > whatever)
> >> >> >> >> > > versions for the records?
> >> >> >> >> > No
> >> >> >> >> > > * What will be consistency guarantees?
> >> >> >> >> > We can acknowledge writes back only after Lucene index is
> >> >> >> >> > updated.
> >> >> >> >> > >
> >> >> >> >> > > UX
> >> >> >> >> > > * How to add FullText search queries syntax into Calcite?
> >> >> >> >> > Postgres's FTS functions are a good reference.
> >> >> >> >> > > * AFAIK, the Lucene index has many properties for tuning.
> >> >> >> >> > > How
> >> >> >> >> > > will
> >> >> >> >> > > the
> >> >> >> >> > user
> >> >> >> >> > > configure the index?
> >> >> >> >> > Most of those properties can be cluster level and exposed as
> a
> >> >> >> >> > new
> >> >> >> >> > sub
> >> >> >> >> > config for ignite.
> >> >> >> >> > > * How and where to store the settings? What are
> cluster-wide
> >> >> >> >> > > and
> >> >> >> what
> >> >> >> >> > > a
> >> >> >> >> > > local to the particular node?
> >> >> >> >> > All can be cluster level.
> >> >> >> >> > > * Will be all the settings immutable? Can be they changed
> >> >> >> >> > > on-fly?
> >> >> >> >> > > after
> >> >> >> >> > > node/grid restart?
> >> >> >> >> > They should be applied post restart.
> >> >> >> >> >
> >> >> >> >> > > * Any limitations on query syntax?
> >> >> >> >> > It depends on how we model our queries for text search.
> >> >> >> >> >
> >> >> >> >> > >
> >> >> >> >> > > SQL
> >> >> >> >> > > * Will we support FullText search in SQL?
> >> >> >> >> > We need custom functions for it. See Postgres's FTS
> functions.
> >> >> >> >> > > * How to integrate Lucene index into Calcite? What is the
> >> >> >> >> > > cost
> >> >> >> model?
> >> >> >> >> > There cannot be any cost model since there are no paths for a
> >> >> >> >> > text
> >> >> >> >> > query. If we see a text query, we have to use Lucene index or
> >> >> >> >> > return
> >> >> >> >> > an error. In this way, we need to model text search as a set
> >> >> >> >> > of
> >> >> >> >> > UDFs
> >> >> >> >> >
> >> >> >> >> > > Splitting rules? Traits?
> >> >> >> >> > Please see my reply above.
> >> >> >> >> > >
> >> >> >> >> > >
> >> >> >> >> > > With all of this, you can go with the IEP (or even some
> >> >> >> >> > > short
> >> >> >> >> > > summary)
> >> >> >> >> > and
> >> >> >> >> > > further POC and implementation.
> >> >> >> >> > > That's a big deal, so let's discuss what could be done
> here.
> >> >> >> >> > >
> >> >> >> >> > > On Fri, Jul 23, 2021 at 12:58 PM Atri Sharma
> >> >> >> >> > > <a...@apache.org
> >> >
> >> >> >> wrote:
> >> >> >> >> > >
> >> >> >> >> > > > I am actually happy to drive the feature for Ignite 3.
> FTS
> >> is
> >> >> >> >> > > > very
> >> >> >> >> > > > important for me and I think Ignite users will benefit
> >> >> >> >> > > > from
> >> >> >> >> > > > it
> >> >> >> >> > > > greatly.
> >> >> >> >> > > >
> >> >> >> >> > > > If it makes sense to be focusing on Ignite 3 for this
> >> >> >> >> > > > capability,
> >> >> >> I
> >> >> >> >> > > > am
> >> >> >> >> > > > eager to contribute there and lead the development.
> >> >> >> >> > > >
> >> >> >> >> > > > Please share your thoughts.
> >> >> >> >> > > >
> >> >> >> >> > > > On Fri, Jul 23, 2021 at 3:21 PM Andrey Mashenkov
> >> >> >> >> > > > <andrey.mashen...@gmail.com> wrote:
> >> >> >> >> > > > >
> >> >> >> >> > > > > Hi Atri,
> >> >> >> >> > > > >
> >> >> >> >> > > > > All the Jira tickets we have on the Full-text search
> >> >> >> >> > > > > (FTS)
> >> >> >> >> > > > > thing
> >> >> >> >> > > > > are
> >> >> >> >> > > > > targeted to Ignite 2.
> >> >> >> >> > > > >
> >> >> >> >> > > > > AFAIK, we want, but we have NOT committed to FTS
> support
> >> in
> >> >> >> Ignite
> >> >> >> >> > > > > 3,
> >> >> >> >> > > > yet.
> >> >> >> >> > > > > By the way, we are getting requests for this thing from
> >> the
> >> >> >> >> > > > > user
> >> >> >> >> > side,
> >> >> >> >> > > > and
> >> >> >> >> > > > > definitely,
> >> >> >> >> > > > > FTS would be a valuable feature for Ignite.
> >> >> >> >> > > > >
> >> >> >> >> > > > > It will be great if the one wants to drive it, any help
> >> >> >> >> > > > > will
> >> >> >> >> > > > > be
> >> >> >> >> > > > appreciated.
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > On Fri, Jul 23, 2021 at 12:12 PM Atri Sharma
> >> >> >> >> > > > > <a...@apache.org>
> >> >> >> >> > wrote:
> >> >> >> >> > > > >
> >> >> >> >> > > > > > Hello,
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > An update, please. I am working through persistence
> of
> >> >> >> >> > > > > > Lucene
> >> >> >> >> > > > > > index
> >> >> >> >> > > > using
> >> >> >> >> > > > > > Ignite Dictionary, and will be asking some questions
> >> >> >> >> > > > > > soon.
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > I had one doubt - - where does this change go? Ignite
> >> >> >> >> > > > > > 3?
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > Also, I know we want to build native support for text
> >> >> >> >> > > > > > searches
> >> >> >> >> > > > > > in
> >> >> >> >> > > > Ignite 3.
> >> >> >> >> > > > > > Is the work I am proposing here part of that, or will
> >> >> >> >> > > > > > that
> >> >> >> >> > > > > > be
> >> >> >> a
> >> >> >> >> > > > separate
> >> >> >> >> > > > > > effort?
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > On Mon, 28 Jun 2021, 19:20 Ilya Kasnacheev, <
> >> >> >> >> > ilya.kasnach...@gmail.com
> >> >> >> >> > > > >
> >> >> >> >> > > > > > wrote:
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > > Hello!
> >> >> >> >> > > > > > >
> >> >> >> >> > > > > > > I think that number one is the most important one,
> >> then
> >> >> >> maybe
> >> >> >> >> > > > > > > it
> >> >> >> >> > > > will see
> >> >> >> >> > > > > > > more use and other deficiencies become more
> >> >> >> >> > > > > > > apparent,
> >> >> >> leading
> >> >> >> >> > > > > > > to
> >> >> >> >> > more
> >> >> >> >> > > > > > > tickets and visibility.
> >> >> >> >> > > > > > >
> >> >> >> >> > > > > > > Maybe 2. and 3. will even use a different approach
> >> when
> >> >> >> >> > persistence
> >> >> >> >> > > > is
> >> >> >> >> > > > > > > implemented.
> >> >> >> >> > > > > > >
> >> >> >> >> > > > > > > Regards,
> >> >> >> >> > > > > > > --
> >> >> >> >> > > > > > > Ilya Kasnacheev
> >> >> >> >> > > > > > >
> >> >> >> >> > > > > > >
> >> >> >> >> > > > > > > пн, 28 июн. 2021 г. в 14:34, Atri Sharma
> >> >> >> >> > > > > > > <a...@apache.org>:
> >> >> >> >> > > > > > >
> >> >> >> >> > > > > > > > Hello Again!
> >> >> >> >> > > > > > > >
> >> >> >> >> > > > > > > > I have been looking into the aforementioned and
> >> >> >> >> > > > > > > > here
> >> >> >> >> > > > > > > > are
> >> >> >> my
> >> >> >> >> > follow
> >> >> >> >> > > > up
> >> >> >> >> > > > > > > > thoughts:
> >> >> >> >> > > > > > > >
> >> >> >> >> > > > > > > > 1. Support persistence of Lucene indexes.
> >> >> >> >> > > > > > > > 2.
> >> https://issues.apache.org/jira/browse/IGNITE-12401
> >> >> >> >> > > > > > > > (Needs
> >> >> >> >> > > > fixing of
> >> >> >> >> > > > > > > > moving partitions first)
> >> >> >> >> > > > > > > > 3. Figure out how to return scores from nodes and
> >> use
> >> >> >> >> > > > > > > > them
> >> >> >> >> > > > > > > > as
> >> >> >> >> > sort
> >> >> >> >> > > > > > > > parameters on the coordinator node
> >> >> >> >> > > > > > > > (
> https://issues.apache.org/jira/browse/IGNITE-12291
> >> )
> >> >> >> >> > > > > > > >
> >> >> >> >> > > > > > > > Please let me know if this looks ok to make text
> >> >> >> >> > > > > > > > queries
> >> >> >> >> > > > functional?
> >> >> >> >> > > > > > > >
> >> >> >> >> > > > > > > > Atri
> >> >> >> >> > > > > > > >
> >> >> >> >> > > > > > > > On Mon, Jun 21, 2021 at 2:49 PM Alexei Scherbakov
> >> >> >> >> > > > > > > > <alexey.scherbak...@gmail.com> wrote:
> >> >> >> >> > > > > > > > >
> >> >> >> >> > > > > > > > > Hi.
> >> >> >> >> > > > > > > > >
> >> >> >> >> > > > > > > > > One of the biggest issues with text queries is
> a
> >> >> >> >> > > > > > > > > lack
> >> >> >> >> > > > > > > > > of
> >> >> >> >> > support
> >> >> >> >> > > > for
> >> >> >> >> > > > > > > > lucene
> >> >> >> >> > > > > > > > > indices persistence, which makes this
> >> functionality
> >> >> >> >> > > > > > > > > useless
> >> >> >> >> > if a
> >> >> >> >> > > > > > > > > persistence is enabled.
> >> >> >> >> > > > > > > > >
> >> >> >> >> > > > > > > > > I would first take care of it.
> >> >> >> >> > > > > > > > >
> >> >> >> >> > > > > > > > > пн, 21 июн. 2021 г. в 12:16, Maksim Timonin <
> >> >> >> >> > > > timonin.ma...@gmail.com
> >> >> >> >> > > > > > >:
> >> >> >> >> > > > > > > > >
> >> >> >> >> > > > > > > > > > Hi, Atri!
> >> >> >> >> > > > > > > > > >
> >> >> >> >> > > > > > > > > > You're right, Actually there is a lack of
> >> support
> >> >> >> >> > > > > > > > > > for
> >> >> >> >> > > > TextQueries.
> >> >> >> >> > > > > > > For
> >> >> >> >> > > > > > > > the
> >> >> >> >> > > > > > > > > > last ticket I'm doing I see some obvious
> >> >> >> >> > > > > > > > > > issues
> >> >> >> >> > > > > > > > > > with
> >> >> >> >> > > > > > > > > > them
> >> >> >> >> > (no
> >> >> >> >> > > > page
> >> >> >> >> > > > > > > size
> >> >> >> >> > > > > > > > > > support, for example). I'm glad that somebody
> >> >> >> >> > > > > > > > > > wants
> >> >> >> >> > > > > > > > > > to
> >> >> >> >> > maintain
> >> >> >> >> > > > > > this
> >> >> >> >> > > > > > > > > > functionality. Thanks a lot!
> >> >> >> >> > > > > > > > > >
> >> >> >> >> > > > > > > > > > For the MergeSort algorithm there is already
> a
> >> >> >> >> > > > > > > > > > patch
> >> >> >> >> > > > > > > > > > for
> >> >> >> >> > that
> >> >> >> >> > > > [1].
> >> >> >> >> > > > > > > It's
> >> >> >> >> > > > > > > > > > currently on review. This patch introduces an
> >> >> >> >> > > > > > > > > > abstract
> >> >> >> >> > reducer
> >> >> >> >> > > > for
> >> >> >> >> > > > > > > > > > CacheQueries with 2 implementations
> >> >> >> >> > > > > > > > > > (unordered,
> >> >> >> >> > merge-sort).
> >> >> >> >> > > > Then
> >> >> >> >> > > > > > > > TextQuery
> >> >> >> >> > > > > > > > > > leverages on MergeSort to order results from
> >> >> >> >> > > > > > > > > > multiple
> >> >> >> >> > nodes by
> >> >> >> >> > > > > > score.
> >> >> >> >> > > > > > > > This
> >> >> >> >> > > > > > > > > > patch also fixes the pageSize issue, I've
> >> >> >> >> > > > > > > > > > mentioned
> >> >> >> >> > > > > > > > > > before.
> >> >> >> >> > > > Could
> >> >> >> >> > > > > > you
> >> >> >> >> > > > > > > > > > please check if it fully matches your idea?
> >> >> >> >> > > > > > > > > > Any
> >> >> >> >> > > > > > > > > > issues
> >> >> >> >> > > > > > > > > > or
> >> >> >> >> > > > comments
> >> >> >> >> > > > > > > are
> >> >> >> >> > > > > > > > > > welcome.
> >> >> >> >> > > > > > > > > >
> >> >> >> >> > > > > > > > > > I've prepared this ticket, because I need the
> >> >> >> MergeSort
> >> >> >> >> > > > algorithm
> >> >> >> >> > > > > > for
> >> >> >> >> > > > > > > > the
> >> >> >> >> > > > > > > > > > new type of queries I'm implementing
> >> (IndexQuery,
> >> >> >> >> > > > > > > > > > it
> >> >> >> >> > > > > > > > > > should
> >> >> >> >> > > > also
> >> >> >> >> > > > > > > > provide
> >> >> >> >> > > > > > > > > > ordered results over multiple nodes).
> >> >> >> >> > > > > > > > > > Currently
> >> >> >> >> > > > > > > > > > I'm
> >> >> >> not
> >> >> >> >> > > > planning to
> >> >> >> >> > > > > > > go
> >> >> >> >> > > > > > > > > > further with TextQuery, so if you're going to
> >> >> >> >> > > > > > > > > > support
> >> >> >> >> > > > > > > > > > this
> >> >> >> >> > > > it'll
> >> >> >> >> > > > > > be a
> >> >> >> >> > > > > > > > great
> >> >> >> >> > > > > > > > > > contribution, I think.
> >> >> >> >> > > > > > > > > >
> >> >> >> >> > > > > > > > > > [1]
> >> >> >> https://issues.apache.org/jira/browse/IGNITE-14703
> >> >> >> >> > > > > > > > > > [2]
> https://github.com/apache/ignite/pull/9081
> >> >> >> >> > > > > > > > > >
> >> >> >> >> > > > > > > > > >
> >> >> >> >> > > > > > > > > > On Mon, Jun 21, 2021 at 11:11 AM Atri Sharma
> <
> >> >> >> >> > a...@apache.org>
> >> >> >> >> > > > > > > wrote:
> >> >> >> >> > > > > > > > > >
> >> >> >> >> > > > > > > > > > > Hi All,
> >> >> >> >> > > > > > > > > > >
> >> >> >> >> > > > > > > > > > > I have been looking into our text queries
> >> >> >> >> > > > > > > > > > > support
> >> >> >> and
> >> >> >> >> > > > > > > > > > > see
> >> >> >> >> > > > that it
> >> >> >> >> > > > > > > has
> >> >> >> >> > > > > > > > > > > limited community support.
> >> >> >> >> > > > > > > > > > >
> >> >> >> >> > > > > > > > > > > Therefore, I volunteer to be the maintainer
> >> >> >> >> > > > > > > > > > > of
> >> >> >> >> > > > > > > > > > > the
> >> >> >> >> > module and
> >> >> >> >> > > > > > work
> >> >> >> >> > > > > > > on
> >> >> >> >> > > > > > > > > > > enhancing it further.
> >> >> >> >> > > > > > > > > > >
> >> >> >> >> > > > > > > > > > > First goal would be to move to Lucene 8.x,
> >> then
> >> >> >> >> > > > > > > > > > > work
> >> >> >> >> > > > > > > > > > > on
> >> >> >> >> > > > sorted
> >> >> >> >> > > > > > > reduce
> >> >> >> >> > > > > > > > > > > - merge across nodes. Fundamentally, this
> is
> >> >> >> >> > > > > > > > > > > doable
> >> >> >> >> > > > > > > > > > > since
> >> >> >> >> > > > Lucene
> >> >> >> >> > > > > > > > ranks
> >> >> >> >> > > > > > > > > > > documents according to their score, and
> >> >> >> >> > > > > > > > > > > documents
> >> >> >> are
> >> >> >> >> > > > returned in
> >> >> >> >> > > > > > > the
> >> >> >> >> > > > > > > > > > > order of their score. Since the scoring
> >> >> >> >> > > > > > > > > > > function
> >> >> >> >> > > > > > > > > > > is
> >> >> >> >> > > > homogeneous,
> >> >> >> >> > > > > > > this
> >> >> >> >> > > > > > > > > > > means that across nodes, we can compare
> >> >> >> >> > > > > > > > > > > scores
> >> >> >> >> > > > > > > > > > > and
> >> >> >> >> > > > > > > > > > > merge
> >> >> >> >> > > > sort.
> >> >> >> >> > > > > > > > > > >
> >> >> >> >> > > > > > > > > > > Please let me know if I can take this up.
> >> >> >> >> > > > > > > > > > >
> >> >> >> >> > > > > > > > > > > Atri
> >> >> >> >> > > > > > > > > > >
> >> >> >> >> > > > > > > > > > > --
> >> >> >> >> > > > > > > > > > > Regards,
> >> >> >> >> > > > > > > > > > >
> >> >> >> >> > > > > > > > > > > Atri
> >> >> >> >> > > > > > > > > > > Apache Concerted
> >> >> >> >> > > > > > > > > > >
> >> >> >> >> > > > > > > > > >
> >> >> >> >> > > > > > > > >
> >> >> >> >> > > > > > > > >
> >> >> >> >> > > > > > > > > --
> >> >> >> >> > > > > > > > >
> >> >> >> >> > > > > > > > > Best regards,
> >> >> >> >> > > > > > > > > Alexei Scherbakov
> >> >> >> >> > > > > > > >
> >> >> >> >> > > > > > > > --
> >> >> >> >> > > > > > > > Regards,
> >> >> >> >> > > > > > > >
> >> >> >> >> > > > > > > > Atri
> >> >> >> >> > > > > > > > Apache Concerted
> >> >> >> >> > > > > > > >
> >> >> >> >> > > > > > >
> >> >> >> >> > > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > --
> >> >> >> >> > > > > Best regards,
> >> >> >> >> > > > > Andrey V. Mashenkov
> >> >> >> >> > > >
> >> >> >> >> > > > --
> >> >> >> >> > > > Regards,
> >> >> >> >> > > >
> >> >> >> >> > > > Atri
> >> >> >> >> > > > Apache Concerted
> >> >> >> >> > > >
> >> >> >> >> > >
> >> >> >> >> > >
> >> >> >> >> > > --
> >> >> >> >> > > Best regards,
> >> >> >> >> > > Andrey V. Mashenkov
> >> >> >> >> >
> >> >> >> >> > --
> >> >> >> >> > Regards,
> >> >> >> >> >
> >> >> >> >> > Atri
> >> >> >> >> > Apache Concerted
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> Best regards,
> >> >> >> >> Andrey V. Mashenkov
> >> >> >> >
> >> >> >> > --
> >> >> >> > Regards,
> >> >> >> >
> >> >> >> > Atri
> >> >> >> > Apache Concerted
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >> Best regards,
> >> >> >> Ivan Pavlukhin
> >> >> >>
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Best regards,
> >> >> Ivan Pavlukhin
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > Atri
> >> > Apache Concerted
> >> >
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Ivan Pavlukhin
> >>
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
>

Re: Text Queries Support

Reply via email to