Re: Apache Ignite 3 Alpha 2 webinar follow up questions

Andrey Mashenkov Fri, 23 Jul 2021 02:04:52 -0700

Courtney,

I have a short update.
>   3. The Lucene API enabled more flexibility and removed a network
>      round trip from our queries.
>      4. Given Calcite's ability to support custom SQL functions, I'd love
>      to have the ability to define custom functions that Lucene was
answering
As we won't support IndexingSPI extension point anymore.


Seems ContinuousQuery might be a good choice to keep the local Lucene index
in sync with the data.

SQL functions will be supported, if it will be possible to do smth like
this "Select * From T1 Where T1.key in
(TextQueryFunction("text_query_string"))",
where "TextQueryFunction" is a custom SQL function that might be able to
access some static method or local Ignite service or whatever, then
will this work for you?


On Fri, Jul 23, 2021 at 1:10 AM Andrey Mashenkov <andrey.mashen...@gmail.com>
wrote:

> Hi Courtney,
>
> Thanks for your feedback.
>
> I've gone through the questions and have no the whole picture of your use
> case.
> Would you please clarify how you exactly use the Ignite? what are the
> integration points?
> and maybe share some experience with using Ignite SPIs?
>
> We'll keep the information in mind while developing the Ignite,
> because this may help us to make a better product.
>
> By the way, I'll try to answer the questions.
>
> >   1. Schema change - does that include the ability to change the types of
> >   fields/columns?
> Yes, we plan to support transparent conversion to a wider type on-fly
> (e.g. 'int' to 'long').
> This is a major point of our Live-schema concept.
> In fact, there is no need to convert data on all the nodes in a
> synchronous way as old SQL databases do (if one supports though),
> we are going to support multiple schema versions and convert data
> on-demand on a per-row basis to the latest version,
> then write-back the row.
>
> More complex things like 'String' -> 'int' are out of scope for now
> because it requires the execution of a user code on the critical path.
> The limitation here is column MUST NOT be indexed, because an index over
> the data of different kinds is impossible.
>
>
>  >  2. Will the new guaranteed consistency between APIs also mean SQL will
>  >  gain transaction support?
> Yes, we plan to have Transactional SQL.
> DDL will be non-transactional though, and I wonder if the one supports
> this.
>
> Ignite 3 will operate with Rows underneath, but classic Table API and
> Key-value will be available to a user
> at the same time and with all consistency guarantees.
>
>
> >  3. Has there been any decision about how much of Calcite will be exposed
> >   to the client? When using thick clients, it'll be hugely beneficial to
> be
> >   able to work with Calcite APIs directly to provide custom rules and
> >  optimizations to better suit organization needs
> As of now, we have no plans to expose any Calcite API to a user.
> AFAIK, we have our custom Calcite convention, custom rules that are aware
> of distributed environment,
> and additional AST nodes. The rules MUST correctly propagate internal
> information about data distribution,
> so I'm not sure want to give low-level access to them.
>
> > We Index into Solr and use the Solr indices
> Ignite 1-2 has poor support for TEXT queries, which is totally
> unconfigurable.
> Also, Lucene indices underneath are NOT persistent that requires too much
> effort to fix it.
> GeoSpatial index has the same issues, we decided to drop them along with
> Indexing SPI at all.
>
> However, you can find the activity on dev-list on the Index Query topic.
> Guys are going to add IndexQuery (a scan query over the sorted index which
> can use simple conditions) in Ignite 2.
> We also plan to have the same functionality, maybe it is possible to add
> full-text search support here.
> Will it work for you, what do you think?
>
>
> >    4. Will the unified storage model enable different versions of Ignite
> to
> >   be in the cluster when persistence is enabled so that rolling restarts
> can
> >   be done?
> I'm not sure a rolling upgrade (RU) will be available because too much
> compatibility issues should be resolved
> to make RU possible under the load without downtime.
>
> Maybe it makes sense to provide some grid mode (maintenance mode) for RU
> purposes that will block all the user load
> but allow upgrade the grid. E.g. for the pure in-memory case.
>
> Persistence compatibility should be preserved as it works for Ignite 2.
>
>
> >    5. Will it be possible to provide a custom cache store still and will
> >   these changes enable custom cache stores to be queryable from SQL?
> I'm not sure I fully understand this.
> 1. Usually, SQL is about indices. Ignite can't perform a query over the
> unindexed data.
>
> 2. Fullscan over the cache that contains only part of data + scan the
> CacheStore, then merging the results is a pain.
> Most likely, running a query over CacheStore directly will be a simpler
> way, and even more performant.
> Shared CacheStore (same for all nodes) will definitely kill the
> performance in that case.
> So, the preliminary loadCache() call looks like a good compromise.
>
> 3. Splitting query into 2 parts to run on Ignite and to run on CacheStore
> looks possible with Calcite,
> but I think it impractical because in general, neither CacheStore nor
> database structure are aware of the data partitioning.
>
> 4. Transactions can't be supported in case of direct CacheStore access,
> because even if the underlying database supports 2-phase commit, which is
> a rare case, the recovery protocol looks hard.
> Just looks like this feature doesn't worth it.
>
>
> >   6. This question wasn't mine but I was going to ask it as well: What
> >   will happen to the Indexing API since H2 is being removed?
> As I wrote above, Indexing SPI will be dropped, but IndexQuery will be
> added.
>
> >  1. As I mentioned above, we Index into Solr, in earlier versions of
> >      our product we used the indexing SPI to index into Lucene on the
> Ignite
> >      nodes but this presented so many challenges we ultimately abandoned
> it and
> >      replaced it with the current Solr solution.
> AFAIK, some guys developed and sell a plugin for Ignite-2 with persistent
> Lucene and Geo indices.
> I don't know about the capabilities and limitations of their solution,
> because of closed code.
> You can easily google it.
>
> I saw few encouraged guys who want to improve TEXT queries,
> but unfortunately, things weren't moved far enough. For now, they are in
> the middle of fixing the merging TEXT query results.
> So far so good.
>
> I think it is a good chance to master the skill developing of a
> distributed system for the one
> who will take a lead over the full-text search feature and add native
> FullText index support into Ignite-3.
>
>
> >   7. What impact does RAFT now have on conflict resolution?
> RAFT is a state machine replication protocol. It guarantees all the nodes
> will see the updates in the same order.
> So, seems no conflicts are possible. Recovery from split-brain is
> impossible in common-case.
>
> However, I think we have a conflict resolver analog in Ignite-3 as it is
> very useful in some cases
> e.g datacenter replication, incremental data load from 3-rd party source,
> recovery from 3-rd party source.
>
>
> > 8. CacheGroups.
> AFAIK, CacheGroup will be eliminated, actually, we'll keep this mechanic,
> but it will be configured in a different way,
> which makes Ignite configuring a bit simpler.
> Sorry, for now, I have no answer on your performance concerns, this part
> of Ignite-3 slipped from my radar.
>
>
> Let's wait if someone will clarify what we could expect in Ignite-3.
> Guys, can someone chime in and give more light on 3,4,7,8 questions?
>
>
> On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson <
> courtney.robin...@hypi.io> wrote:
>
>> Hey everyone,
>> I attended the Alpha 2 update yesterday and was quite pleased to see the
>> progress on things so far. So first, congratulations to everyone on the
>> work being put in and thank you to Val and Kseniya for running yesterday's
>> event.
>>
>> I asked a few questions after the webinar which Val had some answers to
>> but
>> suggested posting here as some of them are not things that have been
>> thought about yet or no plans exist around it at this point.
>>
>> I'll put all of them here and if necessary we can break into different
>> threads after.
>>
>>    1. Schema change - does that include the ability to change the types of
>>    fields/columns?
>>       1. Val's answer was yes with some limitations but those are not well
>>       defined yet. He did mention that something like some kind of
>> transformer
>>       could be provided for doing the conversion and I would second this,
>> even
>>       for common types like int to long being able to do a custom
>> conversion will
>>       be immensely valuable.
>>    2. Will the new guaranteed consistency between APIs also mean SQL will
>>    gain transaction support?
>>       1. I believe the answer here was yes but perhaps someone else may
>>       want to weigh in to confirm
>>    3. Has there been any decision about how much of Calcite will be
>> exposed
>>    to the client? When using thick clients, it'll be hugely beneficial to
>> be
>>    able to work with Calcite APIs directly to provide custom rules and
>>    optimisations to better suit organisation needs
>>    1. We currently use Calcite ourselves and have a lot of custom rules
>> and
>>       optimisations and have slowly pushed more of our queries to
>> Calcite that we
>>       then push down to Ignite.
>>       2. We Index into Solr and use the Solr indices and others to
>>       fulfill over all queries with Ignite just being one of the
>> possible storage
>>       targets Calcite pushes down to. If we could get to the calcite
>> API from an
>>       Ignite thick client, it would enable us to remove a layer of
>> abstraction
>>       and complexity and make Ignite our primary that we then link
>> with Solr and
>>       others to fulfill queries.
>>    4. Will the unified storage model enable different versions of Ignite
>> to
>>    be in the cluster when persistence is enabled so that rolling restarts
>> can
>>    be done?
>>    1. We have to do a strange dance to perform Ignite upgrades without
>>       downtime because pods/nodes will fail to start on version mismatch
>> and if
>>       we get that dance wrong, we will corrupt a node's data. It will make
>>       admin/upgrades far less brittle and error prone if this was
>> possible.
>>    5. Will it be possible to provide a custom cache store still and will
>>    these changes enable custom cache stores to be queryable from SQL?
>>    1. Our Ignite usage is wide and complex because we use KV, SQL and
>> other
>>       APIs. The inconsistency of what can and can't be used from one API
>> to
>>       another is a real challenge and has forced us over time to stick
>> to one API
>>       and write alternative solutions outside of Ignite. It will
>> drastically
>>       simplify things if any CacheStore (or some new equivalent) could
>> be plugged
>>       in and be made accessible to SQL (and in fact all other APIs)
>> without
>>       having to load all the data from the underlying CacheStore first
>> into memory
>>    6. This question wasn't mine but I was going to ask it as well: What
>>    will happen to the Indexing API since H2 is being removed?
>>       1. As I mentioned above, we Index into Solr, in earlier versions of
>>       our product we used the indexing SPI to index into Lucene on the
>> Ignite
>>       nodes but this presented so many challenges we ultimately
>> abandoned it and
>>       replaced it with the current Solr solution.
>>       2. Lucene indexing was ideal because it meant we didn't have to
>>       re-invent Solr or Elasticsearch's sharding capabilities, that was
>> almost
>>       automatic with Ignite only giving you the data that was meant for
>> the
>>       current node.
>>       3. The Lucene API enabled more flexibility and removed a network
>>       round trip from our queries.
>>       4. Given Calcite's ability to support custom SQL functions, I'd love
>>       to have the ability to define custom functions that Lucene was
>> answering
>>    7. What impact does RAFT now have on conflict resolution, off the top
>> of
>>    my head there are two cases
>>       1. On startup after a split brain Ignite currently takes an
>> "exercise
>>       for the reader" approach and dumps a log along the lines of
>>
>> >    1. BaselineTopology of joining node is not compatible with
>> >       BaselineTopology in the cluster.
>> >    1. Branching history of cluster BlT doesn't contain branching point
>> >       hash of joining node BlT. Consider cleaning persistent storage of
>> the node
>> >       and adding it to the cluster again.
>> >
>>    1. This leaves you with no choice except to take one half and manually
>>       copy, write data back over to the other half then destroy the bad
>> one.
>>       2. The second case is conflicts on keys, I
>>       beleive CacheVersionConflictResolver and manager are used
>>       by GridCacheMapEntry which just says if use old value do this
>> otherwise use
>>       newVal. Ideally this will be exposed in the new API so that one can
>>       override this behaviour. The last writer wins approach isn't always
>> ideal
>>       and the semantics of the domain can mean that what is consider
>> "correct" in
>>       a conflict is not so for a different domain.
>>    8. This is last on the list but is actually the most important for us
>>    right now as it is an impending and growing risk. We allow customers to
>>    create their own tables on demand. We're already using the same cache
>> group
>>    etc for data structures to be re-used but now that we're getting to
>>    thousands of tables/caches our startup times are sometimes
>> unpredictably
>>    long - at present it seems to depend on the state of the cache/table
>> before
>>    the restart but we're into the order of 5 - 7 mins and steadily
>> increasing
>>    with the growth of tables. Are there any provisions in Ignite 3 for
>>    ensuring startup time isn't proportional to the number of tables/caches
>>    available?
>>
>>
>> Those are the key things I can think of at the moment. Val and others I'd
>> love to open a conversation around these.
>>
>> Regards,
>> Courtney Robinson
>> Founder and CEO, Hypi
>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>>
>> <https://hypi.io>
>> https://hypi.io
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Apache Ignite 3 Alpha 2 webinar follow up questions

Reply via email to