Something else came to mind, are there plans to support prepared queries?

I recall someone saying before that Ignite does internally cache queries
but it's not at all clear if or how it does do that. I assume a simple hash
of the query isn't enough.

We generate SQL queries based on user runtime settings and they can get to
hundreds of lines long, I imagine this means most of our queries are not
being cached but there are patterns so we could generate and manage
prepared queries ourselves.

Also, will there be a dedicated API for doing SQL queries rather than
having to pass a SqlFieldsQuery to a cache that has nothing to do with the
cache being queried? When I first started with Ignite years ago, this was
beyond confusing for me. I'm trying to run select x from B but I pass this
to a cache called DUMMY or whatever arbitrary name...

On Fri, Jul 23, 2021 at 4:05 PM Courtney Robinson <courtney.robin...@hypi.io>
wrote:

> Andrey,
> Thanks for the response - see my comments inline.
>
>
>> I've gone through the questions and have no the whole picture of your use
>> case.
>
> Would you please clarify how you exactly use the Ignite? what are the
>> integration points?
>>
>
> I'll try to clarify - we have a low/no code platform. A user designs a
> model for their application and we map this model to Ignite tables and
> other data sources. The model I'll describe is what we're building now and
> expected to be in alpha some time in Q4 21. Our current production
> architecture is different and isn't as generic, it is heavily tied to
> Ignite and we've redesigned to get some flexibility where Ignite doesn't
> provide what we want. Things like window functions and other SQL-99 limits.
>
> In the next gen version we're working on you can create a model for a
> Tweet(content, to) and we will create an Ignite table with content and to
> columns using the type the user selects. This is the simplest case.
> We are adding generic support for sources and sinks and using Calcite as a
> data virtualisation layer. Ignite is one of the available source/sinks.
>
> When a user creates a model for Tweet, we also allow them to specify how
> they want to index the data. We have a copy of the calcite Elasticsearch
> adapter modified for Solr.
>
> When a source is queried (Ignite or any other that we support), we
> generate SQL that Calcite executes. Calcite will push down the generated
> queries to Solr and Solr produces a list of IDs (in case of Ignite) and we
> do a multi-get from Ignite to produce the actual results.
>
> Obviously there's a lot more to this but that should give you a general
> idea.
>
> and maybe share some experience with using Ignite SPIs?
>>
> Our evolution with Ignite started from the key value + compute APIs. We
> used the SPIs then but have since moved to using only the Ignite SQL API
> (we gave up transactions for this).
>
> We originally used the indexing SPI to keep our own lucene index of data
> in a cache. We did not use the Ignite FTS as it is very limited compared to
> what we allow customers to do. If I remember correctly, we were using an
> affinity compute job to send queries to the right Ignite node and
> then doing a multi-get to pull the data from caches.
> I think we used one or two other SPIs and we found them very useful to be
> able to extend and customise Ignite without having to fork/change upstream
> classes. We only stopped using them because we eventually concluded that
> using the SQL only API was better for numerous reasons.
>
>
>> We'll keep the information in mind while developing the Ignite,
>> because this may help us to make a better product.
>>
>> By the way, I'll try to answer the questions.
>>
>> >   1. Schema change - does that include the ability to change the types
>> of
>> >   fields/columns?
>> Yes, we plan to support transparent conversion to a wider type on-fly
>> (e.g.
>> 'int' to 'long').
>> This is a major point of our Live-schema concept.
>> In fact, there is no need to convert data on all the nodes in a
>> synchronous
>> way as old SQL databases do (if one supports though),
>> we are going to support multiple schema versions and convert data
>> on-demand
>> on a per-row basis to the latest version,
>> then write-back the row.
>>
>
> I can understand. The auto conversion to wider type makes sense.
>
>>
>> More complex things like 'String' -> 'int' are out of scope for now
>> because
>> it requires the execution of a user code on the critical path.
>>
>
> I would argue though that executing user code on the critical path
> shouldn't be a blocker for custom conversions. I feel if a user is making
> an advance enough integration to provide custom conversions they would be
> aware that it impacts the system as a whole.
>
> The limitation here is column MUST NOT be indexed, because an index over
>> the data of different kinds is impossible.
>>
>  Understood - I'd make the case that indexing should be pluggable. I would
> love for us to be able to take indexing away from Ignite in our impl. - I
> think in Calcite, the Postgres adapter does this by having a table whose
> type is "Index". The implementor would be left with the freedom to choose
> how that table answers index lookups. From Ignite's perspective it wouldn't
> care so long as the interface's contract is met, I could use an index that
> does a lucene, ES, Solr or Redis lookup and the end result would be the
> same but as the implementor I'm choosing the tradeoff I want to meet the
> organisation's goals.
>
>
>>
>>  >  2. Will the new guaranteed consistency between APIs also mean SQL will
>>  >  gain transaction support?
>> Yes, we plan to have Transactional SQL.
>> DDL will be non-transactional though, and I wonder if the one supports
>> this.
>>
> I'm not sure I know of any thing that supports transactional DDL so don't
> think this is an issue but I would say that a DDL statement in a
> transaction shouldn't fail the transaction. I believe in Ignite 2 there is
> a flag  to turn this on or off, we should definitely keep this. In our
> case, it's an issue with the nature of the platform we provide, at
> development time only about 10% of schema or other DB info is known - we
> generate the other 90% on the fly based on whatever customers decide to
> design from our UI.
>
>>
>> Ignite 3 will operate with Rows underneath, but classic Table API and
>> Key-value will be available to a user
>> at the same time and with all consistency guarantees.
>
> Excellent!
>
>>
>>
>
>> >  3. Has there been any decision about how much of Calcite will be
>> exposed
>> >   to the client? When using thick clients, it'll be hugely beneficial to
>> be
>> >   able to work with Calcite APIs directly to provide custom rules and
>> >  optimizations to better suit organization needs
>> As of now, we have no plans to expose any Calcite API to a user.
>> AFAIK, we have our custom Calcite convention, custom rules that are aware
>> of distributed environment,
>> and additional AST nodes. The rules MUST correctly propagate internal
>> information about data distribution,
>> so I'm not sure want to give low-level access to them.
>>
>
> Maybe we're an edge case but for us access to the Calcite APIs would be
> shift our current development somewhat. For us, we're treating Ignite as a
> library that provides a good foundation and we extend and customise it.
> Again, we may be an edge case and maybe most people just want a database to
> put data into and get it back out without controlling some of how it does
> those things.
>
>
>> > We Index into Solr and use the Solr indices
>> Ignite 1-2 has poor support for TEXT queries, which is totally
>> unconfigurable.
>> Also, Lucene indices underneath are NOT persistent that requires too much
>> effort to fix it.
>> GeoSpatial index has the same issues, we decided to drop them along with
>> Indexing SPI at all.
>>
>> However, you can find the activity on dev-list on the Index Query topic.
>> Guys are going to add IndexQuery (a scan query over the sorted index which
>> can use simple conditions) in Ignite 2.
>> We also plan to have the same functionality, maybe it is possible to add
>> full-text search support here.
>> Will it work for you, what do you think?
>>
> Yes, we originally looked at text queries and almost immediately said no.
> Nothing about it was useful for us other than the lucene dependency in
> Java. In the end that also became an issue because we wanted a newer lucene
> version.
> IndexQuery will be useful - we'll certainly use it but it's not enough. I
> think we customise and depend on Solr too much for IndexQuery to compare
> but it will help in some cases for simpler queries.
>
>>
>>
>> >    4. Will the unified storage model enable different versions of Ignite
>> to
>> >   be in the cluster when persistence is enabled so that rolling restarts
>> can
>> >   be done?
>> I'm not sure a rolling upgrade (RU) will be available because too much
>> compatibility issues should be resolved
>> to make RU possible under the load without downtime.
>>
>> Maybe it makes sense to provide some grid mode (maintenance mode) for RU
>> purposes that will block all the user load
>> but allow upgrade the grid. E.g. for the pure in-memory case.
>>
>> Persistence compatibility should be preserved as it works for Ignite 2.
>>
> My ideal situation would be that we start a newer Ignite version, it comes
> online, joins the cluster and is treated as some kind of maintenance mode
> as you suggested. In maintenance mode, the other nodes re-balance or some
> other process to send all the data this new node will handle over to it.
> The existing nodes continue serving this data until the new node is no
> longer in maintenance mode and then it becomes the primary for the data
> that was rebalanced to it.
>
> The second case is if an existing node is restarted with a newer Ignite
> version. No re-balance is needed, it joins in maintenance mode, runs any
> upgrade/conversion or other task it needs to and then starts accepting
> reads and writes. Communication with lower version nodes can be limited,
> they are aware of it and sends it data and queries for which it is the
> primary assuming they will also be upgraded.
>
> I guess I'm not aware of the compatibility issues this presents and so my
> view is narrow and perhaps naive here.
>
>>
>>
>> >    5. Will it be possible to provide a custom cache store still and will
>> >   these changes enable custom cache stores to be queryable from SQL?
>> I'm not sure I fully understand this.
>> 1. Usually, SQL is about indices. Ignite can't perform a query over the
>> unindexed data.
>>
> Yes understood
>
>>
>> 2. Fullscan over the cache that contains only part of data + scan the
>> CacheStore, then merging the results is a pain.
>> Most likely, running a query over CacheStore directly will be a simpler
>> way, and even more performant.
>> Shared CacheStore (same for all nodes) will definitely kill the
>> performance
>> in that case.
>> So, the preliminary loadCache() call looks like a good compromise.
>>
> I think the problem is largely that the CacheStore interface is not
> sufficient for being able to do this. If it had a richer interface which
> allowed the cache store to answer index queries basically hooking into
> whatever Ignite's doing for its B+tree then this would be viable. A
> CacheStore that only implements KV API doesn't take part in SQL queries.
>
>>
>> 3. Splitting query into 2 parts to run on Ignite and to run on CacheStore
>> looks possible with Calcite,
>> but I think it impractical because in general, neither CacheStore nor
>> database structure are aware of the data partitioning.
>>
> hmmm, maybe I missed the point but as the implementor of the CacheStore
> you should have knowledge of the structure and partition info. or have some
> way of retrieving it. Again, I think the current CacheStore interface is
> the problem and if it was extended to provide this information then its up
> to the implementation to do this whilst Ignite knows that any
> implementation of these interfaces will meet the contract necessary.
>
>
>>
>> 4. Transactions can't be supported in case of direct CacheStore access,
>> because even if the underlying database supports 2-phase commit, which is
>> a
>> rare case, the recovery protocol looks hard.
>> Just looks like this feature doesn't worth it.
>>
> I'd completely agree with this. It will be incredibly hard to get this
> done reliably
>
>>
>>
>> >   6. This question wasn't mine but I was going to ask it as well: What
>> >   will happen to the Indexing API since H2 is being removed?
>> As I wrote above, Indexing SPI will be dropped, but IndexQuery will be
>> added.
>>
>> >  1. As I mentioned above, we Index into Solr, in earlier versions of
>> >      our product we used the indexing SPI to index into Lucene on the
>> Ignite
>> >      nodes but this presented so many challenges we ultimately abandoned
>> it and
>> >      replaced it with the current Solr solution.
>> AFAIK, some guys developed and sell a plugin for Ignite-2 with persistent
>> Lucene and Geo indices.
>> I don't know about the capabilities and limitations of their solution,
>> because of closed code.
>> You can easily google it.
>>
>> I saw few encouraged guys who want to improve TEXT queries,
>> but unfortunately, things weren't moved far enough. For now, they are in
>> the middle of fixing the merging TEXT query results.
>> So far so good.
>>
>> I think it is a good chance to master the skill developing of a
>> distributed
>> system for the one
>> who will take a lead over the full-text search feature and add native
>> FullText index support into Ignite-3.
>>
> I've seen the other thread from Atri I believe about this.
>
>>
>>
>> >   7. What impact does RAFT now have on conflict resolution?
>> RAFT is a state machine replication protocol. It guarantees all the nodes
>> will see the updates in the same order.
>> So, seems no conflicts are possible. Recovery from split-brain is
>> impossible in common-case.
>>
>> However, I think we have a conflict resolver analog in Ignite-3 as it is
>> very useful in some cases
>> e.g datacenter replication, incremental data load from 3-rd party source,
>> recovery from 3-rd party source.
>>
>>
>> > 8. CacheGroups.
>> AFAIK, CacheGroup will be eliminated, actually, we'll keep this mechanic,
>> but it will be configured in a different way,
>> which makes Ignite configuring a bit simpler.
>> Sorry, for now, I have no answer on your performance concerns, this part
>> of
>> Ignite-3 slipped from my radar.
>>
> No worries. I'll wait and see if anyone else suggests something. Its
> getting a lot worse, a node took 1hr to start yesterday after a deployment
> and its in prod with very little visibility into what it is doing, it was
> just stopped, no logging or anything and then resumed.
>
> 2021-07-22 13:40:15.997  INFO [ArcOS,,,] 9 --- [orker-#40%hypi%]
> o.a.i.i.p.cache.GridCacheProcessor      [285] :  Finished recovery for
> cache [cache=hypi_01F8ZC3DGT66RNYCDZH3XNVY2E_Hue, grp=hypi,
> startVer=AffinityTopologyVersion [topVer=79, minorTopVer=0]]
>
> One hour later it printed the next cache recovery message and started 30
> seconds after going through other tables.
>
>
>
>>
>> Let's wait if someone will clarify what we could expect in Ignite-3.
>> Guys, can someone chime in and give more light on 3,4,7,8 questions?
>>
>>
>> On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson <
>> courtney.robin...@hypi.io>
>> wrote:
>>
>> > Hey everyone,
>> > I attended the Alpha 2 update yesterday and was quite pleased to see the
>> > progress on things so far. So first, congratulations to everyone on the
>> > work being put in and thank you to Val and Kseniya for running
>> yesterday's
>> > event.
>> >
>> > I asked a few questions after the webinar which Val had some answers to
>> but
>> > suggested posting here as some of them are not things that have been
>> > thought about yet or no plans exist around it at this point.
>> >
>> > I'll put all of them here and if necessary we can break into different
>> > threads after.
>> >
>> >    1. Schema change - does that include the ability to change the types
>> of
>> >    fields/columns?
>> >       1. Val's answer was yes with some limitations but those are not
>> well
>> >       defined yet. He did mention that something like some kind of
>> > transformer
>> >       could be provided for doing the conversion and I would second
>> this,
>> > even
>> >       for common types like int to long being able to do a custom
>> > conversion will
>> >       be immensely valuable.
>> >    2. Will the new guaranteed consistency between APIs also mean SQL
>> will
>> >    gain transaction support?
>> >       1. I believe the answer here was yes but perhaps someone else may
>> >       want to weigh in to confirm
>> >    3. Has there been any decision about how much of Calcite will be
>> exposed
>> >    to the client? When using thick clients, it'll be hugely beneficial
>> to
>> > be
>> >    able to work with Calcite APIs directly to provide custom rules and
>> >    optimisations to better suit organisation needs
>> >    1. We currently use Calcite ourselves and have a lot of custom rules
>> and
>> >       optimisations and have slowly pushed more of our queries to
>> > Calcite that we
>> >       then push down to Ignite.
>> >       2. We Index into Solr and use the Solr indices and others to
>> >       fulfill over all queries with Ignite just being one of the
>> > possible storage
>> >       targets Calcite pushes down to. If we could get to the calcite
>> > API from an
>> >       Ignite thick client, it would enable us to remove a layer of
>> > abstraction
>> >       and complexity and make Ignite our primary that we then link
>> > with Solr and
>> >       others to fulfill queries.
>> >    4. Will the unified storage model enable different versions of
>> Ignite to
>> >    be in the cluster when persistence is enabled so that rolling
>> restarts
>> > can
>> >    be done?
>> >    1. We have to do a strange dance to perform Ignite upgrades without
>> >       downtime because pods/nodes will fail to start on version mismatch
>> > and if
>> >       we get that dance wrong, we will corrupt a node's data. It will
>> make
>> >       admin/upgrades far less brittle and error prone if this was
>> possible.
>> >    5. Will it be possible to provide a custom cache store still and will
>> >    these changes enable custom cache stores to be queryable from SQL?
>> >    1. Our Ignite usage is wide and complex because we use KV, SQL and
>> other
>> >       APIs. The inconsistency of what can and can't be used from one
>> API to
>> >       another is a real challenge and has forced us over time to stick
>> > to one API
>> >       and write alternative solutions outside of Ignite. It will
>> > drastically
>> >       simplify things if any CacheStore (or some new equivalent) could
>> > be plugged
>> >       in and be made accessible to SQL (and in fact all other APIs)
>> without
>> >       having to load all the data from the underlying CacheStore first
>> > into memory
>> >    6. This question wasn't mine but I was going to ask it as well: What
>> >    will happen to the Indexing API since H2 is being removed?
>> >       1. As I mentioned above, we Index into Solr, in earlier versions
>> of
>> >       our product we used the indexing SPI to index into Lucene on the
>> > Ignite
>> >       nodes but this presented so many challenges we ultimately
>> > abandoned it and
>> >       replaced it with the current Solr solution.
>> >       2. Lucene indexing was ideal because it meant we didn't have to
>> >       re-invent Solr or Elasticsearch's sharding capabilities, that was
>> > almost
>> >       automatic with Ignite only giving you the data that was meant for
>> the
>> >       current node.
>> >       3. The Lucene API enabled more flexibility and removed a network
>> >       round trip from our queries.
>> >       4. Given Calcite's ability to support custom SQL functions, I'd
>> love
>> >       to have the ability to define custom functions that Lucene was
>> > answering
>> >    7. What impact does RAFT now have on conflict resolution, off the
>> top of
>> >    my head there are two cases
>> >       1. On startup after a split brain Ignite currently takes an
>> "exercise
>> >       for the reader" approach and dumps a log along the lines of
>> >
>> > >    1. BaselineTopology of joining node is not compatible with
>> > >       BaselineTopology in the cluster.
>> > >    1. Branching history of cluster BlT doesn't contain branching point
>> > >       hash of joining node BlT. Consider cleaning persistent storage
>> of
>> > the node
>> > >       and adding it to the cluster again.
>> > >
>> >    1. This leaves you with no choice except to take one half and
>> manually
>> >       copy, write data back over to the other half then destroy the bad
>> > one.
>> >       2. The second case is conflicts on keys, I
>> >       beleive CacheVersionConflictResolver and manager are used
>> >       by GridCacheMapEntry which just says if use old value do this
>> > otherwise use
>> >       newVal. Ideally this will be exposed in the new API so that one
>> can
>> >       override this behaviour. The last writer wins approach isn't
>> always
>> > ideal
>> >       and the semantics of the domain can mean that what is consider
>> > "correct" in
>> >       a conflict is not so for a different domain.
>> >    8. This is last on the list but is actually the most important for us
>> >    right now as it is an impending and growing risk. We allow customers
>> to
>> >    create their own tables on demand. We're already using the same cache
>> > group
>> >    etc for data structures to be re-used but now that we're getting to
>> >    thousands of tables/caches our startup times are sometimes
>> unpredictably
>> >    long - at present it seems to depend on the state of the cache/table
>> > before
>> >    the restart but we're into the order of 5 - 7 mins and steadily
>> > increasing
>> >    with the growth of tables. Are there any provisions in Ignite 3 for
>> >    ensuring startup time isn't proportional to the number of
>> tables/caches
>> >    available?
>> >
>> >
>> > Those are the key things I can think of at the moment. Val and others
>> I'd
>> > love to open a conversation around these.
>> >
>> > Regards,
>> > Courtney Robinson
>> > Founder and CEO, Hypi
>> > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>> >
>> > <https://hypi.io>
>> > https://hypi.io
>> >
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>

Reply via email to