Hi Courtney, Generally speaking, query caching certainly makes sense. As far as I know, Ignite 2.x actually does that, but most likely there might be room for improvement as well. We will look into this.
As for the SQL API - the answer is yes. The requirement for a dummy cache is an artifact of the current architecture. This is 100% wrong and will be changed in 3.0. -Val On Sun, Jul 25, 2021 at 2:51 PM Courtney Robinson <courtney.robin...@hypi.io> wrote: > Something else came to mind, are there plans to support prepared queries? > > I recall someone saying before that Ignite does internally cache queries > but it's not at all clear if or how it does do that. I assume a simple hash > of the query isn't enough. > > We generate SQL queries based on user runtime settings and they can get to > hundreds of lines long, I imagine this means most of our queries are not > being cached but there are patterns so we could generate and manage > prepared queries ourselves. > > Also, will there be a dedicated API for doing SQL queries rather than > having to pass a SqlFieldsQuery to a cache that has nothing to do with the > cache being queried? When I first started with Ignite years ago, this was > beyond confusing for me. I'm trying to run select x from B but I pass this > to a cache called DUMMY or whatever arbitrary name... > > On Fri, Jul 23, 2021 at 4:05 PM Courtney Robinson < > courtney.robin...@hypi.io> > wrote: > > > Andrey, > > Thanks for the response - see my comments inline. > > > > > >> I've gone through the questions and have no the whole picture of your > use > >> case. > > > > Would you please clarify how you exactly use the Ignite? what are the > >> integration points? > >> > > > > I'll try to clarify - we have a low/no code platform. A user designs a > > model for their application and we map this model to Ignite tables and > > other data sources. The model I'll describe is what we're building now > and > > expected to be in alpha some time in Q4 21. Our current production > > architecture is different and isn't as generic, it is heavily tied to > > Ignite and we've redesigned to get some flexibility where Ignite doesn't > > provide what we want. Things like window functions and other SQL-99 > limits. > > > > In the next gen version we're working on you can create a model for a > > Tweet(content, to) and we will create an Ignite table with content and to > > columns using the type the user selects. This is the simplest case. > > We are adding generic support for sources and sinks and using Calcite as > a > > data virtualisation layer. Ignite is one of the available source/sinks. > > > > When a user creates a model for Tweet, we also allow them to specify how > > they want to index the data. We have a copy of the calcite Elasticsearch > > adapter modified for Solr. > > > > When a source is queried (Ignite or any other that we support), we > > generate SQL that Calcite executes. Calcite will push down the generated > > queries to Solr and Solr produces a list of IDs (in case of Ignite) and > we > > do a multi-get from Ignite to produce the actual results. > > > > Obviously there's a lot more to this but that should give you a general > > idea. > > > > and maybe share some experience with using Ignite SPIs? > >> > > Our evolution with Ignite started from the key value + compute APIs. We > > used the SPIs then but have since moved to using only the Ignite SQL API > > (we gave up transactions for this). > > > > We originally used the indexing SPI to keep our own lucene index of data > > in a cache. We did not use the Ignite FTS as it is very limited compared > to > > what we allow customers to do. If I remember correctly, we were using an > > affinity compute job to send queries to the right Ignite node and > > then doing a multi-get to pull the data from caches. > > I think we used one or two other SPIs and we found them very useful to be > > able to extend and customise Ignite without having to fork/change > upstream > > classes. We only stopped using them because we eventually concluded that > > using the SQL only API was better for numerous reasons. > > > > > >> We'll keep the information in mind while developing the Ignite, > >> because this may help us to make a better product. > >> > >> By the way, I'll try to answer the questions. > >> > >> > 1. Schema change - does that include the ability to change the types > >> of > >> > fields/columns? > >> Yes, we plan to support transparent conversion to a wider type on-fly > >> (e.g. > >> 'int' to 'long'). > >> This is a major point of our Live-schema concept. > >> In fact, there is no need to convert data on all the nodes in a > >> synchronous > >> way as old SQL databases do (if one supports though), > >> we are going to support multiple schema versions and convert data > >> on-demand > >> on a per-row basis to the latest version, > >> then write-back the row. > >> > > > > I can understand. The auto conversion to wider type makes sense. > > > >> > >> More complex things like 'String' -> 'int' are out of scope for now > >> because > >> it requires the execution of a user code on the critical path. > >> > > > > I would argue though that executing user code on the critical path > > shouldn't be a blocker for custom conversions. I feel if a user is making > > an advance enough integration to provide custom conversions they would be > > aware that it impacts the system as a whole. > > > > The limitation here is column MUST NOT be indexed, because an index over > >> the data of different kinds is impossible. > >> > > Understood - I'd make the case that indexing should be pluggable. I > would > > love for us to be able to take indexing away from Ignite in our impl. - I > > think in Calcite, the Postgres adapter does this by having a table whose > > type is "Index". The implementor would be left with the freedom to choose > > how that table answers index lookups. From Ignite's perspective it > wouldn't > > care so long as the interface's contract is met, I could use an index > that > > does a lucene, ES, Solr or Redis lookup and the end result would be the > > same but as the implementor I'm choosing the tradeoff I want to meet the > > organisation's goals. > > > > > >> > >> > 2. Will the new guaranteed consistency between APIs also mean SQL > will > >> > gain transaction support? > >> Yes, we plan to have Transactional SQL. > >> DDL will be non-transactional though, and I wonder if the one supports > >> this. > >> > > I'm not sure I know of any thing that supports transactional DDL so don't > > think this is an issue but I would say that a DDL statement in a > > transaction shouldn't fail the transaction. I believe in Ignite 2 there > is > > a flag to turn this on or off, we should definitely keep this. In our > > case, it's an issue with the nature of the platform we provide, at > > development time only about 10% of schema or other DB info is known - we > > generate the other 90% on the fly based on whatever customers decide to > > design from our UI. > > > >> > >> Ignite 3 will operate with Rows underneath, but classic Table API and > >> Key-value will be available to a user > >> at the same time and with all consistency guarantees. > > > > Excellent! > > > >> > >> > > > >> > 3. Has there been any decision about how much of Calcite will be > >> exposed > >> > to the client? When using thick clients, it'll be hugely beneficial > to > >> be > >> > able to work with Calcite APIs directly to provide custom rules and > >> > optimizations to better suit organization needs > >> As of now, we have no plans to expose any Calcite API to a user. > >> AFAIK, we have our custom Calcite convention, custom rules that are > aware > >> of distributed environment, > >> and additional AST nodes. The rules MUST correctly propagate internal > >> information about data distribution, > >> so I'm not sure want to give low-level access to them. > >> > > > > Maybe we're an edge case but for us access to the Calcite APIs would be > > shift our current development somewhat. For us, we're treating Ignite as > a > > library that provides a good foundation and we extend and customise it. > > Again, we may be an edge case and maybe most people just want a database > to > > put data into and get it back out without controlling some of how it does > > those things. > > > > > >> > We Index into Solr and use the Solr indices > >> Ignite 1-2 has poor support for TEXT queries, which is totally > >> unconfigurable. > >> Also, Lucene indices underneath are NOT persistent that requires too > much > >> effort to fix it. > >> GeoSpatial index has the same issues, we decided to drop them along with > >> Indexing SPI at all. > >> > >> However, you can find the activity on dev-list on the Index Query topic. > >> Guys are going to add IndexQuery (a scan query over the sorted index > which > >> can use simple conditions) in Ignite 2. > >> We also plan to have the same functionality, maybe it is possible to add > >> full-text search support here. > >> Will it work for you, what do you think? > >> > > Yes, we originally looked at text queries and almost immediately said no. > > Nothing about it was useful for us other than the lucene dependency in > > Java. In the end that also became an issue because we wanted a newer > lucene > > version. > > IndexQuery will be useful - we'll certainly use it but it's not enough. I > > think we customise and depend on Solr too much for IndexQuery to compare > > but it will help in some cases for simpler queries. > > > >> > >> > >> > 4. Will the unified storage model enable different versions of > Ignite > >> to > >> > be in the cluster when persistence is enabled so that rolling > restarts > >> can > >> > be done? > >> I'm not sure a rolling upgrade (RU) will be available because too much > >> compatibility issues should be resolved > >> to make RU possible under the load without downtime. > >> > >> Maybe it makes sense to provide some grid mode (maintenance mode) for RU > >> purposes that will block all the user load > >> but allow upgrade the grid. E.g. for the pure in-memory case. > >> > >> Persistence compatibility should be preserved as it works for Ignite 2. > >> > > My ideal situation would be that we start a newer Ignite version, it > comes > > online, joins the cluster and is treated as some kind of maintenance mode > > as you suggested. In maintenance mode, the other nodes re-balance or some > > other process to send all the data this new node will handle over to it. > > The existing nodes continue serving this data until the new node is no > > longer in maintenance mode and then it becomes the primary for the data > > that was rebalanced to it. > > > > The second case is if an existing node is restarted with a newer Ignite > > version. No re-balance is needed, it joins in maintenance mode, runs any > > upgrade/conversion or other task it needs to and then starts accepting > > reads and writes. Communication with lower version nodes can be limited, > > they are aware of it and sends it data and queries for which it is the > > primary assuming they will also be upgraded. > > > > I guess I'm not aware of the compatibility issues this presents and so my > > view is narrow and perhaps naive here. > > > >> > >> > >> > 5. Will it be possible to provide a custom cache store still and > will > >> > these changes enable custom cache stores to be queryable from SQL? > >> I'm not sure I fully understand this. > >> 1. Usually, SQL is about indices. Ignite can't perform a query over the > >> unindexed data. > >> > > Yes understood > > > >> > >> 2. Fullscan over the cache that contains only part of data + scan the > >> CacheStore, then merging the results is a pain. > >> Most likely, running a query over CacheStore directly will be a simpler > >> way, and even more performant. > >> Shared CacheStore (same for all nodes) will definitely kill the > >> performance > >> in that case. > >> So, the preliminary loadCache() call looks like a good compromise. > >> > > I think the problem is largely that the CacheStore interface is not > > sufficient for being able to do this. If it had a richer interface which > > allowed the cache store to answer index queries basically hooking into > > whatever Ignite's doing for its B+tree then this would be viable. A > > CacheStore that only implements KV API doesn't take part in SQL queries. > > > >> > >> 3. Splitting query into 2 parts to run on Ignite and to run on > CacheStore > >> looks possible with Calcite, > >> but I think it impractical because in general, neither CacheStore nor > >> database structure are aware of the data partitioning. > >> > > hmmm, maybe I missed the point but as the implementor of the CacheStore > > you should have knowledge of the structure and partition info. or have > some > > way of retrieving it. Again, I think the current CacheStore interface is > > the problem and if it was extended to provide this information then its > up > > to the implementation to do this whilst Ignite knows that any > > implementation of these interfaces will meet the contract necessary. > > > > > >> > >> 4. Transactions can't be supported in case of direct CacheStore access, > >> because even if the underlying database supports 2-phase commit, which > is > >> a > >> rare case, the recovery protocol looks hard. > >> Just looks like this feature doesn't worth it. > >> > > I'd completely agree with this. It will be incredibly hard to get this > > done reliably > > > >> > >> > >> > 6. This question wasn't mine but I was going to ask it as well: What > >> > will happen to the Indexing API since H2 is being removed? > >> As I wrote above, Indexing SPI will be dropped, but IndexQuery will be > >> added. > >> > >> > 1. As I mentioned above, we Index into Solr, in earlier versions of > >> > our product we used the indexing SPI to index into Lucene on the > >> Ignite > >> > nodes but this presented so many challenges we ultimately > abandoned > >> it and > >> > replaced it with the current Solr solution. > >> AFAIK, some guys developed and sell a plugin for Ignite-2 with > persistent > >> Lucene and Geo indices. > >> I don't know about the capabilities and limitations of their solution, > >> because of closed code. > >> You can easily google it. > >> > >> I saw few encouraged guys who want to improve TEXT queries, > >> but unfortunately, things weren't moved far enough. For now, they are in > >> the middle of fixing the merging TEXT query results. > >> So far so good. > >> > >> I think it is a good chance to master the skill developing of a > >> distributed > >> system for the one > >> who will take a lead over the full-text search feature and add native > >> FullText index support into Ignite-3. > >> > > I've seen the other thread from Atri I believe about this. > > > >> > >> > >> > 7. What impact does RAFT now have on conflict resolution? > >> RAFT is a state machine replication protocol. It guarantees all the > nodes > >> will see the updates in the same order. > >> So, seems no conflicts are possible. Recovery from split-brain is > >> impossible in common-case. > >> > >> However, I think we have a conflict resolver analog in Ignite-3 as it is > >> very useful in some cases > >> e.g datacenter replication, incremental data load from 3-rd party > source, > >> recovery from 3-rd party source. > >> > >> > >> > 8. CacheGroups. > >> AFAIK, CacheGroup will be eliminated, actually, we'll keep this > mechanic, > >> but it will be configured in a different way, > >> which makes Ignite configuring a bit simpler. > >> Sorry, for now, I have no answer on your performance concerns, this part > >> of > >> Ignite-3 slipped from my radar. > >> > > No worries. I'll wait and see if anyone else suggests something. Its > > getting a lot worse, a node took 1hr to start yesterday after a > deployment > > and its in prod with very little visibility into what it is doing, it was > > just stopped, no logging or anything and then resumed. > > > > 2021-07-22 13:40:15.997 INFO [ArcOS,,,] 9 --- [orker-#40%hypi%] > > o.a.i.i.p.cache.GridCacheProcessor [285] : Finished recovery for > > cache [cache=hypi_01F8ZC3DGT66RNYCDZH3XNVY2E_Hue, grp=hypi, > > startVer=AffinityTopologyVersion [topVer=79, minorTopVer=0]] > > > > One hour later it printed the next cache recovery message and started 30 > > seconds after going through other tables. > > > > > > > >> > >> Let's wait if someone will clarify what we could expect in Ignite-3. > >> Guys, can someone chime in and give more light on 3,4,7,8 questions? > >> > >> > >> On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson < > >> courtney.robin...@hypi.io> > >> wrote: > >> > >> > Hey everyone, > >> > I attended the Alpha 2 update yesterday and was quite pleased to see > the > >> > progress on things so far. So first, congratulations to everyone on > the > >> > work being put in and thank you to Val and Kseniya for running > >> yesterday's > >> > event. > >> > > >> > I asked a few questions after the webinar which Val had some answers > to > >> but > >> > suggested posting here as some of them are not things that have been > >> > thought about yet or no plans exist around it at this point. > >> > > >> > I'll put all of them here and if necessary we can break into different > >> > threads after. > >> > > >> > 1. Schema change - does that include the ability to change the > types > >> of > >> > fields/columns? > >> > 1. Val's answer was yes with some limitations but those are not > >> well > >> > defined yet. He did mention that something like some kind of > >> > transformer > >> > could be provided for doing the conversion and I would second > >> this, > >> > even > >> > for common types like int to long being able to do a custom > >> > conversion will > >> > be immensely valuable. > >> > 2. Will the new guaranteed consistency between APIs also mean SQL > >> will > >> > gain transaction support? > >> > 1. I believe the answer here was yes but perhaps someone else > may > >> > want to weigh in to confirm > >> > 3. Has there been any decision about how much of Calcite will be > >> exposed > >> > to the client? When using thick clients, it'll be hugely beneficial > >> to > >> > be > >> > able to work with Calcite APIs directly to provide custom rules and > >> > optimisations to better suit organisation needs > >> > 1. We currently use Calcite ourselves and have a lot of custom > rules > >> and > >> > optimisations and have slowly pushed more of our queries to > >> > Calcite that we > >> > then push down to Ignite. > >> > 2. We Index into Solr and use the Solr indices and others to > >> > fulfill over all queries with Ignite just being one of the > >> > possible storage > >> > targets Calcite pushes down to. If we could get to the calcite > >> > API from an > >> > Ignite thick client, it would enable us to remove a layer of > >> > abstraction > >> > and complexity and make Ignite our primary that we then link > >> > with Solr and > >> > others to fulfill queries. > >> > 4. Will the unified storage model enable different versions of > >> Ignite to > >> > be in the cluster when persistence is enabled so that rolling > >> restarts > >> > can > >> > be done? > >> > 1. We have to do a strange dance to perform Ignite upgrades without > >> > downtime because pods/nodes will fail to start on version > mismatch > >> > and if > >> > we get that dance wrong, we will corrupt a node's data. It will > >> make > >> > admin/upgrades far less brittle and error prone if this was > >> possible. > >> > 5. Will it be possible to provide a custom cache store still and > will > >> > these changes enable custom cache stores to be queryable from SQL? > >> > 1. Our Ignite usage is wide and complex because we use KV, SQL and > >> other > >> > APIs. The inconsistency of what can and can't be used from one > >> API to > >> > another is a real challenge and has forced us over time to stick > >> > to one API > >> > and write alternative solutions outside of Ignite. It will > >> > drastically > >> > simplify things if any CacheStore (or some new equivalent) could > >> > be plugged > >> > in and be made accessible to SQL (and in fact all other APIs) > >> without > >> > having to load all the data from the underlying CacheStore first > >> > into memory > >> > 6. This question wasn't mine but I was going to ask it as well: > What > >> > will happen to the Indexing API since H2 is being removed? > >> > 1. As I mentioned above, we Index into Solr, in earlier versions > >> of > >> > our product we used the indexing SPI to index into Lucene on the > >> > Ignite > >> > nodes but this presented so many challenges we ultimately > >> > abandoned it and > >> > replaced it with the current Solr solution. > >> > 2. Lucene indexing was ideal because it meant we didn't have to > >> > re-invent Solr or Elasticsearch's sharding capabilities, that > was > >> > almost > >> > automatic with Ignite only giving you the data that was meant > for > >> the > >> > current node. > >> > 3. The Lucene API enabled more flexibility and removed a network > >> > round trip from our queries. > >> > 4. Given Calcite's ability to support custom SQL functions, I'd > >> love > >> > to have the ability to define custom functions that Lucene was > >> > answering > >> > 7. What impact does RAFT now have on conflict resolution, off the > >> top of > >> > my head there are two cases > >> > 1. On startup after a split brain Ignite currently takes an > >> "exercise > >> > for the reader" approach and dumps a log along the lines of > >> > > >> > > 1. BaselineTopology of joining node is not compatible with > >> > > BaselineTopology in the cluster. > >> > > 1. Branching history of cluster BlT doesn't contain branching > point > >> > > hash of joining node BlT. Consider cleaning persistent storage > >> of > >> > the node > >> > > and adding it to the cluster again. > >> > > > >> > 1. This leaves you with no choice except to take one half and > >> manually > >> > copy, write data back over to the other half then destroy the > bad > >> > one. > >> > 2. The second case is conflicts on keys, I > >> > beleive CacheVersionConflictResolver and manager are used > >> > by GridCacheMapEntry which just says if use old value do this > >> > otherwise use > >> > newVal. Ideally this will be exposed in the new API so that one > >> can > >> > override this behaviour. The last writer wins approach isn't > >> always > >> > ideal > >> > and the semantics of the domain can mean that what is consider > >> > "correct" in > >> > a conflict is not so for a different domain. > >> > 8. This is last on the list but is actually the most important for > us > >> > right now as it is an impending and growing risk. We allow > customers > >> to > >> > create their own tables on demand. We're already using the same > cache > >> > group > >> > etc for data structures to be re-used but now that we're getting to > >> > thousands of tables/caches our startup times are sometimes > >> unpredictably > >> > long - at present it seems to depend on the state of the > cache/table > >> > before > >> > the restart but we're into the order of 5 - 7 mins and steadily > >> > increasing > >> > with the growth of tables. Are there any provisions in Ignite 3 for > >> > ensuring startup time isn't proportional to the number of > >> tables/caches > >> > available? > >> > > >> > > >> > Those are the key things I can think of at the moment. Val and others > >> I'd > >> > love to open a conversation around these. > >> > > >> > Regards, > >> > Courtney Robinson > >> > Founder and CEO, Hypi > >> > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> > >> > > >> > <https://hypi.io> > >> > https://hypi.io > >> > > >> > >> > >> -- > >> Best regards, > >> Andrey V. Mashenkov > >> > > >