Something else came to mind, are there plans to support prepared queries? I recall someone saying before that Ignite does internally cache queries but it's not at all clear if or how it does do that. I assume a simple hash of the query isn't enough.
We generate SQL queries based on user runtime settings and they can get to hundreds of lines long, I imagine this means most of our queries are not being cached but there are patterns so we could generate and manage prepared queries ourselves. Also, will there be a dedicated API for doing SQL queries rather than having to pass a SqlFieldsQuery to a cache that has nothing to do with the cache being queried? When I first started with Ignite years ago, this was beyond confusing for me. I'm trying to run select x from B but I pass this to a cache called DUMMY or whatever arbitrary name... On Fri, Jul 23, 2021 at 4:05 PM Courtney Robinson <courtney.robin...@hypi.io> wrote: > Andrey, > Thanks for the response - see my comments inline. > > >> I've gone through the questions and have no the whole picture of your use >> case. > > Would you please clarify how you exactly use the Ignite? what are the >> integration points? >> > > I'll try to clarify - we have a low/no code platform. A user designs a > model for their application and we map this model to Ignite tables and > other data sources. The model I'll describe is what we're building now and > expected to be in alpha some time in Q4 21. Our current production > architecture is different and isn't as generic, it is heavily tied to > Ignite and we've redesigned to get some flexibility where Ignite doesn't > provide what we want. Things like window functions and other SQL-99 limits. > > In the next gen version we're working on you can create a model for a > Tweet(content, to) and we will create an Ignite table with content and to > columns using the type the user selects. This is the simplest case. > We are adding generic support for sources and sinks and using Calcite as a > data virtualisation layer. Ignite is one of the available source/sinks. > > When a user creates a model for Tweet, we also allow them to specify how > they want to index the data. We have a copy of the calcite Elasticsearch > adapter modified for Solr. > > When a source is queried (Ignite or any other that we support), we > generate SQL that Calcite executes. Calcite will push down the generated > queries to Solr and Solr produces a list of IDs (in case of Ignite) and we > do a multi-get from Ignite to produce the actual results. > > Obviously there's a lot more to this but that should give you a general > idea. > > and maybe share some experience with using Ignite SPIs? >> > Our evolution with Ignite started from the key value + compute APIs. We > used the SPIs then but have since moved to using only the Ignite SQL API > (we gave up transactions for this). > > We originally used the indexing SPI to keep our own lucene index of data > in a cache. We did not use the Ignite FTS as it is very limited compared to > what we allow customers to do. If I remember correctly, we were using an > affinity compute job to send queries to the right Ignite node and > then doing a multi-get to pull the data from caches. > I think we used one or two other SPIs and we found them very useful to be > able to extend and customise Ignite without having to fork/change upstream > classes. We only stopped using them because we eventually concluded that > using the SQL only API was better for numerous reasons. > > >> We'll keep the information in mind while developing the Ignite, >> because this may help us to make a better product. >> >> By the way, I'll try to answer the questions. >> >> > 1. Schema change - does that include the ability to change the types >> of >> > fields/columns? >> Yes, we plan to support transparent conversion to a wider type on-fly >> (e.g. >> 'int' to 'long'). >> This is a major point of our Live-schema concept. >> In fact, there is no need to convert data on all the nodes in a >> synchronous >> way as old SQL databases do (if one supports though), >> we are going to support multiple schema versions and convert data >> on-demand >> on a per-row basis to the latest version, >> then write-back the row. >> > > I can understand. The auto conversion to wider type makes sense. > >> >> More complex things like 'String' -> 'int' are out of scope for now >> because >> it requires the execution of a user code on the critical path. >> > > I would argue though that executing user code on the critical path > shouldn't be a blocker for custom conversions. I feel if a user is making > an advance enough integration to provide custom conversions they would be > aware that it impacts the system as a whole. > > The limitation here is column MUST NOT be indexed, because an index over >> the data of different kinds is impossible. >> > Understood - I'd make the case that indexing should be pluggable. I would > love for us to be able to take indexing away from Ignite in our impl. - I > think in Calcite, the Postgres adapter does this by having a table whose > type is "Index". The implementor would be left with the freedom to choose > how that table answers index lookups. From Ignite's perspective it wouldn't > care so long as the interface's contract is met, I could use an index that > does a lucene, ES, Solr or Redis lookup and the end result would be the > same but as the implementor I'm choosing the tradeoff I want to meet the > organisation's goals. > > >> >> > 2. Will the new guaranteed consistency between APIs also mean SQL will >> > gain transaction support? >> Yes, we plan to have Transactional SQL. >> DDL will be non-transactional though, and I wonder if the one supports >> this. >> > I'm not sure I know of any thing that supports transactional DDL so don't > think this is an issue but I would say that a DDL statement in a > transaction shouldn't fail the transaction. I believe in Ignite 2 there is > a flag to turn this on or off, we should definitely keep this. In our > case, it's an issue with the nature of the platform we provide, at > development time only about 10% of schema or other DB info is known - we > generate the other 90% on the fly based on whatever customers decide to > design from our UI. > >> >> Ignite 3 will operate with Rows underneath, but classic Table API and >> Key-value will be available to a user >> at the same time and with all consistency guarantees. > > Excellent! > >> >> > >> > 3. Has there been any decision about how much of Calcite will be >> exposed >> > to the client? When using thick clients, it'll be hugely beneficial to >> be >> > able to work with Calcite APIs directly to provide custom rules and >> > optimizations to better suit organization needs >> As of now, we have no plans to expose any Calcite API to a user. >> AFAIK, we have our custom Calcite convention, custom rules that are aware >> of distributed environment, >> and additional AST nodes. The rules MUST correctly propagate internal >> information about data distribution, >> so I'm not sure want to give low-level access to them. >> > > Maybe we're an edge case but for us access to the Calcite APIs would be > shift our current development somewhat. For us, we're treating Ignite as a > library that provides a good foundation and we extend and customise it. > Again, we may be an edge case and maybe most people just want a database to > put data into and get it back out without controlling some of how it does > those things. > > >> > We Index into Solr and use the Solr indices >> Ignite 1-2 has poor support for TEXT queries, which is totally >> unconfigurable. >> Also, Lucene indices underneath are NOT persistent that requires too much >> effort to fix it. >> GeoSpatial index has the same issues, we decided to drop them along with >> Indexing SPI at all. >> >> However, you can find the activity on dev-list on the Index Query topic. >> Guys are going to add IndexQuery (a scan query over the sorted index which >> can use simple conditions) in Ignite 2. >> We also plan to have the same functionality, maybe it is possible to add >> full-text search support here. >> Will it work for you, what do you think? >> > Yes, we originally looked at text queries and almost immediately said no. > Nothing about it was useful for us other than the lucene dependency in > Java. In the end that also became an issue because we wanted a newer lucene > version. > IndexQuery will be useful - we'll certainly use it but it's not enough. I > think we customise and depend on Solr too much for IndexQuery to compare > but it will help in some cases for simpler queries. > >> >> >> > 4. Will the unified storage model enable different versions of Ignite >> to >> > be in the cluster when persistence is enabled so that rolling restarts >> can >> > be done? >> I'm not sure a rolling upgrade (RU) will be available because too much >> compatibility issues should be resolved >> to make RU possible under the load without downtime. >> >> Maybe it makes sense to provide some grid mode (maintenance mode) for RU >> purposes that will block all the user load >> but allow upgrade the grid. E.g. for the pure in-memory case. >> >> Persistence compatibility should be preserved as it works for Ignite 2. >> > My ideal situation would be that we start a newer Ignite version, it comes > online, joins the cluster and is treated as some kind of maintenance mode > as you suggested. In maintenance mode, the other nodes re-balance or some > other process to send all the data this new node will handle over to it. > The existing nodes continue serving this data until the new node is no > longer in maintenance mode and then it becomes the primary for the data > that was rebalanced to it. > > The second case is if an existing node is restarted with a newer Ignite > version. No re-balance is needed, it joins in maintenance mode, runs any > upgrade/conversion or other task it needs to and then starts accepting > reads and writes. Communication with lower version nodes can be limited, > they are aware of it and sends it data and queries for which it is the > primary assuming they will also be upgraded. > > I guess I'm not aware of the compatibility issues this presents and so my > view is narrow and perhaps naive here. > >> >> >> > 5. Will it be possible to provide a custom cache store still and will >> > these changes enable custom cache stores to be queryable from SQL? >> I'm not sure I fully understand this. >> 1. Usually, SQL is about indices. Ignite can't perform a query over the >> unindexed data. >> > Yes understood > >> >> 2. Fullscan over the cache that contains only part of data + scan the >> CacheStore, then merging the results is a pain. >> Most likely, running a query over CacheStore directly will be a simpler >> way, and even more performant. >> Shared CacheStore (same for all nodes) will definitely kill the >> performance >> in that case. >> So, the preliminary loadCache() call looks like a good compromise. >> > I think the problem is largely that the CacheStore interface is not > sufficient for being able to do this. If it had a richer interface which > allowed the cache store to answer index queries basically hooking into > whatever Ignite's doing for its B+tree then this would be viable. A > CacheStore that only implements KV API doesn't take part in SQL queries. > >> >> 3. Splitting query into 2 parts to run on Ignite and to run on CacheStore >> looks possible with Calcite, >> but I think it impractical because in general, neither CacheStore nor >> database structure are aware of the data partitioning. >> > hmmm, maybe I missed the point but as the implementor of the CacheStore > you should have knowledge of the structure and partition info. or have some > way of retrieving it. Again, I think the current CacheStore interface is > the problem and if it was extended to provide this information then its up > to the implementation to do this whilst Ignite knows that any > implementation of these interfaces will meet the contract necessary. > > >> >> 4. Transactions can't be supported in case of direct CacheStore access, >> because even if the underlying database supports 2-phase commit, which is >> a >> rare case, the recovery protocol looks hard. >> Just looks like this feature doesn't worth it. >> > I'd completely agree with this. It will be incredibly hard to get this > done reliably > >> >> >> > 6. This question wasn't mine but I was going to ask it as well: What >> > will happen to the Indexing API since H2 is being removed? >> As I wrote above, Indexing SPI will be dropped, but IndexQuery will be >> added. >> >> > 1. As I mentioned above, we Index into Solr, in earlier versions of >> > our product we used the indexing SPI to index into Lucene on the >> Ignite >> > nodes but this presented so many challenges we ultimately abandoned >> it and >> > replaced it with the current Solr solution. >> AFAIK, some guys developed and sell a plugin for Ignite-2 with persistent >> Lucene and Geo indices. >> I don't know about the capabilities and limitations of their solution, >> because of closed code. >> You can easily google it. >> >> I saw few encouraged guys who want to improve TEXT queries, >> but unfortunately, things weren't moved far enough. For now, they are in >> the middle of fixing the merging TEXT query results. >> So far so good. >> >> I think it is a good chance to master the skill developing of a >> distributed >> system for the one >> who will take a lead over the full-text search feature and add native >> FullText index support into Ignite-3. >> > I've seen the other thread from Atri I believe about this. > >> >> >> > 7. What impact does RAFT now have on conflict resolution? >> RAFT is a state machine replication protocol. It guarantees all the nodes >> will see the updates in the same order. >> So, seems no conflicts are possible. Recovery from split-brain is >> impossible in common-case. >> >> However, I think we have a conflict resolver analog in Ignite-3 as it is >> very useful in some cases >> e.g datacenter replication, incremental data load from 3-rd party source, >> recovery from 3-rd party source. >> >> >> > 8. CacheGroups. >> AFAIK, CacheGroup will be eliminated, actually, we'll keep this mechanic, >> but it will be configured in a different way, >> which makes Ignite configuring a bit simpler. >> Sorry, for now, I have no answer on your performance concerns, this part >> of >> Ignite-3 slipped from my radar. >> > No worries. I'll wait and see if anyone else suggests something. Its > getting a lot worse, a node took 1hr to start yesterday after a deployment > and its in prod with very little visibility into what it is doing, it was > just stopped, no logging or anything and then resumed. > > 2021-07-22 13:40:15.997 INFO [ArcOS,,,] 9 --- [orker-#40%hypi%] > o.a.i.i.p.cache.GridCacheProcessor [285] : Finished recovery for > cache [cache=hypi_01F8ZC3DGT66RNYCDZH3XNVY2E_Hue, grp=hypi, > startVer=AffinityTopologyVersion [topVer=79, minorTopVer=0]] > > One hour later it printed the next cache recovery message and started 30 > seconds after going through other tables. > > > >> >> Let's wait if someone will clarify what we could expect in Ignite-3. >> Guys, can someone chime in and give more light on 3,4,7,8 questions? >> >> >> On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson < >> courtney.robin...@hypi.io> >> wrote: >> >> > Hey everyone, >> > I attended the Alpha 2 update yesterday and was quite pleased to see the >> > progress on things so far. So first, congratulations to everyone on the >> > work being put in and thank you to Val and Kseniya for running >> yesterday's >> > event. >> > >> > I asked a few questions after the webinar which Val had some answers to >> but >> > suggested posting here as some of them are not things that have been >> > thought about yet or no plans exist around it at this point. >> > >> > I'll put all of them here and if necessary we can break into different >> > threads after. >> > >> > 1. Schema change - does that include the ability to change the types >> of >> > fields/columns? >> > 1. Val's answer was yes with some limitations but those are not >> well >> > defined yet. He did mention that something like some kind of >> > transformer >> > could be provided for doing the conversion and I would second >> this, >> > even >> > for common types like int to long being able to do a custom >> > conversion will >> > be immensely valuable. >> > 2. Will the new guaranteed consistency between APIs also mean SQL >> will >> > gain transaction support? >> > 1. I believe the answer here was yes but perhaps someone else may >> > want to weigh in to confirm >> > 3. Has there been any decision about how much of Calcite will be >> exposed >> > to the client? When using thick clients, it'll be hugely beneficial >> to >> > be >> > able to work with Calcite APIs directly to provide custom rules and >> > optimisations to better suit organisation needs >> > 1. We currently use Calcite ourselves and have a lot of custom rules >> and >> > optimisations and have slowly pushed more of our queries to >> > Calcite that we >> > then push down to Ignite. >> > 2. We Index into Solr and use the Solr indices and others to >> > fulfill over all queries with Ignite just being one of the >> > possible storage >> > targets Calcite pushes down to. If we could get to the calcite >> > API from an >> > Ignite thick client, it would enable us to remove a layer of >> > abstraction >> > and complexity and make Ignite our primary that we then link >> > with Solr and >> > others to fulfill queries. >> > 4. Will the unified storage model enable different versions of >> Ignite to >> > be in the cluster when persistence is enabled so that rolling >> restarts >> > can >> > be done? >> > 1. We have to do a strange dance to perform Ignite upgrades without >> > downtime because pods/nodes will fail to start on version mismatch >> > and if >> > we get that dance wrong, we will corrupt a node's data. It will >> make >> > admin/upgrades far less brittle and error prone if this was >> possible. >> > 5. Will it be possible to provide a custom cache store still and will >> > these changes enable custom cache stores to be queryable from SQL? >> > 1. Our Ignite usage is wide and complex because we use KV, SQL and >> other >> > APIs. The inconsistency of what can and can't be used from one >> API to >> > another is a real challenge and has forced us over time to stick >> > to one API >> > and write alternative solutions outside of Ignite. It will >> > drastically >> > simplify things if any CacheStore (or some new equivalent) could >> > be plugged >> > in and be made accessible to SQL (and in fact all other APIs) >> without >> > having to load all the data from the underlying CacheStore first >> > into memory >> > 6. This question wasn't mine but I was going to ask it as well: What >> > will happen to the Indexing API since H2 is being removed? >> > 1. As I mentioned above, we Index into Solr, in earlier versions >> of >> > our product we used the indexing SPI to index into Lucene on the >> > Ignite >> > nodes but this presented so many challenges we ultimately >> > abandoned it and >> > replaced it with the current Solr solution. >> > 2. Lucene indexing was ideal because it meant we didn't have to >> > re-invent Solr or Elasticsearch's sharding capabilities, that was >> > almost >> > automatic with Ignite only giving you the data that was meant for >> the >> > current node. >> > 3. The Lucene API enabled more flexibility and removed a network >> > round trip from our queries. >> > 4. Given Calcite's ability to support custom SQL functions, I'd >> love >> > to have the ability to define custom functions that Lucene was >> > answering >> > 7. What impact does RAFT now have on conflict resolution, off the >> top of >> > my head there are two cases >> > 1. On startup after a split brain Ignite currently takes an >> "exercise >> > for the reader" approach and dumps a log along the lines of >> > >> > > 1. BaselineTopology of joining node is not compatible with >> > > BaselineTopology in the cluster. >> > > 1. Branching history of cluster BlT doesn't contain branching point >> > > hash of joining node BlT. Consider cleaning persistent storage >> of >> > the node >> > > and adding it to the cluster again. >> > > >> > 1. This leaves you with no choice except to take one half and >> manually >> > copy, write data back over to the other half then destroy the bad >> > one. >> > 2. The second case is conflicts on keys, I >> > beleive CacheVersionConflictResolver and manager are used >> > by GridCacheMapEntry which just says if use old value do this >> > otherwise use >> > newVal. Ideally this will be exposed in the new API so that one >> can >> > override this behaviour. The last writer wins approach isn't >> always >> > ideal >> > and the semantics of the domain can mean that what is consider >> > "correct" in >> > a conflict is not so for a different domain. >> > 8. This is last on the list but is actually the most important for us >> > right now as it is an impending and growing risk. We allow customers >> to >> > create their own tables on demand. We're already using the same cache >> > group >> > etc for data structures to be re-used but now that we're getting to >> > thousands of tables/caches our startup times are sometimes >> unpredictably >> > long - at present it seems to depend on the state of the cache/table >> > before >> > the restart but we're into the order of 5 - 7 mins and steadily >> > increasing >> > with the growth of tables. Are there any provisions in Ignite 3 for >> > ensuring startup time isn't proportional to the number of >> tables/caches >> > available? >> > >> > >> > Those are the key things I can think of at the moment. Val and others >> I'd >> > love to open a conversation around these. >> > >> > Regards, >> > Courtney Robinson >> > Founder and CEO, Hypi >> > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> >> > >> > <https://hypi.io> >> > https://hypi.io >> > >> >> >> -- >> Best regards, >> Andrey V. Mashenkov >> >