Courtney, I have a short update. > 3. The Lucene API enabled more flexibility and removed a network > round trip from our queries. > 4. Given Calcite's ability to support custom SQL functions, I'd love > to have the ability to define custom functions that Lucene was answering As we won't support IndexingSPI extension point anymore.
Seems ContinuousQuery might be a good choice to keep the local Lucene index in sync with the data. SQL functions will be supported, if it will be possible to do smth like this "Select * From T1 Where T1.key in (TextQueryFunction("text_query_string"))", where "TextQueryFunction" is a custom SQL function that might be able to access some static method or local Ignite service or whatever, then will this work for you? On Fri, Jul 23, 2021 at 1:10 AM Andrey Mashenkov <andrey.mashen...@gmail.com> wrote: > Hi Courtney, > > Thanks for your feedback. > > I've gone through the questions and have no the whole picture of your use > case. > Would you please clarify how you exactly use the Ignite? what are the > integration points? > and maybe share some experience with using Ignite SPIs? > > We'll keep the information in mind while developing the Ignite, > because this may help us to make a better product. > > By the way, I'll try to answer the questions. > > > 1. Schema change - does that include the ability to change the types of > > fields/columns? > Yes, we plan to support transparent conversion to a wider type on-fly > (e.g. 'int' to 'long'). > This is a major point of our Live-schema concept. > In fact, there is no need to convert data on all the nodes in a > synchronous way as old SQL databases do (if one supports though), > we are going to support multiple schema versions and convert data > on-demand on a per-row basis to the latest version, > then write-back the row. > > More complex things like 'String' -> 'int' are out of scope for now > because it requires the execution of a user code on the critical path. > The limitation here is column MUST NOT be indexed, because an index over > the data of different kinds is impossible. > > > > 2. Will the new guaranteed consistency between APIs also mean SQL will > > gain transaction support? > Yes, we plan to have Transactional SQL. > DDL will be non-transactional though, and I wonder if the one supports > this. > > Ignite 3 will operate with Rows underneath, but classic Table API and > Key-value will be available to a user > at the same time and with all consistency guarantees. > > > > 3. Has there been any decision about how much of Calcite will be exposed > > to the client? When using thick clients, it'll be hugely beneficial to > be > > able to work with Calcite APIs directly to provide custom rules and > > optimizations to better suit organization needs > As of now, we have no plans to expose any Calcite API to a user. > AFAIK, we have our custom Calcite convention, custom rules that are aware > of distributed environment, > and additional AST nodes. The rules MUST correctly propagate internal > information about data distribution, > so I'm not sure want to give low-level access to them. > > > We Index into Solr and use the Solr indices > Ignite 1-2 has poor support for TEXT queries, which is totally > unconfigurable. > Also, Lucene indices underneath are NOT persistent that requires too much > effort to fix it. > GeoSpatial index has the same issues, we decided to drop them along with > Indexing SPI at all. > > However, you can find the activity on dev-list on the Index Query topic. > Guys are going to add IndexQuery (a scan query over the sorted index which > can use simple conditions) in Ignite 2. > We also plan to have the same functionality, maybe it is possible to add > full-text search support here. > Will it work for you, what do you think? > > > > 4. Will the unified storage model enable different versions of Ignite > to > > be in the cluster when persistence is enabled so that rolling restarts > can > > be done? > I'm not sure a rolling upgrade (RU) will be available because too much > compatibility issues should be resolved > to make RU possible under the load without downtime. > > Maybe it makes sense to provide some grid mode (maintenance mode) for RU > purposes that will block all the user load > but allow upgrade the grid. E.g. for the pure in-memory case. > > Persistence compatibility should be preserved as it works for Ignite 2. > > > > 5. Will it be possible to provide a custom cache store still and will > > these changes enable custom cache stores to be queryable from SQL? > I'm not sure I fully understand this. > 1. Usually, SQL is about indices. Ignite can't perform a query over the > unindexed data. > > 2. Fullscan over the cache that contains only part of data + scan the > CacheStore, then merging the results is a pain. > Most likely, running a query over CacheStore directly will be a simpler > way, and even more performant. > Shared CacheStore (same for all nodes) will definitely kill the > performance in that case. > So, the preliminary loadCache() call looks like a good compromise. > > 3. Splitting query into 2 parts to run on Ignite and to run on CacheStore > looks possible with Calcite, > but I think it impractical because in general, neither CacheStore nor > database structure are aware of the data partitioning. > > 4. Transactions can't be supported in case of direct CacheStore access, > because even if the underlying database supports 2-phase commit, which is > a rare case, the recovery protocol looks hard. > Just looks like this feature doesn't worth it. > > > > 6. This question wasn't mine but I was going to ask it as well: What > > will happen to the Indexing API since H2 is being removed? > As I wrote above, Indexing SPI will be dropped, but IndexQuery will be > added. > > > 1. As I mentioned above, we Index into Solr, in earlier versions of > > our product we used the indexing SPI to index into Lucene on the > Ignite > > nodes but this presented so many challenges we ultimately abandoned > it and > > replaced it with the current Solr solution. > AFAIK, some guys developed and sell a plugin for Ignite-2 with persistent > Lucene and Geo indices. > I don't know about the capabilities and limitations of their solution, > because of closed code. > You can easily google it. > > I saw few encouraged guys who want to improve TEXT queries, > but unfortunately, things weren't moved far enough. For now, they are in > the middle of fixing the merging TEXT query results. > So far so good. > > I think it is a good chance to master the skill developing of a > distributed system for the one > who will take a lead over the full-text search feature and add native > FullText index support into Ignite-3. > > > > 7. What impact does RAFT now have on conflict resolution? > RAFT is a state machine replication protocol. It guarantees all the nodes > will see the updates in the same order. > So, seems no conflicts are possible. Recovery from split-brain is > impossible in common-case. > > However, I think we have a conflict resolver analog in Ignite-3 as it is > very useful in some cases > e.g datacenter replication, incremental data load from 3-rd party source, > recovery from 3-rd party source. > > > > 8. CacheGroups. > AFAIK, CacheGroup will be eliminated, actually, we'll keep this mechanic, > but it will be configured in a different way, > which makes Ignite configuring a bit simpler. > Sorry, for now, I have no answer on your performance concerns, this part > of Ignite-3 slipped from my radar. > > > Let's wait if someone will clarify what we could expect in Ignite-3. > Guys, can someone chime in and give more light on 3,4,7,8 questions? > > > On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson < > courtney.robin...@hypi.io> wrote: > >> Hey everyone, >> I attended the Alpha 2 update yesterday and was quite pleased to see the >> progress on things so far. So first, congratulations to everyone on the >> work being put in and thank you to Val and Kseniya for running yesterday's >> event. >> >> I asked a few questions after the webinar which Val had some answers to >> but >> suggested posting here as some of them are not things that have been >> thought about yet or no plans exist around it at this point. >> >> I'll put all of them here and if necessary we can break into different >> threads after. >> >> 1. Schema change - does that include the ability to change the types of >> fields/columns? >> 1. Val's answer was yes with some limitations but those are not well >> defined yet. He did mention that something like some kind of >> transformer >> could be provided for doing the conversion and I would second this, >> even >> for common types like int to long being able to do a custom >> conversion will >> be immensely valuable. >> 2. Will the new guaranteed consistency between APIs also mean SQL will >> gain transaction support? >> 1. I believe the answer here was yes but perhaps someone else may >> want to weigh in to confirm >> 3. Has there been any decision about how much of Calcite will be >> exposed >> to the client? When using thick clients, it'll be hugely beneficial to >> be >> able to work with Calcite APIs directly to provide custom rules and >> optimisations to better suit organisation needs >> 1. We currently use Calcite ourselves and have a lot of custom rules >> and >> optimisations and have slowly pushed more of our queries to >> Calcite that we >> then push down to Ignite. >> 2. We Index into Solr and use the Solr indices and others to >> fulfill over all queries with Ignite just being one of the >> possible storage >> targets Calcite pushes down to. If we could get to the calcite >> API from an >> Ignite thick client, it would enable us to remove a layer of >> abstraction >> and complexity and make Ignite our primary that we then link >> with Solr and >> others to fulfill queries. >> 4. Will the unified storage model enable different versions of Ignite >> to >> be in the cluster when persistence is enabled so that rolling restarts >> can >> be done? >> 1. We have to do a strange dance to perform Ignite upgrades without >> downtime because pods/nodes will fail to start on version mismatch >> and if >> we get that dance wrong, we will corrupt a node's data. It will make >> admin/upgrades far less brittle and error prone if this was >> possible. >> 5. Will it be possible to provide a custom cache store still and will >> these changes enable custom cache stores to be queryable from SQL? >> 1. Our Ignite usage is wide and complex because we use KV, SQL and >> other >> APIs. The inconsistency of what can and can't be used from one API >> to >> another is a real challenge and has forced us over time to stick >> to one API >> and write alternative solutions outside of Ignite. It will >> drastically >> simplify things if any CacheStore (or some new equivalent) could >> be plugged >> in and be made accessible to SQL (and in fact all other APIs) >> without >> having to load all the data from the underlying CacheStore first >> into memory >> 6. This question wasn't mine but I was going to ask it as well: What >> will happen to the Indexing API since H2 is being removed? >> 1. As I mentioned above, we Index into Solr, in earlier versions of >> our product we used the indexing SPI to index into Lucene on the >> Ignite >> nodes but this presented so many challenges we ultimately >> abandoned it and >> replaced it with the current Solr solution. >> 2. Lucene indexing was ideal because it meant we didn't have to >> re-invent Solr or Elasticsearch's sharding capabilities, that was >> almost >> automatic with Ignite only giving you the data that was meant for >> the >> current node. >> 3. The Lucene API enabled more flexibility and removed a network >> round trip from our queries. >> 4. Given Calcite's ability to support custom SQL functions, I'd love >> to have the ability to define custom functions that Lucene was >> answering >> 7. What impact does RAFT now have on conflict resolution, off the top >> of >> my head there are two cases >> 1. On startup after a split brain Ignite currently takes an >> "exercise >> for the reader" approach and dumps a log along the lines of >> >> > 1. BaselineTopology of joining node is not compatible with >> > BaselineTopology in the cluster. >> > 1. Branching history of cluster BlT doesn't contain branching point >> > hash of joining node BlT. Consider cleaning persistent storage of >> the node >> > and adding it to the cluster again. >> > >> 1. This leaves you with no choice except to take one half and manually >> copy, write data back over to the other half then destroy the bad >> one. >> 2. The second case is conflicts on keys, I >> beleive CacheVersionConflictResolver and manager are used >> by GridCacheMapEntry which just says if use old value do this >> otherwise use >> newVal. Ideally this will be exposed in the new API so that one can >> override this behaviour. The last writer wins approach isn't always >> ideal >> and the semantics of the domain can mean that what is consider >> "correct" in >> a conflict is not so for a different domain. >> 8. This is last on the list but is actually the most important for us >> right now as it is an impending and growing risk. We allow customers to >> create their own tables on demand. We're already using the same cache >> group >> etc for data structures to be re-used but now that we're getting to >> thousands of tables/caches our startup times are sometimes >> unpredictably >> long - at present it seems to depend on the state of the cache/table >> before >> the restart but we're into the order of 5 - 7 mins and steadily >> increasing >> with the growth of tables. Are there any provisions in Ignite 3 for >> ensuring startup time isn't proportional to the number of tables/caches >> available? >> >> >> Those are the key things I can think of at the moment. Val and others I'd >> love to open a conversation around these. >> >> Regards, >> Courtney Robinson >> Founder and CEO, Hypi >> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io> >> >> <https://hypi.io> >> https://hypi.io >> > > > -- > Best regards, > Andrey V. Mashenkov > -- Best regards, Andrey V. Mashenkov