Hi Jan, The key thing to remember about TRA's (or any Routed Alias) is that it only actively does two things: 1) Routes document updates to the correct collection by inspecting the routed field in the document 2) Detects when a new collection is required and creates it.
If you don't send it data *nothing* happens. The collections are not created until data requires them (with an async create possible when it sees an update that has a timestamp "near" the next interval, see docs for router.preemptiveCreateMath ) A) Dave's half of our talk at 2018 activate talks about it: https://youtu.be/RB1-7Y5NQeI?t=839 B) Time Routed Aliases are a means by which to automate creation of collections and route documents to the created collections. Sizing, and performance of the individual collections is not otherwise special, and you can interact with the collections individually after they are created, with the obvious caveats that you probably don't want to be doing things that get them out of sync schema wise unless your client programs know how to handle documents of both types etc. A less obvious consequence of the routing is that your data must not ever republish the same document with a different route key (date for TRA), since that can lead to duplicate id's across collections. The "normal" use case is event data, things that happened and are done, and are correctly recorded (or at least their time is correctly recorded) the first time C) Configure the higher number of replicas, remove old ones manually if not needed. At query time it's "just an alias". Managing collections based on recency could be automated here, before autoscaling was deprecated I was thinking that adding a couple of hooks into autoscaling such that it could react to collection creation by a TRA specifically would get us to a place much like Elastic's Hot/Warm architecture. I haven't kept track of what's being done to replace auto scaling however. I think Atri was interested in that at one point as well. D) TRA's create collections under the hood with a CREATE command just like you would manually (based on the config in the TRA). Anything in Solr that would influence that placement should apply. E) See D above, for fill rate, Utilizing new nodes over time should be as simple as adding new nodes and waiting for new collections to be created. One could also manually move replicas as with any other collection, (aside: be sure to refer to a current version of MOVEREPLICA docs, prior to something like 8.6 they were incomplete and even wrong in a few places). F) If you are talking about router.autoDeleteAge here, old collection removal is a regular DELETE (just automatically issued), Not sure what you mean by rotation interval. G) They are just collections with special names that can be parsed during update to select a destination for the incoming document. H) They are just collections, and there's nothing to prevent you from upgrading the schema, and new collections will begin using that, individual collections would need to be reloaded, non-safe schema changes (in the usual sense) require a re-index as usual. In a cloud environment where you can temporarily add machines or disk this is not so bad aside from the time to re-index of course. If you are on-prem then plan to have a significant level of spare disk to handle this case without running yourself into the danger zone for segment merging. H.2) TRA is just an alias with fancy collection creation (and naming). Once they collections exist, it's just an alias. All the action (at this point) happens at update. So long as the collection is listed in the TRA in zookeeper in aliases.json ***in the correct, (chronological, desc) order*** and the naming of the collection can be parsed by the TRA code you should be fine. Incoming updates iterate down the list of collections during an update, and stop at the first one where the collection name matches the date in the routing field for the document for a normal TRA the vast majority of updates hit one of the most recent two or three collections. Frequent updates to old data in a TRA with very many time slices (sub collections) might suffer some since this is a simple linear iteration, optimizing that was deferred until it seemed important to someone's less normal use case :). Otherwise it's just an alias of collections with funky looking names (unless someone added something when I wasn't looking ;) ). -Gus On Fri, Aug 6, 2021 at 4:13 AM Jan Høydahl <jan....@cominvent.com> wrote: > Hi, > > I have never used TRA, but a client of mine is considering it. A few > questions. > > A) Do you have links to talks (slides/video) on the feature? Or blog posts > going into more detail than the RefGuide? > B) For ingestion performance, sharding may make sense. But only for the > current collection. Have anyone tried merging "static" shards? > C) Is there a trick to have more relicas on recent collections than old > ones? > D) Is there a way to manage what nodes that get selected for new > collections, or you need to rely on replica placement policies? > E) How do you guys ensure you get a good fill-rate on the nodes, and what > procedure do you use when adding more nodes in the cluster? > * I.e. do you simply add a few new nodes and let Solr automatically > place new collections onto those? > F) How many sub-collections/cores do you plan for on a single node? > * You could try to configure the "rotation interval" such that a node > gets filled by a single core, but that seems hard to predict > * Having a too rapid "rotation interval" will leave behind too many > cores per node, causing inefficiencies? > * Have you found a strategy to balance this? I'd likely try to plan > for 10 cores per node, and monitor fill-rate such that I (manually) add > more HW once a threshold is reached. > G) Have anyone tried backup of a TRA? Does it even work, or do you need to > run the command for each single collection? > H) A typical requirement is to migrate all data from one cluster to a new > cluster on a newer version or with a new schema. Have you tried doing that > with a TRA? > * Would you need to migrate each sub collection at a time? > * Will TRA on the new cluster accept that someone "external" adds > collections, and how it is initialized/bootstrapped to fill the internal > collection registry? > > That's what I could think of before trying the feature. I'm sure there > would be other questions after some trial and error :) > > Jan -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)