Ted: The cascading approach is a bit strange IMO. You're right that any failure means the rest of the chain doesn't get updated.Plus the time for the last node in the chain to get updated is the sum of _all_ the polling intervals. If network saturation is a concern, I'd suggest a separate program that uses the replication handler API calls to monitor when each replication is finished then fire off the command for the next slave to replicate. All from the master.
About the soft commit stuff. It could certainly be used that way. The downside, though, is that if you restart Solr in the middle of the day, I'm pretty sure it'll see all the results of the hard commits. And, frankly, I'm not entirely sure what the interplay between old-style replication and hard commit with openSearcher=false so you'd have to test. Best, Erick On Thu, Apr 9, 2015 at 10:11 AM, Ted Cao <[email protected]> wrote: > Cool, thanks Erick. > I would also like your opinion (or someone else's opinion) on related > questions > > How things are set up to be replicated here have worked in a cascading way > (and OK for the most part). > Write-Only-Master <- solr 1 pull from master <- solr 2 pull from solr 1 <- > solr 3 pull from solr 2, etc etc some 20 instances (all for different > purposes and different types of traffic) down > > I find this cascading way of replication weird and think it might be better > that all slaves pull from the write-only-master > - possible disadvantage of 20 servers pulling from the same > write-only-master is the combined load on the master (network probably get > saturated) > - disadvantage of cascading replication as shown above is if anyone on the > chain fails then rest of the chain fails, and it blurs the distinction of > master/slave > > There is another weirder suggestion --> read-only master auto commit without > opensearcher=false, and then ONLY soft commit at midnight so all changes > only become visible and thus available for all the cascading replication > slaves at midnight (with this all slaves can continue to just pull every 10 > mins). I am guessing this might work but soft commit was really designed for > NRT and designed to be very frequent, not for daily visibility purpose. > What's your insight on something like that? > > > On Thu, Apr 9, 2015 at 11:35 AM, Erick Erickson <[email protected]> > wrote: >> >> You could use the replications API from a cron job. See: >> >> https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler >> >> Best, >> Erick >> >> On Thu, Apr 9, 2015 at 8:32 AM, Ted Cao <[email protected]> wrote: >> > Hi, this is regarding to replication for version 4.5 >> > >> > I need time based replication, need to control replication to be during >> > off >> > hours (midnight - 6am) so commits do not impact performance, we only >> > need >> > data refresh-ness to be daily. >> > >> > Currently it seems replication in 4.5 can only be done with pull >> > interval? >> > So if the write-only master is updated constantly then time based >> > replication doesn't seem possible? (We currently "solve" the problem by >> > committing write only master once a day at midnight but I would like >> > more >> > frequent commits to minimize data loss) >> > >> > Is there standard way of handling time based replication in 4.5??? >> > >> > (Back in 1.4 days when I worked for another company, we did file syncing >> > manually but that's not possible/feasible here, we have over 60 solr >> > boxes >> > that needs to be replicated to here) >> > >> > Thanks a lot for any info/insights >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
