Hi Eugene,

As far as I know, there's no alternative to CDCR within Solr. And yes, it
would be dangerous for the cluster to span multiple DCs. Not so much Solr
itself (you could use shard preference to get queries to hit only one DC),
but for Zookeeper. All Solr(Cloud) nodes will have to keep an open
connection to Zookeeper, which is more likely to fail (i.e. cause
instability) when using multiple datacenters. Plus, you'd have a
split-brain problem if the connection between two datacenters goes down (by
default, it's solved with the Zookeeper quorum: if the DC with more ZK
nodes goes down, the other DC won't work unless you change the quorum).

With Kafka, the common design is:
- you write documents to a topic
- you have N consumers to this topic, one for each DC. They index to their
respective Solr clusters independently
- if a DC becomes unavailable, its consumer would retry and data will still
stay in Kafka (assuming you have enough disk)

But then you may have an issue if the DC hosting Kafka goes down. I don't
know much about this, way back people would use Mirror Maker, but I think
that has been replaced with something else recently.

Best regards,
Radu
--
Elasticsearch/OpenSearch & Solr Consulting, Production Support & Training
Sematext Cloud - Full Stack Observability
http://sematext.com/


On Fri, Jul 1, 2022 at 1:43 PM Eug ene <neceug...@gmail.com> wrote:

> Howdy!
>
> Is there an alternative to CDCR that doesn't require changes to
> application code?
>
> I'd like to set up replication between regions (currently running
> 8.x). It will be near real time since the data change is megabytes per
> day, and apparently CDCR was deprecated in 8.6 (and dropped in v9) and
> is generally untrustworthy.
>
> Would it be dangerous to just create a cluster that spans DCs with
> multiple replicas in each DC?
>
> I've seen people mentioning using Kafka to assist with this process,
> but I can't find any information or examples in the wild for this.
>
> Thanks in advance for any advice!
>
> -Eugene
>

Reply via email to