Team,

We are seeking assistance from the community to identify the most suitable 
database backend (AWS Aurora or AWS Redshift) for SOLR. Currently, we are 
utilizing AWS Aurora (We will be performing regular inserts/updates on our 
source table) and have observed a decrease in performance when indexing data 
into SOLR.

Our team has determined that Redshift is the optimal backend database for SOLR. 
I am concerned that Redshift is a columnar database system specifically 
designed for analytical queries on large datasets.

Total records in Aurora table: 1.7 Billion
Total valid records in Aurora: 900 Million

I believe there is a query scanning performance issue on our side when we build 
the data into an index in SOLR. By eliminating outdated records from the 
backend table, we may be able to improve performance.

However, transitioning to AWS Redshift (a datawarehouse service) may not be the 
optimal solution for improved performance.

Can anyone assist me in locating the most appropriate backend for both 
SOLR/Transaction queries?

I appreciate your assistance.
Thank you in advance.


My findings:

Redshift is a columnar database system tailored to analytical queries on large 
datasets. While it is possible to use Redshift for transactional queries such 
as insert, update, and delete, it may not be the most suitable option for this 
type of workload.

Redshift is specifically designed for analytical workloads, taking advantage of 
features such as columnar storage, data compression, and massively parallel 
processing. This is in contrast to transactional queries, which usually involve 
many small, frequent updates to a limited number of rows.

For transactional workloads, a traditional row-based relational database such 
as PostgreSQL or MySQL may provide a beneficial solution. These databases are 
specifically designed to effectively manage small, frequent updates to a 
limited number of rows.

Reply via email to