Team, We are seeking assistance from the community to identify the most suitable database backend (AWS Aurora or AWS Redshift) for SOLR. Currently, we are utilizing AWS Aurora (We will be performing regular inserts/updates on our source table) and have observed a decrease in performance when indexing data into SOLR.
Our team has determined that Redshift is the optimal backend database for SOLR. I am concerned that Redshift is a columnar database system specifically designed for analytical queries on large datasets. Total records in Aurora table: 1.7 Billion Total valid records in Aurora: 900 Million I believe there is a query scanning performance issue on our side when we build the data into an index in SOLR. By eliminating outdated records from the backend table, we may be able to improve performance. However, transitioning to AWS Redshift (a datawarehouse service) may not be the optimal solution for improved performance. Can anyone assist me in locating the most appropriate backend for both SOLR/Transaction queries? I appreciate your assistance. Thank you in advance. My findings: Redshift is a columnar database system tailored to analytical queries on large datasets. While it is possible to use Redshift for transactional queries such as insert, update, and delete, it may not be the most suitable option for this type of workload. Redshift is specifically designed for analytical workloads, taking advantage of features such as columnar storage, data compression, and massively parallel processing. This is in contrast to transactional queries, which usually involve many small, frequent updates to a limited number of rows. For transactional workloads, a traditional row-based relational database such as PostgreSQL or MySQL may provide a beneficial solution. These databases are specifically designed to effectively manage small, frequent updates to a limited number of rows.