janhoy opened a new pull request, #1: URL: https://github.com/apache/solr-orbit/pull/1
This PR contains the initial port of [OpenSearch Benchmark (OSB)](https://github.com/opensearch-project/opensearch-benchmark) to work with Apache Solr. The fork point from OSB is tagged `osb_fork_point` (OSB commit `92982c56`). The codebase retains the OSB Python package name (`osbenchmark`) and directory structure for now; known work to do is tracked in TODO.md and will likely be converted into JIRA tasks. ## Summary of major changes ### 1. Solr-native client (`osbenchmark/client.py`) The OpenSearch Python client (`opensearch-py`) has been replaced with a purpose-built `SolrAdminClient` and `SolrClient` that communicate with Solr over HTTP using `requests`/`pysolr`. All collection management, document indexing, and query execution now goes through Solr's REST API (Collections API, `/select`, `/update`, etc.). ### 2. Solr provisioner (`osbenchmark/builder/solr_provisioner.py`) A new `SolrProvisioner` replaces the OpenSearch node provisioning machinery. It supports three deployment modes: - **`from-distribution`** — downloads a released Solr binary from `downloads.apache.org` or the ASF archive (including pre-9.0 paths). - **`from-sources`** — builds Solr from a local checkout with Gradle. - **`docker`** — pulls and starts the official Solr Docker image, including nightly builds. `SolrDockerLauncher` handles container lifecycle. Version-aware logic handles the API differences between Solr 9.x and 10.x (e.g. collection creation flags). ### 3. Solr-specific telemetry devices (`osbenchmark/telemetry.py`) Six new `SolrTelemetryDevice` subclasses collect Solr-specific metrics during a run: `SolrJvmStats`, `SolrNodeStats`, `SolrCollectionStats`, `SolrQueryStats`, `SolrIndexingStats`, `SolrCacheStats`. These poll the Solr Metrics API and write results via the existing `ResultWriter` pipeline. ### 4. Solr runner operations (`osbenchmark/worker_coordinator/runner.py`) 56 OpenSearch-specific runner classes have been removed (KNN, ML connectors, vector datasets, data streams, index templates, pipelines, etc.). In their place, Solr-specific runners have been added under `SolrRunner`: `SolrBulkIndex`, `SolrSearch`, `SolrPaginatedSearch`, `SolrCommit`, `SolrOptimize`, `SolrWaitForMerges`, `SolrCreateCollection`, `SolrDeleteCollection`. ### 5. Workload model: index → collection (`osbenchmark/workload/`) The workload domain model has been updated throughout: - `Index` / `DataStream` / `IndexTemplate` → `Collection` - `IndexTemplate`, `ComponentTemplate`, `DataStream` and serverless/vector-related types removed - New `CreateCollectionParamSource` / `DeleteCollectionParamSource` / `SolrSearchParamSource` - OpenSearch Query DSL validation removed; Solr query params used instead ### 6. OSB-to-Solr workload converter (`osbenchmark/conversion/`) A new converter pipeline (`workload_converter.py`, `detector.py`, `query.py`, `schema.py`, `field.py`) translates an OpenSearch Benchmark workload into Solr format: - Detects OSB-specific operations and query DSL automatically - Translates `bulk` → `bulk-index`, `force-merge` → `optimize`, index mappings → Solr configsets - Generates a minimal `solrconfig.xml` / `managed-schema.xml` configset skeleton - Invoked via `solr-benchmark convert-workload`; see `docs/converter/` for details ### 7. Metrics store simplified (`osbenchmark/metrics.py`) `OsMetricsStore`, `OsTestRunStore`, `OsResultsStore`, and `IndexTemplateProvider` (all backed by OpenSearch) have been removed. The single supported store is now `FilesystemMetricsStore` (JSON + CSV + SQLite on local disk), accessed via `LocalFilesystemResultWriter`. ### 8. Documentation site (`docs/`) A full user-facing documentation site is included, built with Jekyll + just-the-docs. Key sections: `user-guide/` (install, configure, workload authoring), `reference/` (telemetry, metrics, workload schema, commands), `converter/` (OSB migration guide), `cluster-config/`. Deployed to GitHub Pages via `.github/workflows/docs.yml`. See `docs/README.md` for local build instructions. ### 9. ASF licence headers and housekeeping - All modified files carry a two-line ASF modification notice above the original OpenSearch header. - OSB-specific GitHub workflows (release, backport, integ-test, PyPI publish) removed; a docs deploy workflow added. - Bundled `pbzip2` binaries removed; `pbzip2` is now an optional system prerequisite. - `CONTRIBUTING.md`, `DEVELOPER_GUIDE.md`, `README.md` rewritten for the Solr/ASF context. - `TODO.md` tracks remaining incubation steps (package rename, CI, release process, etc.). PS: The commits are not super well structured in features, but shows the evolution of the port work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
