janhoy opened a new pull request, #3: URL: https://github.com/apache/solr-orbit/pull/3
https://issues.apache.org/jira/browse/SOLR-18255 This PR contains the initial port of [OpenSearch Benchmark (OSB)](https://github.com/opensearch-project/opensearch-benchmark) to work with Apache Solr. The fork point from OSB is tagged `osb_fork_point` (OSB commit `92982c56`). The codebase retains the OSB Python package name (`osbenchmark`) and directory structure for now; known work to do is tracked in `TODO.md` and will likely be converted into JIRA tasks. ## How to review The PR is structured as **6 commits in logical progression order**. Each commit is independently coherent and reviewable in isolation. The recommended approach is to review one commit at a time using GitHub's commit view or `git log -p`. The final commit is the largest, but by that point the project shape is established and the changes read more clearly in context. | # | Commit | Files | What to focus on | |---|--------|-------|-----------------| | 1 | Establish ASF legal and governance files | 12 | NOTICE attribution, license header format, CONTRIBUTING accuracy | | 2 | Update GitHub/CI infrastructure | 20 | Workflow correctness, removed vs. kept actions | | 3 | Rewrite documentation | 84 | Install steps, CLI examples, converter docs accuracy | | 4 | Remove OSB-specific dead code and binaries | 41 | Verify nothing Solr-relevant was swept up | | 5 | Add new Solr-specific modules | 25 | Conversion logic (schema.py, query.py), provisioner correctness | | 6 | Port core benchmark framework | 195 | client.py, telemetry.py, runner.py — see functional notes below | ## Summary of major changes ### 1. Solr-native client (`osbenchmark/client.py`) The OpenSearch Python client (`opensearch-py`) has been replaced with a purpose-built `SolrAdminClient` and `SolrClient` that communicate with Solr over HTTP using `requests`/`pysolr`. All collection management, document indexing, and query execution now goes through Solr's REST API (Collections API, `/select`, `/update`, etc.). ### 2. Solr provisioner (`osbenchmark/builder/solr_provisioner.py`) A new `SolrProvisioner` replaces the OpenSearch node provisioning machinery. It supports three deployment modes: - **`from-distribution`** — downloads a released Solr binary from `downloads.apache.org` or the ASF archive (including pre-9.0 paths). - **`from-sources`** — builds Solr from a local checkout with Gradle. - **`docker`** — pulls and starts the official Solr Docker image, including nightly builds. `SolrDockerLauncher` handles container lifecycle. Version-aware logic handles the API differences between Solr 9.x and 10.x (e.g. collection creation flags). ### 3. Solr-specific telemetry devices (`osbenchmark/telemetry.py`) Six new `SolrTelemetryDevice` subclasses collect Solr-specific metrics during a run: `SolrJvmStats`, `SolrNodeStats`, `SolrCollectionStats`, `SolrQueryStats`, `SolrIndexingStats`, `SolrCacheStats`. These poll the Solr Metrics API and write results via the existing `ResultWriter` pipeline. ### 4. Solr runner operations (`osbenchmark/worker_coordinator/runner.py`) 56 OpenSearch-specific runner classes have been removed (KNN, ML connectors, vector datasets, data streams, index templates, pipelines, etc.). In their place, Solr-specific runners have been added under `SolrRunner`: `SolrBulkIndex`, `SolrSearch`, `SolrPaginatedSearch`, `SolrCommit`, `SolrOptimize`, `SolrWaitForMerges`, `SolrCreateCollection`, `SolrDeleteCollection`. ### 5. Workload model: index → collection (`osbenchmark/workload/`) The workload domain model has been updated throughout: - `Index` / `DataStream` / `IndexTemplate` → `Collection` - `IndexTemplate`, `ComponentTemplate`, `DataStream` and serverless/vector-related types removed - New `CreateCollectionParamSource` / `DeleteCollectionParamSource` / `SolrSearchParamSource` - OpenSearch Query DSL validation removed; Solr query params used instead ### 6. OSB-to-Solr workload converter (`osbenchmark/conversion/`) A new converter pipeline (`workload_converter.py`, `detector.py`, `query.py`, `schema.py`, `field.py`) translates an OpenSearch Benchmark workload into Solr format: - Detects OSB-specific operations and query DSL automatically - Translates `bulk` → `bulk-index`, `force-merge` → `optimize`, index mappings → Solr configsets - Generates a minimal `solrconfig.xml` / `managed-schema.xml` configset skeleton - Invoked via `solr-benchmark convert-workload`; see `docs/converter/` for details ### 7. Metrics store simplified (`osbenchmark/metrics.py`) `OsMetricsStore`, `OsTestRunStore`, `OsResultsStore`, and `IndexTemplateProvider` (all backed by OpenSearch) have been removed. The single supported store is now `FilesystemMetricsStore` (JSON + CSV + SQLite on local disk), accessed via `LocalFilesystemResultWriter`. ### 8. Documentation site (`docs/`) A full user-facing documentation site is included, built with Jekyll + just-the-docs. Key sections: `user-guide/` (install, configure, workload authoring), `reference/` (telemetry, metrics, workload schema, commands), `converter/` (OSB migration guide), `cluster-config/`. Deployed to GitHub Pages via `.github/workflows/docs.yml`. See `docs/README.md` for local build instructions. ### 9. ASF licence headers and housekeeping - All modified files carry a two-line ASF modification notice above the original OpenSearch header. - OSB-specific GitHub workflows (release, backport, integ-test, PyPI publish) removed; a docs deploy workflow added. - Bundled `pbzip2` binaries removed; `pbzip2` is now an optional system prerequisite. - `CONTRIBUTING.md`, `DEVELOPER_GUIDE.md`, `README.md` rewritten for the Solr/ASF context. - `TODO.md` tracks remaining incubation steps (package rename, CI, release process, etc.). --- The changes are described by the **9 functional areas** above regardless of which commit they land in. The 6-commit structure exists purely to aid review — it does not reflect the order in which the work was done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
