MonkeyCanCode opened a new pull request, #4075: URL: https://github.com/apache/polaris/pull/4075
<!-- ๐ Describe what changes you're proposing, especially breaking or user-facing changes. ๐ See https://github.com/apache/polaris/blob/main/CONTRIBUTING.md for more. --> This is phase two of [CLI: Add summarize subcommand](https://github.com/apache/polaris/pull/4003), with great feedback from @flyrain and community from [ML](https://lists.apache.org/thread/35zzzh2jgorhx7q2xksp7rwxnt6gl2zx), this PR added the following support: 1. `find` command to locate identifier via fuzzy search 2. `tables` command to handle some basic Iceberg table operation (get/list/summarize/non-purge delete) Also, a newline is added per section for `summarize` sub-commands introduced from phase one for easier readability. Here are couple sample output: ## Find command ``` # fuzzy search for all entities across all catalogs โ polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find user Searching for 'user'... [Global] Principal: quickstart_user Principal: readonly_user Principal: dev_user Principal Role: quickstart_user_role Principal Role: readonly_user_role Principal Role: dev_user_role [Catalog: quickstart_catalog] Table: dev_namespace.sub_namespace.user View: dev_namespace.sub_namespace.user_view Found 8 matches (3 Principals, 3 Principal Roles, 1 Table, 1 View). # fuzzy search for all entities within a single catalog โ polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find dev --catalog quickstart_catalog Searching for 'dev'... [Catalog: quickstart_catalog] Catalog Role: dev_catalog_role Namespace: dev_namespace Found 2 matches (1 Catalog Role, 1 Namespace). # fuzzy search for entity catalog role within a single catalog โ polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find dev --catalog quickstart_catalog --type catalog-role Searching for 'dev'... [Catalog: quickstart_catalog] Catalog Role: dev_catalog_role Found 1 matches (1 Catalog Role). ``` ## Tables command ``` # list tables โ polaris git:(cli_summary_subcommand_v2) โ ./polaris --profile dev tables list --catalog quickstart_catalog --namespace dev_namespace.sub_namespace {"namespace": ["dev_namespace", "sub_namespace"], "name": "user"} # get full table metadata โ polaris git:(cli_summary_subcommand_v2) โ ./polaris --profile dev tables get user --catalog quickstart_catalog --namespace dev_namespace.sub_namespace {"metadata-location": "file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00002-fa1347d8-c14a-4af7-974d-2e80bc0a5866.metadata.json", "metadata": {"format-version": 3, "table-uuid": "35836a86-bf3a-43df-a6a4-ace9e5c8fb22", "location": "file:///var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user", "last-updated-ms": 1774722865518, "next-row-id": 1, "properties": {"owner": "yong", "created-at": "2026-03-28T18:34:23.090216Z", "write.distribution-mode": "range", "write.parquet.compression-codec": "zstd"}, "schemas": [{"type": "struct", "fields": [{"id": 1, "name": "id", "type": "long", "required": true, "doc": "Row ID"}, {"id": 2, "name": "user", "type": {"type": "struct", "fields": [{"id": 9, "name": "user_id", "type": "string", "required": false}, {"id": 10, "name": "name", "type": "string", "required": false}, {"id": 11, "name": "address", "type": {"type": "struct", "fields": [{"id": 12, "name": "street", "type": "string", "required": false}, {"id": 13 , "name": "city", "type": "string", "required": false}, {"id": 14, "name": "country", "type": "string", "required": false}]}, "required": false}]}, "required": true, "doc": "User info"}, {"id": 3, "name": "tags", "type": {"type": "list", "element-id": 15, "element": "string", "element-required": false}, "required": false, "doc": "tags"}, {"id": 4, "name": "attributes", "type": {"type": "map", "key-id": 16, "key": "string", "value-id": 17, "value": "string", "value-required": false}, "required": false, "doc": "User attributes"}, {"id": 5, "name": "events", "type": {"type": "list", "element-id": 18, "element": {"type": "struct", "fields": [{"id": 19, "name": "event_type", "type": "string", "required": false}, {"id": 20, "name": "event_time", "type": "timestamptz", "required": false}, {"id": 21, "name": "metadata", "type": {"type": "map", "key-id": 22, "key": "string", "value-id": 23, "value": "string", "value-required": false}, "required": false}]}, "element-required": false}, "requir ed": false, "doc": "User event history"}, {"id": 6, "name": "event_data", "type": "variant", "required": false, "doc": "User event data"}, {"id": 7, "name": "category", "type": "string", "required": true, "doc": "Event category"}, {"id": 8, "name": "created_at", "type": "timestamptz", "required": true, "doc": "Event creation time"}]}], "current-schema-id": 0, "last-column-id": 23, "partition-specs": [{"fields": [{"field-id": 1000, "source-id": 8, "name": "created_at_day", "transform": "day"}, {"field-id": 1001, "source-id": 7, "name": "category", "transform": "identity"}]}], "default-spec-id": 0, "last-partition-id": 1001, "sort-orders": [{"fields": []}, {"fields": [{"source-id": 8, "transform": "identity", "direction": "desc", "null-order": "nulls-last"}, {"source-id": 1, "transform": "identity", "direction": "asc", "null-order": "nulls-first"}]}], "default-sort-order-id": 1, "snapshots": [{"snapshot-id": 201003753560339990, "sequence-number": 1, "timestamp-ms": 1774722865518, "man ifest-list": "file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/snap-201003753560339990-1-e0dcc235-e5a1-454a-a303-6a1c8fa22525.avro", "first-row-id": 0, "summary": {"operation": "append", "spark.app.id": "local-1774722859049", "added-data-files": "1", "added-records": "1", "added-files-size": "5600", "changed-partition-count": "1", "total-records": "1", "total-files-size": "5600", "total-data-files": "1", "total-delete-files": "0", "total-position-deletes": "0", "total-equality-deletes": "0", "engine-version": "4.0.2", "app-id": "local-1774722859049", "engine-name": "spark", "iceberg-version": "Apache Iceberg 1.10.1 (commit ccb8bc435062171e64bc8b7e5f56e6aed9c5b934)"}, "schema-id": 0}], "refs": {"main": {"type": "branch", "snapshot-id": 201003753560339990}}, "current-snapshot-id": 201003753560339990, "last-sequence-number": 1, "snapshot-log": [{"snapshot-id": 201003753560339990, "timestamp-ms": 1774722865518}], "metadata-log": [{"metadata-file": "file:/var/tm p/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00000-9cac3cd7-7dbd-4355-be3d-2d3da33d3158.metadata.json", "timestamp-ms": 1774722863092}, {"metadata-file": "file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00001-ef4623e9-286d-4859-9aa6-e90e968b8b12.metadata.json", "timestamp-ms": 1774722863221}], "statistics": [], "partition-statistics": []}} # table summarize โ polaris git:(cli_summary_subcommand_v2) โ ./polaris --profile dev tables summarize user --catalog quickstart_catalog --namespace dev_namespace.sub_namespace Table: dev_namespace.sub_namespace.user -------------------------------------------------------------------------------- Metadata Location: file:///var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user Format Version: 3 Snapshots: 1 Current Snapshot ID: 201003753560339990 Last Updated: 2026-03-28 18:34:25 UTC Statistics Total Records: 1 Total Data Files: 1 Total Files Size: 5600 Schema +----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+ | ID | Field Name | Type | Required | Comment | +----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+ | 1 | id | long | * | Row ID | | 2 | user | struct<user_id:string, name:string, address:struct<street:string, city:string, country:string>> | * | User info | | 3 | tags | list<string> | | tags | | 4 | attributes | map<string, string> | | User attributes | | 5 | events | list<struct<event_type:string, event_time:timestamptz, metadata:map<string, string>>> | | User event history | | 6 | event_data | variant | | User event data | | 7 | category | string | * | Event category | | 8 | created_at | timestamptz | * | Event creation time | +----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+ Partitioning +-----------+----------------+-----------+ | Source ID | Field Name | Transform | +-----------+----------------+-----------+ | 8 | created_at_day | day | | 7 | category | identity | +-----------+----------------+-----------+ Sort order +-----------+-----------+-------------+-----------+ | Source ID | Transform | Null Order | Direction | +-----------+-----------+-------------+-----------+ | 8 | identity | nulls-last | desc | | 1 | identity | nulls-first | asc | +-----------+-----------+-------------+-----------+ Effective policies - orphan-file-policy (Inherited from dev_namespace) - snapshot-expiry-policy (Inherited from dev_namespace) -------------------------------------------------------------------------------- ``` ## Setup instructions used for above ``` # setup ## boostrap ./polaris --profile dev setup apply site/content/guides/assets/polaris/reference-setup-config.yaml ## create sample table with complex types and sort order etc. CREATE TABLE IF NOT EXISTS dev_namespace.sub_namespace.user ( id BIGINT NOT NULL COMMENT 'Row ID', user STRUCT<user_id: STRING, name: STRING, address: STRUCT<street: STRING, city: STRING, country: STRING>> NOT NULL COMMENT 'User info', tags ARRAY<STRING> COMMENT 'tags', attributes MAP<STRING, STRING> COMMENT 'User attributes', events ARRAY<STRUCT<event_type: STRING, event_time: TIMESTAMP, metadata: MAP<STRING, STRING>>> COMMENT 'User event history', event_data VARIANT COMMENT 'User event data', category STRING NOT NULL COMMENT 'Event category', created_at TIMESTAMP NOT NULL COMMENT 'Event creation time' ) USING iceberg PARTITIONED BY (days(created_at), category) TBLPROPERTIES ('format-version' = '3'); ALTER TABLE dev_namespace.sub_namespace.user WRITE ORDERED BY (created_at DESC, id); INSERT INTO dev_namespace.sub_namespace.user VALUES ( 1, named_struct( 'user_id', 'u1', 'name', 'xxx', 'address', named_struct('street', 'xxx', 'city', 'xxx', 'country', 'xx') ), array('tag1', 'tag2'), map('key1', 'value1'), array( named_struct( 'event_type', 'x', 'event_time', timestamp '2026-03-24 12:00:00', 'metadata', map('k', 'v') ) ), parse_json('{"dynamic_field": 123, "nested": {"a": true}}'), 'xxx', timestamp '2026-03-24 12:00:00' ); CREATE VIEW IF NOT EXISTS dev_namespace.sub_namespace.user_view AS SELECT * FROM dev_namespace.sub_namespace.user; ``` ## Checklist - [ ] ๐ก๏ธ Don't disclose security issues! (contact [email protected]) - [ ] ๐ Clearly explained why the changes are needed, or linked related issues: Fixes # - [ ] ๐งช Added/updated tests with good coverage, or manually tested (and explained how) - [ ] ๐ก Added comments for complex logic - [ ] ๐งพ Updated `CHANGELOG.md` (if needed) - [ ] ๐ Updated documentation in `site/content/in-dev/unreleased` (if needed) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
