MonkeyCanCode opened a new pull request, #4075:
URL: https://github.com/apache/polaris/pull/4075

   <!--
   ๐Ÿ“ Describe what changes you're proposing, especially breaking or user-facing 
changes. 
   ๐Ÿ“– See https://github.com/apache/polaris/blob/main/CONTRIBUTING.md for more.
   -->
   
   This is phase two of [CLI: Add summarize 
subcommand](https://github.com/apache/polaris/pull/4003), with great feedback 
from @flyrain and community from 
[ML](https://lists.apache.org/thread/35zzzh2jgorhx7q2xksp7rwxnt6gl2zx), this PR 
added the following support:
   1. `find` command to locate identifier via fuzzy search
   2. `tables` command to handle some basic Iceberg table operation 
(get/list/summarize/non-purge delete)
   
   Also, a newline is added per section for `summarize` sub-commands introduced 
from phase one for easier readability.
   
   Here are couple sample output:
   
   ## Find command
   ```
   # fuzzy search for all entities across all catalogs
   โžœ  polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find user
   Searching for 'user'...
   [Global]
     Principal:           quickstart_user
     Principal:           readonly_user
     Principal:           dev_user
     Principal Role:      quickstart_user_role
     Principal Role:      readonly_user_role
     Principal Role:      dev_user_role
   
   [Catalog: quickstart_catalog]
     Table:               dev_namespace.sub_namespace.user
     View:                dev_namespace.sub_namespace.user_view
   
   Found 8 matches (3 Principals, 3 Principal Roles, 1 Table, 1 View).
   
   # fuzzy search for all entities within a single catalog
   โžœ  polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find dev 
--catalog quickstart_catalog
   Searching for 'dev'...
   [Catalog: quickstart_catalog]
     Catalog Role:        dev_catalog_role
     Namespace:           dev_namespace
   
   Found 2 matches (1 Catalog Role, 1 Namespace).
   
   # fuzzy search for entity catalog role within a single catalog
   โžœ  polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find dev 
--catalog quickstart_catalog --type catalog-role
   Searching for 'dev'...
   [Catalog: quickstart_catalog]
     Catalog Role:        dev_catalog_role
   
   Found 1 matches (1 Catalog Role).
   ```
   
   ## Tables command
   ```
   # list tables
   โžœ  polaris git:(cli_summary_subcommand_v2) โœ— ./polaris --profile dev tables 
list --catalog quickstart_catalog --namespace dev_namespace.sub_namespace
   {"namespace": ["dev_namespace", "sub_namespace"], "name": "user"}
   
   # get full table metadata
   โžœ  polaris git:(cli_summary_subcommand_v2) โœ— ./polaris --profile dev tables 
get user --catalog quickstart_catalog --namespace dev_namespace.sub_namespace
   {"metadata-location": 
"file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00002-fa1347d8-c14a-4af7-974d-2e80bc0a5866.metadata.json",
 "metadata": {"format-version": 3, "table-uuid": 
"35836a86-bf3a-43df-a6a4-ace9e5c8fb22", "location": 
"file:///var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user", 
"last-updated-ms": 1774722865518, "next-row-id": 1, "properties": {"owner": 
"yong", "created-at": "2026-03-28T18:34:23.090216Z", "write.distribution-mode": 
"range", "write.parquet.compression-codec": "zstd"}, "schemas": [{"type": 
"struct", "fields": [{"id": 1, "name": "id", "type": "long", "required": true, 
"doc": "Row ID"}, {"id": 2, "name": "user", "type": {"type": "struct", 
"fields": [{"id": 9, "name": "user_id", "type": "string", "required": false}, 
{"id": 10, "name": "name", "type": "string", "required": false}, {"id": 11, 
"name": "address", "type": {"type": "struct", "fields": [{"id": 12, "name": 
"street", "type": "string", "required": false}, {"id": 13
 , "name": "city", "type": "string", "required": false}, {"id": 14, "name": 
"country", "type": "string", "required": false}]}, "required": false}]}, 
"required": true, "doc": "User info"}, {"id": 3, "name": "tags", "type": 
{"type": "list", "element-id": 15, "element": "string", "element-required": 
false}, "required": false, "doc": "tags"}, {"id": 4, "name": "attributes", 
"type": {"type": "map", "key-id": 16, "key": "string", "value-id": 17, "value": 
"string", "value-required": false}, "required": false, "doc": "User 
attributes"}, {"id": 5, "name": "events", "type": {"type": "list", 
"element-id": 18, "element": {"type": "struct", "fields": [{"id": 19, "name": 
"event_type", "type": "string", "required": false}, {"id": 20, "name": 
"event_time", "type": "timestamptz", "required": false}, {"id": 21, "name": 
"metadata", "type": {"type": "map", "key-id": 22, "key": "string", "value-id": 
23, "value": "string", "value-required": false}, "required": false}]}, 
"element-required": false}, "requir
 ed": false, "doc": "User event history"}, {"id": 6, "name": "event_data", 
"type": "variant", "required": false, "doc": "User event data"}, {"id": 7, 
"name": "category", "type": "string", "required": true, "doc": "Event 
category"}, {"id": 8, "name": "created_at", "type": "timestamptz", "required": 
true, "doc": "Event creation time"}]}], "current-schema-id": 0, 
"last-column-id": 23, "partition-specs": [{"fields": [{"field-id": 1000, 
"source-id": 8, "name": "created_at_day", "transform": "day"}, {"field-id": 
1001, "source-id": 7, "name": "category", "transform": "identity"}]}], 
"default-spec-id": 0, "last-partition-id": 1001, "sort-orders": [{"fields": 
[]}, {"fields": [{"source-id": 8, "transform": "identity", "direction": "desc", 
"null-order": "nulls-last"}, {"source-id": 1, "transform": "identity", 
"direction": "asc", "null-order": "nulls-first"}]}], "default-sort-order-id": 
1, "snapshots": [{"snapshot-id": 201003753560339990, "sequence-number": 1, 
"timestamp-ms": 1774722865518, "man
 ifest-list": 
"file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/snap-201003753560339990-1-e0dcc235-e5a1-454a-a303-6a1c8fa22525.avro",
 "first-row-id": 0, "summary": {"operation": "append", "spark.app.id": 
"local-1774722859049", "added-data-files": "1", "added-records": "1", 
"added-files-size": "5600", "changed-partition-count": "1", "total-records": 
"1", "total-files-size": "5600", "total-data-files": "1", "total-delete-files": 
"0", "total-position-deletes": "0", "total-equality-deletes": "0", 
"engine-version": "4.0.2", "app-id": "local-1774722859049", "engine-name": 
"spark", "iceberg-version": "Apache Iceberg 1.10.1 (commit 
ccb8bc435062171e64bc8b7e5f56e6aed9c5b934)"}, "schema-id": 0}], "refs": {"main": 
{"type": "branch", "snapshot-id": 201003753560339990}}, "current-snapshot-id": 
201003753560339990, "last-sequence-number": 1, "snapshot-log": [{"snapshot-id": 
201003753560339990, "timestamp-ms": 1774722865518}], "metadata-log": 
[{"metadata-file": "file:/var/tm
 
p/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00000-9cac3cd7-7dbd-4355-be3d-2d3da33d3158.metadata.json",
 "timestamp-ms": 1774722863092}, {"metadata-file": 
"file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00001-ef4623e9-286d-4859-9aa6-e90e968b8b12.metadata.json",
 "timestamp-ms": 1774722863221}], "statistics": [], "partition-statistics": []}}
   
   # table summarize
   โžœ  polaris git:(cli_summary_subcommand_v2) โœ— ./polaris --profile dev tables 
summarize user --catalog quickstart_catalog --namespace 
dev_namespace.sub_namespace
   Table: dev_namespace.sub_namespace.user
   
--------------------------------------------------------------------------------
   Metadata
     Location:                      
file:///var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user
     Format Version:                3
     Snapshots:                     1
     Current Snapshot ID:           201003753560339990
     Last Updated:                  2026-03-28 18:34:25 UTC
   
   Statistics
     Total Records:                 1
     Total Data Files:              1
     Total Files Size:              5600
   
   Schema
     
+----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+
     | ID | Field Name | Type                                                   
                                         | Required | Comment             |
     
+----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+
     | 1  | id         | long                                                   
                                         | *        | Row ID              |
     | 2  | user       | struct<user_id:string, name:string, 
address:struct<street:string, city:string, country:string>> | *        | User 
info           |
     | 3  | tags       | list<string>                                           
                                         |          | tags                |
     | 4  | attributes | map<string, string>                                    
                                         |          | User attributes     |
     | 5  | events     | list<struct<event_type:string, event_time:timestamptz, 
metadata:map<string, string>>>           |          | User event history  |
     | 6  | event_data | variant                                                
                                         |          | User event data     |
     | 7  | category   | string                                                 
                                         | *        | Event category      |
     | 8  | created_at | timestamptz                                            
                                         | *        | Event creation time |
     
+----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+
   
   Partitioning
     +-----------+----------------+-----------+
     | Source ID | Field Name     | Transform |
     +-----------+----------------+-----------+
     | 8         | created_at_day | day       |
     | 7         | category       | identity  |
     +-----------+----------------+-----------+
   
   Sort order
     +-----------+-----------+-------------+-----------+
     | Source ID | Transform | Null Order  | Direction |
     +-----------+-----------+-------------+-----------+
     | 8         | identity  | nulls-last  | desc      |
     | 1         | identity  | nulls-first | asc       |
     +-----------+-----------+-------------+-----------+
   
   Effective policies
     - orphan-file-policy (Inherited from dev_namespace)
     - snapshot-expiry-policy (Inherited from dev_namespace)
   
--------------------------------------------------------------------------------
   ```
   
   
   ## Setup instructions used for above
   ```
   # setup
   ## boostrap 
   ./polaris --profile dev setup apply 
site/content/guides/assets/polaris/reference-setup-config.yaml
   
   ## create sample table with complex types and sort order etc.
   CREATE TABLE IF NOT EXISTS dev_namespace.sub_namespace.user (
       id BIGINT NOT NULL COMMENT 'Row ID',
       user STRUCT<user_id: STRING, name: STRING, address: STRUCT<street: 
STRING, city: STRING, country: STRING>> NOT NULL COMMENT 'User info',
       tags ARRAY<STRING> COMMENT 'tags',
       attributes MAP<STRING, STRING> COMMENT 'User attributes',
       events ARRAY<STRUCT<event_type: STRING, event_time: TIMESTAMP, metadata: 
MAP<STRING, STRING>>> COMMENT 'User event history',
       event_data VARIANT COMMENT 'User event data',
       category STRING NOT NULL COMMENT 'Event category',
       created_at TIMESTAMP NOT NULL COMMENT 'Event creation time'
   )
   USING iceberg
   PARTITIONED BY (days(created_at), category)
   TBLPROPERTIES ('format-version' = '3');
   
   ALTER TABLE dev_namespace.sub_namespace.user WRITE ORDERED BY (created_at 
DESC, id);
   
   INSERT INTO dev_namespace.sub_namespace.user VALUES (
     1,
     named_struct(
       'user_id', 'u1',
       'name', 'xxx',
       'address', named_struct('street', 'xxx', 'city', 'xxx', 'country', 'xx')
     ),
     array('tag1', 'tag2'),
     map('key1', 'value1'),
     array(
       named_struct(
         'event_type', 'x',
         'event_time', timestamp '2026-03-24 12:00:00',
         'metadata', map('k', 'v')
       )
     ),
     parse_json('{"dynamic_field": 123, "nested": {"a": true}}'),
     'xxx',
     timestamp '2026-03-24 12:00:00'
   );
   
   CREATE VIEW IF NOT EXISTS dev_namespace.sub_namespace.user_view AS SELECT * 
FROM dev_namespace.sub_namespace.user;
   ``` 
   
   ## Checklist
   - [ ] ๐Ÿ›ก๏ธ Don't disclose security issues! (contact [email protected])
   - [ ] ๐Ÿ”— Clearly explained why the changes are needed, or linked related 
issues: Fixes #
   - [ ] ๐Ÿงช Added/updated tests with good coverage, or manually tested (and 
explained how)
   - [ ] ๐Ÿ’ก Added comments for complex logic
   - [ ] ๐Ÿงพ Updated `CHANGELOG.md` (if needed)
   - [ ] ๐Ÿ“š Updated documentation in `site/content/in-dev/unreleased` (if needed)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to