This is an automated email from the ASF dual-hosted git repository. alsay pushed a commit to branch split_tables in repository https://gitbox.apache.org/repos/asf/datasketches-bigquery.git
commit af1d8f2ce013ae92631c7b750cc0ae682b15f3ef Author: AlexanderSaydakov <[email protected]> AuthorDate: Mon Feb 10 16:19:15 2025 -0800 separate tables for aggregate and scalar functions --- cpc/README.md | 37 +++++++++++++++------------- fi/README.md | 17 ++++++++----- hll/README.md | 31 ++++++++++++++---------- kll/README.md | 39 +++++++++++++++++------------- readme_generator.py | 14 +++++++---- req/README.md | 39 +++++++++++++++++------------- tdigest/README.md | 29 ++++++++++++---------- theta/README.md | 57 +++++++++++++++++++++++-------------------- tuple/README.md | 69 ++++++++++++++++++++++++++++------------------------- 9 files changed, 189 insertions(+), 143 deletions(-) diff --git a/cpc/README.md b/cpc/README.md index b55e9a3..0507f0b 100644 --- a/cpc/README.md +++ b/cpc/README.md @@ -37,22 +37,27 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [cpc_sketch_agg_union](../cpc/sqlx/cpc_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES. | -| [cpc_sketch_agg_string](../cpc/sqlx/cpc_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES | -| [cpc_sketch_agg_int64](../cpc/sqlx/cpc_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES | -| [cpc_sketch_agg_string_lgk_seed](../cpc/sqlx/cpc_sketch_agg_string_lgk_seed.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compre [...] -| [cpc_sketch_agg_union_lgk_seed](../cpc/sqlx/cpc_sketch_agg_union_lgk_seed.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the co [...] -| [cpc_sketch_agg_int64_lgk_seed](../cpc/sqlx/cpc_sketch_agg_int64_lgk_seed.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compres [...] -| [cpc_sketch_get_estimate](../cpc/sqlx/cpc_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: a FLOAT64 value as the cardinality estimate. | -| [cpc_sketch_to_string](../cpc/sqlx/cpc_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. | -| [cpc_sketch_get_estimate_seed](../cpc/sqlx/cpc_sketch_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a FLOAT64 value as the cardinality estimate. | -| [cpc_sketch_to_string_seed](../cpc/sqlx/cpc_sketch_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. | -| [cpc_sketch_union](../cpc/sqlx/cpc_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a CPC Sketch, as BYTES. | -| [cpc_sketch_get_estimate_and_bounds](../cpc/sqlx/cpc_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the [...] -| [cpc_sketch_union_lgk_seed](../cpc/sqlx/cpc_sketch_union_lgk_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configure [...] -| [cpc_sketch_get_estimate_and_bounds_seed](../cpc/sqlx/cpc_sketch_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard de [...] +## Aggregate Functions +| Function Name | Signature | Description | +|---|---|---| +| [cpc_sketch_agg_union](../cpc/sqlx/cpc_sketch_agg_union.sqlx) | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES. | +| [cpc_sketch_agg_string](../cpc/sqlx/cpc_sketch_agg_string.sqlx) | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES | +| [cpc_sketch_agg_int64](../cpc/sqlx/cpc_sketch_agg_int64.sqlx) | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES | +| [cpc_sketch_agg_string_lgk_seed](../cpc/sqlx/cpc_sketch_agg_string_lgk_seed.sqlx) | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compressed CPC Ske [...] +| [cpc_sketch_agg_union_lgk_seed](../cpc/sqlx/cpc_sketch_agg_union_lgk_seed.sqlx) | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.< [...] +| [cpc_sketch_agg_int64_lgk_seed](../cpc/sqlx/cpc_sketch_agg_int64_lgk_seed.sqlx) | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compressed CPC Sket [...] + +## Scalar Functions +| Function Name | Signature | Description | +|---|---|---| +| [cpc_sketch_get_estimate](../cpc/sqlx/cpc_sketch_get_estimate.sqlx) | (sketch BYTES) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: a FLOAT64 value as the cardinality estimate. | +| [cpc_sketch_to_string](../cpc/sqlx/cpc_sketch_to_string.sqlx) | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. | +| [cpc_sketch_get_estimate_seed](../cpc/sqlx/cpc_sketch_get_estimate_seed.sqlx) | (sketch BYTES, seed INT64) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a FLOAT64 value as the cardinality estimate. | +| [cpc_sketch_to_string_seed](../cpc/sqlx/cpc_sketch_to_string_seed.sqlx) | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. | +| [cpc_sketch_union](../cpc/sqlx/cpc_sketch_union.sqlx) | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a CPC Sketch, as BYTES. | +| [cpc_sketch_get_estimate_and_bounds](../cpc/sqlx/cpc_sketch_get_estimate_and_bounds.sqlx) | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the returned [...] +| [cpc_sketch_union_lgk_seed](../cpc/sqlx/cpc_sketch_union_lgk_seed.sqlx) | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with th [...] +| [cpc_sketch_get_estimate_and_bounds_seed](../cpc/sqlx/cpc_sketch_get_estimate_and_bounds_seed.sqlx) | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations< [...] **Examples:** diff --git a/fi/README.md b/fi/README.md index 3a51be0..cbf9527 100644 --- a/fi/README.md +++ b/fi/README.md @@ -36,12 +36,17 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [frequent_strings_sketch_merge](../fi/sqlx/frequent_strings_sketch_merge.sqlx) | AGGREGATE | (sketch BYTES, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as an integer not less than 3.<br>Returns: a serialized Frequent Strings sketch as BYTES. | -| [frequent_strings_sketch_build](../fi/sqlx/frequent_strings_sketch_build.sqlx) | AGGREGATE | (item STRING, weight INT64, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Creates a sketch that represents frequencies of the given column.<br><br>Param item: the column of STRING values.<br>Param weight: the amount by which the weight of the item should be increased.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as a BYTEINT not less than 3.<br>Returns: a Frequent Strings [...] -| [frequent_strings_sketch_to_string](../fi/sqlx/frequent_strings_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | -| [frequent_strings_sketch_get_result](../fi/sqlx/frequent_strings_sketch_get_result.sqlx) | SCALAR | (sketch BYTES, error_type STRING, threshold INT64) -> ARRAY<STRUCT<item STRING, estimate INT64, lower_bound INT64, upper_bound INT64>> | Returns an array of rows that include frequent items, estimates, lower and upper bounds<br>given an error\_type and a threshold.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Param error\_type: determines whether no false positives o [...] +## Aggregate Functions +| Function Name | Signature | Description | +|---|---|---| +| [frequent_strings_sketch_merge](../fi/sqlx/frequent_strings_sketch_merge.sqlx) | (sketch BYTES, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as an integer not less than 3.<br>Returns: a serialized Frequent Strings sketch as BYTES. | +| [frequent_strings_sketch_build](../fi/sqlx/frequent_strings_sketch_build.sqlx) | (item STRING, weight INT64, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Creates a sketch that represents frequencies of the given column.<br><br>Param item: the column of STRING values.<br>Param weight: the amount by which the weight of the item should be increased.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as a BYTEINT not less than 3.<br>Returns: a Frequent Strings Sketch, as [...] + +## Scalar Functions +| Function Name | Signature | Description | +|---|---|---| +| [frequent_strings_sketch_to_string](../fi/sqlx/frequent_strings_sketch_to_string.sqlx) | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | +| [frequent_strings_sketch_get_result](../fi/sqlx/frequent_strings_sketch_get_result.sqlx) | (sketch BYTES, error_type STRING, threshold INT64) -> ARRAY<STRUCT<item STRING, estimate INT64, lower_bound INT64, upper_bound INT64>> | Returns an array of rows that include frequent items, estimates, lower and upper bounds<br>given an error\_type and a threshold.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Param error\_type: determines whether no false positives or no fals [...] **Examples:** diff --git a/hll/README.md b/hll/README.md index 127a88e..294308a 100644 --- a/hll/README.md +++ b/hll/README.md @@ -35,19 +35,24 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [hll_sketch_agg_string](../hll/sqlx/hll_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | -| [hll_sketch_agg_union](../hll/sqlx/hll_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | -| [hll_sketch_agg_int64](../hll/sqlx/hll_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | -| [hll_sketch_agg_string_lgk_type](../hll/sqlx/hll_sketch_agg_string_lgk_type.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: [...] -| [hll_sketch_agg_union_lgk_type](../hll/sqlx/hll_sketch_agg_union_lgk_type.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Re [...] -| [hll_sketch_agg_int64_lgk_type](../hll/sqlx/hll_sketch_agg_int64_lgk_type.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: a [...] -| [hll_sketch_get_estimate](../hll/sqlx/hll_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: the cardinality estimate as FLOAT64 value. | -| [hll_sketch_to_string](../hll/sqlx/hll_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a STRING that represents the state of the given sketch. | -| [hll_sketch_union](../hll/sqlx/hll_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | -| [hll_sketch_union_lgk_type](../hll/sqlx/hll_sketch_union_lgk_type.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, tgt_type STRING) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}. [...] -| [hll_sketch_get_estimate_and_bounds](../hll/sqlx/hll_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the r [...] +## Aggregate Functions +| Function Name | Signature | Description | +|---|---|---| +| [hll_sketch_agg_string](../hll/sqlx/hll_sketch_agg_string.sqlx) | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | +| [hll_sketch_agg_union](../hll/sqlx/hll_sketch_agg_union.sqlx) | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | +| [hll_sketch_agg_int64](../hll/sqlx/hll_sketch_agg_int64.sqlx) | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | +| [hll_sketch_agg_string_lgk_type](../hll/sqlx/hll_sketch_agg_string_lgk_type.sqlx) | (str STRING, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an HLL Sketc [...] +| [hll_sketch_agg_union_lgk_type](../hll/sqlx/hll_sketch_agg_union_lgk_type.sqlx) | (sketch BYTES, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an HL [...] +| [hll_sketch_agg_int64_lgk_type](../hll/sqlx/hll_sketch_agg_int64_lgk_type.sqlx) | (value INT64, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an HLL Sketch [...] + +## Scalar Functions +| Function Name | Signature | Description | +|---|---|---| +| [hll_sketch_get_estimate](../hll/sqlx/hll_sketch_get_estimate.sqlx) | (sketch BYTES) -> FLOAT64 | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: the cardinality estimate as FLOAT64 value. | +| [hll_sketch_to_string](../hll/sqlx/hll_sketch_to_string.sqlx) | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a STRING that represents the state of the given sketch. | +| [hll_sketch_union](../hll/sqlx/hll_sketch_union.sqlx) | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | +| [hll_sketch_union_lgk_type](../hll/sqlx/hll_sketch_union_lgk_type.sqlx) | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, tgt_type STRING) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Retur [...] +| [hll_sketch_get_estimate_and_bounds](../hll/sqlx/hll_sketch_get_estimate_and_bounds.sqlx) | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the returned e [...] **Examples:** diff --git a/kll/README.md b/kll/README.md index 30a707d..d2a5718 100644 --- a/kll/README.md +++ b/kll/README.md @@ -35,23 +35,28 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [kll_sketch_float_build](../kll/sqlx/kll_sketch_float_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a KLL Sketch, as bytes. | -| [kll_sketch_float_merge](../kll/sqlx/kll_sketch_float_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaluts: k = 200.<br>Returns: a serialized KLL sketch as BYTES. | -| [kll_sketch_float_merge_k](../kll/sqlx/kll_sketch_float_merge_k.sqlx) | AGGREGATE | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[8, 65535\].<br>Returns: a serialized KLL sketch as BYTES. | -| [kll_sketch_float_build_k](../kll/sqlx/kll_sketch_float_build_k.sqlx) | AGGREGATE | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[8, 65535\].<br>Returns: a KLL Sketch, as bytes. | -| [kll_sketch_float_get_n](../kll/sqlx/kll_sketch_float_get_n.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 | -| [kll_sketch_float_get_min_value](../kll/sqlx/kll_sketch_float_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | -| [kll_sketch_float_to_string](../kll/sqlx/kll_sketch_float_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | -| [kll_sketch_float_get_num_retained](../kll/sqlx/kll_sketch_float_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 | -| [kll_sketch_float_get_max_value](../kll/sqlx/kll_sketch_float_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | -| [kll_sketch_float_get_normalized_rank_error](../kll/sqlx/kll_sketch_float_get_normalized_rank_error.sqlx) | SCALAR | (sketch BYTES, pmf BOOL) -> FLOAT64 | Returns the approximate rank error of the given sketch normalized as a fraction between zero and one.<br>Param sketch: the given sketch as BYTES.<br>Param pmf: if true, returns the "double\-sided" normalized rank error for the get\_PMF\(\) function.<br>Otherwise, it is the "single\-sided" normalized rank error for all the other queri [...] -| [kll_sketch_float_get_rank](../kll/sqlx/kll_sketch_float_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. | -| [kll_sketch_float_get_pmf](../kll/sqlx/kll_sketch_float_get_pmf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input va [...] -| [kll_sketch_float_kolmogorov_smirnov](../kll/sqlx/kll_sketch_float_kolmogorov_smirnov.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, pvalue FLOAT64) -> BOOL | Performs the Kolmogorov\-Smirnov Test between two KLL sketches of type FLOAT64.<br>If the given sketches have insufficient data or if the sketch sizes are too small, this will return false.<br><br>Param sketchA: sketch A in serialized form.<br>Param sketchB: sketch B in serialized form.<br>Param pvalue: Target p\-value. Typicall [...] -| [kll_sketch_float_get_cdf](../kll/sqlx/kll_sketch_float_get_cdf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as [...] -| [kll_sketch_float_get_quantile](../kll/sqlx/kll_sketch_float_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\ [...] +## Aggregate Functions +| Function Name | Signature | Description | +|---|---|---| +| [kll_sketch_float_build](../kll/sqlx/kll_sketch_float_build.sqlx) | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a KLL Sketch, as bytes. | +| [kll_sketch_float_merge](../kll/sqlx/kll_sketch_float_merge.sqlx) | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaluts: k = 200.<br>Returns: a serialized KLL sketch as BYTES. | +| [kll_sketch_float_merge_k](../kll/sqlx/kll_sketch_float_merge_k.sqlx) | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[8, 65535\].<br>Returns: a serialized KLL sketch as BYTES. | +| [kll_sketch_float_build_k](../kll/sqlx/kll_sketch_float_build_k.sqlx) | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[8, 65535\].<br>Returns: a KLL Sketch, as bytes. | + +## Scalar Functions +| Function Name | Signature | Description | +|---|---|---| +| [kll_sketch_float_get_n](../kll/sqlx/kll_sketch_float_get_n.sqlx) | (sketch BYTES) -> INT64 | Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 | +| [kll_sketch_float_get_min_value](../kll/sqlx/kll_sketch_float_get_min_value.sqlx) | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | +| [kll_sketch_float_to_string](../kll/sqlx/kll_sketch_float_to_string.sqlx) | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | +| [kll_sketch_float_get_num_retained](../kll/sqlx/kll_sketch_float_get_num_retained.sqlx) | (sketch BYTES) -> INT64 | Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 | +| [kll_sketch_float_get_max_value](../kll/sqlx/kll_sketch_float_get_max_value.sqlx) | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | +| [kll_sketch_float_get_normalized_rank_error](../kll/sqlx/kll_sketch_float_get_normalized_rank_error.sqlx) | (sketch BYTES, pmf BOOL) -> FLOAT64 | Returns the approximate rank error of the given sketch normalized as a fraction between zero and one.<br>Param sketch: the given sketch as BYTES.<br>Param pmf: if true, returns the "double\-sided" normalized rank error for the get\_PMF\(\) function.<br>Otherwise, it is the "single\-sided" normalized rank error for all the other queries.<br>Re [...] +| [kll_sketch_float_get_rank](../kll/sqlx/kll_sketch_float_get_rank.sqlx) | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. | +| [kll_sketch_float_get_pmf](../kll/sqlx/kll_sketch_float_get_pmf.sqlx) | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input values\)<br [...] +| [kll_sketch_float_kolmogorov_smirnov](../kll/sqlx/kll_sketch_float_kolmogorov_smirnov.sqlx) | (sketchA BYTES, sketchB BYTES, pvalue FLOAT64) -> BOOL | Performs the Kolmogorov\-Smirnov Test between two KLL sketches of type FLOAT64.<br>If the given sketches have insufficient data or if the sketch sizes are too small, this will return false.<br><br>Param sketchA: sketch A in serialized form.<br>Param sketchB: sketch B in serialized form.<br>Param pvalue: Target p\-value. Typically 0.001 t [...] +| [kll_sketch_float_get_cdf](../kll/sqlx/kll_sketch_float_get_cdf.sqlx) | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as the inpu [...] +| [kll_sketch_float_get_quantile](../kll/sqlx/kll_sketch_float_get_quantile.sqlx) | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\)<br>Retu [...] **Examples:** diff --git a/readme_generator.py b/readme_generator.py index 4029422..dad5227 100644 --- a/readme_generator.py +++ b/readme_generator.py @@ -126,16 +126,22 @@ def generate_readme(template_path: str, function_index: dict, examples_path: str with open(template_path, 'r') as template_file: output_lines = template_file.readlines() + table_header = "| Function Name | Signature | Description |\n|---|---|---|\n" + # Generate the table content - output_lines += "\n" - output_lines += "| Function Name | Function Type | Signature | Description |\n" - output_lines += "|---|---|---|---|\n" # table header + output_lines += "\n## Aggregate Functions\n" + output_lines += table_header # Sort functions by function type (AGGREGATE first, then SCALAR) and then by number of arguments sorted_functions = sorted(function_index, key=lambda x: (x['function_type'], len(x['signature'].split(','))), reverse=False) + is_aggregate = True for function in sorted_functions: + if is_aggregate and function['function_type'] == 'SCALAR': + output_lines += "\n## Scalar Functions\n" + output_lines += table_header + is_aggregate = False function_link = f"[{function['function_name']}](../{function['path']})" - output_lines += f"| {function_link} | {function['function_type']} | {function['signature']} | {function['description']} |\n" + output_lines += f"| {function_link} | {function['signature']} | {function['description']} |\n" # Add examples section example_files = [f for f in os.listdir(examples_path) if f.endswith("_test.sql")] diff --git a/req/README.md b/req/README.md index 66dfafe..867530e 100644 --- a/req/README.md +++ b/req/README.md @@ -37,23 +37,28 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [req_sketch_float_build](../req/sqlx/req_sketch_float_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 12, hra = true.<br>Returns: a serialized REQ Sketch as BYTES. | -| [req_sketch_float_merge](../req/sqlx/req_sketch_float_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of sketches.<br>Defaults: k = 12, hra = true.<br>Returns: a serialized REQ sketch as BYTES. | -| [req_sketch_float_build_k_hra](../req/sqlx/req_sketch_float_build_k_hra.sqlx) | AGGREGATE | (value FLOAT64, params STRUCT<k INT, hra BOOL> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an even INT in the range \[4, 65534\].<br>Param hra: if true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized [...] -| [req_sketch_float_merge_k_hra](../req/sqlx/req_sketch_float_merge_k_hra.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<k INT, hra BOOL> NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an even INT in the range \[4, 65534\].<br>Param hra: if true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized for better accuracy.<br>Returns: a seria [...] -| [req_sketch_float_get_n](../req/sqlx/req_sketch_float_get_n.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 | -| [req_sketch_float_get_num_retained](../req/sqlx/req_sketch_float_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 | -| [req_sketch_float_get_min_value](../req/sqlx/req_sketch_float_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | -| [req_sketch_float_to_string](../req/sqlx/req_sketch_float_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a string that represents the state of the given sketch. | -| [req_sketch_float_get_max_value](../req/sqlx/req_sketch_float_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | -| [req_sketch_float_get_cdf](../req/sqlx/req_sketch_float_get_cdf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as [...] -| [req_sketch_float_get_rank_lower_bound](../req/sqlx/req_sketch_float_get_rank_lower_bound.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, num_std_dev BYTEINT) -> FLOAT64 | Returns an approximate lower bound of the given normalized rank.<br>Param sketch: the given sketch as BYTES.<br>Param rank: the given rank, a value between 0 and 1.0.<br>Param num\_std\_dev: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br [...] -| [req_sketch_float_get_pmf](../req/sqlx/req_sketch_float_get_pmf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input va [...] -| [req_sketch_float_get_quantile](../req/sqlx/req_sketch_float_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\ [...] -| [req_sketch_float_get_rank_upper_bound](../req/sqlx/req_sketch_float_get_rank_upper_bound.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, num_std_dev BYTEINT) -> FLOAT64 | Returns an approximate upper bound of the given normalized rank.<br>Param sketch: the given sketch as BYTES.<br>Param rank: the given rank, a value between 0 and 1.0.<br>Param num\_std\_dev: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br [...] -| [req_sketch_float_get_rank](../req/sqlx/req_sketch_float_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. | +## Aggregate Functions +| Function Name | Signature | Description | +|---|---|---| +| [req_sketch_float_build](../req/sqlx/req_sketch_float_build.sqlx) | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 12, hra = true.<br>Returns: a serialized REQ Sketch as BYTES. | +| [req_sketch_float_merge](../req/sqlx/req_sketch_float_merge.sqlx) | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of sketches.<br>Defaults: k = 12, hra = true.<br>Returns: a serialized REQ sketch as BYTES. | +| [req_sketch_float_build_k_hra](../req/sqlx/req_sketch_float_build_k_hra.sqlx) | (value FLOAT64, params STRUCT<k INT, hra BOOL> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an even INT in the range \[4, 65534\].<br>Param hra: if true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized for better a [...] +| [req_sketch_float_merge_k_hra](../req/sqlx/req_sketch_float_merge_k_hra.sqlx) | (sketch BYTES, params STRUCT<k INT, hra BOOL> NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an even INT in the range \[4, 65534\].<br>Param hra: if true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized for better accuracy.<br>Returns: a serialized REQ sk [...] + +## Scalar Functions +| Function Name | Signature | Description | +|---|---|---| +| [req_sketch_float_get_n](../req/sqlx/req_sketch_float_get_n.sqlx) | (sketch BYTES) -> INT64 | Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 | +| [req_sketch_float_get_num_retained](../req/sqlx/req_sketch_float_get_num_retained.sqlx) | (sketch BYTES) -> INT64 | Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 | +| [req_sketch_float_get_min_value](../req/sqlx/req_sketch_float_get_min_value.sqlx) | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | +| [req_sketch_float_to_string](../req/sqlx/req_sketch_float_to_string.sqlx) | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a string that represents the state of the given sketch. | +| [req_sketch_float_get_max_value](../req/sqlx/req_sketch_float_get_max_value.sqlx) | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | +| [req_sketch_float_get_cdf](../req/sqlx/req_sketch_float_get_cdf.sqlx) | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as the inpu [...] +| [req_sketch_float_get_rank_lower_bound](../req/sqlx/req_sketch_float_get_rank_lower_bound.sqlx) | (sketch BYTES, rank FLOAT64, num_std_dev BYTEINT) -> FLOAT64 | Returns an approximate lower bound of the given normalized rank.<br>Param sketch: the given sketch as BYTES.<br>Param rank: the given rank, a value between 0 and 1.0.<br>Param num\_std\_dev: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from t [...] +| [req_sketch_float_get_pmf](../req/sqlx/req_sketch_float_get_pmf.sqlx) | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input values\)<br [...] +| [req_sketch_float_get_quantile](../req/sqlx/req_sketch_float_get_quantile.sqlx) | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\)<br>Retu [...] +| [req_sketch_float_get_rank_upper_bound](../req/sqlx/req_sketch_float_get_rank_upper_bound.sqlx) | (sketch BYTES, rank FLOAT64, num_std_dev BYTEINT) -> FLOAT64 | Returns an approximate upper bound of the given normalized rank.<br>Param sketch: the given sketch as BYTES.<br>Param rank: the given rank, a value between 0 and 1.0.<br>Param num\_std\_dev: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from t [...] +| [req_sketch_float_get_rank](../req/sqlx/req_sketch_float_get_rank.sqlx) | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. | **Examples:** diff --git a/tdigest/README.md b/tdigest/README.md index efb6bee..adca516 100644 --- a/tdigest/README.md +++ b/tdigest/README.md @@ -34,18 +34,23 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [tdigest_double_build](../tdigest/sqlx/tdigest_double_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a t\-Digest, as bytes. | -| [tdigest_double_merge](../tdigest/sqlx/tdigest_double_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaults: k = 200.<br>Returns: a serialized t\-Digest as BYTES. | -| [tdigest_double_merge_k](../tdigest/sqlx/tdigest_double_merge_k.sqlx) | AGGREGATE | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[10, 65535\].<br>Returns: a serialized t\-Digest as BYTES. | -| [tdigest_double_build_k](../tdigest/sqlx/tdigest_double_build_k.sqlx) | AGGREGATE | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[10, 65535\].<br>Returns: a t\-Digest, as bytes. | -| [tdigest_double_get_max_value](../tdigest/sqlx/tdigest_double_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | -| [tdigest_double_to_string](../tdigest/sqlx/tdigest_double_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | -| [tdigest_double_get_total_weight](../tdigest/sqlx/tdigest_double_get_total_weight.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the total weight of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: total weight as INT64 | -| [tdigest_double_get_min_value](../tdigest/sqlx/tdigest_double_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | -| [tdigest_double_get_rank](../tdigest/sqlx/tdigest_double_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Returns: an approximate rank of the given value. | -| [tdigest_double_get_quantile](../tdigest/sqlx/tdigest_double_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Returns: an approximate quantile associated with the given rank. | +## Aggregate Functions +| Function Name | Signature | Description | +|---|---|---| +| [tdigest_double_build](../tdigest/sqlx/tdigest_double_build.sqlx) | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a t\-Digest, as bytes. | +| [tdigest_double_merge](../tdigest/sqlx/tdigest_double_merge.sqlx) | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaults: k = 200.<br>Returns: a serialized t\-Digest as BYTES. | +| [tdigest_double_merge_k](../tdigest/sqlx/tdigest_double_merge_k.sqlx) | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[10, 65535\].<br>Returns: a serialized t\-Digest as BYTES. | +| [tdigest_double_build_k](../tdigest/sqlx/tdigest_double_build_k.sqlx) | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[10, 65535\].<br>Returns: a t\-Digest, as bytes. | + +## Scalar Functions +| Function Name | Signature | Description | +|---|---|---| +| [tdigest_double_get_max_value](../tdigest/sqlx/tdigest_double_get_max_value.sqlx) | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | +| [tdigest_double_to_string](../tdigest/sqlx/tdigest_double_to_string.sqlx) | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | +| [tdigest_double_get_total_weight](../tdigest/sqlx/tdigest_double_get_total_weight.sqlx) | (sketch BYTES) -> INT64 | Returns the total weight of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: total weight as INT64 | +| [tdigest_double_get_min_value](../tdigest/sqlx/tdigest_double_get_min_value.sqlx) | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | +| [tdigest_double_get_rank](../tdigest/sqlx/tdigest_double_get_rank.sqlx) | (sketch BYTES, value FLOAT64) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Returns: an approximate rank of the given value. | +| [tdigest_double_get_quantile](../tdigest/sqlx/tdigest_double_get_quantile.sqlx) | (sketch BYTES, rank FLOAT64) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Returns: an approximate quantile associated with the given rank. | **Examples:** diff --git a/theta/README.md b/theta/README.md index cef3a08..a01442a 100644 --- a/theta/README.md +++ b/theta/README.md @@ -36,32 +36,37 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [theta_sketch_agg_int64](../theta/sqlx/theta_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br> <br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_agg_union](../theta/sqlx/theta_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_agg_string](../theta/sqlx/theta_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br> <br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_agg_union_lgk_seed](../theta/sqlx/theta_sketch_agg_union_lgk_seed.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with t [...] -| [theta_sketch_agg_int64_lgk_seed_p](../theta/sqlx/theta_sketch_agg_int64_lgk_seed_p.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the [...] -| [theta_sketch_agg_string_lgk_seed_p](../theta/sqlx/theta_sketch_agg_string_lgk_seed_p.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the [...] -| [theta_sketch_get_estimate](../theta/sqlx/theta_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Gets distinct count estimate from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: distinct count estimate as FLOAT64. | -| [theta_sketch_to_string](../theta/sqlx/theta_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. | -| [theta_sketch_get_num_retained](../theta/sqlx/theta_sketch_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: number of retained entries as INT. | -| [theta_sketch_get_theta](../theta/sqlx/theta_sketch_get_theta.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. | -| [theta_sketch_get_num_retained_seed](../theta/sqlx/theta_sketch_get_num_retained_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: number of retained entries as INT. | -| [theta_sketch_get_estimate_seed](../theta/sqlx/theta_sketch_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Gets distinct count estimate from a given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: distinct count estimate as FLOA64. | -| [theta_sketch_to_string_seed](../theta/sqlx/theta_sketch_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. | -| [theta_sketch_get_theta_seed](../theta/sqlx/theta_sketch_get_theta_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: theta as FLOAT64. | -| [theta_sketch_intersection](../theta/sqlx/theta_sketch_intersection.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_union](../theta/sqlx/theta_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_a_not_b](../theta/sqlx/theta_sketch_a_not_b.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_intersection_seed](../theta/sqlx/theta_sketch_intersection_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_a_not_b_seed](../theta/sqlx/theta_sketch_a_not_b_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_union_lgk_seed](../theta/sqlx/theta_sketch_union_lgk_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were con [...] -| [theta_sketch_get_estimate_and_bounds](../theta/sqlx/theta_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets distinct count estimate and bounds from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations [...] -| [theta_sketch_jaccard_similarity](../theta/sqlx/theta_sketch_jaccard_similarity.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br [...] -| [theta_sketch_get_estimate_and_bounds_seed](../theta/sqlx/theta_sketch_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets distinct count estimate and bounds from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number [...] -| [theta_sketch_jaccard_similarity_seed](../theta/sqlx/theta_sketch_jaccard_similarity_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two ske [...] +## Aggregate Functions +| Function Name | Signature | Description | +|---|---|---| +| [theta_sketch_agg_int64](../theta/sqlx/theta_sketch_agg_int64.sqlx) | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br> <br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_agg_union](../theta/sqlx/theta_sketch_agg_union.sqlx) | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_agg_string](../theta/sqlx/theta_sketch_agg_string.sqlx) | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br> <br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_agg_union_lgk_seed](../theta/sqlx/theta_sketch_agg_union_lgk_seed.sqlx) | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the correct s [...] +| [theta_sketch_agg_int64_lgk_seed_p](../theta/sqlx/theta_sketch_agg_int64_lgk_seed_p.sqlx) | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the underlying h [...] +| [theta_sketch_agg_string_lgk_seed_p](../theta/sqlx/theta_sketch_agg_string_lgk_seed_p.sqlx) | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the underlying [...] + +## Scalar Functions +| Function Name | Signature | Description | +|---|---|---| +| [theta_sketch_get_estimate](../theta/sqlx/theta_sketch_get_estimate.sqlx) | (sketch BYTES) -> FLOAT64 | Gets distinct count estimate from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: distinct count estimate as FLOAT64. | +| [theta_sketch_to_string](../theta/sqlx/theta_sketch_to_string.sqlx) | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. | +| [theta_sketch_get_num_retained](../theta/sqlx/theta_sketch_get_num_retained.sqlx) | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: number of retained entries as INT. | +| [theta_sketch_get_theta](../theta/sqlx/theta_sketch_get_theta.sqlx) | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. | +| [theta_sketch_get_num_retained_seed](../theta/sqlx/theta_sketch_get_num_retained_seed.sqlx) | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: number of retained entries as INT. | +| [theta_sketch_get_estimate_seed](../theta/sqlx/theta_sketch_get_estimate_seed.sqlx) | (sketch BYTES, seed INT64) -> FLOAT64 | Gets distinct count estimate from a given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: distinct count estimate as FLOA64. | +| [theta_sketch_to_string_seed](../theta/sqlx/theta_sketch_to_string_seed.sqlx) | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. | +| [theta_sketch_get_theta_seed](../theta/sqlx/theta_sketch_get_theta_seed.sqlx) | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: theta as FLOAT64. | +| [theta_sketch_intersection](../theta/sqlx/theta_sketch_intersection.sqlx) | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_union](../theta/sqlx/theta_sketch_union.sqlx) | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_a_not_b](../theta/sqlx/theta_sketch_a_not_b.sqlx) | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_intersection_seed](../theta/sqlx/theta_sketch_intersection_seed.sqlx) | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_a_not_b_seed](../theta/sqlx/theta_sketch_a_not_b_seed.sqlx) | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | +| [theta_sketch_union_lgk_seed](../theta/sqlx/theta_sketch_union_lgk_seed.sqlx) | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured w [...] +| [theta_sketch_get_estimate_and_bounds](../theta/sqlx/theta_sketch_get_estimate_and_bounds.sqlx) | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets distinct count estimate and bounds from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations from the [...] +| [theta_sketch_jaccard_similarity](../theta/sqlx/theta_sketch_jaccard_similarity.sqlx) | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br>A Jaccar [...] +| [theta_sketch_get_estimate_and_bounds_seed](../theta/sqlx/theta_sketch_get_estimate_and_bounds_seed.sqlx) | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets distinct count estimate and bounds from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standa [...] +| [theta_sketch_jaccard_similarity_seed](../theta/sqlx/theta_sketch_jaccard_similarity_seed.sqlx) | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are [...] **Examples:** diff --git a/tuple/README.md b/tuple/README.md index 85a2e03..064f879 100644 --- a/tuple/README.md +++ b/tuple/README.md @@ -36,38 +36,43 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [tuple_sketch_int64_agg_union](../tuple/sqlx/tuple_sketch_int64_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the given column of Tuple Sketches with an INT64 summary column. This may not be [...] -| [tuple_sketch_int64_agg_string](../tuple/sqlx/tuple_sketch_int64_agg_string.sqlx) | AGGREGATE | (key STRING, value INT64) -> BYTES | Builds a Tuple Sketch from a STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with a STRING Key column and an [...] -| [tuple_sketch_int64_agg_int64](../tuple/sqlx/tuple_sketch_int64_agg_int64.sqlx) | AGGREGATE | (key INT64, value INT64) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 Key column and an INT [...] -| [tuple_sketch_int64_agg_union_lgk_seed_mode](../tuple/sqlx/tuple_sketch_int64_agg_union_lgk_seed_mode.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><b [...] -| [tuple_sketch_int64_agg_int64_lgk_seed_p_mode](../tuple/sqlx/tuple_sketch_int64_agg_int64_lgk_seed_p_mode.sqlx) | AGGREGATE | (key INT64, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: { SUM, MIN, MAX, ONE \(constant 1\) }.<br>Note that cardinality estimation accuracy, [...] -| [tuple_sketch_int64_agg_string_lgk_seed_p_mode](../tuple/sqlx/tuple_sketch_int64_agg_string_lgk_seed_p_mode.sqlx) | AGGREGATE | (key STRING, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from a STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: SUM, MIN, MAX, ONE.<br>Note that cardinality estimation accuracy, plots, error ta [...] -| [tuple_sketch_int64_to_string](../tuple/sqlx/tuple_sketch_int64_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Defaults: seed = 9001.<br> [...] -| [tuple_sketch_int64_get_estimate](../tuple/sqlx/tuple_sketch_int64_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: the cardinality [...] -| [tuple_sketch_int64_get_theta](../tuple/sqlx/tuple_sketch_int64_get_theta.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. | -| [tuple_sketch_int64_get_num_retained](../tuple/sqlx/tuple_sketch_int64_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: number of re [...] -| [tuple_sketch_int64_get_theta_seed](../tuple/sqlx/tuple_sketch_int64_get_theta_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used t [...] -| [tuple_sketch_int64_get_num_retained_seed](../tuple/sqlx/tuple_sketch_int64_get_num_retained_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used [...] -| [tuple_sketch_int64_to_string_seed](../tuple/sqlx/tuple_sketch_int64_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Para [...] -| [tuple_sketch_int64_a_not_b](../tuple/sqlx/tuple_sketch_int64_a_not_b.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column. <br> <br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: th [...] -| [tuple_sketch_int64_from_theta_sketch](../tuple/sqlx/tuple_sketch_int64_from_theta_sketch.sqlx) | SCALAR | (sketch BYTES, value INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This may not be NULL.<br [...] -| [tuple_sketch_int64_get_estimate_seed](../tuple/sqlx/tuple_sketch_int64_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to c [...] -| [tuple_sketch_int64_intersection](../tuple/sqlx/tuple_sketch_int64_intersection.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES.<br>Param sketchB: the second sketc [...] -| [tuple_sketch_int64_union](../tuple/sqlx/tuple_sketch_int64_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketc [...] -| [tuple_sketch_int64_from_theta_sketch_seed](../tuple/sqlx/tuple_sketch_int64_from_theta_sketch_seed.sqlx) | SCALAR | (sketch BYTES, value INT64, seed INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. Th [...] -| [tuple_sketch_int64_a_not_b_seed](../tuple/sqlx/tuple_sketch_int64_a_not_b_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be [...] -| [tuple_sketch_int64_filter_low_high](../tuple/sqlx/tuple_sketch_int64_filter_low_high.sqlx) | SCALAR | (sketch BYTES, low INT64, high INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that cardinality estimation [...] -| [tuple_sketch_int64_get_estimate_and_bounds](../tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <b [...] -| [tuple_sketch_int64_filter_low_high_seed](../tuple/sqlx/tuple_sketch_int64_filter_low_high_seed.sqlx) | SCALAR | (sketch BYTES, low INT64, high INT64, seed INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that [...] -| [tuple_sketch_int64_jaccard_similarity](../tuple/sqlx/tuple_sketch_int64_jaccard_similarity.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are [...] -| [tuple_sketch_int64_get_sum_estimate_and_bounds](../tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.< [...] -| [tuple_sketch_int64_intersection_seed_mode](../tuple/sqlx/tuple_sketch_int64_intersection_seed_mode.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64, mode STRING) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" [...] -| [tuple_sketch_int64_get_sum_estimate_and_bounds_seed](../tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same [...] -| [tuple_sketch_int64_union_lgk_seed_mode](../tuple/sqlx/tuple_sketch_int64_union_lgk_seed_mode.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64, mode STRING) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" [...] -| [tuple_sketch_int64_get_estimate_and_bounds_seed](../tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 s [...] -| [tuple_sketch_int64_jaccard_similarity_seed](../tuple/sqlx/tuple_sketch_int64_jaccard_similarity_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, [...] +## Aggregate Functions +| Function Name | Signature | Description | +|---|---|---| +| [tuple_sketch_int64_agg_union](../tuple/sqlx/tuple_sketch_int64_agg_union.sqlx) | (sketch BYTES) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the given column of Tuple Sketches with an INT64 summary column. This may not be NULL.<br>De [...] +| [tuple_sketch_int64_agg_string](../tuple/sqlx/tuple_sketch_int64_agg_string.sqlx) | (key STRING, value INT64) -> BYTES | Builds a Tuple Sketch from a STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with a STRING Key column and an INT64 summar [...] +| [tuple_sketch_int64_agg_int64](../tuple/sqlx/tuple_sketch_int64_agg_int64.sqlx) | (key INT64, value INT64) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 Key column and an INT64 summary c [...] +| [tuple_sketch_int64_agg_union_lgk_seed_mode](../tuple/sqlx/tuple_sketch_int64_agg_union_lgk_seed_mode.sqlx) | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sket [...] +| [tuple_sketch_int64_agg_int64_lgk_seed_p_mode](../tuple/sqlx/tuple_sketch_int64_agg_int64_lgk_seed_p_mode.sqlx) | (key INT64, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: { SUM, MIN, MAX, ONE \(constant 1\) }.<br>Note that cardinality estimation accuracy, plots, erro [...] +| [tuple_sketch_int64_agg_string_lgk_seed_p_mode](../tuple/sqlx/tuple_sketch_int64_agg_string_lgk_seed_p_mode.sqlx) | (key STRING, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from a STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: SUM, MIN, MAX, ONE.<br>Note that cardinality estimation accuracy, plots, error tables, and sa [...] + +## Scalar Functions +| Function Name | Signature | Description | +|---|---|---| +| [tuple_sketch_int64_to_string](../tuple/sqlx/tuple_sketch_int64_to_string.sqlx) | (sketch BYTES) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: [...] +| [tuple_sketch_int64_get_estimate](../tuple/sqlx/tuple_sketch_int64_get_estimate.sqlx) | (sketch BYTES) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: the cardinality estimate [...] +| [tuple_sketch_int64_get_theta](../tuple/sqlx/tuple_sketch_int64_get_theta.sqlx) | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. | +| [tuple_sketch_int64_get_num_retained](../tuple/sqlx/tuple_sketch_int64_get_num_retained.sqlx) | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: number of retained en [...] +| [tuple_sketch_int64_get_theta_seed](../tuple/sqlx/tuple_sketch_int64_get_theta_seed.sqlx) | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to confirm [...] +| [tuple_sketch_int64_get_num_retained_seed](../tuple/sqlx/tuple_sketch_int64_get_num_retained_seed.sqlx) | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to confir [...] +| [tuple_sketch_int64_to_string_seed](../tuple/sqlx/tuple_sketch_int64_to_string_seed.sqlx) | (sketch BYTES, seed INT64) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Param seed: T [...] +| [tuple_sketch_int64_a_not_b](../tuple/sqlx/tuple_sketch_int64_a_not_b.sqlx) | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column. <br> <br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second [...] +| [tuple_sketch_int64_from_theta_sketch](../tuple/sqlx/tuple_sketch_int64_from_theta_sketch.sqlx) | (sketch BYTES, value INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This may not be NULL.<br>Defaults [...] +| [tuple_sketch_int64_get_estimate_seed](../tuple/sqlx/tuple_sketch_int64_get_estimate_seed.sqlx) | (sketch BYTES, seed INT64) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to confirm th [...] +| [tuple_sketch_int64_intersection](../tuple/sqlx/tuple_sketch_int64_intersection.sqlx) | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES.<br>Param sketchB: the second sketch "B" as [...] +| [tuple_sketch_int64_union](../tuple/sqlx/tuple_sketch_int64_union.sqlx) | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketch "B" as [...] +| [tuple_sketch_int64_from_theta_sketch_seed](../tuple/sqlx/tuple_sketch_int64_from_theta_sketch_seed.sqlx) | (sketch BYTES, value INT64, seed INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This may no [...] +| [tuple_sketch_int64_a_not_b_seed](../tuple/sqlx/tuple_sketch_int64_a_not_b_seed.sqlx) | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br> [...] +| [tuple_sketch_int64_filter_low_high](../tuple/sqlx/tuple_sketch_int64_filter_low_high.sqlx) | (sketch BYTES, low INT64, high INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that cardinality estimation accuracy [...] +| [tuple_sketch_int64_get_estimate_and_bounds](../tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds.sqlx) | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param s [...] +| [tuple_sketch_int64_filter_low_high_seed](../tuple/sqlx/tuple_sketch_int64_filter_low_high_seed.sqlx) | (sketch BYTES, low INT64, high INT64, seed INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that cardinali [...] +| [tuple_sketch_int64_jaccard_similarity](../tuple/sqlx/tuple_sketch_int64_jaccard_similarity.sqlx) | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint. [...] +| [tuple_sketch_int64_get_sum_estimate_and_bounds](../tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds.sqlx) | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This f [...] +| [tuple_sketch_int64_intersection_seed_mode](../tuple/sqlx/tuple_sketch_int64_intersection_seed_mode.sqlx) | (sketchA BYTES, sketchB BYTES, seed INT64, mode STRING) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES [...] +| [tuple_sketch_int64_get_sum_estimate_and_bounds_seed](../tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds_seed.sqlx) | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the T [...] +| [tuple_sketch_int64_union_lgk_seed_mode](../tuple/sqlx/tuple_sketch_int64_union_lgk_seed_mode.sqlx) | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64, mode STRING) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES [...] +| [tuple_sketch_int64_get_estimate_and_bounds_seed](../tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds_seed.sqlx) | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary co [...] +| [tuple_sketch_int64_jaccard_similarity_seed](../tuple/sqlx/tuple_sketch_int64_jaccard_similarity_seed.sqlx) | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two [...] **Examples:** --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
