This is an automated email from the ASF dual-hosted git repository. alsay pushed a commit to branch no_tables in repository https://gitbox.apache.org/repos/asf/datasketches-bigquery.git
commit dff7a6f55705723e607d49e3f066f90126b01f15 Author: AlexanderSaydakov <[email protected]> AuthorDate: Mon Feb 10 18:46:19 2025 -0800 render docs without tables --- cpc/README.md | 51 +++++++++++++++++---------- fi/README.md | 21 +++++++----- hll/README.md | 42 ++++++++++++++--------- kll/README.md | 54 ++++++++++++++++++----------- readme_generator.py | 31 +++++++++-------- req/README.md | 54 ++++++++++++++++++----------- tdigest/README.md | 39 +++++++++++++-------- theta/README.md | 81 +++++++++++++++++++++++++++---------------- tuple/README.md | 99 ++++++++++++++++++++++++++++++++++------------------- 9 files changed, 295 insertions(+), 177 deletions(-) diff --git a/cpc/README.md b/cpc/README.md index b55e9a3..fc71db7 100644 --- a/cpc/README.md +++ b/cpc/README.md @@ -37,25 +37,38 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [cpc_sketch_agg_union](../cpc/sqlx/cpc_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES. | -| [cpc_sketch_agg_string](../cpc/sqlx/cpc_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES | -| [cpc_sketch_agg_int64](../cpc/sqlx/cpc_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES | -| [cpc_sketch_agg_string_lgk_seed](../cpc/sqlx/cpc_sketch_agg_string_lgk_seed.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compre [...] -| [cpc_sketch_agg_union_lgk_seed](../cpc/sqlx/cpc_sketch_agg_union_lgk_seed.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the co [...] -| [cpc_sketch_agg_int64_lgk_seed](../cpc/sqlx/cpc_sketch_agg_int64_lgk_seed.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compres [...] -| [cpc_sketch_get_estimate](../cpc/sqlx/cpc_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: a FLOAT64 value as the cardinality estimate. | -| [cpc_sketch_to_string](../cpc/sqlx/cpc_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. | -| [cpc_sketch_get_estimate_seed](../cpc/sqlx/cpc_sketch_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a FLOAT64 value as the cardinality estimate. | -| [cpc_sketch_to_string_seed](../cpc/sqlx/cpc_sketch_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. | -| [cpc_sketch_union](../cpc/sqlx/cpc_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a CPC Sketch, as BYTES. | -| [cpc_sketch_get_estimate_and_bounds](../cpc/sqlx/cpc_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the [...] -| [cpc_sketch_union_lgk_seed](../cpc/sqlx/cpc_sketch_union_lgk_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configure [...] -| [cpc_sketch_get_estimate_and_bounds_seed](../cpc/sqlx/cpc_sketch_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard de [...] - -**Examples:** - +## Aggregate Functions +### [cpc_sketch_agg_union(sketch BYTES)](../cpc/sqlx/cpc_sketch_agg_union.sqlx) +Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES. +### [cpc_sketch_agg_string(str STRING)](../cpc/sqlx/cpc_sketch_agg_string.sqlx) +Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES +### [cpc_sketch_agg_int64(value INT64)](../cpc/sqlx/cpc_sketch_agg_int64.sqlx) +Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES +### [cpc_sketch_agg_string_lgk_seed(str STRING, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE)](../cpc/sqlx/cpc_sketch_agg_string_lgk_seed.sqlx) +Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES +### [cpc_sketch_agg_union_lgk_seed(sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE)](../cpc/sqlx/cpc_sketch_agg_union_lgk_seed.sqlx) +Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES. +### [cpc_sketch_agg_int64_lgk_seed(value INT64, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE)](../cpc/sqlx/cpc_sketch_agg_int64_lgk_seed.sqlx) +Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: the seed to be used by the underlying hash function.<br>Returns: a Compact, Compressed CPC Sketch, as BYTES + +## Scalar Functions +### [cpc_sketch_get_estimate(sketch BYTES)](../cpc/sqlx/cpc_sketch_get_estimate.sqlx) +Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: a FLOAT64 value as the cardinality estimate. +### [cpc_sketch_to_string(sketch BYTES)](../cpc/sqlx/cpc_sketch_to_string.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. +### [cpc_sketch_get_estimate_seed(sketch BYTES, seed INT64)](../cpc/sqlx/cpc_sketch_get_estimate_seed.sqlx) +Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a FLOAT64 value as the cardinality estimate. +### [cpc_sketch_to_string_seed(sketch BYTES, seed INT64)](../cpc/sqlx/cpc_sketch_to_string_seed.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. +### [cpc_sketch_union(sketchA BYTES, sketchB BYTES)](../cpc/sqlx/cpc_sketch_union.sqlx) +Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a CPC Sketch, as BYTES. +### [cpc_sketch_get_estimate_and_bounds(sketch BYTES, num_std_devs BYTEINT)](../cpc/sqlx/cpc_sketch_get_estimate_and_bounds.sqlx) +Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the returned estimate. This number may be one of {1,2,3}, where 1 represents 68% confidence, 2 represents 95% confidence and 3 represents 99.7% confidence.<br> For example, if the given num\_std\_devs = 2 and the r [...] +### [cpc_sketch_get_estimate_and_bounds_seed(sketch BYTES, num_std_devs BYTEINT, seed INT64)](../cpc/sqlx/cpc_sketch_get_estimate_and_bounds_seed.sqlx) +Gets cardinality estimate and bounds from given sketch.<br> <br>Param sketch: The given sketch to query as bytes.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the returned estimate. This number may be one of {1,2,3}, where 1 represents 68% confidence, 2 represents 95% confidence and 3 represents 99.7% confidence.<br> For example, if the given num\_std\_devs = 2 and the r [...] +### [cpc_sketch_union_lgk_seed(sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64)](../cpc/sqlx/cpc_sketch_union_lgk_seed.sqlx) +Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a CPC Sketch, as BYTES. +## Examples ```sql # using defaults diff --git a/fi/README.md b/fi/README.md index 3a51be0..04ebeea 100644 --- a/fi/README.md +++ b/fi/README.md @@ -36,15 +36,18 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [frequent_strings_sketch_merge](../fi/sqlx/frequent_strings_sketch_merge.sqlx) | AGGREGATE | (sketch BYTES, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as an integer not less than 3.<br>Returns: a serialized Frequent Strings sketch as BYTES. | -| [frequent_strings_sketch_build](../fi/sqlx/frequent_strings_sketch_build.sqlx) | AGGREGATE | (item STRING, weight INT64, lg_max_map_size BYTEINT NOT AGGREGATE) -> BYTES | Creates a sketch that represents frequencies of the given column.<br><br>Param item: the column of STRING values.<br>Param weight: the amount by which the weight of the item should be increased.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as a BYTEINT not less than 3.<br>Returns: a Frequent Strings [...] -| [frequent_strings_sketch_to_string](../fi/sqlx/frequent_strings_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | -| [frequent_strings_sketch_get_result](../fi/sqlx/frequent_strings_sketch_get_result.sqlx) | SCALAR | (sketch BYTES, error_type STRING, threshold INT64) -> ARRAY<STRUCT<item STRING, estimate INT64, lower_bound INT64, upper_bound INT64>> | Returns an array of rows that include frequent items, estimates, lower and upper bounds<br>given an error\_type and a threshold.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Param error\_type: determines whether no false positives o [...] - -**Examples:** - +## Aggregate Functions +### [frequent_strings_sketch_merge(sketch BYTES, lg_max_map_size BYTEINT NOT AGGREGATE)](../fi/sqlx/frequent_strings_sketch_merge.sqlx) +Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as an integer not less than 3.<br>Returns: a serialized Frequent Strings sketch as BYTES. +### [frequent_strings_sketch_build(item STRING, weight INT64, lg_max_map_size BYTEINT NOT AGGREGATE)](../fi/sqlx/frequent_strings_sketch_build.sqlx) +Creates a sketch that represents frequencies of the given column.<br><br>Param item: the column of STRING values.<br>Param weight: the amount by which the weight of the item should be increased.<br>Param lg\_max\_map\_size: the sketch accuracy/size parameter as a BYTEINT not less than 3.<br>Returns: a Frequent Strings Sketch, as bytes. + +## Scalar Functions +### [frequent_strings_sketch_to_string(sketch BYTES)](../fi/sqlx/frequent_strings_sketch_to_string.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. +### [frequent_strings_sketch_get_result(sketch BYTES, error_type STRING, threshold INT64)](../fi/sqlx/frequent_strings_sketch_get_result.sqlx) +Returns an array of rows that include frequent items, estimates, lower and upper bounds<br>given an error\_type and a threshold.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Param error\_type: determines whether no false positives or no false negatives are desired.<br>Param threshold: a threshold to include items in the result list.<br>If NULL, the maximum error of the sketch is used as a threshold.<br>Returns: an array of frequent items with frequency estimates, low [...] +## Examples ```sql select bqutil.datasketches.frequent_strings_sketch_to_string(bqutil.datasketches.frequent_strings_sketch_build(str, 1, 5)) from unnest(["a", "b", "c"]) as str; diff --git a/hll/README.md b/hll/README.md index 127a88e..d8ee2f6 100644 --- a/hll/README.md +++ b/hll/README.md @@ -35,22 +35,32 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [hll_sketch_agg_string](../hll/sqlx/hll_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | -| [hll_sketch_agg_union](../hll/sqlx/hll_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | -| [hll_sketch_agg_int64](../hll/sqlx/hll_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | -| [hll_sketch_agg_string_lgk_type](../hll/sqlx/hll_sketch_agg_string_lgk_type.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: [...] -| [hll_sketch_agg_union_lgk_type](../hll/sqlx/hll_sketch_agg_union_lgk_type.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Re [...] -| [hll_sketch_agg_int64_lgk_type](../hll/sqlx/hll_sketch_agg_int64_lgk_type.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: a [...] -| [hll_sketch_get_estimate](../hll/sqlx/hll_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: the cardinality estimate as FLOAT64 value. | -| [hll_sketch_to_string](../hll/sqlx/hll_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a STRING that represents the state of the given sketch. | -| [hll_sketch_union](../hll/sqlx/hll_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. | -| [hll_sketch_union_lgk_type](../hll/sqlx/hll_sketch_union_lgk_type.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, tgt_type STRING) -> BYTES | Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}. [...] -| [hll_sketch_get_estimate_and_bounds](../hll/sqlx/hll_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the r [...] - -**Examples:** - +## Aggregate Functions +### [hll_sketch_agg_string(str STRING)](../hll/sqlx/hll_sketch_agg_string.sqlx) +Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. +### [hll_sketch_agg_union(sketch BYTES)](../hll/sqlx/hll_sketch_agg_union.sqlx) +Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. +### [hll_sketch_agg_int64(value INT64)](../hll/sqlx/hll_sketch_agg_int64.sqlx) +Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. +### [hll_sketch_agg_string_lgk_type(str STRING, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE)](../hll/sqlx/hll_sketch_agg_string_lgk_type.sqlx) +Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an HLL Sketch, as BYTES. +### [hll_sketch_agg_union_lgk_type(sketch BYTES, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE)](../hll/sqlx/hll_sketch_agg_union_lgk_type.sqlx) +Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an HLL Sketch, as BYTES. +### [hll_sketch_agg_int64_lgk_type(value INT64, params STRUCT<lg_k BYTEINT, tgt_type STRING> NOT AGGREGATE)](../hll/sqlx/hll_sketch_agg_int64_lgk_type.sqlx) +Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an HLL Sketch, as BYTES. + +## Scalar Functions +### [hll_sketch_get_estimate(sketch BYTES)](../hll/sqlx/hll_sketch_get_estimate.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: the cardinality estimate as FLOAT64 value. +### [hll_sketch_to_string(sketch BYTES)](../hll/sqlx/hll_sketch_to_string.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a STRING that represents the state of the given sketch. +### [hll_sketch_union(sketchA BYTES, sketchB BYTES)](../hll/sqlx/hll_sketch_union.sqlx) +Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Defaults: lg\_k = 12, tgt\_type = HLL\_4.<br>Returns: an HLL Sketch, as BYTES. +### [hll_sketch_get_estimate_and_bounds(sketch BYTES, num_std_devs BYTEINT)](../hll/sqlx/hll_sketch_get_estimate_and_bounds.sqlx) +Gets cardinality estimate and bounds from given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the returned estimate. This number may be one of {1,2,3}, where 1 represents 68% confidence, 2 represents 95% confidence and 3 represents 99.7% confidence.<br> For example, if the given num\_std\_devs = 2 and the ret [...] +### [hll_sketch_union_lgk_type(sketchA BYTES, sketchB BYTES, lg_k BYTEINT, tgt_type STRING)](../hll/sqlx/hll_sketch_union_lgk_type.sqlx) +Computes a sketch that represents the union of the two given sketches.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 21\].<br>Param tgt\_type: The HLL type to use: one of {"HLL\_4", "HLL\_6", "HLL\_8"}.<br>Returns: an HLL Sketch, as BYTES. +## Examples ```sql # expected 3 diff --git a/kll/README.md b/kll/README.md index 30a707d..6ca9918 100644 --- a/kll/README.md +++ b/kll/README.md @@ -35,26 +35,40 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [kll_sketch_float_build](../kll/sqlx/kll_sketch_float_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a KLL Sketch, as bytes. | -| [kll_sketch_float_merge](../kll/sqlx/kll_sketch_float_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaluts: k = 200.<br>Returns: a serialized KLL sketch as BYTES. | -| [kll_sketch_float_merge_k](../kll/sqlx/kll_sketch_float_merge_k.sqlx) | AGGREGATE | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[8, 65535\].<br>Returns: a serialized KLL sketch as BYTES. | -| [kll_sketch_float_build_k](../kll/sqlx/kll_sketch_float_build_k.sqlx) | AGGREGATE | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[8, 65535\].<br>Returns: a KLL Sketch, as bytes. | -| [kll_sketch_float_get_n](../kll/sqlx/kll_sketch_float_get_n.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 | -| [kll_sketch_float_get_min_value](../kll/sqlx/kll_sketch_float_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | -| [kll_sketch_float_to_string](../kll/sqlx/kll_sketch_float_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | -| [kll_sketch_float_get_num_retained](../kll/sqlx/kll_sketch_float_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 | -| [kll_sketch_float_get_max_value](../kll/sqlx/kll_sketch_float_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | -| [kll_sketch_float_get_normalized_rank_error](../kll/sqlx/kll_sketch_float_get_normalized_rank_error.sqlx) | SCALAR | (sketch BYTES, pmf BOOL) -> FLOAT64 | Returns the approximate rank error of the given sketch normalized as a fraction between zero and one.<br>Param sketch: the given sketch as BYTES.<br>Param pmf: if true, returns the "double\-sided" normalized rank error for the get\_PMF\(\) function.<br>Otherwise, it is the "single\-sided" normalized rank error for all the other queri [...] -| [kll_sketch_float_get_rank](../kll/sqlx/kll_sketch_float_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. | -| [kll_sketch_float_get_pmf](../kll/sqlx/kll_sketch_float_get_pmf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input va [...] -| [kll_sketch_float_kolmogorov_smirnov](../kll/sqlx/kll_sketch_float_kolmogorov_smirnov.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, pvalue FLOAT64) -> BOOL | Performs the Kolmogorov\-Smirnov Test between two KLL sketches of type FLOAT64.<br>If the given sketches have insufficient data or if the sketch sizes are too small, this will return false.<br><br>Param sketchA: sketch A in serialized form.<br>Param sketchB: sketch B in serialized form.<br>Param pvalue: Target p\-value. Typicall [...] -| [kll_sketch_float_get_cdf](../kll/sqlx/kll_sketch_float_get_cdf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as [...] -| [kll_sketch_float_get_quantile](../kll/sqlx/kll_sketch_float_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\ [...] - -**Examples:** - +## Aggregate Functions +### [kll_sketch_float_build(value FLOAT64)](../kll/sqlx/kll_sketch_float_build.sqlx) +Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a KLL Sketch, as bytes. +### [kll_sketch_float_merge(sketch BYTES)](../kll/sqlx/kll_sketch_float_merge.sqlx) +Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaluts: k = 200.<br>Returns: a serialized KLL sketch as BYTES. +### [kll_sketch_float_merge_k(sketch BYTES, k INT NOT AGGREGATE)](../kll/sqlx/kll_sketch_float_merge_k.sqlx) +Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[8, 65535\].<br>Returns: a serialized KLL sketch as BYTES. +### [kll_sketch_float_build_k(value FLOAT64, k INT NOT AGGREGATE)](../kll/sqlx/kll_sketch_float_build_k.sqlx) +Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[8, 65535\].<br>Returns: a KLL Sketch, as bytes. + +## Scalar Functions +### [kll_sketch_float_get_n(sketch BYTES)](../kll/sqlx/kll_sketch_float_get_n.sqlx) +Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 +### [kll_sketch_float_get_min_value(sketch BYTES)](../kll/sqlx/kll_sketch_float_get_min_value.sqlx) +Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 +### [kll_sketch_float_to_string(sketch BYTES)](../kll/sqlx/kll_sketch_float_to_string.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. +### [kll_sketch_float_get_num_retained(sketch BYTES)](../kll/sqlx/kll_sketch_float_get_num_retained.sqlx) +Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 +### [kll_sketch_float_get_max_value(sketch BYTES)](../kll/sqlx/kll_sketch_float_get_max_value.sqlx) +Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 +### [kll_sketch_float_get_normalized_rank_error(sketch BYTES, pmf BOOL)](../kll/sqlx/kll_sketch_float_get_normalized_rank_error.sqlx) +Returns the approximate rank error of the given sketch normalized as a fraction between zero and one.<br>Param sketch: the given sketch as BYTES.<br>Param pmf: if true, returns the "double\-sided" normalized rank error for the get\_PMF\(\) function.<br>Otherwise, it is the "single\-sided" normalized rank error for all the other queries.<br>Returns: normalized rank error as FLOAT64 +### [kll_sketch_float_get_rank(sketch BYTES, value FLOAT64, inclusive BOOL)](../kll/sqlx/kll_sketch_float_get_rank.sqlx) +Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. +### [kll_sketch_float_get_pmf(sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL)](../kll/sqlx/kll_sketch_float_get_pmf.sqlx) +Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input values\)<br> that divide the input value domain into M\+1 non\-overlapping intervals.<br> <br> Each interval except for the end intervals starts with a split\-poi [...] +### [kll_sketch_float_kolmogorov_smirnov(sketchA BYTES, sketchB BYTES, pvalue FLOAT64)](../kll/sqlx/kll_sketch_float_kolmogorov_smirnov.sqlx) +Performs the Kolmogorov\-Smirnov Test between two KLL sketches of type FLOAT64.<br>If the given sketches have insufficient data or if the sketch sizes are too small, this will return false.<br><br>Param sketchA: sketch A in serialized form.<br>Param sketchB: sketch B in serialized form.<br>Param pvalue: Target p\-value. Typically 0.001 to 0.1, e.g. 0.05.<br>Returns: boolean indicating whether we can reject the null hypothesis \(that the sketches<br> reflect the same underlying distribut [...] +### [kll_sketch_float_get_cdf(sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL)](../kll/sqlx/kll_sketch_float_get_cdf.sqlx) +Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as the input values to the sketch\)<br> that divide the input value domain into M\+1 overlapping intervals.<br> <br> The start of each interval is below the lowes [...] +### [kll_sketch_float_get_quantile(sketch BYTES, rank FLOAT64, inclusive BOOL)](../kll/sqlx/kll_sketch_float_get_quantile.sqlx) +Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\)<br>Returns: an approximate quantile associated with the given rank. +## Examples ```sql create or replace temp table kll_sketch(sketch bytes); diff --git a/readme_generator.py b/readme_generator.py index 4029422..196b750 100644 --- a/readme_generator.py +++ b/readme_generator.py @@ -87,10 +87,11 @@ def parse_sqlx(file_content: str, filename: str) -> dict: # Determine function type function_type = "AGGREGATE" if "AGGREGATE FUNCTION" in file_content else "SCALAR" return { - "function_name": filename[:-5], # Remove file extension .sqlx - "signature": f"({', '.join([f'{arg[0]} {arg[1]}' for arg in arg_list])}) -> {return_type}", + "name": filename[:-5], # Remove file extension .sqlx + "params": f"({', '.join([f'{arg[0]} {arg[1]}' for arg in arg_list])})", + "returns": return_type, "description": description, - "function_type": function_type, + "type": function_type, } # Function to walk through directories, parse SQLX files, and collect data for README @@ -112,9 +113,10 @@ def process_folder(input_folder: str, sketch_type: str) -> dict: logging.info(f"Parsed data for {file}: {parsed_data}") function_index[sketch_type].append({ - 'function_name': parsed_data['function_name'], - 'signature': parsed_data['signature'], - 'function_type':parsed_data['function_type'], + 'name': parsed_data['name'], + 'params': parsed_data['params'], + 'returns': parsed_data['returns'], + 'type':parsed_data['type'], 'description': parsed_data['description'], 'path': sqlx_path }) @@ -126,21 +128,22 @@ def generate_readme(template_path: str, function_index: dict, examples_path: str with open(template_path, 'r') as template_file: output_lines = template_file.readlines() - # Generate the table content - output_lines += "\n" - output_lines += "| Function Name | Function Type | Signature | Description |\n" - output_lines += "|---|---|---|---|\n" # table header + output_lines += "\n## Aggregate Functions\n" # Sort functions by function type (AGGREGATE first, then SCALAR) and then by number of arguments - sorted_functions = sorted(function_index, key=lambda x: (x['function_type'], len(x['signature'].split(','))), reverse=False) + sorted_functions = sorted(function_index, key=lambda x: (x['type'], len(x['params'].split(','))), reverse=False) + is_aggregate = True for function in sorted_functions: - function_link = f"[{function['function_name']}](../{function['path']})" - output_lines += f"| {function_link} | {function['function_type']} | {function['signature']} | {function['description']} |\n" + if is_aggregate and function['type'] == 'SCALAR': + output_lines += "\n## Scalar Functions\n" + is_aggregate = False + function_link = f"[{function['name']}{function['params']}](../{function['path']})" + output_lines += f"### {function_link}\n{function['description']}\n" # Add examples section example_files = [f for f in os.listdir(examples_path) if f.endswith("_test.sql")] if example_files: - output_lines.append("\n**Examples:**\n\n") + output_lines.append("## Examples\n") for example_file in example_files: # Read the example SQL file with open(os.path.join(examples_path, example_file), 'r') as f: diff --git a/req/README.md b/req/README.md index 66dfafe..0abe1dc 100644 --- a/req/README.md +++ b/req/README.md @@ -37,26 +37,40 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [req_sketch_float_build](../req/sqlx/req_sketch_float_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 12, hra = true.<br>Returns: a serialized REQ Sketch as BYTES. | -| [req_sketch_float_merge](../req/sqlx/req_sketch_float_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of sketches.<br>Defaults: k = 12, hra = true.<br>Returns: a serialized REQ sketch as BYTES. | -| [req_sketch_float_build_k_hra](../req/sqlx/req_sketch_float_build_k_hra.sqlx) | AGGREGATE | (value FLOAT64, params STRUCT<k INT, hra BOOL> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an even INT in the range \[4, 65534\].<br>Param hra: if true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized [...] -| [req_sketch_float_merge_k_hra](../req/sqlx/req_sketch_float_merge_k_hra.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<k INT, hra BOOL> NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an even INT in the range \[4, 65534\].<br>Param hra: if true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized for better accuracy.<br>Returns: a seria [...] -| [req_sketch_float_get_n](../req/sqlx/req_sketch_float_get_n.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 | -| [req_sketch_float_get_num_retained](../req/sqlx/req_sketch_float_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 | -| [req_sketch_float_get_min_value](../req/sqlx/req_sketch_float_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | -| [req_sketch_float_to_string](../req/sqlx/req_sketch_float_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a string that represents the state of the given sketch. | -| [req_sketch_float_get_max_value](../req/sqlx/req_sketch_float_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | -| [req_sketch_float_get_cdf](../req/sqlx/req_sketch_float_get_cdf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as [...] -| [req_sketch_float_get_rank_lower_bound](../req/sqlx/req_sketch_float_get_rank_lower_bound.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, num_std_dev BYTEINT) -> FLOAT64 | Returns an approximate lower bound of the given normalized rank.<br>Param sketch: the given sketch as BYTES.<br>Param rank: the given rank, a value between 0 and 1.0.<br>Param num\_std\_dev: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br [...] -| [req_sketch_float_get_pmf](../req/sqlx/req_sketch_float_get_pmf.sqlx) | SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) -> ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input va [...] -| [req_sketch_float_get_quantile](../req/sqlx/req_sketch_float_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\ [...] -| [req_sketch_float_get_rank_upper_bound](../req/sqlx/req_sketch_float_get_rank_upper_bound.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64, num_std_dev BYTEINT) -> FLOAT64 | Returns an approximate upper bound of the given normalized rank.<br>Param sketch: the given sketch as BYTES.<br>Param rank: the given rank, a value between 0 and 1.0.<br>Param num\_std\_dev: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br [...] -| [req_sketch_float_get_rank](../req/sqlx/req_sketch_float_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. | - -**Examples:** - +## Aggregate Functions +### [req_sketch_float_build(value FLOAT64)](../req/sqlx/req_sketch_float_build.sqlx) +Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 12, hra = true.<br>Returns: a serialized REQ Sketch as BYTES. +### [req_sketch_float_merge(sketch BYTES)](../req/sqlx/req_sketch_float_merge.sqlx) +Merges sketches from the given column.<br><br>Param sketch: the column of sketches.<br>Defaults: k = 12, hra = true.<br>Returns: a serialized REQ sketch as BYTES. +### [req_sketch_float_build_k_hra(value FLOAT64, params STRUCT<k INT, hra BOOL> NOT AGGREGATE)](../req/sqlx/req_sketch_float_build_k_hra.sqlx) +Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an even INT in the range \[4, 65534\].<br>Param hra: if true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized for better accuracy.<br>Returns: a serialized REQ Sketch as BYTES. +### [req_sketch_float_merge_k_hra(sketch BYTES, params STRUCT<k INT, hra BOOL> NOT AGGREGATE)](../req/sqlx/req_sketch_float_merge_k_hra.sqlx) +Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an even INT in the range \[4, 65534\].<br>Param hra: if true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized for better accuracy.<br>Returns: a serialized REQ sketch as BYTES. + +## Scalar Functions +### [req_sketch_float_get_n(sketch BYTES)](../req/sqlx/req_sketch_float_get_n.sqlx) +Returns the length of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: stream length as INT64 +### [req_sketch_float_get_num_retained(sketch BYTES)](../req/sqlx/req_sketch_float_get_num_retained.sqlx) +Returns the number of retained items \(samples\) in the sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: number of retained items as INT64 +### [req_sketch_float_get_min_value(sketch BYTES)](../req/sqlx/req_sketch_float_get_min_value.sqlx) +Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 +### [req_sketch_float_to_string(sketch BYTES)](../req/sqlx/req_sketch_float_to_string.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: a string that represents the state of the given sketch. +### [req_sketch_float_get_max_value(sketch BYTES)](../req/sqlx/req_sketch_float_get_max_value.sqlx) +Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 +### [req_sketch_float_get_cdf(sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL)](../req/sqlx/req_sketch_float_get_cdf.sqlx) +Returns an approximation to the Cumulative Distribution Function \(CDF\) <br>of the input stream as an array of cumulative probabilities defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values<br> \(of the same type as the input values to the sketch\)<br> that divide the input value domain into M\+1 overlapping intervals.<br> <br> The start of each interval is below the lowes [...] +### [req_sketch_float_get_rank_lower_bound(sketch BYTES, rank FLOAT64, num_std_dev BYTEINT)](../req/sqlx/req_sketch_float_get_rank_lower_bound.sqlx) +Returns an approximate lower bound of the given normalized rank.<br>Param sketch: the given sketch as BYTES.<br>Param rank: the given rank, a value between 0 and 1.0.<br>Param num\_std\_dev: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the returned estimate. This number may be one of {1,2,3}, where 1 represents 68% confidence, 2 represents 95% confidence and 3 represents 99.7% confidence.<br>Retur [...] +### [req_sketch_float_get_pmf(sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL)](../req/sqlx/req_sketch_float_get_pmf.sqlx) +Returns an approximation to the Probability Mass Function \(PMF\)<br>of the input stream as an array of probability masses defined by the given split\_points.<br><br>Param sketch: the given sketch as BYTES.<br><br>Param split\_points: an array of M unique, monotonically increasing values <br> \(of the same type as the input values\)<br> that divide the input value domain into M\+1 non\-overlapping intervals.<br> <br> Each interval except for the end intervals starts with a split\-poi [...] +### [req_sketch_float_get_quantile(sketch BYTES, rank FLOAT64, inclusive BOOL)](../req/sqlx/req_sketch_float_get_quantile.sqlx) +Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Param inclusive: if true, the given rank is considered inclusive \(includes weight of a value\)<br>Returns: an approximate quantile associated with the given rank. +### [req_sketch_float_get_rank_upper_bound(sketch BYTES, rank FLOAT64, num_std_dev BYTEINT)](../req/sqlx/req_sketch_float_get_rank_upper_bound.sqlx) +Returns an approximate upper bound of the given normalized rank.<br>Param sketch: the given sketch as BYTES.<br>Param rank: the given rank, a value between 0 and 1.0.<br>Param num\_std\_dev: The returned bounds will be based on the statistical confidence interval determined by the given number of standard deviations<br> from the returned estimate. This number may be one of {1,2,3}, where 1 represents 68% confidence, 2 represents 95% confidence and 3 represents 99.7% confidence.<br>Retur [...] +### [req_sketch_float_get_rank(sketch BYTES, value FLOAT64, inclusive BOOL)](../req/sqlx/req_sketch_float_get_rank.sqlx) +Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Param inclusive: if true the weight of the given value is included into the rank.<br>Returns: an approximate rank of the given value. +## Examples ```sql # using defaults diff --git a/tdigest/README.md b/tdigest/README.md index efb6bee..089a4d2 100644 --- a/tdigest/README.md +++ b/tdigest/README.md @@ -34,21 +34,30 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [tdigest_double_build](../tdigest/sqlx/tdigest_double_build.sqlx) | AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a t\-Digest, as bytes. | -| [tdigest_double_merge](../tdigest/sqlx/tdigest_double_merge.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaults: k = 200.<br>Returns: a serialized t\-Digest as BYTES. | -| [tdigest_double_merge_k](../tdigest/sqlx/tdigest_double_merge_k.sqlx) | AGGREGATE | (sketch BYTES, k INT NOT AGGREGATE) -> BYTES | Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[10, 65535\].<br>Returns: a serialized t\-Digest as BYTES. | -| [tdigest_double_build_k](../tdigest/sqlx/tdigest_double_build_k.sqlx) | AGGREGATE | (value FLOAT64, k INT NOT AGGREGATE) -> BYTES | Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[10, 65535\].<br>Returns: a t\-Digest, as bytes. | -| [tdigest_double_get_max_value](../tdigest/sqlx/tdigest_double_get_max_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 | -| [tdigest_double_to_string](../tdigest/sqlx/tdigest_double_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. | -| [tdigest_double_get_total_weight](../tdigest/sqlx/tdigest_double_get_total_weight.sqlx) | SCALAR | (sketch BYTES) -> INT64 | Returns the total weight of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: total weight as INT64 | -| [tdigest_double_get_min_value](../tdigest/sqlx/tdigest_double_get_min_value.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 | -| [tdigest_double_get_rank](../tdigest/sqlx/tdigest_double_get_rank.sqlx) | SCALAR | (sketch BYTES, value FLOAT64) -> FLOAT64 | Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Returns: an approximate rank of the given value. | -| [tdigest_double_get_quantile](../tdigest/sqlx/tdigest_double_get_quantile.sqlx) | SCALAR | (sketch BYTES, rank FLOAT64) -> FLOAT64 | Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Returns: an approximate quantile associated with the given rank. | - -**Examples:** - +## Aggregate Functions +### [tdigest_double_build(value FLOAT64)](../tdigest/sqlx/tdigest_double_build.sqlx) +Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Defaults: k = 200.<br>Returns: a t\-Digest, as bytes. +### [tdigest_double_merge(sketch BYTES)](../tdigest/sqlx/tdigest_double_merge.sqlx) +Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Defaults: k = 200.<br>Returns: a serialized t\-Digest as BYTES. +### [tdigest_double_merge_k(sketch BYTES, k INT NOT AGGREGATE)](../tdigest/sqlx/tdigest_double_merge_k.sqlx) +Merges sketches from the given column.<br><br>Param sketch: the column of values.<br>Param k: the sketch accuracy/size parameter as an integer in the range \[10, 65535\].<br>Returns: a serialized t\-Digest as BYTES. +### [tdigest_double_build_k(value FLOAT64, k INT NOT AGGREGATE)](../tdigest/sqlx/tdigest_double_build_k.sqlx) +Creates a sketch that represents the distribution of the given column.<br><br>Param value: the column of FLOAT64 values.<br>Param k: the sketch accuracy/size parameter as an INT in the range \[10, 65535\].<br>Returns: a t\-Digest, as bytes. + +## Scalar Functions +### [tdigest_double_get_max_value(sketch BYTES)](../tdigest/sqlx/tdigest_double_get_max_value.sqlx) +Returns the maximum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: max value as FLOAT64 +### [tdigest_double_to_string(sketch BYTES)](../tdigest/sqlx/tdigest_double_to_string.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as sketch encoded bytes.<br>Returns: a string that represents the state of the given sketch. +### [tdigest_double_get_total_weight(sketch BYTES)](../tdigest/sqlx/tdigest_double_get_total_weight.sqlx) +Returns the total weight of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: total weight as INT64 +### [tdigest_double_get_min_value(sketch BYTES)](../tdigest/sqlx/tdigest_double_get_min_value.sqlx) +Returns the minimum value of the input stream.<br><br>Param sketch: the given sketch as BYTES.<br>Returns: min value as FLOAT64 +### [tdigest_double_get_rank(sketch BYTES, value FLOAT64)](../tdigest/sqlx/tdigest_double_get_rank.sqlx) +Returns an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the given value.<br><br>Param sketch: the given sketch in serialized form.<br>Param value: value to be ranked.<br>Returns: an approximate rank of the given value. +### [tdigest_double_get_quantile(sketch BYTES, rank FLOAT64)](../tdigest/sqlx/tdigest_double_get_quantile.sqlx) +Returns a value from the sketch that is the best approximation to a value from the original stream with the given rank.<br><br>Param sketch: the given sketch in serialized form.<br>Param rank: rank of a value in the hypothetical sorted stream.<br>Returns: an approximate quantile associated with the given rank. +## Examples ```sql create or replace temp table tdigest_double(sketch bytes); diff --git a/theta/README.md b/theta/README.md index cef3a08..a35a15d 100644 --- a/theta/README.md +++ b/theta/README.md @@ -36,35 +36,58 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [theta_sketch_agg_int64](../theta/sqlx/theta_sketch_agg_int64.sqlx) | AGGREGATE | (value INT64) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br> <br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_agg_union](../theta/sqlx/theta_sketch_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_agg_string](../theta/sqlx/theta_sketch_agg_string.sqlx) | AGGREGATE | (str STRING) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br> <br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_agg_union_lgk_seed](../theta/sqlx/theta_sketch_agg_union_lgk_seed.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with t [...] -| [theta_sketch_agg_int64_lgk_seed_p](../theta/sqlx/theta_sketch_agg_int64_lgk_seed_p.sqlx) | AGGREGATE | (value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the [...] -| [theta_sketch_agg_string_lgk_seed_p](../theta/sqlx/theta_sketch_agg_string_lgk_seed_p.sqlx) | AGGREGATE | (str STRING, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE) -> BYTES | Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the [...] -| [theta_sketch_get_estimate](../theta/sqlx/theta_sketch_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Gets distinct count estimate from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: distinct count estimate as FLOAT64. | -| [theta_sketch_to_string](../theta/sqlx/theta_sketch_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. | -| [theta_sketch_get_num_retained](../theta/sqlx/theta_sketch_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: number of retained entries as INT. | -| [theta_sketch_get_theta](../theta/sqlx/theta_sketch_get_theta.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. | -| [theta_sketch_get_num_retained_seed](../theta/sqlx/theta_sketch_get_num_retained_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: number of retained entries as INT. | -| [theta_sketch_get_estimate_seed](../theta/sqlx/theta_sketch_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Gets distinct count estimate from a given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: distinct count estimate as FLOA64. | -| [theta_sketch_to_string_seed](../theta/sqlx/theta_sketch_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. | -| [theta_sketch_get_theta_seed](../theta/sqlx/theta_sketch_get_theta_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: theta as FLOAT64. | -| [theta_sketch_intersection](../theta/sqlx/theta_sketch_intersection.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_union](../theta/sqlx/theta_sketch_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_a_not_b](../theta/sqlx/theta_sketch_a_not_b.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_intersection_seed](../theta/sqlx/theta_sketch_intersection_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_a_not_b_seed](../theta/sqlx/theta_sketch_a_not_b_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. | -| [theta_sketch_union_lgk_seed](../theta/sqlx/theta_sketch_union_lgk_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64) -> BYTES | Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were con [...] -| [theta_sketch_get_estimate_and_bounds](../theta/sqlx/theta_sketch_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets distinct count estimate and bounds from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations [...] -| [theta_sketch_jaccard_similarity](../theta/sqlx/theta_sketch_jaccard_similarity.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br [...] -| [theta_sketch_get_estimate_and_bounds_seed](../theta/sqlx/theta_sketch_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Gets distinct count estimate and bounds from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number [...] -| [theta_sketch_jaccard_similarity_seed](../theta/sqlx/theta_sketch_jaccard_similarity_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two ske [...] - -**Examples:** - +## Aggregate Functions +### [theta_sketch_agg_int64(value INT64)](../theta/sqlx/theta_sketch_agg_int64.sqlx) +Creates a sketch that represents the cardinality of the given INT64 column.<br> <br>Param value: the INT64 column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_agg_union(sketch BYTES)](../theta/sqlx/theta_sketch_agg_union.sqlx) +Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_agg_string(str STRING)](../theta/sqlx/theta_sketch_agg_string.sqlx) +Creates a sketch that represents the cardinality of the given STRING column.<br> <br>Param str: the STRING column of identifiers.<br>Defaults: lg\_k = 12, seed = 9001, p = 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_agg_union_lgk_seed(sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64> NOT AGGREGATE)](../theta/sqlx/theta_sketch_agg_union_lgk_seed.sqlx) +Creates a sketch that represents the union of the given column of sketches.<br><br>Param sketch: the column of sketches. Each as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_agg_int64_lgk_seed_p(value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE)](../theta/sqlx/theta_sketch_agg_int64_lgk_seed_p.sqlx) +Creates a sketch that represents the cardinality of the given INT64 column.<br><br>Param value: the INT64 column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the underlying hash function. A NULL specifies the default of 9001.<br>Param p: up\-front sampling probability. A NULL specifies the default of 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_agg_string_lgk_seed_p(str STRING, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64> NOT AGGREGATE)](../theta/sqlx/theta_sketch_agg_string_lgk_seed_p.sqlx) +Creates a sketch that represents the cardinality of the given STRING column.<br><br>Param str: the STRING column of identifiers.<br>Param lg\_k: the sketch accuracy/size parameter as a BYTEINT in the range \[4, 26\]. A NULL specifies the default of 12.<br>Param seed: the seed to be used by the underlying hash function. A NULL specifies the default of 9001.<br>Param p: up\-front sampling probability. A NULL specifies the default of 1.0.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. + +## Scalar Functions +### [theta_sketch_get_estimate(sketch BYTES)](../theta/sqlx/theta_sketch_get_estimate.sqlx) +Gets distinct count estimate from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: distinct count estimate as FLOAT64. +### [theta_sketch_to_string(sketch BYTES)](../theta/sqlx/theta_sketch_to_string.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a STRING that represents the state of the given sketch. +### [theta_sketch_get_num_retained(sketch BYTES)](../theta/sqlx/theta_sketch_get_num_retained.sqlx) +Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: number of retained entries as INT. +### [theta_sketch_get_theta(sketch BYTES)](../theta/sqlx/theta_sketch_get_theta.sqlx) +Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. +### [theta_sketch_get_num_retained_seed(sketch BYTES, seed INT64)](../theta/sqlx/theta_sketch_get_num_retained_seed.sqlx) +Returns the number of retained entries in the given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: number of retained entries as INT. +### [theta_sketch_get_estimate_seed(sketch BYTES, seed INT64)](../theta/sqlx/theta_sketch_get_estimate_seed.sqlx) +Gets distinct count estimate from a given sketch.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: distinct count estimate as FLOA64. +### [theta_sketch_to_string_seed(sketch BYTES, seed INT64)](../theta/sqlx/theta_sketch_to_string_seed.sqlx) +Returns a summary string that represents the state of the given sketch.<br><br>Param sketch: the given sketch as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: a STRING that represents the state of the given sketch. +### [theta_sketch_get_theta_seed(sketch BYTES, seed INT64)](../theta/sqlx/theta_sketch_get_theta_seed.sqlx) +Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br> <br>Param sketch: The given sketch to query as BYTES.<br>Param seed: This is used to confirm that the given sketch was configured with the correct seed.<br>Returns: theta as FLOAT64. +### [theta_sketch_intersection(sketchA BYTES, sketchB BYTES)](../theta/sqlx/theta_sketch_intersection.sqlx) +Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_union(sketchA BYTES, sketchB BYTES)](../theta/sqlx/theta_sketch_union.sqlx) +Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Defaults: lg\_k = 12, seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_a_not_b(sketchA BYTES, sketchB BYTES)](../theta/sqlx/theta_sketch_a_not_b.sqlx) +Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Defaults: seed = 9001.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_get_estimate_and_bounds(sketch BYTES, num_std_devs BYTEINT)](../theta/sqlx/theta_sketch_get_estimate_and_bounds.sqlx) +Gets distinct count estimate and bounds from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations from the returned estimate.<br> This number may be one of {1,2,3}, where 1 represents 68% confidence,<br> 2 represents 95% confidence and 3 represents 99.7% confidence.<br> For example, if the given num\_std\_devs [...] +### [theta_sketch_jaccard_similarity(sketchA BYTES, sketchB BYTES)](../theta/sqlx/theta_sketch_jaccard_similarity.sqlx) +Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br>A Jaccard of .95 means the overlap between the two sets is 95% of the union of the two sets.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Defaults: [...] +### [theta_sketch_get_estimate_and_bounds_seed(sketch BYTES, num_std_devs BYTEINT, seed INT64)](../theta/sqlx/theta_sketch_get_estimate_and_bounds_seed.sqlx) +Gets distinct count estimate and bounds from a given sketch.<br><br>Param sketch: The given sketch to query as BYTES.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations from the returned estimate.<br> This number may be one of {1,2,3}, where 1 represents 68% confidence,<br> 2 represents 95% confidence and 3 represents 99.7% confidence.<br> For example, if the given num\_std\_devs [...] +### [theta_sketch_intersection_seed(sketchA BYTES, sketchB BYTES, seed INT64)](../theta/sqlx/theta_sketch_intersection_seed.sqlx) +Computes a sketch that represents the scalar intersection of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_jaccard_similarity_seed(sketchA BYTES, sketchB BYTES, seed INT64)](../theta/sqlx/theta_sketch_jaccard_similarity_seed.sqlx) +Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br>A Jaccard of .95 means the overlap between the two sets is 95% of the union of the two sets.<br><br>Param sketchA: the first sketch as bytes.<br>Param sketchB: the second sketch as bytes.<br>Param seed [...] +### [theta_sketch_a_not_b_seed(sketchA BYTES, sketchB BYTES, seed INT64)](../theta/sqlx/theta_sketch_a_not_b_seed.sqlx) +Computes a sketch that represents the scalar set difference: sketchA and not sketchB.<br><br>Param sketchA: the first sketch "A" as bytes.<br>Param sketchB: the second sketch "B" as bytes.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +### [theta_sketch_union_lgk_seed(sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64)](../theta/sqlx/theta_sketch_union_lgk_seed.sqlx) +Computes a sketch that represents the scalar union of the two given sketches.<br><br>Param sketchA: the first sketch as BYTES.<br>Param sketchB: the second sketch as BYTES.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\].<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed.<br>Returns: a Compact, Compressed Theta Sketch, as BYTES. +## Examples ```sql # using defaults diff --git a/tuple/README.md b/tuple/README.md index 85a2e03..4867449 100644 --- a/tuple/README.md +++ b/tuple/README.md @@ -36,41 +36,70 @@ If you are interested in making contributions to this project please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us. -| Function Name | Function Type | Signature | Description | -|---|---|---|---| -| [tuple_sketch_int64_agg_union](../tuple/sqlx/tuple_sketch_int64_agg_union.sqlx) | AGGREGATE | (sketch BYTES) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the given column of Tuple Sketches with an INT64 summary column. This may not be [...] -| [tuple_sketch_int64_agg_string](../tuple/sqlx/tuple_sketch_int64_agg_string.sqlx) | AGGREGATE | (key STRING, value INT64) -> BYTES | Builds a Tuple Sketch from a STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with a STRING Key column and an [...] -| [tuple_sketch_int64_agg_int64](../tuple/sqlx/tuple_sketch_int64_agg_int64.sqlx) | AGGREGATE | (key INT64, value INT64) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 Key column and an INT [...] -| [tuple_sketch_int64_agg_union_lgk_seed_mode](../tuple/sqlx/tuple_sketch_int64_agg_union_lgk_seed_mode.sqlx) | AGGREGATE | (sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><b [...] -| [tuple_sketch_int64_agg_int64_lgk_seed_p_mode](../tuple/sqlx/tuple_sketch_int64_agg_int64_lgk_seed_p_mode.sqlx) | AGGREGATE | (key INT64, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: { SUM, MIN, MAX, ONE \(constant 1\) }.<br>Note that cardinality estimation accuracy, [...] -| [tuple_sketch_int64_agg_string_lgk_seed_p_mode](../tuple/sqlx/tuple_sketch_int64_agg_string_lgk_seed_p_mode.sqlx) | AGGREGATE | (key STRING, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE) -> BYTES | Builds a Tuple Sketch from a STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: SUM, MIN, MAX, ONE.<br>Note that cardinality estimation accuracy, plots, error ta [...] -| [tuple_sketch_int64_to_string](../tuple/sqlx/tuple_sketch_int64_to_string.sqlx) | SCALAR | (sketch BYTES) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Defaults: seed = 9001.<br> [...] -| [tuple_sketch_int64_get_estimate](../tuple/sqlx/tuple_sketch_int64_get_estimate.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: the cardinality [...] -| [tuple_sketch_int64_get_theta](../tuple/sqlx/tuple_sketch_int64_get_theta.sqlx) | SCALAR | (sketch BYTES) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. | -| [tuple_sketch_int64_get_num_retained](../tuple/sqlx/tuple_sketch_int64_get_num_retained.sqlx) | SCALAR | (sketch BYTES) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: number of re [...] -| [tuple_sketch_int64_get_theta_seed](../tuple/sqlx/tuple_sketch_int64_get_theta_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used t [...] -| [tuple_sketch_int64_get_num_retained_seed](../tuple/sqlx/tuple_sketch_int64_get_num_retained_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> INT | Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used [...] -| [tuple_sketch_int64_to_string_seed](../tuple/sqlx/tuple_sketch_int64_to_string_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> STRING | Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Para [...] -| [tuple_sketch_int64_a_not_b](../tuple/sqlx/tuple_sketch_int64_a_not_b.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column. <br> <br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: th [...] -| [tuple_sketch_int64_from_theta_sketch](../tuple/sqlx/tuple_sketch_int64_from_theta_sketch.sqlx) | SCALAR | (sketch BYTES, value INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This may not be NULL.<br [...] -| [tuple_sketch_int64_get_estimate_seed](../tuple/sqlx/tuple_sketch_int64_get_estimate_seed.sqlx) | SCALAR | (sketch BYTES, seed INT64) -> FLOAT64 | Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to c [...] -| [tuple_sketch_int64_intersection](../tuple/sqlx/tuple_sketch_int64_intersection.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES.<br>Param sketchB: the second sketc [...] -| [tuple_sketch_int64_union](../tuple/sqlx/tuple_sketch_int64_union.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketc [...] -| [tuple_sketch_int64_from_theta_sketch_seed](../tuple/sqlx/tuple_sketch_int64_from_theta_sketch_seed.sqlx) | SCALAR | (sketch BYTES, value INT64, seed INT64) -> BYTES | Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. Th [...] -| [tuple_sketch_int64_a_not_b_seed](../tuple/sqlx/tuple_sketch_int64_a_not_b_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> BYTES | Computes a sketch that represents the scalar set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be [...] -| [tuple_sketch_int64_filter_low_high](../tuple/sqlx/tuple_sketch_int64_filter_low_high.sqlx) | SCALAR | (sketch BYTES, low INT64, high INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that cardinality estimation [...] -| [tuple_sketch_int64_get_estimate_and_bounds](../tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <b [...] -| [tuple_sketch_int64_filter_low_high_seed](../tuple/sqlx/tuple_sketch_int64_filter_low_high_seed.sqlx) | SCALAR | (sketch BYTES, low INT64, high INT64, seed INT64) -> BYTES | Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that [...] -| [tuple_sketch_int64_jaccard_similarity](../tuple/sqlx/tuple_sketch_int64_jaccard_similarity.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are [...] -| [tuple_sketch_int64_get_sum_estimate_and_bounds](../tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.< [...] -| [tuple_sketch_int64_intersection_seed_mode](../tuple/sqlx/tuple_sketch_int64_intersection_seed_mode.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64, mode STRING) -> BYTES | Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" [...] -| [tuple_sketch_int64_get_sum_estimate_and_bounds_seed](../tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<sum_estimate FLOAT64, sum_lower_bound FLOAT64, sum_upper_bound FLOAT64> | Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same [...] -| [tuple_sketch_int64_union_lgk_seed_mode](../tuple/sqlx/tuple_sketch_int64_union_lgk_seed_mode.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64, mode STRING) -> BYTES | Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" [...] -| [tuple_sketch_int64_get_estimate_and_bounds_seed](../tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds_seed.sqlx) | SCALAR | (sketch BYTES, num_std_devs BYTEINT, seed INT64) -> STRUCT<estimate FLOAT64, lower_bound FLOAT64, upper_bound FLOAT64> | Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 s [...] -| [tuple_sketch_int64_jaccard_similarity_seed](../tuple/sqlx/tuple_sketch_int64_jaccard_similarity_seed.sqlx) | SCALAR | (sketchA BYTES, sketchB BYTES, seed INT64) -> STRUCT<lower_bound FLOAT64, estimate FLOAT64, upper_bound FLOAT64> | Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, [...] - -**Examples:** - +## Aggregate Functions +### [tuple_sketch_int64_agg_union(sketch BYTES)](../tuple/sqlx/tuple_sketch_int64_agg_union.sqlx) +Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the given column of Tuple Sketches with an INT64 summary column. This may not be NULL.<br>Defaults: lg\_k = 12, seed = 9001, mode = SUM.<br>Returns: a Compact Tuple Sketch as BYTES. +### [tuple_sketch_int64_agg_string(key STRING, value INT64)](../tuple/sqlx/tuple_sketch_int64_agg_string.sqlx) +Builds a Tuple Sketch from a STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with a STRING Key column and an INT64 summary column.<br> <br>Param key: the STRING column of identifiers. This may not be NULL.<br>Param value: the INT64 value column [...] +### [tuple_sketch_int64_agg_int64(key INT64, value INT64)](../tuple/sqlx/tuple_sketch_int64_agg_int64.sqlx) +Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using the default mode.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 Key column and an INT64 summary column.<br><br>Param key: the INT64 key column of identifiers. This may not be NULL.<br>Param value: the INT64 value colu [...] +### [tuple_sketch_int64_agg_union_lgk_seed_mode(sketch BYTES, params STRUCT<lg_k BYTEINT, seed INT64, mode STRING> NOT AGGREGATE)](../tuple/sqlx/tuple_sketch_int64_agg_union_lgk_seed_mode.sqlx) +Builds a Tuple Sketch that represents the UNION of the given column of Tuple Sketches.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the given column of Tuple Sketches with an INT64 summary column. This may not be NULL.<br>Param lg\_k: the sketch accuracy/size parameter as an integer in the range \[4, 26\]. A NULL specifies the defau [...] +### [tuple_sketch_int64_agg_int64_lgk_seed_p_mode(key INT64, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE)](../tuple/sqlx/tuple_sketch_int64_agg_int64_lgk_seed_p_mode.sqlx) +Builds a Tuple Sketch from an INT64 Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: { SUM, MIN, MAX, ONE \(constant 1\) }.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 Key column and an INT64 summary column.<br><br>Param key: the INT64 key column of identifiers. Thi [...] +### [tuple_sketch_int64_agg_string_lgk_seed_p_mode(key STRING, value INT64, params STRUCT<lg_k BYTEINT, seed INT64, p FLOAT64, mode STRING> NOT AGGREGATE)](../tuple/sqlx/tuple_sketch_int64_agg_string_lgk_seed_p_mode.sqlx) +Builds a Tuple Sketch from a STRING Key column and an INT64 value column.<br>Multiple values for the same key are aggregated using one of the selectable operations: SUM, MIN, MAX, ONE.<br>Note that cardinality estimation accuracy, plots, error tables, and sampling probability p are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with a STRING Key column and an INT64 summary column.<br><br>Param key: the STRING key column of identifiers. This may not be NULL. [...] + +## Scalar Functions +### [tuple_sketch_int64_to_string(sketch BYTES)](../tuple/sqlx/tuple_sketch_int64_to_string.sqlx) +Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: A human readable STRING that is a short summary of the state of this sketch. +### [tuple_sketch_int64_get_estimate(sketch BYTES)](../tuple/sqlx/tuple_sketch_int64_get_estimate.sqlx) +Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: the cardinality estimate of the given Tuple Sketch +### [tuple_sketch_int64_get_theta(sketch BYTES)](../tuple/sqlx/tuple_sketch_int64_get_theta.sqlx) +Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: theta as FLOAT64. +### [tuple_sketch_int64_get_num_retained(sketch BYTES)](../tuple/sqlx/tuple_sketch_int64_get_num_retained.sqlx) +Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: number of retained entries as INT. +### [tuple_sketch_int64_get_theta_seed(sketch BYTES, seed INT64)](../tuple/sqlx/tuple_sketch_int64_get_theta_seed.sqlx) +Returns theta \(effective sampling rate\) as a fraction from 0 to 1.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed. A NULL specifies the default seed = 9001.<br>Returns: theta as FLOAT64. +### [tuple_sketch_int64_get_num_retained_seed(sketch BYTES, seed INT64)](../tuple/sqlx/tuple_sketch_int64_get_num_retained_seed.sqlx) +Returns the number of retained entries in the given sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed. A NULL specifies the default seed = 9001.<br>Returns: number of retained entrie [...] +### [tuple_sketch_int64_to_string_seed(sketch BYTES, seed INT64)](../tuple/sqlx/tuple_sketch_int64_to_string_seed.sqlx) +Returns a human readable STRING that is a short summary of the state of this sketch.<br> Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br> This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the sketch to be summarized. This may not be NULL.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed. A NULL specifies the default seed = 9001.<br>Re [...] +### [tuple_sketch_int64_get_estimate_and_bounds(sketch BYTES, num_std_devs BYTEINT)](../tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds.sqlx) +Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations from the re [...] +### [tuple_sketch_int64_jaccard_similarity(sketchA BYTES, sketchB BYTES)](../tuple/sqlx/tuple_sketch_int64_jaccard_similarity.sqlx) +Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br>A Jaccard of .95 means the overlap between the two sets is 95% of the union of the two sets.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the firs [...] +### [tuple_sketch_int64_a_not_b(sketchA BYTES, sketchB BYTES)](../tuple/sqlx/tuple_sketch_int64_a_not_b.sqlx) +Computes a sketch that represents the set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column. <br> <br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketch "B" as BYTES. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: a Compact Tuple Sketch as BYTES. +### [tuple_sketch_int64_from_theta_sketch(sketch BYTES, value INT64)](../tuple/sqlx/tuple_sketch_int64_from_theta_sketch.sqlx) +Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: a Tuple Sketch with an INT64 summary column as BYTES. +### [tuple_sketch_int64_get_sum_estimate_and_bounds(sketch BYTES, num_std_devs BYTEINT)](../tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds.sqlx) +Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> [...] +### [tuple_sketch_int64_get_estimate_seed(sketch BYTES, seed INT64)](../tuple/sqlx/tuple_sketch_int64_get_estimate_seed.sqlx) +Returns the cardinality estimate of the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed. A NULL specifies the default seed = 9001.<br>Returns: the cardinality estimate [...] +### [tuple_sketch_int64_intersection(sketchA BYTES, sketchB BYTES)](../tuple/sqlx/tuple_sketch_int64_intersection.sqlx) +Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES.<br>Param sketchB: the second sketch "B" as BYTES.<br>Defaults: seed = 9001.<br>Returns: a Compact Tuple Sketch as BYTES. +### [tuple_sketch_int64_union(sketchA BYTES, sketchB BYTES)](../tuple/sqlx/tuple_sketch_int64_union.sqlx) +Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketch "B" as BYTES. This may not be NULL.<br>Defaults: seed = 9001.<br>Returns: a Compact Tuple Sketch as BYTES. +### [tuple_sketch_int64_from_theta_sketch_seed(sketch BYTES, value INT64, seed INT64)](../tuple/sqlx/tuple_sketch_int64_from_theta_sketch_seed.sqlx) +Converts the given Theta Sketch into a Tuple Sketch with a INT64 summary column set to the given INT64 value.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br><br>Param sketch: the given Theta Sketch. This may not be NULL.<br>Param value: the given INT64 value. This may not be NULL.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed. A NULL specifies the default seed = 9001.<br>Ret [...] +### [tuple_sketch_int64_get_sum_estimate_and_bounds_seed(sketch BYTES, num_std_devs BYTEINT, seed INT64)](../tuple/sqlx/tuple_sketch_int64_get_sum_estimate_and_bounds_seed.sqlx) +Returns the estimate and bounds for the sum of the INT64 summary column<br>scaled to the original population from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> de [...] +### [tuple_sketch_int64_get_estimate_and_bounds_seed(sketch BYTES, num_std_devs BYTEINT, seed INT64)](../tuple/sqlx/tuple_sketch_int64_get_estimate_and_bounds_seed.sqlx) +Returns the cardinality estimate and bounds from the given Tuple Sketch.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br> <br>Param sketch: the given Tuple Sketch. This may not be NULL.<br>Param num\_std\_devs: The returned bounds will be based on the statistical confidence interval<br> determined by the given number of standard deviations from the re [...] +### [tuple_sketch_int64_a_not_b_seed(sketchA BYTES, sketchB BYTES, seed INT64)](../tuple/sqlx/tuple_sketch_int64_a_not_b_seed.sqlx) +Computes a sketch that represents the scalar set difference of sketchA and not sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketch "B" as BYTES. This may not be NULL.<br>Param seed: This is used to confirm that the given sketches were configu [...] +### [tuple_sketch_int64_jaccard_similarity_seed(sketchA BYTES, sketchB BYTES, seed INT64)](../tuple/sqlx/tuple_sketch_int64_jaccard_similarity_seed.sqlx) +Computes the Jaccard similarity index with upper and lower bounds.<br>The Jaccard similarity index J\(A,B\) = \(A ^ B\)/\(A U B\) is used to measure how similar the two sketches are to each other.<br>If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint.<br>A Jaccard of .95 means the overlap between the two sets is 95% of the union of the two sets.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the firs [...] +### [tuple_sketch_int64_filter_low_high(sketch BYTES, low INT64, high INT64)](../tuple/sqlx/tuple_sketch_int64_filter_low_high.sqlx) +Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br> [...] +### [tuple_sketch_int64_filter_low_high_seed(sketch BYTES, low INT64, high INT64, seed INT64)](../tuple/sqlx/tuple_sketch_int64_filter_low_high_seed.sqlx) +Returns a Tuple Sketch computed from the given sketch filtered by the given low and high values. <br>This returns a compact tuple sketch that contains the subset of rows of the give sketch where the<br>summary column is greater\-than or equal to the given low and less\-than or equal to the given high.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br> [...] +### [tuple_sketch_int64_intersection_seed_mode(sketchA BYTES, sketchB BYTES, seed INT64, mode STRING)](../tuple/sqlx/tuple_sketch_int64_intersection_seed_mode.sqlx) +Computes a sketch that represents the scalar intersection of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES.<br>Param sketchB: the second sketch "B" as BYTES.<br>Param seed: This is used to confirm that the given sketches were configured with the correct seed. A NULL specifies the de [...] +### [tuple_sketch_int64_union_lgk_seed_mode(sketchA BYTES, sketchB BYTES, lg_k BYTEINT, seed INT64, mode STRING)](../tuple/sqlx/tuple_sketch_int64_union_lgk_seed_mode.sqlx) +Computes a Tuple Sketch that represents the UNION of sketchA and sketchB.<br>Note that cardinality estimation accuracy, plots, and error tables are the same as the Theta Sketch.<br>This function only applies to Tuple Sketches with an INT64 summary column.<br><br>Param sketchA: the first sketch "A" as BYTES. This may not be NULL.<br>Param sketchB: the second sketch "B" as BYTES. This may not be NULL.<br>Param seed: This is used to confirm that the given sketches were configured with the c [...] +## Examples ```sql # using defaults --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
