This is an automated email from the ASF dual-hosted git repository.
alsay pushed a commit to branch req_sketch_float
in repository https://gitbox.apache.org/repos/asf/datasketches-bigquery.git
The following commit(s) were added to refs/heads/req_sketch_float by this push:
new 67f882b req_sketch_float
67f882b is described below
commit 67f882b67fc030bf570feaa137e3fca689dfae7e
Author: AlexanderSaydakov <[email protected]>
AuthorDate: Thu Nov 21 15:48:38 2024 -0800
req_sketch_float
---
Makefile | 2 +-
readme_generator.py | 2 +-
req/README.md | 113 +++++++++++++++++++++++++++++++++++++++++++++++++
req/README_template.md | 38 +++++++++++++++++
4 files changed, 153 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index 8c7eafe..dcf5791 100644
--- a/Makefile
+++ b/Makefile
@@ -15,7 +15,7 @@
# specific language governing permissions and limitations
# under the License.
-MODULES := theta tuple cpc hll kll fi tdigest
+MODULES := theta tuple cpc hll kll fi tdigest req
$(MODULES):
$(MAKE) -C $@
diff --git a/readme_generator.py b/readme_generator.py
index f294c65..bf3bb80 100644
--- a/readme_generator.py
+++ b/readme_generator.py
@@ -162,7 +162,7 @@ def generate_readme(template_path: str, function_index:
dict, examples_path: str
return output_content
if __name__ == "__main__":
- sketch_types = ["cpc", "fi", "hll", "kll", "tdigest", "theta", "tuple"]
+ sketch_types = ["cpc", "fi", "hll", "kll", "tdigest", "theta", "tuple",
"req"]
template_name = "README_template.md"
readme_name = "README.md"
for sketch_type in sketch_types:
diff --git a/req/README.md b/req/README.md
new file mode 100644
index 0000000..63a0b9c
--- /dev/null
+++ b/req/README.md
@@ -0,0 +1,113 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Apache DataSketches REQ Sketches for Google BigQuery
+
+Relative Error Quantiles Sketch that rovides extremely high accuracy
+at a chosen end of the rank domain: high rank accuracy (HRA) or low
+rank accuracy (LRA).
+REQ sketches are quantile sketches that provide approximate quantiles
+and ranks for a dataset.
+
+Please visit
+[REQ Sketches](https://datasketches.apache.org/docs/REQ/ReqSketch.html)
+for more information about this sketch family.
+
+Please visit the main
+[Apache DataSketches website](https://datasketches.apache.org)
+for more information about DataSketches library.
+
+If you are interested in making contributions to this project please see our
+[Community](https://datasketches.apache.org/docs/Community/)
+page for how to contact us.
+
+| Function Name | Function Type | Signature | Description |
+|---|---|---|---|
+| [req_sketch_float_build](../definitions/req/req_sketch_float_build.sqlx) |
AGGREGATE | (value FLOAT64) -> BYTES | Creates a sketch that represents the
distribution of the given column.\<br\>\<br\>Param value: the column of FLOAT64
values.\<br\>Defaults: k = 12, hra = true.\<br\>Returns: a serialized REQ
Sketch as BYTES. |
+| [req_sketch_float_merge](../definitions/req/req_sketch_float_merge.sqlx) |
AGGREGATE | (sketch BYTES) -> BYTES | Merges sketches from the given
column.\<br\>\<br\>Param sketch: the column of sketches.\<br\>Defaults: k = 12,
hra = true.\<br\>Returns: a serialized REQ sketch as BYTES. |
+|
[req_sketch_float_build_k_hra](../definitions/req/req_sketch_float_build_k_hra.sqlx)
| AGGREGATE | (value FLOAT64, params STRUCT<k INT, hra BOOL> NOT AGGREGATE) ->
BYTES | Creates a sketch that represents the distribution of the given
column.\<br\>\<br\>Param value: the column of FLOAT64 values.\<br\>Param k: the
sketch accuracy/size parameter as an even INT in the range \[4,
65534\].\<br\>Param hra: if true, the high ranks are prioritized for better
accuracy. Otherwise the low ranks a [...]
+|
[req_sketch_float_merge_k_hra](../definitions/req/req_sketch_float_merge_k_hra.sqlx)
| AGGREGATE | (sketch BYTES, params STRUCT<k INT, hra BOOL> NOT AGGREGATE) ->
BYTES | Merges sketches from the given column.\<br\>\<br\>Param sketch: the
column of values.\<br\>Param k: the sketch accuracy/size parameter as an even
INT in the range \[4, 65534\].\<br\>Param hra: if true, the high ranks are
prioritized for better accuracy. Otherwise the low ranks are prioritized for
better accuracy.\<br\ [...]
+| [req_sketch_float_get_n](../definitions/req/req_sketch_float_get_n.sqlx) |
SCALAR | (sketch BYTES) -> INT64 | Returns the length of the input
stream.\<br\>\<br\>Param sketch: the given sketch as BYTES.\<br\>Returns:
stream length as INT64 |
+|
[req_sketch_float_get_num_retained](../definitions/req/req_sketch_float_get_num_retained.sqlx)
| SCALAR | (sketch BYTES) -> INT64 | Returns the number of retained items
\(samples\) in the sketch.\<br\>\<br\>Param sketch: the given sketch as
BYTES.\<br\>Returns: number of retained items as INT64 |
+|
[req_sketch_float_get_min_value](../definitions/req/req_sketch_float_get_min_value.sqlx)
| SCALAR | (sketch BYTES) -> FLOAT64 | Returns the minimum value of the input
stream.\<br\>\<br\>Param sketch: the given sketch as BYTES.\<br\>Returns: min
value as FLOAT64 |
+|
[req_sketch_float_to_string](../definitions/req/req_sketch_float_to_string.sqlx)
| SCALAR | (sketch BYTES) -> STRING | Returns a summary string that represents
the state of the given sketch.\<br\>\<br\>Param sketch: the given sketch as
BYTES.\<br\>Returns: a string that represents the state of the given sketch. |
+|
[req_sketch_float_get_max_value](../definitions/req/req_sketch_float_get_max_value.sqlx)
| SCALAR | (sketch BYTES) -> FLOAT64 | Returns the maximum value of the input
stream.\<br\>\<br\>Param sketch: the given sketch as BYTES.\<br\>Returns: max
value as FLOAT64 |
+| [req_sketch_float_get_cdf](../definitions/req/req_sketch_float_get_cdf.sqlx)
| SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) ->
ARRAY<FLOAT64> | Returns an approximation to the Cumulative Distribution
Function \(CDF\) \<br\>of the input stream as an array of cumulative
probabilities defined by the given split\_points.\<br\>\<br\>Param sketch: the
given sketch as BYTES.\<br\>\<br\>Param split\_points: an array of M unique,
monotonically increasing values\<br\> \( [...]
+|
[req_sketch_float_get_rank_lower_bound](../definitions/req/req_sketch_float_get_rank_lower_bound.sqlx)
| SCALAR | (sketch BYTES, rank FLOAT64, num_std_dev BYTEINT) -> FLOAT64 |
Returns an approximate lower bound of the given normalized rank.\<br\>Param
sketch: the given sketch as BYTES.\<br\>Param rank: the given rank, a value
between 0 and 1.0.\<br\>Param num\_std\_dev: The returned bounds will be based
on the statistical confidence interval determined by the given number of
standard [...]
+| [req_sketch_float_get_pmf](../definitions/req/req_sketch_float_get_pmf.sqlx)
| SCALAR | (sketch BYTES, split_points ARRAY<FLOAT64>, inclusive BOOL) ->
ARRAY<FLOAT64> | Returns an approximation to the Probability Mass Function
\(PMF\)\<br\>of the input stream as an array of probability masses defined by
the given split\_points.\<br\>\<br\>Param sketch: the given sketch as
BYTES.\<br\>\<br\>Param split\_points: an array of M unique, monotonically
increasing values \<br\> \(of the same t [...]
+|
[req_sketch_float_get_quantile](../definitions/req/req_sketch_float_get_quantile.sqlx)
| SCALAR | (sketch BYTES, rank FLOAT64, inclusive BOOL) -> FLOAT64 | Returns a
value from the sketch that is the best approximation to a value from the
original stream with the given rank.\<br\>\<br\>Param sketch: the given sketch
in serialized form.\<br\>Param rank: rank of a value in the hypothetical sorted
stream.\<br\>Param inclusive: if true, the given rank is considered inclusive
\(includes wei [...]
+|
[req_sketch_float_get_rank_upper_bound](../definitions/req/req_sketch_float_get_rank_upper_bound.sqlx)
| SCALAR | (sketch BYTES, rank FLOAT64, num_std_dev BYTEINT) -> FLOAT64 |
Returns an approximate upper bound of the given normalized rank.\<br\>Param
sketch: the given sketch as BYTES.\<br\>Param rank: the given rank, a value
between 0 and 1.0.\<br\>Param num\_std\_dev: The returned bounds will be based
on the statistical confidence interval determined by the given number of
standard [...]
+|
[req_sketch_float_get_rank](../definitions/req/req_sketch_float_get_rank.sqlx)
| SCALAR | (sketch BYTES, value FLOAT64, inclusive BOOL) -> FLOAT64 | Returns
an approximation to the normalized rank, on the interval \[0.0, 1.0\], of the
given value.\<br\>\<br\>Param sketch: the given sketch in serialized
form.\<br\>Param value: value to be ranked.\<br\>Param inclusive: if true the
weight of the given value is included into the rank.\<br\>Returns: an
approximate rank of the given value. |
+
+**Examples:**
+
+```sql
+
+# using defaults
+
+create or replace table `$BQ_DATASET`.req_sketch(sketch bytes);
+
+insert into `$BQ_DATASET`.req_sketch
+(select `$BQ_DATASET`.req_sketch_float_build(value) from
unnest([1,2,3,4,5,6,7,8,9,10]) as value);
+
+insert into `$BQ_DATASET`.req_sketch
+(select `$BQ_DATASET`.req_sketch_float_build(value) from
unnest([11,12,13,14,15,16,17,18,19,20]) as value);
+
+select
`$BQ_DATASET`.req_sketch_float_to_string(`$BQ_DATASET`.req_sketch_float_merge(sketch))
from `$BQ_DATASET`.req_sketch;
+
+# expected 0.5
+select
`$BQ_DATASET`.req_sketch_float_get_rank(`$BQ_DATASET`.req_sketch_float_merge(sketch),
10, true) from `$BQ_DATASET`.req_sketch;
+
+# expected 10
+select
`$BQ_DATASET`.req_sketch_float_get_quantile(`$BQ_DATASET`.req_sketch_float_merge(sketch),
0.5, true) from `$BQ_DATASET`.req_sketch;
+
+# expected 0.5, 0.5
+select
`$BQ_DATASET`.req_sketch_float_get_pmf(`$BQ_DATASET`.req_sketch_float_merge(sketch),
[10.0], true) from `$BQ_DATASET`.req_sketch;
+
+# expected 0.5, 1
+select
`$BQ_DATASET`.req_sketch_float_get_cdf(`$BQ_DATASET`.req_sketch_float_merge(sketch),
[10.0], true) from `$BQ_DATASET`.req_sketch;
+
+# expected 1
+select
`$BQ_DATASET`.req_sketch_float_get_min_value(`$BQ_DATASET`.req_sketch_float_merge(sketch))
from `$BQ_DATASET`.req_sketch;
+
+# expected 20
+select
`$BQ_DATASET`.req_sketch_float_get_max_value(`$BQ_DATASET`.req_sketch_float_merge(sketch))
from `$BQ_DATASET`.req_sketch;
+
+# expected 20
+select
`$BQ_DATASET`.req_sketch_float_get_n(`$BQ_DATASET`.req_sketch_float_merge(sketch))
from `$BQ_DATASET`.req_sketch;
+
+# expected 20
+select
`$BQ_DATASET`.req_sketch_float_get_num_retained(`$BQ_DATASET`.req_sketch_float_merge(sketch))
from `$BQ_DATASET`.req_sketch;
+
+drop table `$BQ_DATASET`.req_sketch;
+
+# using full signatures
+
+create or replace table `$BQ_DATASET`.req_sketch(sketch bytes);
+
+insert into `$BQ_DATASET`.req_sketch
+(select `$BQ_DATASET`.req_sketch_float_build_k_hra(value, struct<int,
bool>(10, false)) from unnest([1,2,3,4,5,6,7,8,9,10]) as value);
+
+insert into `$BQ_DATASET`.req_sketch
+(select `$BQ_DATASET`.req_sketch_float_build_k_hra(value, struct<int,
bool>(10, false)) from unnest([11,12,13,14,15,16,17,18,19,20]) as value);
+
+select
`$BQ_DATASET`.req_sketch_float_to_string(`$BQ_DATASET`.req_sketch_float_merge_k_hra(sketch,
struct<int, bool>(10, false))) from `$BQ_DATASET`.req_sketch;
+
+drop table `$BQ_DATASET`.req_sketch;
+```
diff --git a/req/README_template.md b/req/README_template.md
new file mode 100644
index 0000000..06d7108
--- /dev/null
+++ b/req/README_template.md
@@ -0,0 +1,38 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Apache DataSketches REQ Sketches for Google BigQuery
+
+Relative Error Quantiles Sketch that rovides extremely high accuracy
+at a chosen end of the rank domain: high rank accuracy (HRA) or low
+rank accuracy (LRA).
+REQ sketches are quantile sketches that provide approximate quantiles
+and ranks for a dataset.
+
+Please visit
+[REQ Sketches](https://datasketches.apache.org/docs/REQ/ReqSketch.html)
+for more information about this sketch family.
+
+Please visit the main
+[Apache DataSketches website](https://datasketches.apache.org)
+for more information about DataSketches library.
+
+If you are interested in making contributions to this project please see our
+[Community](https://datasketches.apache.org/docs/Community/)
+page for how to contact us.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]