This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 02f2675d Automatic Site Publish by Buildbot
02f2675d is described below
commit 02f2675d1a46f6aed8ab20fc17b0dbbae034e472
Author: buildbot <[email protected]>
AuthorDate: Fri Oct 25 18:55:54 2024 +0000
Automatic Site Publish by Buildbot
---
output/docs/Architecture/KeyFeatures.html | 6 +++---
output/docs/Architecture/MajorSketchFamilies.html | 8 ++++----
output/docs/HLL/Hll_vs_CS_Hllpp.html | 2 +-
3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/output/docs/Architecture/KeyFeatures.html
b/output/docs/Architecture/KeyFeatures.html
index 79a675b4..75d861d8 100644
--- a/output/docs/Architecture/KeyFeatures.html
+++ b/output/docs/Architecture/KeyFeatures.html
@@ -414,9 +414,9 @@ and Difference) on sets of unique identifiers</li>
<h4 id="four-families-of-count-unique-algorithms">Four families of Count
Unique algorithms:</h4>
<ul>
- <li><a href="/docs/HLL/HLL.html">The HLL Sketch</a>. The famous HyperLogLog
algorithm when stored sketch size is of utmost concern.</li>
- <li><a href="/docs/CPC/CPC.html">The CPC Sketch</a>. The Compressed
Probabilistic Counting algorithm when maximizing accuracy per stored sketch
size is of utmost concern.</li>
- <li><a href="/docs/Theta/ThetaSketchFramework.html">The Theta Sketch
Framework</a>. Theta sketches enable real-time set-expression computations and
can operate on or off the java heap.</li>
+ <li><a href="/docs/HLL/HllSketches.html">The HLL Sketch</a>. The famous
HyperLogLog algorithm when stored sketch size is of utmost concern.</li>
+ <li><a href="/docs/CPC/CpcSketches.html">The CPC Sketch</a>. The Compressed
Probabilistic Counting algorithm when maximizing accuracy per stored sketch
size is of utmost concern.</li>
+ <li><a href="/docs/Theta/ThetaSketches.html">The Theta Sketch Framework</a>.
Theta sketches enable real-time set-expression computations and can operate on
or off the java heap.</li>
<li><a href="/docs/Tuple/TupleOverview.html">The Tuple Sketch</a>. Tuple
sketches are associative sketches that are useful for performing approximate
join operations and extracting other kinds of statistical behavior associated
with unique identifiers.</li>
</ul>
diff --git a/output/docs/Architecture/MajorSketchFamilies.html
b/output/docs/Architecture/MajorSketchFamilies.html
index 3a6c3771..70ef83c5 100644
--- a/output/docs/Architecture/MajorSketchFamilies.html
+++ b/output/docs/Architecture/MajorSketchFamilies.html
@@ -327,10 +327,10 @@
<h2 id="cardinality-sketches">Cardinality Sketches</h2>
-<h3
id="cpc-sketch-estimating-stream-cardinalities-more-efficiently-than-the-famous-hll-sketch"><a
href="/docs/CPC/CPC.html">CPC Sketch</a>: Estimating Stream Cardinalities more
efficiently than the famous HLL sketch!</h3>
+<h3
id="cpc-sketch-estimating-stream-cardinalities-more-efficiently-than-the-famous-hll-sketch"><a
href="/docs/CPC/CpcSketches.html">CPC Sketch</a>: Estimating Stream
Cardinalities more efficiently than the famous HLL sketch!</h3>
<p>This sketch was developed by the late Keven J. Lang, our chief scientist at
the time. It is an amazing <em>tour de force</em> of scientific design and
engineering and has substantially better accuracy / per stored size than the
famous HLL sketch. The theory and demonstration of its performance is detailed
in Lang’s paper <a href="https://arxiv.org/abs/1708.06839">Back to the Future:
an Even More Nearly Optimal Cardinality Estimation Algorithm</a>.</p>
-<h3 id="theta-sketches-estimating-stream-expression-cardinalities"><a
href="/docs/Theta/ThetaSketchFramework.html">Theta Sketches</a>: Estimating
Stream Expression Cardinalities</h3>
+<h3 id="theta-sketches-estimating-stream-expression-cardinalities"><a
href="/docs/Theta/ThetaSketches.html">Theta Sketches</a>: Estimating Stream
Expression Cardinalities</h3>
<p>Internet content, search and media companies like Yahoo, Google, Facebook,
etc., collect many tens of billions of event records from the many millions of
users to their web sites each day. These events can be classified by many
different dimensions, such as the page visited and user location and profile
information. Each event also contains some unique identifiers associated with
the user, specific device (cell phone, tablet, or computer) and the web browser
used.</p>
<p><img class="doc-img-full" src="/docs/img/PeopleCloud.png" alt="PeopleCloud"
/></p>
@@ -338,7 +338,7 @@
<p>These same unique identifiers will appear on every page that the user
visits. In order to measure the number of unique identifiers on a page or
across a number of different pages, it is necessary to discount the identifier
duplicates. Obtaining an exact answer to a <em>COUNT DISTINCT</em> query with
massive data is a difficult computational challenge. It is even more
challenging if it is necessary to compute arbitrary expressions across sets of
unique identifiers. For example, if se [...]
<p>Computing cardinalities with massive data requires lots of computer
resources and time.
-However, if an approximate answer to these problems is acceptable, <a
href="/docs/Theta/ThetaSketchFramework.html">Theta Sketches</a> can provide
reasonable estimates, in a single pass, orders of magnitude faster, even fast
enough for analysis in near-real time.</p>
+However, if an approximate answer to these problems is acceptable, <a
href="/docs/Theta/ThetaSketches.html">Theta Sketches</a> can provide reasonable
estimates, in a single pass, orders of magnitude faster, even fast enough for
analysis in near-real time.</p>
<p>The <a
href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/theta/Sketch.java">theta/Sketch</a>
can operate both on-heap and off-heap, has powerful Union, Intersection, AnotB
and Jaccard operators, has a high-performance concurrent form for
multi-threaded environments, has both immutable compact, and updatable
representations, and is quite fast. Because of its flexibility, it is one of
the most popular sketches in our library.</p>
@@ -354,7 +354,7 @@ However, if an approximate answer to these problems is
acceptable, <a href="/doc
<li><a
href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/tuple/arrayofdoubles/ArrayOfDoublesSketch.java">tuple/ArrayOfDoublesSketch</a>,
which enables the user to specify the number of columns of double values as
the <em>summary</em>. This variant also provides both on-heap and off-heap
operation.</li>
</ul>
-<h3 id="hyperloglog-sketches-estimating-stream-cardinalities"><a
href="/docs/HLL/HLL.html">HyperLogLog Sketches</a>: Estimating Stream
Cardinalities</h3>
+<h3 id="hyperloglog-sketches-estimating-stream-cardinalities"><a
href="/docs/HLL/HllSketches.html">HyperLogLog Sketches</a>: Estimating Stream
Cardinalities</h3>
<p>The HyperLogLog (HLL) is a cardinality sketch similar to the above Theta
sketches except they are anywhere from 2 to 16 times smaller in size. The HLL
sketches can be merged via the Union operator, but set intersection and
difference operations are not provided intrinsically, because the resulting
error would be quite poor. If your application only requires cardinality
estimation and merging and space is at a premium, the HLL or CPC sketches would
be your best choice.</p>
<p>The <a
href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/hll/HllSketch.java">hll/HllSketch</a>
can operate both on-heap and off-heap, provides the Union operators, and has
both immutable compact and updatable representations.</p>
diff --git a/output/docs/HLL/Hll_vs_CS_Hllpp.html
b/output/docs/HLL/Hll_vs_CS_Hllpp.html
index 30d5c6d8..015e4874 100644
--- a/output/docs/HLL/Hll_vs_CS_Hllpp.html
+++ b/output/docs/HLL/Hll_vs_CS_Hllpp.html
@@ -559,7 +559,7 @@ Note that the Y-axis scale is now 100 nanoseconds. Some of
the peaks in these pl
<ul>
<li>[1] <a
href="https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/hll">DataSketches
HllSketch GitHub</a></li>
- <li>[2] <a
href="https://datasketches.apache.org/docs/HLL/HLL.html">DataSketches HllSketch
JavaDocs (top of page)</a></li>
+ <li>[2] <a
href="https://datasketches.apache.org/docs/HLL/HllSketches.html">DataSketches
HllSketch JavaDocs (top of page)</a></li>
<li>[3] <a
href="https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java">HyperLogLogPlus
GitHub</a></li>
<li>[4] <a
href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf">Google:
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art
Cardinality Estimation Algorithm</a></li>
<li>[5] The Root-Mean-Square of the Relative Error (RMS-RE) is sensitive to
bias of the mean if there is any. However, if the bias is zero RMS-RE will
produce the same results as the theoretical Relative Standard Error (RSE) of
the stochastic process.</li>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]