This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 8b01e3b8 Automatic Site Publish by Buildbot
8b01e3b8 is described below
commit 8b01e3b8816189e338ae2b65ab7af9855326f817
Author: buildbot <[email protected]>
AuthorDate: Sun Jan 12 20:28:17 2025 +0000
Automatic Site Publish by Buildbot
---
.../RunningCharacterizationRemotely.html | 28 +++---
output/docs/CPC/CpcCppExample.html | 2 +-
output/docs/CPC/CpcHiveExample.html | 4 +-
output/docs/CPC/CpcJavaExample.html | 2 +-
output/docs/CPC/CpcPigExample.html | 8 +-
output/docs/CPC/CpcSketches.html | 10 +-
output/docs/Community/Downloads.html | 4 +-
.../Community/ReleaseProcessForCppComponents.html | 34 +++----
.../Community/ReleaseProcessForJavaComponents.html | 102 ++++++++++-----------
output/docs/Frequency/FrequencySketches.html | 4 +-
.../Frequency/FrequentDistinctTuplesSketch.html | 10 +-
output/docs/Frequency/FrequentItemsCppExample.html | 2 +-
output/docs/Frequency/FrequentItemsHiveUDFs.html | 2 +-
.../docs/Frequency/FrequentItemsJavaExample.html | 2 +-
output/docs/Frequency/FrequentItemsPigUDFs.html | 10 +-
output/docs/Graduation/GraduationCheckList.html | 4 +-
output/docs/HLL/HllCppExample.html | 2 +-
output/docs/HLL/HllHiveUDFs.html | 4 +-
output/docs/HLL/HllJavaExample.html | 2 +-
output/docs/HLL/HllMap.html | 2 +-
output/docs/HLL/HllPigUDFs.html | 12 +--
.../HLL/HllSketchVsDruidHyperLogLogCollector.html | 2 +-
output/docs/HLL/HllSketches.html | 22 ++---
output/docs/KLL/KLLCppExample.html | 2 +-
output/docs/KLL/KLLSketch.html | 2 +-
output/docs/KLL/UnderstandingKLLBounds.html | 2 +-
output/docs/Memory/MemoryPerformance.html | 4 +-
output/docs/Quantiles/QuantilesHiveUDFs.html | 2 +-
output/docs/Quantiles/QuantilesJavaExample.html | 2 +-
output/docs/Quantiles/QuantilesPigUDFs.html | 8 +-
output/docs/Quantiles/QuantilesSketchOverview.html | 18 ++--
output/docs/Sampling/ReservoirSamplingJava.html | 4 +-
output/docs/Sampling/ReservoirSamplingPigUDFs.html | 8 +-
output/docs/Sampling/VarOptPigUDFs.html | 8 +-
output/docs/Sampling/VarOptSamplingJava.html | 8 +-
output/docs/Sampling/VarOptSamplingSketches.html | 8 +-
.../SystemIntegrations/ApachePinotIntegration.html | 2 +-
output/docs/Theta/ConcurrentThetaSketch.html | 8 +-
output/docs/Theta/ThetaAccuracyPlots.html | 2 +-
output/docs/Theta/ThetaHiveUDFs.html | 4 +-
output/docs/Theta/ThetaJavaExample.html | 2 +-
output/docs/Theta/ThetaPigUDFs.html | 10 +-
output/docs/Theta/ThetaSetOpsCornerCases.html | 2 +-
output/docs/Theta/ThetaSketchSetOps.html | 2 +-
output/docs/Theta/ThetaSketches.html | 18 ++--
output/docs/Theta/ThetaSparkExample.html | 10 +-
output/docs/Tuple/TupleEngagementExample.html | 2 +-
output/docs/Tuple/TupleHiveUDFs.html | 2 +-
output/docs/Tuple/TupleJavaExample.html | 2 +-
output/docs/Tuple/TuplePigUDFs.html | 8 +-
50 files changed, 212 insertions(+), 212 deletions(-)
diff --git a/output/docs/Architecture/RunningCharacterizationRemotely.html
b/output/docs/Architecture/RunningCharacterizationRemotely.html
index 20457c44..c9500265 100644
--- a/output/docs/Architecture/RunningCharacterizationRemotely.html
+++ b/output/docs/Architecture/RunningCharacterizationRemotely.html
@@ -345,27 +345,27 @@
<p>The following is a short manual how to set up running <a
href="https://github.com/apache/datasketches-characterization">characterization
tests</a> remotely on a server (with an access to Linux command line) so that
the characterization code sees local changes in the implementation of the
sketches — here, we choose the <a
href="https://github.com/apache/datasketches-java">Java implemenation</a> to
experiment with. This allows us, for example, to verify a bug fix or test
effects of chan [...]
-<p>As a first step, check that the server has Java version 1.8, using <code
class="highlighter-rouge">java -version</code>, as the DataSketches code
requires JDK8. If not, install (or ask administrators to install) OpenJDK 8.
Furthermore, we will need <a href="https://maven.apache.org/">Maven</a>, which
may also be installed locally.</p>
+<p>As a first step, check that the server has Java version 1.8, using <code
class="language-plaintext highlighter-rouge">java -version</code>, as the
DataSketches code requires JDK8. If not, install (or ask administrators to
install) OpenJDK 8. Furthermore, we will need <a
href="https://maven.apache.org/">Maven</a>, which may also be installed
locally.</p>
<p>Next, start with cloning the required repositories into the desired
location, including <a
href="https://github.com/apache/datasketches-memory">DataSketches Memory</a>
which is needed as a dependency. We will assume below that all repositories are
in the same directory.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>$ git clone
https://github.com/apache/datasketches-java.git
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>$ git clone
https://github.com/apache/datasketches-java.git
$ git clone https://github.com/apache/datasketches-characterization.git
$ git clone https://github.com/apache/datasketches-memory.git
</code></pre></div></div>
<p>Then go to the DataSketches Java component, possibly change the code as
desired, and compile and install the Java component using Maven:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>$ mvn clean install
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>$ mvn clean install
</code></pre></div></div>
-<p>While the Java component gets compiled and installed in the <code
class="highlighter-rouge">target/</code> subdirectory, the build may actually
fail due to “Too many files with unapproved license” — this error should not
affect the experiments in any way. When installing the Java component after
some experimental changes, tests may need to be skipped during the installation
(unless we update them), using option <code
class="highlighter-rouge">-DskipTests=true</code>.</p>
+<p>While the Java component gets compiled and installed in the <code
class="language-plaintext highlighter-rouge">target/</code> subdirectory, the
build may actually fail due to “Too many files with unapproved license” — this
error should not affect the experiments in any way. When installing the Java
component after some experimental changes, tests may need to be skipped during
the installation (unless we update them), using option <code
class="language-plaintext highlighter-rouge">-Dsk [...]
-<p>Compile and install the Memory component in the same way, that is using
<code class="highlighter-rouge">mvn clean install</code> in directory <code
class="highlighter-rouge">datasketches-memory</code>.</p>
+<p>Compile and install the Memory component in the same way, that is using
<code class="language-plaintext highlighter-rouge">mvn clean install</code> in
directory <code class="language-plaintext
highlighter-rouge">datasketches-memory</code>.</p>
-<p>The crucial step is to make the characterization components use the local
compiled Java component and not the one in the public repository. For this, we
need to modify file <code class="highlighter-rouge">pom.xml</code> in the
datasketches-characterization repository, which contains two <code
class="highlighter-rouge">dependency</code> tags with element <code
class="highlighter-rouge">artifactID</code> set to <code
class="highlighter-rouge">datasketches-java</code>:</p>
+<p>The crucial step is to make the characterization components use the local
compiled Java component and not the one in the public repository. For this, we
need to modify file <code class="language-plaintext
highlighter-rouge">pom.xml</code> in the datasketches-characterization
repository, which contains two <code class="language-plaintext
highlighter-rouge">dependency</code> tags with element <code
class="language-plaintext highlighter-rouge">artifactID</code> set to <code
class="langua [...]
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><dependency>
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><dependency>
<groupId>org.apache.datasketches</groupId>
<artifactId>datasketches-java</artifactId>
<version>${datasketches-java.version}</version>
@@ -380,9 +380,9 @@ $ git clone
https://github.com/apache/datasketches-memory.git
</dependency>
</code></pre></div></div>
-<p>We modify both of them by adding the <code
class="highlighter-rouge">scope</code> tag, set to <code
class="highlighter-rouge">system</code>, and the <code
class="highlighter-rouge">systemPath</code> tag as follows:</p>
+<p>We modify both of them by adding the <code class="language-plaintext
highlighter-rouge">scope</code> tag, set to <code class="language-plaintext
highlighter-rouge">system</code>, and the <code class="language-plaintext
highlighter-rouge">systemPath</code> tag as follows:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><dependency>
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><dependency>
<groupId>org.apache.datasketches</groupId>
<artifactId>datasketches-java</artifactId>
<version>${datasketches-java.version}</version>
@@ -401,17 +401,17 @@ $ git clone
https://github.com/apache/datasketches-memory.git
</dependency>
</code></pre></div></div>
-<p>The particular version of the Java component needs to be checked (see the
<code class="highlighter-rouge">target/</code> subdirectory). <strong>Do not
commit the modified pom.xml file.</strong></p>
+<p>The particular version of the Java component needs to be checked (see the
<code class="language-plaintext highlighter-rouge">target/</code>
subdirectory). <strong>Do not commit the modified pom.xml file.</strong></p>
-<p>Having modified <code class="highlighter-rouge">pom.xml</code>, compile the
characterization code using <code class="highlighter-rouge">mvn clean
test</code> (no need to install this time). Some Maven warnings because of the
modified <code class="highlighter-rouge">pom.xml</code> may appear, which we
can nevertheless ignore.</p>
+<p>Having modified <code class="language-plaintext
highlighter-rouge">pom.xml</code>, compile the characterization code using
<code class="language-plaintext highlighter-rouge">mvn clean test</code> (no
need to install this time). Some Maven warnings because of the modified <code
class="language-plaintext highlighter-rouge">pom.xml</code> may appear, which
we can nevertheless ignore.</p>
-<p>Finally, choose a test to run, for example, <code
class="highlighter-rouge">KllSketchAccuracyProfile</code>. After setting up the
test configuration (in file <code
class="highlighter-rouge">datasketches-characterization/src/main/resources/quantiles/KllSketchAccuracyJob.conf</code>)
appropriately, it remains to run the test, for which we need to set the
CLASSPATH enviroment variable before running <code
class="highlighter-rouge">java</code> by executing the following in the
datasketche [...]
+<p>Finally, choose a test to run, for example, <code class="language-plaintext
highlighter-rouge">KllSketchAccuracyProfile</code>. After setting up the test
configuration (in file <code class="language-plaintext
highlighter-rouge">datasketches-characterization/src/main/resources/quantiles/KllSketchAccuracyJob.conf</code>)
appropriately, it remains to run the test, for which we need to set the
CLASSPATH enviroment variable before running <code class="language-plaintext
highlighter-rouge"> [...]
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>$ export
CLASSPATH="$CLASSPATH:$PWD/target/test-classes:$PWD/target/classes:$PWD/../datasketches-memory/target/datasketches-memory-1.3.0-SNAPSHOT.jar:$PWD/../datasketches-java/target/datasketches-java-1.4.0-SNAPSHOT.jar"
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>$ export
CLASSPATH="$CLASSPATH:$PWD/target/test-classes:$PWD/target/classes:$PWD/../datasketches-memory/target/datasketches-memory-1.3.0-SNAPSHOT.jar:$PWD/../datasketches-java/target/datasketches-java-1.4.0-SNAPSHOT.jar"
$ java -ea -Dfile.encoding=UTF-8 org.apache.datasketches.Job
src/main/resources/quantiles/KllSketchAccuracyJob.conf
</code></pre></div></div>
-<p>Then the test should run and when finishes, it writes the test results into
a file in directory <code
class="highlighter-rouge">datasketches-characterization</code>.</p>
+<p>Then the test should run and when finishes, it writes the test results into
a file in directory <code class="language-plaintext
highlighter-rouge">datasketches-characterization</code>.</p>
</div> <!-- End content -->
</div> <!-- End row -->
diff --git a/output/docs/CPC/CpcCppExample.html
b/output/docs/CPC/CpcCppExample.html
index d7ba5290..95bc271f 100644
--- a/output/docs/CPC/CpcCppExample.html
+++ b/output/docs/CPC/CpcCppExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="cpc-sketch-c-example">CPC Sketch C++ Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>#include <iostream>
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>#include <iostream>
#include <fstream>
#include <cpc_sketch.hpp>
diff --git a/output/docs/CPC/CpcHiveExample.html
b/output/docs/CPC/CpcHiveExample.html
index 6e428571..5ab52bef 100644
--- a/output/docs/CPC/CpcHiveExample.html
+++ b/output/docs/CPC/CpcHiveExample.html
@@ -345,7 +345,7 @@
<h3 id="building-sketches-merging-sketches-and-getting-estimates">Building
sketches, merging sketches and getting estimates</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
@@ -385,7 +385,7 @@ Output:
<p>Notice the difference between UnionUDF in this example, which takes two
sketches, and UnionUDAF in the previous example, which is an aggregate function
taking a collection of sketches as one parameter.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
diff --git a/output/docs/CPC/CpcJavaExample.html
b/output/docs/CPC/CpcJavaExample.html
index 067f277b..b7fa6018 100644
--- a/output/docs/CPC/CpcJavaExample.html
+++ b/output/docs/CPC/CpcJavaExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="cpc-sketch-java-example">CPC Sketch Java Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.FileInputStream;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.FileInputStream;
import java.io.FileOutputStream;
import com.yahoo.memory.Memory;
import com.yahoo.sketches.cpc.CpcSketch;
diff --git a/output/docs/CPC/CpcPigExample.html
b/output/docs/CPC/CpcPigExample.html
index d70c8684..ee87ff0f 100644
--- a/output/docs/CPC/CpcPigExample.html
+++ b/output/docs/CPC/CpcPigExample.html
@@ -356,7 +356,7 @@
<h3
id="cpcpig-script-building-sketches-merging-sketches-and-getting-estimates">cpc.pig
script: building sketches, merging sketches and getting estimates</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -396,18 +396,18 @@ Most of the IDs in these categories overlap, so that
there are 60 unique IDs in
<p>Results:
From ‘dump d’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,50.09992602861082)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,50.09992602861082)
(b,50.09992602861082)
</code></pre></div></div>
<p>From ‘dump g’ (merged across categories):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(60.14445031168714)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(60.14445031168714)
</code></pre></div></div>
<p>From ‘dump h’ (with error bounds, 99% confidence interval):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(60.14445031168714,60.0,63.0)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(60.14445031168714,60.0,63.0)
</code></pre></div></div>
</div> <!-- End content -->
diff --git a/output/docs/CPC/CpcSketches.html b/output/docs/CPC/CpcSketches.html
index 8aa46345..614e4523 100644
--- a/output/docs/CPC/CpcSketches.html
+++ b/output/docs/CPC/CpcSketches.html
@@ -358,8 +358,8 @@
</ul>
<p><a id="cpc-sketches"></a></p>
-<h2 id="compressed-probability-counting-cpc-sketches">Compressed Probability
Counting (CPC) Sketches<sup id="fnref:1"><a href="#fn:1"
class="footnote">1</a></sup></h2>
-<p>The cpc package contains implementations of Kevin J. Lang’s CPC sketch<sup
id="fnref:1:1"><a href="#fn:1" class="footnote">1</a></sup>.
+<h2 id="compressed-probability-counting-cpc-sketches">Compressed Probability
Counting (CPC) Sketches<sup id="fnref:1" role="doc-noteref"><a href="#fn:1"
class="footnote" rel="footnote">1</a></sup></h2>
+<p>The cpc package contains implementations of Kevin J. Lang’s CPC sketch<sup
id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote"
rel="footnote">1</a></sup>.
The stored CPC sketch can consume about 40% less space than an HLL sketch of
comparable accuracy.
Nonetheless, the HLL and CPC sketches have been intentially designed to offer
different tradeoffs so that, in fact, they complement each other in many
ways.</p>
@@ -374,10 +374,10 @@ Nonetheless, the HLL and CPC sketches have been
intentially designed to offer di
<hr />
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
<ol>
- <li id="fn:1">
- <p>Kevin J Lang. Back to the future: an even more nearly optimal
cardinality estimation algorithm. arXiv preprint
https://arxiv.org/abs/1708.06839, 2017. <a href="#fnref:1"
class="reversefootnote">↩</a> <a href="#fnref:1:1"
class="reversefootnote">↩<sup>2</sup></a></p>
+ <li id="fn:1" role="doc-endnote">
+ <p>Kevin J Lang. Back to the future: an even more nearly optimal
cardinality estimation algorithm. arXiv preprint
https://arxiv.org/abs/1708.06839, 2017. <a href="#fnref:1"
class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:1:1"
class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
</ol>
</div>
diff --git a/output/docs/Community/Downloads.html
b/output/docs/Community/Downloads.html
index 97e28e19..5a4d2fe5 100644
--- a/output/docs/Community/Downloads.html
+++ b/output/docs/Community/Downloads.html
@@ -529,10 +529,10 @@ the release was generated from.</p>
<li>Click on <strong>[Releases]</strong></li>
</ul>
-<h4 id="or-you-can-go-directly-by-apending-releases-to-the-url-as-in">Or, you
can go directly by apending <code class="highlighter-rouge">/releases</code> to
the URL as in</h4>
+<h4 id="or-you-can-go-directly-by-apending-releases-to-the-url-as-in">Or, you
can go directly by apending <code class="language-plaintext
highlighter-rouge">/releases</code> to the URL as in</h4>
<ul>
- <li><code
class="highlighter-rouge">https://github.com/apache/datasketches-java/releases</code></li>
+ <li><code class="language-plaintext
highlighter-rouge">https://github.com/apache/datasketches-java/releases</code></li>
</ul>
<h2 id="release-philosophy">Release Philosophy</h2>
diff --git a/output/docs/Community/ReleaseProcessForCppComponents.html
b/output/docs/Community/ReleaseProcessForCppComponents.html
index abbdd7e3..4b9f0a01 100644
--- a/output/docs/Community/ReleaseProcessForCppComponents.html
+++ b/output/docs/Community/ReleaseProcessForCppComponents.html
@@ -373,27 +373,27 @@
<ul>
<li>To confirm <em>gpg-agent</em> is running type:
<ul>
- <li><code class="highlighter-rouge">ps -axww | grep gpg</code> #
you should see something like:
+ <li><code class="language-plaintext highlighter-rouge">ps -axww |
grep gpg</code> # you should see something like:
<ul>
<li><em>64438 ?? 0:30.33 gpg-agent –homedir
/Users/<name>/.gnupg –use-standard-socket –daemon</em></li>
</ul>
</li>
<li>To start GPG if GPG Agent is not running:
<ul>
- <li><code class="highlighter-rouge">eval $(gpg-agent
--daemon)</code></li>
+ <li><code class="language-plaintext highlighter-rouge">eval
$(gpg-agent --daemon)</code></li>
</ul>
</li>
</ul>
</li>
<li>Confirm GitHub repository is current and git status is clean:
<ul>
- <li><code class="highlighter-rouge">git status</code> # should
return:</li>
+ <li><code class="language-plaintext highlighter-rouge">git
status</code> # should return:</li>
<li>“nothing to commit, working tree clean”</li>
</ul>
</li>
<li>At major version releases, search for deprecated code and remove at
<strong>Major Versions</strong> only.
<ul>
- <li><code class="highlighter-rouge">find . -name "*.?pp" -type f
-print | xargs grep -i -n -s -A0 "deprecated"</code></li>
+ <li><code class="language-plaintext highlighter-rouge">find . -name
"*.?pp" -type f -print | xargs grep -i -n -s -A0 "deprecated"</code></li>
<li>you may need to ignore false positives in the pybind11
directory</li>
</ul>
</li>
@@ -415,7 +415,7 @@
<li><strong>DO NOT MERGE THIS PERMANENT BRANCH INTO MASTER</strong></li>
<li>Do explicit push of tags on branch “A.B.X” to origin:
<ul>
- <li><code class="highlighter-rouge">git push origin
--tags</code></li>
+ <li><code class="language-plaintext highlighter-rouge">git push
origin --tags</code></li>
</ul>
</li>
</ul>
@@ -443,22 +443,22 @@
<ul>
<li>If you have not already, on your system create the two directory
structures that mirror the dist.apache.org/repos/ directories:
<ul>
- <li><code class="highlighter-rouge">mkdir
dist/dev/datasketches/</code></li>
- <li><code class="highlighter-rouge">mkdir
dist/release/datasketches/</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mkdir
dist/dev/datasketches/</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mkdir
dist/release/datasketches/</code></li>
</ul>
</li>
<li>Checkout both “dev” and “release” directories
<ul>
<li>Open a terminal in the dist/dev/datasketches directory and do a
checkout:
<ul>
- <li><code class="highlighter-rouge">svn co
https://dist.apache.org/repos/dist/dev/datasketches/ .</code> #Note the
DOT</li>
- <li><code class="highlighter-rouge">svn status</code> # make sure
it is clean</li>
+ <li><code class="language-plaintext highlighter-rouge">svn co
https://dist.apache.org/repos/dist/dev/datasketches/ .</code> #Note the
DOT</li>
+ <li><code class="language-plaintext highlighter-rouge">svn
status</code> # make sure it is clean</li>
</ul>
</li>
<li>Open a terminal in the dist/release/datasketches directory and do a
checkout:
<ul>
- <li><code class="highlighter-rouge">svn co
https://dist.apache.org/repos/dist/release/datasketches/ .</code> #Note the
DOT</li>
- <li><code class="highlighter-rouge">svn status</code> # make sure
it is clean</li>
+ <li><code class="language-plaintext highlighter-rouge">svn co
https://dist.apache.org/repos/dist/release/datasketches/ .</code> #Note the
DOT</li>
+ <li><code class="language-plaintext highlighter-rouge">svn
status</code> # make sure it is clean</li>
</ul>
</li>
</ul>
@@ -479,27 +479,27 @@
<ul>
<li>To confirm <em>gpg-agent</em> is running type:
<ul>
- <li><code class="highlighter-rouge">ps -axww | grep gpg</code> #
you should see something like:
+ <li><code class="language-plaintext highlighter-rouge">ps -axww |
grep gpg</code> # you should see something like:
<ul>
<li><em>64438 ?? 0:30.33 gpg-agent –homedir
/Users/<name>/.gnupg –use-standard-socket –daemon</em></li>
</ul>
</li>
<li>To start GPG if GPG Agent is not running:
<ul>
- <li><code class="highlighter-rouge">eval $(gpg-agent
--daemon)</code></li>
+ <li><code class="language-plaintext highlighter-rouge">eval
$(gpg-agent --daemon)</code></li>
</ul>
</li>
</ul>
</li>
<li>Run something like:
<ul>
- <li><code class="highlighter-rouge">./bashDeployToDist.sh
/Users/\<name\>/dev/git/Apache/datasketches-\<component\>
datasketches-\<component\> A.B.0-RC1</code></li>
+ <li><code class="language-plaintext
highlighter-rouge">./bashDeployToDist.sh
/Users/\<name\>/dev/git/Apache/datasketches-\<component\>
datasketches-\<component\> A.B.0-RC1</code></li>
<li>Follow the instructions.</li>
<li>NOTE: if you get the error “gpg: signing failed: No pinentry”:
<ul>
<li>open .gnupg/gpg-agent.conf</li>
<li>change to: pinentry-program /usr/local/bin/pinentry-tty</li>
- <li>reload the gpg agent in the terminal: <code
class="highlighter-rouge">gpg-connect-agent reloadagent /bye</code></li>
+ <li>reload the gpg agent in the terminal: <code
class="language-plaintext highlighter-rouge">gpg-connect-agent reloadagent
/bye</code></li>
<li>restart the ./bashDeployToDist script</li>
</ul>
</li>
@@ -561,7 +561,7 @@
<li>At that same GitHub ID hash, create a new tag A.B.0 (without the
RCn).</li>
<li>From the Command Line: Push the new tag to origin:
<ul>
- <li><code class="highlighter-rouge">git push origin --tags</code></li>
+ <li><code class="language-plaintext highlighter-rouge">git push origin
--tags</code></li>
</ul>
</li>
<li>On the GitHub component site document the release</li>
@@ -574,7 +574,7 @@
<li>Make sure you local website directory is pointing to master and
up-to-date.</li>
<li>Run the following with the argument specifying the location of your
local website directory:
<ul>
- <li><code class="highlighter-rouge">./createDownloadsInclude.sh
/Users/\<name\>/ ... /datasketches-website</code></li>
+ <li><code class="language-plaintext
highlighter-rouge">./createDownloadsInclude.sh /Users/\<name\>/ ...
/datasketches-website</code></li>
</ul>
</li>
<li>When this is done, be sure to commit the changes to the website.</li>
diff --git a/output/docs/Community/ReleaseProcessForJavaComponents.html
b/output/docs/Community/ReleaseProcessForJavaComponents.html
index adf2e81d..770a4bde 100644
--- a/output/docs/Community/ReleaseProcessForJavaComponents.html
+++ b/output/docs/Community/ReleaseProcessForJavaComponents.html
@@ -359,7 +359,7 @@
<li>Confirm any new bug fixes have corresponding tests</li>
<li>At major version releases, search for deprecated code and remove at
<strong>Major Versions</strong> only.
<ul>
- <li><code class="highlighter-rouge">find . -name "*.java" -type f -print
| xargs grep -n -s -A0 "deprecat"</code> <br /> (This will find “deprecated”,
“Deprecating”, “deprecation”, etc.)</li>
+ <li><code class="language-plaintext highlighter-rouge">find . -name
"*.java" -type f -print | xargs grep -n -s -A0 "deprecat"</code> <br /> (This
will find “deprecated”, “Deprecating”, “deprecation”, etc.)</li>
<li><strong>Note:</strong> When first marking a segment of code
deprecated, please add the current version number. This will make it easier to
know when to remove the deprecated code.</li>
</ul>
</li>
@@ -368,23 +368,23 @@
<h3 id="check-pom-for-correctness">Check POM for Correctness</h3>
<ul>
- <li><code class="highlighter-rouge">mvn clean</code> will check for most
things</li>
+ <li><code class="language-plaintext highlighter-rouge">mvn clean</code> will
check for most things</li>
<li>Import the pom into Eclipse’s POM Editor</li>
</ul>
<h4 id="check-project-dependencies">Check Project Dependencies</h4>
<ul>
- <li><code class="highlighter-rouge">mvn dependency:tree</code></li>
- <li><code class="highlighter-rouge">mvn
versions:display-dependency-updates</code> <br />
+ <li><code class="language-plaintext highlighter-rouge">mvn
dependency:tree</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mvn
versions:display-dependency-updates</code> <br />
For Java 8, the TestNG version must remain at 7.5.1</li>
</ul>
<h4 id="check-pom-plugin-updates">Check POM Plugin Updates</h4>
<ul>
- <li><code class="highlighter-rouge">mvn
versions:display-plugin-updates</code></li>
- <li><code class="highlighter-rouge">mvn
versions:display-property-updates</code> For all POM properties</li>
+ <li><code class="language-plaintext highlighter-rouge">mvn
versions:display-plugin-updates</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mvn
versions:display-property-updates</code> For all POM properties</li>
</ul>
<h3 id="visual-checks-for-correctness">Visual Checks for Correctness</h3>
@@ -404,18 +404,18 @@ Also specifies line separator characters for text
files.</li>
<h3 id="run-maven-tests">Run Maven Tests</h3>
<ul>
- <li><code class="highlighter-rouge">mvn apache-rat:check</code></li>
- <li><code class="highlighter-rouge">mvn clean test</code></li>
- <li><code class="highlighter-rouge">mvn clean test -P check-cpp-files</code>
(only for ds-java)</li>
- <li><code class="highlighter-rouge">mvn clean test -P
check-cpp-historical-files</code> (only for ds-java)</li>
- <li><code class="highlighter-rouge">mvn clean javadoc:javadoc</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mvn
apache-rat:check</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mvn clean
test</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mvn clean test -P
check-cpp-files</code> (only for ds-java)</li>
+ <li><code class="language-plaintext highlighter-rouge">mvn clean test -P
check-cpp-historical-files</code> (only for ds-java)</li>
+ <li><code class="language-plaintext highlighter-rouge">mvn clean
javadoc:javadoc</code></li>
</ul>
<h3 id="run-ide-checks">Run IDE Checks</h3>
<ul>
<li>Run Code Coverage > 90%
<ul>
- <li><strong>Hint:</strong> Run <code class="highlighter-rouge">mvn clean
test</code> first so that /target/ has a complete set of class files before you
attempt the Eclipse <code class="highlighter-rouge">coverage</code> test.</li>
+ <li><strong>Hint:</strong> Run <code class="language-plaintext
highlighter-rouge">mvn clean test</code> first so that /target/ has a complete
set of class files before you attempt the Eclipse <code
class="language-plaintext highlighter-rouge">coverage</code> test.</li>
</ul>
</li>
<li>SpotBugs checks (is it properly configured?)</li>
@@ -446,7 +446,7 @@ Also specifies line separator characters for text
files.</li>
<li>From command line at component root on master/main branch:</li>
<li>Confirm master/main branch is current and git status is clean:
<ul>
- <li><code class="highlighter-rouge">git status</code> # should
return:</li>
+ <li><code class="language-plaintext highlighter-rouge">git status</code>
# should return:</li>
<li>“On branch master/main, your branch is up to date with
‘origin/master(main)’, nothing to commit, working tree clean.”</li>
</ul>
</li>
@@ -474,14 +474,14 @@ Also specifies line separator characters for text
files.</li>
<li><strong>NOW PUSH</strong> Release Branch “A.B.X” with edited pom.xml to
origin <strong>DO NOT MERGE THIS PERMANENT BRANCH INTO MASTER/MAIN</strong></li>
<li>Do explicit push of tags on new branch A.B.X (or A’.0.X) to origin:
<ul>
- <li><code class="highlighter-rouge">git push origin --tags</code></li>
+ <li><code class="language-plaintext highlighter-rouge">git push origin
--tags</code></li>
</ul>
</li>
</ul>
<h3 id="run-maven-install-on-release-branch">Run Maven Install on Release
Branch</h3>
<ul>
- <li><code class="highlighter-rouge">mvn clean install
-DskipTests=true</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mvn clean install
-DskipTests=true</code></li>
<li>Check that the /target/ directory has 5 jars: (may need to refresh)
<ul>
<li>datasketches-<component-version>-javadoc.jar</li>
@@ -523,8 +523,8 @@ Also specifies line separator characters for text
files.</li>
<ul>
<li>On your system create the two directory structures that mirror the
dist.apache.org/repos/ directories:
<ul>
- <li><code class="highlighter-rouge">mkdir
dist/dev/datasketches/</code></li>
- <li><code class="highlighter-rouge">mkdir
dist/release/datasketches/</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mkdir
dist/dev/datasketches/</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mkdir
dist/release/datasketches/</code></li>
</ul>
</li>
</ul>
@@ -533,8 +533,8 @@ Also specifies line separator characters for text
files.</li>
<ul>
<li>Open a terminal in the dist/dev/datasketches directory and do a checkout:
<ul>
- <li><code class="highlighter-rouge">svn co
https://dist.apache.org/repos/dist/dev/datasketches/ .</code> #Note the
DOT</li>
- <li><code class="highlighter-rouge">svn status</code> # make sure it
is clean: does not list any (?) or (!) files
+ <li><code class="language-plaintext highlighter-rouge">svn co
https://dist.apache.org/repos/dist/dev/datasketches/ .</code> #Note the
DOT</li>
+ <li><code class="language-plaintext highlighter-rouge">svn status</code>
# make sure it is clean: does not list any (?) or (!) files
<ul>
<li>If any (?) or (!) files exist they must be resolved before
proceding.</li>
</ul>
@@ -557,24 +557,24 @@ Also specifies line separator characters for text
files.</li>
<ul>
<li>Confirm <em>gpg-agent</em> is running:
<ul>
- <li><code class="highlighter-rouge">eval $(gpg-agent --daemon)</code>
+ <li><code class="language-plaintext highlighter-rouge">eval
$(gpg-agent --daemon)</code>
<ul>
<li>if it is not running it will start it</li>
<li>if it is already running you will see something like:</li>
- <li><code class="highlighter-rouge">gpg-agent: a gpg-agent is
already running - not starting a new one</code></li>
+ <li><code class="language-plaintext
highlighter-rouge">gpg-agent: a gpg-agent is already running - not starting a
new one</code></li>
</ul>
</li>
</ul>
</li>
<li>Run something like this (you must copy & edit):
<ul>
- <li><code class="highlighter-rouge">./bashDeployToDist.sh
/Users/<name>/dev/git/Apache/datasketches-<component>
datasketches-<component> A.B.0-RC1</code></li>
+ <li><code class="language-plaintext
highlighter-rouge">./bashDeployToDist.sh
/Users/<name>/dev/git/Apache/datasketches-<component>
datasketches-<component> A.B.0-RC1</code></li>
<li>Follow the instructions.</li>
<li>NOTE: if you get the error “gpg: signing failed: No pinentry”:
<ul>
<li>open .gnupg/gpg-agent.conf</li>
<li>change to: pinentry-program
<em>/usr/local/bin/pinentry-tty</em></li>
- <li>reload the gpg agent in the terminal: <code
class="highlighter-rouge">gpg-connect-agent reloadagent /bye</code></li>
+ <li>reload the gpg agent in the terminal: <code
class="language-plaintext highlighter-rouge">gpg-connect-agent reloadagent
/bye</code></li>
<li>restart the <em>./bashDeployToDist</em> script</li>
</ul>
</li>
@@ -589,7 +589,7 @@ Also specifies line separator characters for text
files.</li>
<ul>
<li>Check this web URL ~
<em>https://dist.apache.org/repos/dist/dev/datasketches/<component>/A.B.0-RC1/</em>
<ul>
- <li>Update SVN: <code class="highlighter-rouge">svn up</code></li>
+ <li>Update SVN: <code class="language-plaintext highlighter-rouge">svn
up</code></li>
<li>There should be 3 files: *-src.zip, *-src.zip.asc,
*-src.zip.sha512</li>
<li>Copy the URL for later.</li>
</ul>
@@ -606,24 +606,24 @@ Also specifies line separator characters for text
files.</li>
<ul>
<li>Confirm <em>gpg-agent</em> is running:
<ul>
- <li><code class="highlighter-rouge">eval $(gpg-agent --daemon)</code>
+ <li><code class="language-plaintext highlighter-rouge">eval
$(gpg-agent --daemon)</code>
<ul>
<li>if it is not running it will start it</li>
<li>if it is already running you will see something like:</li>
- <li><code class="highlighter-rouge">gpg-agent: a gpg-agent is
already running - not starting a new one</code></li>
+ <li><code class="language-plaintext
highlighter-rouge">gpg-agent: a gpg-agent is already running - not starting a
new one</code></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
- <li><code class="highlighter-rouge">git status</code> # make sure you are
still on the release branch: A.B.X</li>
+ <li><code class="language-plaintext highlighter-rouge">git status</code> #
make sure you are still on the release branch: A.B.X</li>
</ul>
<h3 id="trial-run">TRIAL-RUN:</h3>
<ul>
<li><strong>Have your GPG passphrase handy – you may have only a few seconds
to enter it!</strong></li>
- <li><code class="highlighter-rouge">mvn clean install -Pnexus-jars
-DskipTests=true</code>
+ <li><code class="language-plaintext highlighter-rouge">mvn clean install
-Pnexus-jars -DskipTests=true</code>
<ul>
<li>Check target/ that jars & pom exist</li>
</ul>
@@ -633,7 +633,7 @@ Also specifies line separator characters for text
files.</li>
<h3 id="deploy">DEPLOY</h3>
<ul>
<li><strong>Have your GPG passphrase handy – you may have only a few seconds
to enter it, but it may be automatic!</strong></li>
- <li><code class="highlighter-rouge">mvn clean deploy -Pnexus-jars
-DskipTests=true</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mvn clean deploy
-Pnexus-jars -DskipTests=true</code></li>
</ul>
<h4 id="deploy-check">DEPLOY-CHECK</h4>
@@ -668,12 +668,12 @@ Also specifies line separator characters for text
files.</li>
<h4 id="java-only">JAVA ONLY</h4>
<ul>
<li>Place copies of the artifact jars deployed to Nexus under a “maven”
directory. For example see <a
href="https://dist.apache.org/repos/dist/dev/datasketches/memory/3.0.0-RC1/">https://dist.apache.org/repos/dist/dev/datasketches/memory/3.0.0-RC1/</a></li>
- <li>Note that the <code class="highlighter-rouge">jar</code> files with
their <code class="highlighter-rouge">asc</code>, (optional <code
class="highlighter-rouge">md5</code> and <code
class="highlighter-rouge">sha1</code>) signature are all together in the .md2
archive</li>
- <li>Add a <code class="highlighter-rouge">maven</code> directory under the
<code
class="highlighter-rouge">dist/dev/datasketches/<component>/A.B.0/</code></li>
- <li>Bulk copy the <code class="highlighter-rouge">jar, asc, (optional
md5</code> and <code class="highlighter-rouge">sha1</code>) files into the
<code class="highlighter-rouge">maven</code> directory.</li>
- <li><code class="highlighter-rouge">svn status</code> # check to see if it
is ready to add</li>
- <li><code class="highlighter-rouge">svn add . --force</code></li>
- <li><code class="highlighter-rouge">svn ci -m "add nexus jars to
dist/dev/datasketches"</code></li>
+ <li>Note that the <code class="language-plaintext
highlighter-rouge">jar</code> files with their <code class="language-plaintext
highlighter-rouge">asc</code>, (optional <code class="language-plaintext
highlighter-rouge">md5</code> and <code class="language-plaintext
highlighter-rouge">sha1</code>) signature are all together in the .md2
archive</li>
+ <li>Add a <code class="language-plaintext highlighter-rouge">maven</code>
directory under the <code class="language-plaintext
highlighter-rouge">dist/dev/datasketches/<component>/A.B.0/</code></li>
+ <li>Bulk copy the <code class="language-plaintext highlighter-rouge">jar,
asc, (optional md5</code> and <code class="language-plaintext
highlighter-rouge">sha1</code>) files into the <code class="language-plaintext
highlighter-rouge">maven</code> directory.</li>
+ <li><code class="language-plaintext highlighter-rouge">svn status</code> #
check to see if it is ready to add</li>
+ <li><code class="language-plaintext highlighter-rouge">svn add .
--force</code></li>
+ <li><code class="language-plaintext highlighter-rouge">svn ci -m "add nexus
jars to dist/dev/datasketches"</code></li>
</ul>
<h4 id="non-java">NON-JAVA</h4>
@@ -703,11 +703,11 @@ Also specifies line separator characters for text
files.</li>
<ul>
<li>Open Terminal #1
<ul>
- <li>Confirm you are in the <code
class="highlighter-rouge">/dev/</code> directory: <code
class="highlighter-rouge">pwd</code></li>
+ <li>Confirm you are in the <code class="language-plaintext
highlighter-rouge">/dev/</code> directory: <code class="language-plaintext
highlighter-rouge">pwd</code></li>
<li>Perform SVN Checkout:
<ul>
- <li><code class="highlighter-rouge">svn co
https://dist.apache.org/repos/dist/dev/datasketches/ .</code> #note dot at
end</li>
- <li><code class="highlighter-rouge">svn status</code> #make sure
checkout is clean: does not list any (?) or (!) files</li>
+ <li><code class="language-plaintext highlighter-rouge">svn co
https://dist.apache.org/repos/dist/dev/datasketches/ .</code> #note dot at
end</li>
+ <li><code class="language-plaintext highlighter-rouge">svn
status</code> #make sure checkout is clean: does not list any (?) or (!)
files</li>
</ul>
</li>
</ul>
@@ -718,16 +718,16 @@ Also specifies line separator characters for text
files.</li>
<ul>
<li>Open Terminal #2
<ul>
- <li>Confirm you are in the <code
class="highlighter-rouge">/release/</code> directory: <code
class="highlighter-rouge">pwd</code></li>
+ <li>Confirm you are in the <code class="language-plaintext
highlighter-rouge">/release/</code> directory: <code class="language-plaintext
highlighter-rouge">pwd</code></li>
<li>Perform SVN Checkout:
<ul>
- <li><code class="highlighter-rouge">svn co
https://dist.apache.org/repos/dist/release/datasketches/ .</code> #note dot at
end</li>
- <li><code class="highlighter-rouge">svn status</code> #make sure
checkout is clean: does not list any (?) or (!) files</li>
+ <li><code class="language-plaintext highlighter-rouge">svn co
https://dist.apache.org/repos/dist/release/datasketches/ .</code> #note dot at
end</li>
+ <li><code class="language-plaintext highlighter-rouge">svn
status</code> #make sure checkout is clean: does not list any (?) or (!)
files</li>
</ul>
</li>
<li>Create new version directory under appropriate component
directory:
<ul>
- <li><code class="highlighter-rouge">mkdir -p
<component>/A.B.0</code></li>
+ <li><code class="language-plaintext highlighter-rouge">mkdir -p
<component>/A.B.0</code></li>
</ul>
</li>
</ul>
@@ -741,12 +741,12 @@ Also specifies line separator characters for text
files.</li>
</li>
<li>Using Terminal #2 at … /dist/release/datasketches directory:
<ul>
- <li><code class="highlighter-rouge">svn add . --force</code></li>
- <li><code class="highlighter-rouge">svn ci -m "Release
A.B.0"</code></li>
+ <li><code class="language-plaintext highlighter-rouge">svn add .
--force</code></li>
+ <li><code class="language-plaintext highlighter-rouge">svn ci -m
"Release A.B.0"</code></li>
<li>Remove the prior release…</li>
- <li><code class="highlighter-rouge">svn remove
<component>/X.Y.0</code></li>
- <li><code class="highlighter-rouge">svn ci -m "Remove Prior
release"</code></li>
- <li><code class="highlighter-rouge">svn status</code> # should be
empty</li>
+ <li><code class="language-plaintext highlighter-rouge">svn remove
<component>/X.Y.0</code></li>
+ <li><code class="language-plaintext highlighter-rouge">svn ci -m
"Remove Prior release"</code></li>
+ <li><code class="language-plaintext highlighter-rouge">svn
status</code> # should be empty</li>
</ul>
</li>
<li>Using local file system
@@ -785,8 +785,8 @@ reflected in the master/main. Use the <strong>git
cherry-pick</strong> command f
<h3 id="update-apache-reporter">Update Apache Reporter</h3>
<ul>
- <li>Because of the commit to the <code
class="highlighter-rouge">dist/release</code> branch, you should get an
automated email requesting you to update the Apache DataBase about the
releaase. The email should point you to the <a
href="https://reporter.apache.org/addrelease.html?datasketches">Apache
Committee Report Helper</a>. You can choose to go there directly without
waiting for the notice, there is only one box to fillout.</li>
- <li>Update the full name of the component release. For example: <code
class="highlighter-rouge">Apache datasketches-memory-1.3.0</code></li>
+ <li>Because of the commit to the <code class="language-plaintext
highlighter-rouge">dist/release</code> branch, you should get an automated
email requesting you to update the Apache DataBase about the releaase. The
email should point you to the <a
href="https://reporter.apache.org/addrelease.html?datasketches">Apache
Committee Report Helper</a>. You can choose to go there directly without
waiting for the notice, there is only one box to fillout.</li>
+ <li>Update the full name of the component release. For example: <code
class="language-plaintext highlighter-rouge">Apache
datasketches-memory-1.3.0</code></li>
</ul>
<h3 id="create--document-release-tag-on-github">Create & Document Release
Tag on GitHub</h3>
@@ -796,7 +796,7 @@ reflected in the master/main. Use the <strong>git
cherry-pick</strong> command f
<li>At that same GitHub ID hash, create a new tag A.B.0 (without the
RCn).</li>
<li>From the Command Line: Push the new tag to origin:
<ul>
- <li><code class="highlighter-rouge">git push origin --tags</code></li>
+ <li><code class="language-plaintext highlighter-rouge">git push origin
--tags</code></li>
</ul>
</li>
<li>On the GitHub component site document the release</li>
@@ -809,7 +809,7 @@ reflected in the master/main. Use the <strong>git
cherry-pick</strong> command f
<li>Make sure your local website directory is pointing to master/main and
up-to-date.</li>
<li>Run the following with the argument specifying the location of your
local website directory:
<ul>
- <li><code class="highlighter-rouge">./createDownloadsInclude.sh
/Users/<name>/ ... /datasketches-website</code></li>
+ <li><code class="language-plaintext
highlighter-rouge">./createDownloadsInclude.sh /Users/<name>/ ...
/datasketches-website</code></li>
</ul>
</li>
<li>When this is done, be sure to commit the changes to the website.</li>
diff --git a/output/docs/Frequency/FrequencySketches.html
b/output/docs/Frequency/FrequencySketches.html
index 75947a84..9ca50f02 100644
--- a/output/docs/Frequency/FrequencySketches.html
+++ b/output/docs/Frequency/FrequencySketches.html
@@ -387,7 +387,7 @@ that are being downloaded from your store.</p>
<p>This is a perfect use-case for the frequencies/ItemsSketch, which is a
Generic class that can be configured to
count the number of occurrences of any arbitrary item. In this case our
song-titles are strings. For example,</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>ItemsSketch<String> sketch = new
ItemsSketch<String>();
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>ItemsSketch<String> sketch = new
ItemsSketch<String>();
while (remainingItems) { sketch.update("songTitle"); }
</code></pre></div></div>
@@ -400,7 +400,7 @@ a list of the “most frequently occurring” song titles with
an approximate co
revenue. In this case each item can carry a different “weight” which is the
price. We can use the same sketch as before, but
we update it using a “weight”.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>ItemsSketch<String> sketch = new
ItemsSketch<String>();
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>ItemsSketch<String> sketch = new
ItemsSketch<String>();
while (remainingItems) { sketch.update("songTitle", priceCents); }
</code></pre></div></div>
diff --git a/output/docs/Frequency/FrequentDistinctTuplesSketch.html
b/output/docs/Frequency/FrequentDistinctTuplesSketch.html
index 2e989dce..3089e27e 100644
--- a/output/docs/Frequency/FrequentDistinctTuplesSketch.html
+++ b/output/docs/Frequency/FrequentDistinctTuplesSketch.html
@@ -449,13 +449,13 @@ Let’s assume <i>N = 2</i> and let <i>d1 := IP
address</i>, and <i>d2 := User I
such that items with distinct frequency counts greater than 1% of the total
distinct population will be
detectable and have a distinct frequency estimation error of less than or
equal to 5% with 86% confidence.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>//Construct the sketch
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>//Construct the sketch
FdtSketch sketch = new FdtSketch(0.01, 0.05);
</code></pre></div></div>
<p>Assume the incoming data is a stream of {IP address, User ID} pairs:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>//Populate
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>//Populate
while (inputStream.hasRemainingItems()) {
String[] in = new String[] {Pair.IPaddress, Pair.userID};
sketch(in);
@@ -464,7 +464,7 @@ while (inputStream.hasRemainingItems()) {
<p>We are done populating the sketch, now we post process the data in the
sketch:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int[] priKeyIndices = new int[] {0}; //identifies the
IP address as the primary key
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int[] priKeyIndices = new int[] {0}; //identifies the
IP address as the primary key
int numStdDev = 2; //for 95% confidence intervals
int limit = 20; //list only the top 20 groups
char sep = '|'; //the separator charactor for the group dimensions as strings
@@ -478,7 +478,7 @@ while (itr.hasNext()) {
<p>If we want the converse relation we assign the UserID as the primary key.
Note that we do not have to repopulate the sketch!</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int[] priKeyIndices = new int[] {1}; //identifies the
User ID as the primary key
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int[] priKeyIndices = new int[] {1}; //identifies the
User ID as the primary key
...
</code></pre></div></div>
@@ -533,7 +533,7 @@ input stream of over 100K groups this graph is a view of
the top 500, which is m
<p>The following table is the list of the top 10 results from just one of the
trials. The Group class was extended to include more columns at the end which
were useful for this study. (This was easy to do and does not require any
special access.)</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Count Est UB LB Thresh
RSE PriKey xG yU Err
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Count Est UB LB Thresh
RSE PriKey xG yU Err
1338 16511.86 16957.05 16078.18 0.019521 0.026962 1,1,16384
1 16384 0.007804
666 8218.91 8536.72 7912.67 0.009717 0.038668 2,1,8192
2 8192 0.003285
660 8144.87 8461.30 7840.01 0.009629 0.038850 2,2,8192
2 8192 -0.005754
diff --git a/output/docs/Frequency/FrequentItemsCppExample.html
b/output/docs/Frequency/FrequentItemsCppExample.html
index 86ae3b80..9d91b571 100644
--- a/output/docs/Frequency/FrequentItemsCppExample.html
+++ b/output/docs/Frequency/FrequentItemsCppExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="frequent-items-sketch-c-example">Frequent Items Sketch C++ Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>#include <iostream>
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>#include <iostream>
#include <fstream>
#include <frequent_items_sketch.hpp>
diff --git a/output/docs/Frequency/FrequentItemsHiveUDFs.html
b/output/docs/Frequency/FrequentItemsHiveUDFs.html
index bc7a2347..4ba16928 100644
--- a/output/docs/Frequency/FrequentItemsHiveUDFs.html
+++ b/output/docs/Frequency/FrequentItemsHiveUDFs.html
@@ -343,7 +343,7 @@
-->
<h2 id="frequent-items-sketch-hive-udfs">Frequent Items Sketch Hive UDFs</h2>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
diff --git a/output/docs/Frequency/FrequentItemsJavaExample.html
b/output/docs/Frequency/FrequentItemsJavaExample.html
index c8594591..7bd2c790 100644
--- a/output/docs/Frequency/FrequentItemsJavaExample.html
+++ b/output/docs/Frequency/FrequentItemsJavaExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="frequent-items-sketch-java-example">Frequent Items Sketch Java
Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.FileInputStream;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.datasketches.memory.Memory;
import org.apache.datasketches.ArrayOfStringsSerDe;
diff --git a/output/docs/Frequency/FrequentItemsPigUDFs.html
b/output/docs/Frequency/FrequentItemsPigUDFs.html
index 20610745..582f62b1 100644
--- a/output/docs/Frequency/FrequentItemsPigUDFs.html
+++ b/output/docs/Frequency/FrequentItemsPigUDFs.html
@@ -356,7 +356,7 @@
<h3 id="frequent_itemspig-script">frequent_items.pig script</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -391,24 +391,24 @@ dump g;
<p>From ‘dump d’ (one sketch per category):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(c1,{(a,7,7,7),(d,2,2,2),(b,1,1,1)})
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(c1,{(a,7,7,7),(d,2,2,2),(b,1,1,1)})
(c2,{(a,5,5,5),(d,2,2,2),(e,1,1,1),(c,1,1,1)})
</code></pre></div></div>
<p>From ‘dump g’ (merged across categories):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>({(a,12,12,12),(d,4,4,4),(b,1,1,1),(e,1,1,1),(c,1,1,1)})
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>({(a,12,12,12),(d,4,4,4),(b,1,1,1),(e,1,1,1),(c,1,1,1)})
</code></pre></div></div>
<p>From ‘describe g’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>g: {bag_of_item_tuples: {item_tuple: (item:
chararray,estimate: long,lower_bound: long,upper_bound: long)}}
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>g: {bag_of_item_tuples: {item_tuple: (item:
chararray,estimate: long,lower_bound: long,upper_bound: long)}}
</code></pre></div></div>
<p>In this example the results are exact due to small input.</p>
<h3 id="datatxt-tab-separated"><a href="/docs/Frequency/data.txt">data.txt</a>
(tab separated)</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>a c1
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>a c1
a c1
a c1
a c2
diff --git a/output/docs/Graduation/GraduationCheckList.html
b/output/docs/Graduation/GraduationCheckList.html
index 09f218fc..1add3833 100644
--- a/output/docs/Graduation/GraduationCheckList.html
+++ b/output/docs/Graduation/GraduationCheckList.html
@@ -367,7 +367,7 @@
</ul>
<h2 id="define-a-charter---done">Define a Charter - Done</h2>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Apache DataSketches consists of software related to
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Apache DataSketches consists of software related to
an open source, high-performance library of
streaming algorithms commonly called "sketches" in the data sciences.
Sketches are small, stateful programs that process massive data as a
@@ -537,7 +537,7 @@ PMC Chair Elect, Committer</p>
<p>Resolution draft:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> ESTABLISH THE APACHE DATASKETCHES PROJECT
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> ESTABLISH THE APACHE DATASKETCHES PROJECT
WHEREAS, the Board of Directors deems it to be in the best interests of
the Foundation and consistent with the Foundation's purpose to
diff --git a/output/docs/HLL/HllCppExample.html
b/output/docs/HLL/HllCppExample.html
index 7ec8b9ef..7d1d098a 100644
--- a/output/docs/HLL/HllCppExample.html
+++ b/output/docs/HLL/HllCppExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="hyper-log-log-sketch-c-example">Hyper Log Log Sketch C++ Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>#include <iostream>
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>#include <iostream>
#include <fstream>
#include <hll.hpp>
diff --git a/output/docs/HLL/HllHiveUDFs.html b/output/docs/HLL/HllHiveUDFs.html
index d3ac61c1..6539f668 100644
--- a/output/docs/HLL/HllHiveUDFs.html
+++ b/output/docs/HLL/HllHiveUDFs.html
@@ -345,7 +345,7 @@
<h3 id="building-sketches-computing-unions-and-getting-estimates">Building
sketches, computing unions and getting estimates</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
@@ -383,7 +383,7 @@ Output:
<p>Notice the difference between UnionUDF in this example, which takes two
sketches, and UnionUDAF in the previous example, which is an aggregate function
taking a collection of sketches as one parameter.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
diff --git a/output/docs/HLL/HllJavaExample.html
b/output/docs/HLL/HllJavaExample.html
index 88a33e96..e441e9c9 100644
--- a/output/docs/HLL/HllJavaExample.html
+++ b/output/docs/HLL/HllJavaExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="hyper-log-log-sketch-java-example">Hyper Log Log Sketch Java
Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.FileInputStream;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.datasketches.memory.Memory;
import org.apache.datasketches.hll.HllSketch;
diff --git a/output/docs/HLL/HllMap.html b/output/docs/HLL/HllMap.html
index cb89d594..c80fac89 100644
--- a/output/docs/HLL/HllMap.html
+++ b/output/docs/HLL/HllMap.html
@@ -371,7 +371,7 @@ count estimation (see method
getAverageSketchMemoryPerKey()) is about 10 bytes.<
<p>Assuming that the frequency distribution of your identifiers per key is
roughly power-law with a log-log slope of about -1, the total storage required
for this sketch can be roughly computed as</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Size = (K + S) * T
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Size = (K + S) * T
where K = Key size in bytes
S = Average HLL Sketch Memory Per Key
T = Total number of keys seen by the sketch
diff --git a/output/docs/HLL/HllPigUDFs.html b/output/docs/HLL/HllPigUDFs.html
index 1ca3edf2..9eb88335 100644
--- a/output/docs/HLL/HllPigUDFs.html
+++ b/output/docs/HLL/HllPigUDFs.html
@@ -358,7 +358,7 @@
<h3 id="hllpig-script-build-sketches-union-sketches-and-get-estimates">hll.pig
script: build sketches, union sketches and get estimates</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -399,18 +399,18 @@ Most of the IDs in these categories overlap, so that
there are 60 unique IDs in
<p>Results:
From ‘dump d’ (unique count estimate per category):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,50.00000608464168)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,50.00000608464168)
(b,50.00000608464168)
</code></pre></div></div>
<p>From ‘dump g’ (union across categories):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(60.00000879168661)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(60.00000879168661)
</code></pre></div></div>
<p>From ‘dump h’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(### HLL SKETCH SUMMARY:
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(### HLL SKETCH SUMMARY:
Log Config K : 10
Hll Target : HLL_4
Current Mode : SET
@@ -423,7 +423,7 @@ From ‘dump d’ (unique count estimate per category):</p>
<h3
id="hll_unionpig-script-union-sketches-from-different-columns">hll_union.pig
script: union sketches from different columns</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -451,7 +451,7 @@ dump e;
<p>Result:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(10.000000223517425,12.000000327825557,18.00000075995926)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(10.000000223517425,12.000000327825557,18.00000075995926)
</code></pre></div></div>
</div> <!-- End content -->
diff --git a/output/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
b/output/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
index e6fda2d4..4f799814 100644
--- a/output/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
+++ b/output/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
@@ -376,7 +376,7 @@
<p>The test consisted of 100 trials of merging 8192 sketches into a single
sketch. Each of the input sketches had been updated with 32768 unique values.
The error is computed</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Sketch Size: lgK = 11, k = 2048 True count:
2.68435456E8
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Sketch Size: lgK = 11, k = 2048 True count:
2.68435456E8
Distinct keys per sketch = 32768
Number of sketches = 8192
Number of trials = 100 RSE Specification for this size sketch = 1.03896 /
SQRT(2^11) = 0.023 = +/- 2.3% at 68% confidence.
diff --git a/output/docs/HLL/HllSketches.html b/output/docs/HLL/HllSketches.html
index 4f838ef9..293f1a37 100644
--- a/output/docs/HLL/HllSketches.html
+++ b/output/docs/HLL/HllSketches.html
@@ -377,7 +377,7 @@ accuracy than the Theta Sketches mentioned above, and the
CPC sketch is another
<p>A new HLL sketch is created with a simple constructor:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int lgK = 12; //This is log-base2 of k, so k = 4096.
lgK can be from 4 to 21
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int lgK = 12; //This is log-base2 of k, so k = 4096.
lgK can be from 4 to 21
HllSketch sketch = new HllSketch(lgK); //TgtHllType.HLL_4 is the default
//OR
HllSketch sketch = new HllSketch(lgK, TgtHllType.HLL_6);
@@ -393,7 +393,7 @@ The HLL_4 is the most compressed but generally slightly
slower than the other tw
<p>All three types share the same API. Updating the HllSketch is very
simple:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>long n = 1000000;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>long n = 1000000;
for (int i = 0; i < n; i++) {
sketch.update(i);
}
@@ -405,7 +405,7 @@ initialized with a seed so that the sequence is
deterministic.</p>
<p>Obtaining the cardinality results from the sketch is also simple:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>double estimate = sketch.getEstimate();
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>double estimate = sketch.getEstimate();
double estUB = sketch.getUpperBound(1.0); //the upper bound at 1 standard
deviation.
double estLB = sketch.getLowerBound(1.0); //the lower bound at 1 standard
deviation.
//OR
@@ -414,7 +414,7 @@ System.out.println(sketch.toString()); //will output a
summary of the sketch.
<p>Which produces a console output something like this:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>### HLL SKETCH SUMMARY:
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>### HLL SKETCH SUMMARY:
Log Config K : 12
Hll Target : HLL_4
Current Mode : HLL
@@ -437,9 +437,9 @@ The following plot was generated with <i>LgK</i> = 14 using
2<sup>20</sup> trial
<p>The <i>Factor = 0.8326</i> is directly relatable to the Flajolet alpha
factor of 1.04.
As a result, this plot demonstrates that this implementation of the HLL sketch
will be about 20% = (0.8326/1.04 - 1) more accurate than a conventional HLL
sketch using Flajolet’s estimators (or derived estimators).
-This is partially due to the use of the HIP estimator<sup id="fnref:1"><a
href="#fn:1" class="footnote">1</a></sup> for range above the transition point,
which occurs at about 1500 on the graph.
+This is partially due to the use of the HIP estimator<sup id="fnref:1"
role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>
for range above the transition point, which occurs at about 1500 on the graph.
Below this transition point the accuracy is near zero (an RSE of about 50
ppm), which is far better than any known implementation of HLL.
-This is due to a newly developed theory and estimator developed by Kevin
Lang<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
+This is due to a newly developed theory and estimator developed by Kevin
Lang<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote"
rel="footnote">2</a></sup>.</p>
<p>The base Relative Standard Error (RSE) for this sketch (at LgK = 14) is
0.0065 = 0.8326 / sqrt(2<sup>14</sup>).
The horizontal gridlines are configured to be +/- multiples of the base
RSE.</p>
@@ -509,13 +509,13 @@ or difference operations with reasonable accuracy.
Instead, use the Theta sketc
<hr />
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
<ol>
- <li id="fn:1">
- <p>Edith Cohen, All-Distances Sketches, Revisited: HIP Estimators for
Massive Graphs Analysis, PODS 2014. <a href="#fnref:1"
class="reversefootnote">↩</a></p>
+ <li id="fn:1" role="doc-endnote">
+ <p>Edith Cohen, All-Distances Sketches, Revisited: HIP Estimators for
Massive Graphs Analysis, PODS 2014. <a href="#fnref:1" class="reversefootnote"
role="doc-backlink">↩</a></p>
</li>
- <li id="fn:2">
- <p>Kevin Lang, Back to the Future: an Even More Nearly Optimal
Cardinality Estimation Algorithm. https://arxiv.org/abs/1708.06839 <a
href="#fnref:2" class="reversefootnote">↩</a></p>
+ <li id="fn:2" role="doc-endnote">
+ <p>Kevin Lang, Back to the Future: an Even More Nearly Optimal
Cardinality Estimation Algorithm. https://arxiv.org/abs/1708.06839 <a
href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
diff --git a/output/docs/KLL/KLLCppExample.html
b/output/docs/KLL/KLLCppExample.html
index 6fedd379..deb62b12 100644
--- a/output/docs/KLL/KLLCppExample.html
+++ b/output/docs/KLL/KLLCppExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="kll-sketch-c-example">KLL Sketch C++ Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>#include <iostream>
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>#include <iostream>
#include <fstream>
#include <random>
#include <chrono>
diff --git a/output/docs/KLL/KLLSketch.html b/output/docs/KLL/KLLSketch.html
index 87673dbd..a0ada927 100644
--- a/output/docs/KLL/KLLSketch.html
+++ b/output/docs/KLL/KLLSketch.html
@@ -475,7 +475,7 @@ The classes are organized in an inheritance hierarchy as
follows:</p>
<p><a id="simple-example"></a></p>
<h3 id="simple-java-kll-floats-example">Simple Java KLL Floats example</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.kll.KllFloatsSketch;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.kll.KllFloatsSketch;
KllFloatsSketch sketch = KllFloatsSketch.newHeapInstance();
int n = 1000000;
diff --git a/output/docs/KLL/UnderstandingKLLBounds.html
b/output/docs/KLL/UnderstandingKLLBounds.html
index 68251687..cc15ae5b 100644
--- a/output/docs/KLL/UnderstandingKLLBounds.html
+++ b/output/docs/KLL/UnderstandingKLLBounds.html
@@ -405,7 +405,7 @@
<h2 id="the-output">The Output</h2>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Normalized Rank Error: +/- 0.0067762427270138155
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Normalized Rank Error: +/- 0.0067762427270138155
Absolute Rank Error : +/- 6.776242727013815
q1: 500.0
diff --git a/output/docs/Memory/MemoryPerformance.html
b/output/docs/Memory/MemoryPerformance.html
index 1fc7dfe2..1cde5789 100644
--- a/output/docs/Memory/MemoryPerformance.html
+++ b/output/docs/Memory/MemoryPerformance.html
@@ -349,14 +349,14 @@ Each test of size k was repeated up to millions of times
to reduce noise in the
<p>The inner timing loop for writing into a java array:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> startTime_nS = System.nanoTime();
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> startTime_nS = System.nanoTime();
for (int i=0; i<arrLongs; i++) { array[i] = i; }
stopTime_nS = System.nanoTime();
</code></pre></div></div>
<p>The inner timing loop for reading from that array:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> startTime_nS = System.nanoTime();
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> startTime_nS = System.nanoTime();
for (int i=0; i<arrLongs; i++) { trialSum += array[i]; }
stopTime_nS = System.nanoTime();
</code></pre></div></div>
diff --git a/output/docs/Quantiles/QuantilesHiveUDFs.html
b/output/docs/Quantiles/QuantilesHiveUDFs.html
index 765c9a91..06f1cbba 100644
--- a/output/docs/Quantiles/QuantilesHiveUDFs.html
+++ b/output/docs/Quantiles/QuantilesHiveUDFs.html
@@ -345,7 +345,7 @@
<h3 id="doublessketch-example">DoublesSketch example</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
diff --git a/output/docs/Quantiles/QuantilesJavaExample.html
b/output/docs/Quantiles/QuantilesJavaExample.html
index 78a26281..35501da8 100644
--- a/output/docs/Quantiles/QuantilesJavaExample.html
+++ b/output/docs/Quantiles/QuantilesJavaExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="quantilesdoublessketch-java-example">Quantiles/DoublesSketch Java
Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>// simplified file operations and no error handling for
clarity
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>// simplified file operations and no error handling for
clarity
import java.io.FileInputStream;
import java.io.FileOutputStream;
diff --git a/output/docs/Quantiles/QuantilesPigUDFs.html
b/output/docs/Quantiles/QuantilesPigUDFs.html
index 51d4cf1a..9465c03d 100644
--- a/output/docs/Quantiles/QuantilesPigUDFs.html
+++ b/output/docs/Quantiles/QuantilesPigUDFs.html
@@ -356,7 +356,7 @@
<h3 id="quantilespig-script">quantiles.pig script</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -387,17 +387,17 @@ dump g;
<p>From ‘dump d’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,6.0)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,6.0)
(b,16.0)
</code></pre></div></div>
<p>From ‘dump g’ (merged across categories):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(11.0)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(11.0)
</code></pre></div></div>
<h3 id="datatxt-tab-separated"><a href="/docs/Quantiles/data.txt">data.txt</a>
(tab separated)</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>1 a
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>1 a
2 a
3 a
4 a
diff --git a/output/docs/Quantiles/QuantilesSketchOverview.html
b/output/docs/Quantiles/QuantilesSketchOverview.html
index 9b2447d3..548635d3 100644
--- a/output/docs/Quantiles/QuantilesSketchOverview.html
+++ b/output/docs/Quantiles/QuantilesSketchOverview.html
@@ -373,7 +373,7 @@ Probability Mass Function, getPMF(), and the Cumulative
Distribution Function, g
<p>The relevant pseudo-code snippets would look something like this:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int k = 256; //256 gives < 1% normalized rank error
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int k = 256; //256 gives < 1% normalized rank error
UpdateDoublesSketch sketch = DoublesSketch.builder().setK(k).build();
while ( remainingValuesExist ) { //stream in all the values, one by one
@@ -399,7 +399,7 @@ the upper and lower bounds returned from the sketch.</p>
<p>In this case the values ranged from one to 1.8 million, which is a little
over 6 orders-of-magnitude. (There are zero values in the raw data, which often
happens, but they can be ignored in
this analysis.) In order to plot such a large dynamic range I used a log
X-axis and a plot resolution of 5 points per factor of 10. Then I computed 36
equally spaced (on the log axis) split points with values from 1.0 to 1E7.
These 36 split points are then provided to the getPMF() function:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>double[] splitpoints = {1.00, 1.58, ... , 6.3E6, 1E7};
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>double[] splitpoints = {1.00, 1.58, ... , 6.3E6, 1E7};
double[] pmf = sketch.getPMF(splitpoints);
</code></pre></div></div>
@@ -425,7 +425,7 @@ way off.</p>
<h4 id="median-and-top-quartile">Median and Top Quartile</h4>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>UpdateDoublesSketch qs =
DoublesSketch.builder().build(); //default k = 128
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>UpdateDoublesSketch qs =
DoublesSketch.builder().build(); //default k = 128
for (int i=0; i < 1000000; i++) { //stream length is generally unknown
qs.update(i); //load the sketch
@@ -444,7 +444,7 @@ Median = 500087.0
<h4 id="simple-frequency-histogram">Simple Frequency Histogram</h4>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>UpdateDoublesSketch qs =
DoublesSketch.builder().build(); //default k = 128
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>UpdateDoublesSketch qs =
DoublesSketch.builder().build(); //default k = 128
for (int i=0; i < 1000000; i++) { //stream length is generally unknown
qs.update(i); //load the sketch
@@ -475,7 +475,7 @@ System.out.println(freq + " >= "+ splitPoints[bins-2]);
<h4 id="merging-quantile-sketches">Merging Quantile Sketches</h4>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>UpdateDoublesSketch qs1 =
DoublesSketch.builder().build(); //default k = 128
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>UpdateDoublesSketch qs1 =
DoublesSketch.builder().build(); //default k = 128
UpdateDoublesSketch qs2 = DoublesSketch.builder().build();
long size = 1000000; //generally unknown
for (int i=0; i < size; i++) { //update each value into the sketch
@@ -520,7 +520,7 @@ The task is to figure out how to equally partition your
data.</p>
<p>First you build a comparator for MyItem:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>public class MyComparator implements
java.util.Comparator<MyItem> {
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>public class MyComparator implements
java.util.Comparator<MyItem> {
@Override
public int compare(MyItem item1, MyItem item2) {
//compute equivalent to ...
@@ -533,7 +533,7 @@ The task is to figure out how to equally partition your
data.</p>
Serialization and deserialization is required to move sketch images across
JVMs.
The methods in this class are called by the sketch toByteArray() and sketch
constructor as necessary.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.ArrayOfItemsSerDe;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.ArrayOfItemsSerDe;
import org.apache.datasketches.memory.Memory;
public class ArrayOfMyItemsSerDe extends ArrayOfItemsSerDe<MyItem> {
@@ -561,7 +561,7 @@ public class ArrayOfMyItemsSerDe extends
ArrayOfItemsSerDe<MyItem> {
<p>You are ready to feed all of MyItems into the sketch:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.quantiles.ItemsSketch;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.quantiles.ItemsSketch;
ItemsSketch<MyItem> sketch = ItemsSketch.getInstance(128, new
MyComparator());
while ( remainingItemsExist ) {
@@ -571,7 +571,7 @@ while ( remainingItemsExist ) {
<p>Then obtain the split point values that equally partition the data into 10
partitions.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>double[] rankFractions = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
0.7, 0.8, 0.9};
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>double[] rankFractions = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
0.7, 0.8, 0.9};
MyItem[] itemSplitPoints = sketch.getQuantiles(rankFractions);
</code></pre></div></div>
diff --git a/output/docs/Sampling/ReservoirSamplingJava.html
b/output/docs/Sampling/ReservoirSamplingJava.html
index 67a405f9..d8845f0a 100644
--- a/output/docs/Sampling/ReservoirSamplingJava.html
+++ b/output/docs/Sampling/ReservoirSamplingJava.html
@@ -343,7 +343,7 @@
-->
<h1 id="reservoir-sampling-java-example">Reservoir Sampling Java Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>// simplified file operations and no error handling for
clarity
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>// simplified file operations and no error handling for
clarity
import java.io.File;
import java.io.FileInputStream;
@@ -409,7 +409,7 @@ import org.apache.datasketches.sampling.ReservoirItemsUnion;
<p>Output:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>### ReservoirItemsSketch SUMMARY:
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>### ReservoirItemsSketch SUMMARY:
k : 8192
n : 200000
Current size : 8192
diff --git a/output/docs/Sampling/ReservoirSamplingPigUDFs.html
b/output/docs/Sampling/ReservoirSamplingPigUDFs.html
index 7c047fb0..85c6b61f 100644
--- a/output/docs/Sampling/ReservoirSamplingPigUDFs.html
+++ b/output/docs/Sampling/ReservoirSamplingPigUDFs.html
@@ -356,7 +356,7 @@
<h3 id="reservoir_examplepig-script">reservoir_example.pig script</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -400,18 +400,18 @@ DESCRIBE result;
<p>From ‘DUMP result’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(24,4,{(30.0,h),(7.0,g),(6.0,f),(5.0,e)})
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(24,4,{(30.0,h),(7.0,g),(6.0,f),(5.0,e)})
</code></pre></div></div>
<p>Running this script many, we will see each element appear with equal
probability.</p>
<p>From ‘DESCRIBE result’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>result: {n: long,k: int,samples: {(scale: double,label:
chararray)}}
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>result: {n: long,k: int,samples: {(scale: double,label:
chararray)}}
</code></pre></div></div>
<h3 id="datatxt-tab-separated"><a href="/docs/Sampling/data.txt">data.txt</a>
(tab separated)</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>1.0 a
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>1.0 a
2.0 b
3.0 c
4.0 d
diff --git a/output/docs/Sampling/VarOptPigUDFs.html
b/output/docs/Sampling/VarOptPigUDFs.html
index 2e3dd0b6..e365c457 100644
--- a/output/docs/Sampling/VarOptPigUDFs.html
+++ b/output/docs/Sampling/VarOptPigUDFs.html
@@ -356,7 +356,7 @@
<h3 id="varopt_examplepig-script">varopt_example.pig script</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -405,7 +405,7 @@ DESCRIBE result;
<p>From ‘DUMP result’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(30.0,(30.0,h))
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(30.0,(30.0,h))
(30.0,(30.0,h))
(28.0,(4.0,d))
(28.0,(7.0,g))
@@ -420,11 +420,11 @@ tuple the original weight value is also stored.</p>
<p>From ‘DESCRIBE result’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>result: {vo_weight: double,record: (id:
bytearray,weight: bytearray)}
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>result: {vo_weight: double,record: (id:
bytearray,weight: bytearray)}
</code></pre></div></div>
<h3 id="datatxt-tab-separated"><a href="/docs/Sampling/data.txt">data.txt</a>
(tab separated)</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>1.0 a
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>1.0 a
2.0 b
3.0 c
4.0 d
diff --git a/output/docs/Sampling/VarOptSamplingJava.html
b/output/docs/Sampling/VarOptSamplingJava.html
index ee9dce63..86878ba8 100644
--- a/output/docs/Sampling/VarOptSamplingJava.html
+++ b/output/docs/Sampling/VarOptSamplingJava.html
@@ -348,12 +348,12 @@ Romeo and Juliet and Hamlet. The scripts, available from
various
sources including http://shakespeare.mit.edu/, were converted to
(word, count) files with the following Perl command:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>perl -lane 's/^\s+//; s/[;\.,!?:\x27\[\]&]//g;
s/--//g; s/\s+/\n/g; print lc if length > 0' input.txt | sort | uniq -c |
awk '{print $1 "\t" $2}' > output.txt
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>perl -lane 's/^\s+//; s/[;\.,!?:\x27\[\]&]//g;
s/--//g; s/\s+/\n/g; print lc if length > 0' input.txt | sort | uniq -c |
awk '{print $1 "\t" $2}' > output.txt
</code></pre></div></div>
<p>These were then used in the following example, slightly modified to remove
error handling for clarity. Serialization and deserialization are completely
parallel to the Reservoir Sampling sketch, and example code for that may be
found in those Java examples.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.BufferedReader;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
@@ -407,7 +407,7 @@ private static VarOptItemsSketch<String>
loadFile(final String filename,
<p>Sample Output:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>i 567.0
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>i 567.0
i 580.0
to 737.0
of 667.0
@@ -426,7 +426,7 @@ a 560.5978260869568
<p>Lower bound, estimate, upper bound:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>[594.395954, 2242.391304, 5611.681344]
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>[594.395954, 2242.391304, 5611.681344]
</code></pre></div></div>
</div> <!-- End content -->
diff --git a/output/docs/Sampling/VarOptSamplingSketches.html
b/output/docs/Sampling/VarOptSamplingSketches.html
index 50bf920a..707a4c16 100644
--- a/output/docs/Sampling/VarOptSamplingSketches.html
+++ b/output/docs/Sampling/VarOptSamplingSketches.html
@@ -395,7 +395,7 @@ duplicate items if the input stream contains duplicates.</p>
<h3 id="more-on-varopt-sampling">More on VarOpt Sampling</h3>
-<p>The basic VarOpt algorithm was first presented by Cohen et al<sup
id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>. We have modified and
extended that work to allow for unions of VarOpt samples, producing a result
that remains a valid VarOpt sketch.</p>
+<p>The basic VarOpt algorithm was first presented by Cohen et al<sup
id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote"
rel="footnote">1</a></sup>. We have modified and extended that work to allow
for unions of VarOpt samples, producing a result that remains a valid VarOpt
sketch.</p>
<p>The underlying goal of VarOpt sampling is to provide the best possible
estimate of subset sums of items in the sample. As an example, we might select
a sample o size <tt>k</tt> from the ~3200 counties (a political administrative
region below the level of a state) in the United States, using the county
population as the weight. We could then apply a predicate to our sample – for
instance, counties in the state of California – and sum the resulting weights.
That sum is our estimate of t [...]
@@ -413,10 +413,10 @@ duplicate items if the input stream contains
duplicates.</p>
<p>When using VarOpt with uniform input weights, for instance giving all items
a weight of 1.0, the sketch will perform standard reservoir sampling.
Similarly, if operating over an infinite stream of items with finite weights,
there will also be no heavy items and the sketch will return a standard
weighted sample of items.</p>
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
<ol>
- <li id="fn:1">
- <p>E. Cohen, N. G. Guffield, H. Kapla, C. Lund, M. Thorup, <em>Efficient
Stream Sampling for Variance-Optimal Estimation of Subset Sums</em>, SIAM J.
Comput. 40(5): 1402-1431, 2011. <a href="#fnref:1"
class="reversefootnote">↩</a></p>
+ <li id="fn:1" role="doc-endnote">
+ <p>E. Cohen, N. G. Guffield, H. Kapla, C. Lund, M. Thorup, <em>Efficient
Stream Sampling for Variance-Optimal Estimation of Subset Sums</em>, SIAM J.
Comput. 40(5): 1402-1431, 2011. <a href="#fnref:1" class="reversefootnote"
role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
diff --git a/output/docs/SystemIntegrations/ApachePinotIntegration.html
b/output/docs/SystemIntegrations/ApachePinotIntegration.html
index dd55e31d..e59953ba 100644
--- a/output/docs/SystemIntegrations/ApachePinotIntegration.html
+++ b/output/docs/SystemIntegrations/ApachePinotIntegration.html
@@ -382,7 +382,7 @@
<div class="language-sql highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="k">select</span> <span
class="n">percentileRawKll</span><span class="p">(</span><span
class="n">ArrDelayMinutes</span><span class="p">,</span> <span
class="mi">90</span><span class="p">)</span> <span class="k">as</span> <span
class="n">sketch</span>
<span class="k">from</span> <span class="n">airlineStats</span>
</code></pre></div></div>
-<p>Returns Base64 encoded string: <code
class="highlighter-rouge">BQEPC...</code></p>
+<p>Returns Base64 encoded string: <code class="language-plaintext
highlighter-rouge">BQEPC...</code></p>
<p>Output can be processed as:</p>
diff --git a/output/docs/Theta/ConcurrentThetaSketch.html
b/output/docs/Theta/ConcurrentThetaSketch.html
index 1414549e..182f8eee 100644
--- a/output/docs/Theta/ConcurrentThetaSketch.html
+++ b/output/docs/Theta/ConcurrentThetaSketch.html
@@ -368,7 +368,7 @@ A <em>background propagation thread</em> continuously
merges full local sketches
However, it is important that the shared sketch is only used to get the
estimate, while updates only go through the local sketches.
The shared sketch can be allocated either off-heap or on-heap, while the local
sketch is always allocated on-heap.</p>
-<p>Like other Theta sketches, <code
class="highlighter-rouge">UpdateSketchBuilder</code> is used to build the
shared and local sketches.
+<p>Like other Theta sketches, <code class="language-plaintext
highlighter-rouge">UpdateSketchBuilder</code> is used to build the shared and
local sketches.
It is imperative that the shared sketch is built first.
Then, at the context of an application thread(/s) that feeds the data a local
sketch is created and connected to the shared sketch.
This is a list of the configuration parameters for the builder:</p>
@@ -383,7 +383,7 @@ This is a list of the configuration parameters for the
builder:</p>
<h2 id="code-example-for-building-a-concurrent-theta-sketch">Code Example for
Building a Concurrent Theta Sketch</h2>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import
org.apache.datasketches.memory.WritableDirectHandle;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import
org.apache.datasketches.memory.WritableDirectHandle;
import org.apache.datasketches.memory.WritableMemory;
import org.apache.datasketches.theta.Sketch;
import org.apache.datasketches.theta.UpdateSketch;
@@ -472,12 +472,12 @@ class WriterThread extends Thread {
<h2 id="serializing-a-concurrent-sketch">Serializing a Concurrent Sketch</h2>
<p>A concurrent sketch is not a single unit of computation. It is composed of
the shared sketch and the local buffers.
Only the shared sketch supports serialization as it captures the most
up-to-date content of the sketch.
-In the current implementation, deserializing a shred sketch yields an <code
class="highlighter-rouge">UpdateSketch</code>.
+In the current implementation, deserializing a shred sketch yields an <code
class="language-plaintext highlighter-rouge">UpdateSketch</code>.
Therefore when de-serializing a concurrent sketch both the shared sketch and
the local buffers need to be re-created again.</p>
<h2
id="code-example-for-serializing-and-deserializing-a-concurrent-theta-sketch">Code
Example for Serializing and Deserializing a Concurrent Theta Sketch</h2>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.memory.WritableMemory;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.datasketches.memory.WritableMemory;
import org.apache.datasketches.theta.Sketch;
import org.apache.datasketches.theta.UpdateSketch;
import org.apache.datasketches.theta.UpdateSketchBuilder;
diff --git a/output/docs/Theta/ThetaAccuracyPlots.html
b/output/docs/Theta/ThetaAccuracyPlots.html
index 61753ea9..f8bcbd4d 100644
--- a/output/docs/Theta/ThetaAccuracyPlots.html
+++ b/output/docs/Theta/ThetaAccuracyPlots.html
@@ -347,7 +347,7 @@
<p>A QuickSelect Sketch, which is the default sketch family, can be
constructed with code similar to:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int k = 4096; //the accuracy and
size are a function of k
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>int k = 4096; //the accuracy and
size are a function of k
UpdateSketch sketch = UpdateSketch.builder().build(k); //build an empty sketch
long u = 1000000; //The number of uniques fed to the sketch
diff --git a/output/docs/Theta/ThetaHiveUDFs.html
b/output/docs/Theta/ThetaHiveUDFs.html
index 81c6f746..81de669b 100644
--- a/output/docs/Theta/ThetaHiveUDFs.html
+++ b/output/docs/Theta/ThetaHiveUDFs.html
@@ -345,7 +345,7 @@
<h3 id="building-sketches-merging-sketches-and-getting-estimates">Building
sketches, merging sketches and getting estimates</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
@@ -379,7 +379,7 @@ Output:
<p>Notice the difference between UnionUDF in this example, which takes two
sketches, and UnionUDAF in the previous example, which is an aggregate function
taking a collection of sketches as one parameter. The same is true about
IntersectSketchUDF and IntersectSketchUDAF.</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
diff --git a/output/docs/Theta/ThetaJavaExample.html
b/output/docs/Theta/ThetaJavaExample.html
index 6e3a76f8..403adc0f 100644
--- a/output/docs/Theta/ThetaJavaExample.html
+++ b/output/docs/Theta/ThetaJavaExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="theta-sketch-java-example">Theta Sketch Java Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>// simplified file operations and no error handling for
clarity
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>// simplified file operations and no error handling for
clarity
import java.io.FileInputStream;
import java.io.FileOutputStream;
diff --git a/output/docs/Theta/ThetaPigUDFs.html
b/output/docs/Theta/ThetaPigUDFs.html
index 604b7d13..419a103d 100644
--- a/output/docs/Theta/ThetaPigUDFs.html
+++ b/output/docs/Theta/ThetaPigUDFs.html
@@ -356,7 +356,7 @@
<h3
id="thetapig-script-building-sketches-merging-sketches-and-getting-estimates">theta.pig
script: building sketches, merging sketches and getting estimates</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -392,13 +392,13 @@ Most of the IDs in these categories overlap, so that
there are 60 unique IDs in
<p>Results:
From ‘dump d’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,46.91487058420659)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,46.91487058420659)
(b,46.23988568048073)
</code></pre></div></div>
<p>From ‘dump g’ (merged across categories):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(50.415577215639736)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(50.415577215639736)
</code></pre></div></div>
<p>The expected exact result would be (60.0). The estimate has high relative
error because the sketch was configured with only 32 nominal entries
@@ -406,7 +406,7 @@ to show the estimation mode for the purposes of this
example.</p>
<h3 id="theta_setopspig-script-set-operations-on-sketches">theta_setops.pig
script: set operations on sketches</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0-incubating.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0-incubating.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -442,7 +442,7 @@ dump e;
<p>Result:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(10.0,12.0,18.0,4.0,6.0,8.0)
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(10.0,12.0,18.0,4.0,6.0,8.0)
</code></pre></div></div>
</div> <!-- End content -->
diff --git a/output/docs/Theta/ThetaSetOpsCornerCases.html
b/output/docs/Theta/ThetaSetOpsCornerCases.html
index e3a1dd2d..dc4ebc11 100644
--- a/output/docs/Theta/ThetaSetOpsCornerCases.html
+++ b/output/docs/Theta/ThetaSetOpsCornerCases.html
@@ -437,7 +437,7 @@ The <em>getLowerBound(…)</em> will be zero because it is
also possible that th
<li><em>empty = true := 1, else 0</em></li>
</ul>
-<p>The octal digit <code class="highlighter-rouge">ID = ((theta == 1.0) ? 4 :
0) | ((retainedEntries > 0) ? 2 : 0) | (empty ? 1 : 0);</code></p>
+<p>The octal digit <code class="language-plaintext highlighter-rouge">ID =
((theta == 1.0) ? 4 : 0) | ((retainedEntries > 0) ? 2 : 0) | (empty ? 1 :
0);</code></p>
<table>
<thead>
diff --git a/output/docs/Theta/ThetaSketchSetOps.html
b/output/docs/Theta/ThetaSketchSetOps.html
index 44e8ce70..30c656c0 100644
--- a/output/docs/Theta/ThetaSketchSetOps.html
+++ b/output/docs/Theta/ThetaSketchSetOps.html
@@ -365,7 +365,7 @@ The AnotB operation, however, is asymmetric (i.e., sketch
order sensitive) and n
<p>This is a Java example of all three operations:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import static org.testng.Assert.assertEquals;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import static org.testng.Assert.assertEquals;
import org.testng.annotations.Test;
public class SetOpsExample {
diff --git a/output/docs/Theta/ThetaSketches.html
b/output/docs/Theta/ThetaSketches.html
index 92496016..6aadd764 100644
--- a/output/docs/Theta/ThetaSketches.html
+++ b/output/docs/Theta/ThetaSketches.html
@@ -407,7 +407,7 @@
<p><a id="theta-sketch-framework"></a></p>
<h2 id="theta-sketch-framework">Theta Sketch Framework</h2>
-<p>Theta Sketches are a generalization of the well known <i>K<sup>th</sup>
Minimum Value</i> (KMV) <sup id="fnref:1"><a href="#fn:1"
class="footnote">1</a></sup><sup>,</sup><sup id="fnref:2"><a href="#fn:2"
class="footnote">2</a></sup>
+<p>Theta Sketches are a generalization of the well known <i>K<sup>th</sup>
Minimum Value</i> (KMV) <sup id="fnref:1" role="doc-noteref"><a href="#fn:1"
class="footnote" rel="footnote">1</a></sup><sup>,</sup><sup id="fnref:2"
role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>
sketches in that KMV sketches are a form of Theta Sketch, but not all Theta
Sketches are KMV.</p>
<p>The <a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchFramework.pdf">Theta
Sketch Framework</a> (TSF)
@@ -449,7 +449,7 @@ we are going to create a separate threshold variable and
call it <i>theta (θ)</
This effectively decouples #3 and #4 above from <i>k</i>. When the sketch is
empty <i>θ</i> = 1.0.
After the sketch has filled with <i>k</i> minimum values <i>θ</i> is still
1.0.
When the next incoming unique value must be inserted into the sketch the
<i>(k+1)<sup>th</sup></i>
-minimum value, is assigned to <i>θ</i> and removed from the cache.<sup
id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup></p>
+minimum value, is assigned to <i>θ</i> and removed from the cache.<sup
id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote"
rel="footnote">3</a></sup></p>
<p>Ultimately, it will be the size of <i>S</i>, <i>|S|</i>, that will
determine the stored size of a
sketch, which decouples #2 above from the value <i>k</i>.
@@ -461,16 +461,16 @@ We will discuss the RSE in a later section.</p>
<p><img class="doc-img-full"
src="https://datasketches.apache.org/docs/img/theta/ThetaSketch1.png"
alt="ThetaSketch1" /></p>
-<div class="footnotes">
+<div class="footnotes" role="doc-endnotes">
<ol>
- <li id="fn:1">
- <p>Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan.
Counting distinct elements in a data stream. In <i>Randomization and
Approximation Techniques in Computer Science</i>, pages 1–10. Springer, 2002.
<a href="#fnref:1" class="reversefootnote">↩</a></p>
+ <li id="fn:1" role="doc-endnote">
+ <p>Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan.
Counting distinct elements in a data stream. In <i>Randomization and
Approximation Techniques in Computer Science</i>, pages 1–10. Springer, 2002.
<a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
- <li id="fn:2">
- <p>See <a href="/docs/Theta/InverseEstimate.html">KMV Tutorial</a> for a
brief tutorial on KMV Sketches. <a href="#fnref:2"
class="reversefootnote">↩</a></p>
+ <li id="fn:2" role="doc-endnote">
+ <p>See <a href="/docs/Theta/InverseEstimate.html">KMV Tutorial</a> for a
brief tutorial on KMV Sketches. <a href="#fnref:2" class="reversefootnote"
role="doc-backlink">↩</a></p>
</li>
- <li id="fn:3">
- <p>This is a limited “KMV perspective” on how <i>θ</i> gets assigned.
The attached paper <a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchFramework.pdf">Theta
Sketch Framework</a> presents multiple ways that <i>θ</i> can be assigned
using the <i>Theta Choosing Function (TCF)</i>. Different sketch algorithms
have different TCFs. <a href="#fnref:3" class="reversefootnote">↩</a></p>
+ <li id="fn:3" role="doc-endnote">
+ <p>This is a limited “KMV perspective” on how <i>θ</i> gets assigned.
The attached paper <a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchFramework.pdf">Theta
Sketch Framework</a> presents multiple ways that <i>θ</i> can be assigned
using the <i>Theta Choosing Function (TCF)</i>. Different sketch algorithms
have different TCFs. <a href="#fnref:3" class="reversefootnote"
role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
diff --git a/output/docs/Theta/ThetaSparkExample.html
b/output/docs/Theta/ThetaSparkExample.html
index afbd317f..45fb33e5 100644
--- a/output/docs/Theta/ThetaSparkExample.html
+++ b/output/docs/Theta/ThetaSparkExample.html
@@ -347,7 +347,7 @@
<p>Building one sketch using old Spark API:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.spark.SparkContext;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.spark.SparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.JavaRDD;
@@ -392,7 +392,7 @@ public class Aggregate {
<p>Wrapper to make ThetaSketch serializable:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.ObjectInputStream;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.IOException;
import java.io.Serializable;
@@ -462,7 +462,7 @@ public class ThetaSketchJavaSerializable implements
Serializable {
<p>Building one sketch using new Spark 2.x API and reading input from a Hive
table:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.spark.sql.SparkSession;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
@@ -523,7 +523,7 @@ public class
Spark2DatasetMapPartitionsReduceJavaSerialization {
<p>Building multiple sketches (one sketch per key or dimension):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.spark.SparkContext;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.spark.SparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.JavaRDD;
@@ -606,7 +606,7 @@ public class MapPartitionsToPairReduceByKey {
<p>Building multiple sketches using SparkSession and reading input from a Hive
table:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.spark.sql.SparkSession;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.api.java.JavaPairRDD;
diff --git a/output/docs/Tuple/TupleEngagementExample.html
b/output/docs/Tuple/TupleEngagementExample.html
index 25b8350b..25b666d8 100644
--- a/output/docs/Tuple/TupleEngagementExample.html
+++ b/output/docs/Tuple/TupleEngagementExample.html
@@ -589,7 +589,7 @@ The user system code would call one of these two methods to
update the sketch.
<p>The remainder of the method is just the mechanics of printing out the
results to the console, and computing the error bounds for each row and for the
totals. The output should look something like this:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Engagement Histogram:
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>Engagement Histogram:
Number of Unique Visitors by Number of Days Visited
Days Visited Estimate LB UB
1 98 92 104
diff --git a/output/docs/Tuple/TupleHiveUDFs.html
b/output/docs/Tuple/TupleHiveUDFs.html
index 03026663..29e33fde 100644
--- a/output/docs/Tuple/TupleHiveUDFs.html
+++ b/output/docs/Tuple/TupleHiveUDFs.html
@@ -343,7 +343,7 @@
-->
<h2 id="tuple-sketch-hive-udfs">Tuple Sketch Hive UDFs</h2>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>add jar datasketches-memory-2.0.0.jar;
add jar datasketches-java-3.1.0.jar;
add jar datasketches-hive-1.2.0.jar;
diff --git a/output/docs/Tuple/TupleJavaExample.html
b/output/docs/Tuple/TupleJavaExample.html
index 6070a3dd..bda04c7f 100644
--- a/output/docs/Tuple/TupleJavaExample.html
+++ b/output/docs/Tuple/TupleJavaExample.html
@@ -343,7 +343,7 @@
-->
<h1 id="tuple-sketch-java-example">Tuple Sketch Java Example</h1>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>// simplified file operations and no error handling for
clarity
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>// simplified file operations and no error handling for
clarity
import java.io.FileInputStream;
import java.io.FileOutputStream;
diff --git a/output/docs/Tuple/TuplePigUDFs.html
b/output/docs/Tuple/TuplePigUDFs.html
index c4422eaf..82775368 100644
--- a/output/docs/Tuple/TuplePigUDFs.html
+++ b/output/docs/Tuple/TuplePigUDFs.html
@@ -356,7 +356,7 @@
<h3 id="tuplepig-script">tuple.pig script</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>register datasketches-memory-2.0.0.jar;
register datasketches-java-3.1.0.jar;
register datasketches-pig-1.1.0.jar;
@@ -394,13 +394,13 @@ The numeric parameter has the sum of 500.0 per category,
and 1000.0 in total.</p
<p>Results:
From ‘dump d’:</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,(50.0,500.0))
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>(a,(50.0,500.0))
(b,(50.0,500.0))
</code></pre></div></div>
<p>From ‘dump g’ (merged across categories):</p>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>((50.415577215639736,846.0364051499544))
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>((50.415577215639736,846.0364051499544))
</code></pre></div></div>
<p>In this example the sketches have a small size of 32 nominal entries. From
the first dump one can see that the sketches are still in the exact counting
mode.
@@ -408,7 +408,7 @@ Merging across categories pushed the resulting sketch into
the estimation mode.
The estimates have high relative error because the sketch was configured with
only 32 nominal entries.</p>
<h3 id="datatxt-tab-separated"><a href="/docs/Tuple/data.txt">data.txt</a>
(tab separated)</h3>
-<div class="highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>01 a 10.0
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>01 a 10.0
02 a 10.0
03 a 9.0
04 a 11.0
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]