(datasketches-website) branch master updated: Fix spelling errors.

leerho Mon, 12 Jan 2026 22:36:08 -0800

This is an automated email from the ASF dual-hosted git repository.

leerho pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git



The following commit(s) were added to refs/heads/master by this push:
     new f5c758ee Fix spelling errors.
f5c758ee is described below

commit f5c758ee0ad81896cd5ddd5c03950c9d447c2cee
Author: Lee Rhodes <[email protected]>
AuthorDate: Mon Jan 12 22:35:43 2026 -0800

    Fix spelling errors.
---
 docs/Frequency/FrequentDistinctTuplesSketch.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/Frequency/FrequentDistinctTuplesSketch.md 
b/docs/Frequency/FrequentDistinctTuplesSketch.md
index 20cb083c..dfc8eb3c 100644
--- a/docs/Frequency/FrequentDistinctTuplesSketch.md
+++ b/docs/Frequency/FrequentDistinctTuplesSketch.md
@@ -40,14 +40,14 @@ of the <i>N - M</i> non-primary dimensions.
 * __Equal Distribution Threshold__
 
 Suppose we have a stream of 160 items where the stream consists of four item 
types: A, B, C, and D.
-If the distribution of occurances was shared equally across the four items 
each would
+If the distribution of occurrences was shared equally across the four items 
each would
 occur exactly 40 times or 25% of the total distribution of 160 items. Thus the 
equally distributed
 (or fair share) <i>threshold</i> would be 25% or as a fraction 0.25. 
 
 * __Most Frequent__
 
 We define <i>Most Frequent</i> items as those that consume more than the fair 
share threshold of the
-total occurances (also called the <i>weight</i>) of the entire stream. 
+total occurrences (also called the <i>weight</i>) of the entire stream. 
 
 Suppose we have a stream of 160 items where the stream consists of four item 
types: A, B, C, and D,
 which have the following frequency distribution: 
@@ -61,7 +61,7 @@ We would declare that A is the most frequent and B is the 
next most frequent. We
 declare C and D in a list of most frequent items since their respective 
frequencies are below 
 the threshold of 40 or 25%. 
 
-If all items occured with a frequency of 40, we could not declare 
+If all items occurred with a frequency of 40, we could not declare 
 any item as most frequent. Requesting a list of the "Top 4" items could be a 
list of the 4 items in any random
 order, or a list of zero items, depending on policy.
 
@@ -104,7 +104,7 @@ In this implementation the input tuples presented to the 
sketch are string array
 
 ### Using the FdtSketch
 
-Let's leverate the challenge at the beginning to crete a concrete example. 
+Let's leverage the challenge at the beginning to create a concrete example. 
 Let's assume <i>N = 2</i> and let <i>d1 := IP address</i>, and <i>d2 := User 
ID</i>.
 
 If we choose <i>{d1}</i> as the Primary Keys, then the sketch will allow us to 
identify the
@@ -133,7 +133,7 @@ We are done populating the sketch, now we post process the 
data in the sketch:
     int[] priKeyIndices = new int[] {0}; //identifies the IP address as the 
primary key
     int numStdDev = 2; //for 95% confidence intervals
     int limit = 20; //list only the top 20 groups
-    char sep = '|'; //the separator charactor for the group dimensions as 
strings
+    char sep = '|'; //the separator character for the group dimensions as 
strings
     List<Group> list = sketch.getResult(priKeyIndices, limit, numStdDev, sep);
     System.out.println(Group.getHeader())
     Iterator<Group> itr = list.iterator()
@@ -183,7 +183,7 @@ The Y-axis is the relative error.
 
 The blue dots represent the error of a single group from the top 500 groups. 
Not all of the top 500 groups are shown on the graph as number of them had true 
cardinalities of less than 256. Also many of the dots represent multiple groups 
since groups with the same Count and the same true cardinality will result in 
the same exact computed error, thus plotted at the same exact point.
 
-The red line is the contour of the quantile(0.84) points of the error 
distribution at each point along the X-axis. This quantile contour would be 
equivalent to the +1 standard deviation from the mean of a Gaussian 
distribution. But since these are quantile measurements of the actual error 
distribution there is no assuption whatsoever that the error distribution is 
Gaussian.  It is just a convenient reference contour. Similarly the black line 
is the contour of the quantile(0.159), which c [...]
+The red line is the contour of the quantile(0.84) points of the error 
distribution at each point along the X-axis. This quantile contour would be 
equivalent to the +1 standard deviation from the mean of a Gaussian 
distribution. But since these are quantile measurements of the actual error 
distribution there is no assumption whatsoever that the error distribution is 
Gaussian.  It is just a convenient reference contour. Similarly the black line 
is the contour of the quantile(0.159), which  [...]
 
 The following table is the list of the top 10 results from just one of the 
trials. The Group class was extended to include more columns at the end which 
were useful for this study. (This was easy to do and does not require any 
special access.)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datasketches-website) branch master updated: Fix spelling errors.

Reply via email to