This is an automated email from the ASF dual-hosted git repository.

leerho pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 8cff4668 Add Density Sketch Documentation
8cff4668 is described below

commit 8cff46682047ff572266a5c5fa784a091064e9ba
Author: Lee Rhodes <[email protected]>
AuthorDate: Sat Jan 24 16:47:55 2026 -0800

    Add Density Sketch Documentation
---
 _includes/toc.html                 |  7 ++++
 docs/Density/DensitySketch.md      | 66 ++++++++++++++++++++++++++++++++++++++
 src/main/resources/docgen/toc.json |  6 ++++
 3 files changed, 79 insertions(+)

diff --git a/_includes/toc.html b/_includes/toc.html
index 23550378..31def6e4 100644
--- a/_includes/toc.html
+++ b/_includes/toc.html
@@ -82,6 +82,13 @@
       <li><a 
href="{{site.docs_dir}}/Sampling/ReservoirSamplingSketches.html">•Reservoir 
Sampling Sketches</a></li>
       <li><a 
href="{{site.docs_dir}}/Sampling/VarOptSamplingSketches.html">•VarOpt Sampling 
Sketches</a></li>
     </div>
+
+    <p id="density-&-coresets">
+      <a data-toggle="collapse" class="menu collapsed" 
href="#collapse_density_&_coresets">Density & Coresets</a>
+    </p>
+    <div class="collapse" id="collapse_density_&_coresets">
+      <li><a href="{{site.docs_dir}}/Density/DensitySketch.html">•Density 
Sketch</a></li>
+    </div>
   </div>
 
   <p id="system-integrations">
diff --git a/docs/Density/DensitySketch.md b/docs/Density/DensitySketch.md
new file mode 100644
index 00000000..357c0480
--- /dev/null
+++ b/docs/Density/DensitySketch.md
@@ -0,0 +1,66 @@
+---
+layout: doc_page
+---
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+## Contents
+<!-- TOC -->
+* [Density Sketch](#density-sketch)
+    * [Foundation Paper](#paper)
+        * [Key Highlights](#highlights)
+        * [Practical Implications](#implications)
+    * [Inspirational Code, Example and Tests](#inspiration)
+<!-- TOC -->
+
+<a id="density-sketch"></a>
+## DensitySketch
+**Quick summary:** This sketch builds a coreset from the given set of input 
points as multi-dimensional vectors. Provides density estimate at a given point.
+
+<a id="paper"></a>
+### Our implementation is based on the following paper:<br>
+
+* Zohar Karnin, Edo Liberty "Discrepancy, Coresets, and Sketches in Machine 
Learning"
+https://proceedings.mlr.press/v99/karnin19a/karnin19a.pdf
+    * The paper explores the relationship between class discrepancy and the 
efficiency of data reduction techniques like coresets and streaming sketches.
+    * The core contribution is a general framework for creating small, 
representative subsets of data (coresets) that maintain the statistical 
properties of the original large dataset.
+
+<a id="highlights"></a>
+#### Key Highlights:
+* **New Complexity Measure:** The authors define "class discrepancy" as a way 
to characterize the coreset complexity of different function families, similar 
to how Rademacher complexity is used for generalization.
+* **Improved Coreset Sizes:** They prove the existence of 
$\epsilon$-approximation coresets of size $O(\sqrt{d}/\epsilon)$ for several 
common machine learning problems, including:
+    * Logistic regression
+    * Sigmoid activation loss
+    * Matrix covariance
+    * Kernel density estimation
+* **Gaussian Kernel Resolution:** The paper resolves a long-standing open 
problem by matching the lower bound for the coreset complexity of Gaussian 
kernel density estimation at $O(\sqrt{d}/\epsilon)$.
+* **Streaming Algorithms:** It introduces an exponential improvement to the 
"merge-and-reduce" trick, leading to better streaming sketches for any problem 
with low discrepancy.
+* **Deterministic Algorithm:** The authors provide a simple, deterministic 
algorithm for finding low-discrepancy sequences and coresets for any positive 
semi-definite kernel.
+
+<a id="implications"></a>
+#### Practical Implications:
+The findings allow for significantly faster optimization in large-scale 
machine learning. By reducing a massive dataset into a much smaller coreset, 
researchers can perform complex calculations (like training a logistic 
regression model) with a fraction of the computational cost while maintaining a 
high level of accuracy.
+
+<a id="inspiration"></a>
+### Our implementations was inspired by the following implementation, example, 
and tests by Edo Liberty:
+* **Code:** 
https://github.com/edoliberty/streaming-quantiles/blob/f688c8161a25582457b0a09deb4630a81406293b/gde.py
+* **Example** 
https://github.com/edoliberty/streaming-quantiles/blob/f688c8161a25582457b0a09deb4630a81406293b/gde_example_usage.ipynb
+* **Tests** 
https://github.com/edoliberty/streaming-quantiles/blob/f688c8161a25582457b0a09deb4630a81406293b/gde_test.py
+
+
+*(Note: much of this overview was generated by Gemini AI, it may contain 
errors.)*
diff --git a/src/main/resources/docgen/toc.json 
b/src/main/resources/docgen/toc.json
index a11ba460..d451ee26 100644
--- a/src/main/resources/docgen/toc.json
+++ b/src/main/resources/docgen/toc.json
@@ -68,6 +68,12 @@
             {"class":"Doc",  "desc" : "VarOpt Sampling Sketches",              
    "dir" : "Sampling", "file": "VarOptSamplingSketches" },
           ]
         },
+
+        { "class":"Dropdown", "desc" : "Density & Coresets", "array":
+          [
+            {"class":"Doc",  "desc" : "Density Sketch",                        
    "dir" : "Density", "file": "DensitySketch" },
+          ]
+        },
       ]
     },
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to