This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 1608270b Automatic Site Publish by Buildbot
1608270b is described below

commit 1608270b1358d6c05d482f21933e32cd3492a83b
Author: buildbot <[email protected]>
AuthorDate: Sat Mar 2 23:01:57 2024 +0000

    Automatic Site Publish by Buildbot
---
 output/docs/Architecture/LargeScale.html | 55 ++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 9 deletions(-)

diff --git a/output/docs/Architecture/LargeScale.html 
b/output/docs/Architecture/LargeScale.html
index 9689ea0a..97a8be7c 100644
--- a/output/docs/Architecture/LargeScale.html
+++ b/output/docs/Architecture/LargeScale.html
@@ -514,12 +514,52 @@
 -->
 <h2 id="designed-for-large-scale-computing-systems">Designed for Large-scale 
Computing Systems</h2>
 
+<h4 id="multiple-languages">Multiple Languages</h4>
+
+<ul>
+  <li>The DataSketches library is now available in three languages, Java, C++, 
and Python. A fourth language, Go, is in development.</li>
+</ul>
+
+<h3 
id="compatibility-across-languages-software-versions-and-binary-serialization-versions">Compatibility
 Across Languages, Software Versions And Binary Serialization Versions</h3>
+<p>Large-scale computing environments may have a mix of various platforms 
utilizing different programming languages each with the possiblity of using 
different Software Versions of our DataSketches library.  Cross version 
compatibility of software is a challenge that all platforms face in general, 
and it is up to the platform maintainers to keep their software up-to-date. 
This not new and not different with the DataSketches library.</p>
+
+<p>Nonetheless, it our goal to strive to make it as easy as practically 
possible to serialize our sketches in one of our supported languages on one 
platform and to be deserialized in a different supported language, potentially 
on a different, even remote platform, and perhaps much later in time.</p>
+
+<p>With this goal in mind, here are some of the key strategic decisions we 
have made in the development of the DataSketches library.</p>
+
+<h4 id="two-levels-of-versioning">Two levels of versioning.</h4>
+
+<ul>
+  <li>
+    <p><strong>Software Version:</strong> This is the release version, 
published via Apache.org and specified in the POM file or equivalent. This can 
change relatively frequently based on bug fixes and introduction of new 
capabilities. We follow the principles of <em>Semantic Versioning</em> as 
specified by <a href="https://semver.org";>semver.org</a>.</p>
+  </li>
+  <li>
+    <p><strong>Serialization Version:</strong> (<em>SerVer</em>) This is a 
small integer placed in the preamble of the serialized byte array that 
indicates the version of the serialized structure for the sketch. This is very 
similar to Java’s <a 
href="https://en.wikipedia.org/wiki/Java_class_file";><em>Class File Format 
Version</em></a>. A single <em>SerVer</em> may represent multiple structures 
all based on the same sketch when stored in different states, e.g., <em>Single 
Item</em>, <em> [...]
+  </li>
+</ul>
+
+<p>From the user’s perspective, as long as the <em>SerVer</em> is the same, 
older <em>Software Versions</em> should be able to read sketch images created 
by newer <em>Software Versions</em>. But the APIs may be different, obviously. 
An older <em>Software Version</em> will not be able to take advantage of new 
features introduced in new <em>Software Versions</em>, but it should be able to 
do what it did before. In other words, there will be no loss of access to the 
serialized sketch and th [...]
+
+<p>Sketches requiring user-written custom serialize/deserialize code rely on 
users to port that custom code themselves for cross-version, cross-language, 
and cross-platform compatibility.</p>
+
+<h4 id="the-serialized-image-of-a-sketch">The Serialized Image of a Sketch</h4>
+<ul>
+  <li>The structure (or image) of a serialized sketch is independent of the 
language from which it was created.</li>
+  <li>The sketch image only contains little-endian primitives, such as int64, 
int32, int16, int8, double-64, float-32, UTF-8 strings, and simple array 
structures of those. While these serialized primitives between languages may 
not be strictly equal they can be interpreted to be logically equivalent. We do 
not support big-endian serialization.</li>
+  <li>The sketch image is unique for each type of sketch.</li>
+  <li>Simply speaking, a sketch image can be viewed as a blob of bytes, which 
is easily stored and easily transported using many different protocols, 
including Protobuf, Avro, Thrift, Byte64, etc.</li>
+</ul>
+
+<p>As a result, sketches serialized in one supported language can be 
interpreted by a different supported language, with the caveat that due to 
language differences, availability of resources, and time to develop, not all 
sketches may be available in all languages at the same time.</p>
+
 <h3 id="easy-integration-with-minimal-dependencies">Easy Integration with 
Minimal Dependencies</h3>
+<p>We strive to make our sketch library easy to integrate into larger systems 
by keeping the number of external dependencies at a minimum.</p>
+
 <ul>
   <li><a 
href="https://datasketches.apache.org/docs/Community/Downloads.html";>Java 
Core</a>
     <ul>
       <li>The Java core library (including Memory) has no dependencies outside 
of the Java JVM at runtime allowing simple integration into virtually any Java 
based system environment.</li>
-      <li>All of the Java components are Maven Deployable and registered with 
<a 
href="https://search.maven.org/classic/#search%7Cga%7C1%7Cg%3A%22org.apache.datasketches%22";>The
 Central Repository</a></li>
+      <li>All of the Java components and artifacts are Maven Deployable and 
registered with <a 
href="https://search.maven.org/classic/#search%7Cga%7C1%7Cg%3A%22org.apache.datasketches%22";>The
 Central Repository</a></li>
     </ul>
   </li>
   <li><a 
href="https://datasketches.apache.org/docs/Community/Downloads.html";>C++ 
Core</a>
@@ -534,11 +574,6 @@
   </li>
 </ul>
 
-<h3 id="cross-language-binary-compatibility">Cross Language Binary 
Compatibility</h3>
-<ul>
-  <li>Sketches serialized from C++ or Python can be interpreted by compatible 
Java sketches and visa versa.</li>
-</ul>
-
 <h3 id="speed">Speed</h3>
 <ul>
   <li>
@@ -555,7 +590,9 @@
   </li>
 </ul>
 
-<h3 id="systems-integrations">Systems Integrations</h3>
+<h3 id="system-integrations">System Integrations</h3>
+<p>The following are system integrations that we have been involved with, but 
there are many more platform integrations out there that were performed by the 
individual platform teams.</p>
+
 <ul>
   <li>
     <p><a 
href="https://datasketches.apache.org/docs/SystemIntegrations/ApacheDruidIntegration.html";>Druid
 Integration</a></p>
@@ -592,7 +629,7 @@ The Java sketches utilize this powerful component.</p>
   <li>
     <p>Built-in <b>Upper-Bound and Lower-Bound estimators</b>. 
 You are never in the dark about how good of an estimate the sketch is 
providing. 
-All the sketches are able to estimate the upper and lower bounds of the 
estimate given a 
+Nearly all the sketches are able to estimate the upper and lower bounds of the 
estimate given a 
 confidence level.</p>
   </li>
   <li>
@@ -600,7 +637,7 @@ confidence level.</p>
 tuning options.</p>
   </li>
   <li>
-    <p><b>Small Footprint Per Sketch</b>. The operating and storage footprint 
for both 
+    <p><b>Small Footprint Per Sketch</b>. The in-memory run-time and storage 
footprint for both 
 row and column oriented storage are minimized with compact binary 
representations, which are much smaller 
 than the raw input stream and with a well defined upper bound of size.</p>
   </li>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to