Champion and Mentors wanted for new potential Apache SST incubator project

Lothar Klein Wed, 09 Jun 2021 05:09:48 -0700

Dear IPMC members,

We are looking for a Champion and Mentors for the new SST incubator project 
proposal; see below.
So far I contacted Justin Mclean and Christofer Dutz and they are considering 
to become mentors.


Best Regards
Lothar


=== Abstract ===

SST (Semantic STEP Technology) is an optimized RDF/OWL API with
GIT like revision control capabilities as an enabler for a new
comprehensive CAD/CAx data model derived from ISO 10303 (STEP)
and ISO 15926 (Oil & Gas) standards.

=== Rationale: The Problem to solve ===

There are many different kinds of Computer-aided technology
(CAx) systems around. Some are listed here:
=> https://en.wikipedia.org/wiki/Computer-aided_technologies

For all organizations that are involved in the design,
production, maintenance and commercial use of industrial
products a smooth exchange of data between these tools would be
needed. Also please have in mind that for most industrial
production a big supply network has be coordinated; each with
it’s own specific tools.

The reality is that data exchange between these system is often
not possible and if available the interface tools are
incomplete as they exchange only some, but not all needed data.
The problem get even worse as often the systems have to be used
in a cyclic iterative way to deal with design changes. The
truth is also that for all CAx systems a big part of the
software development power (if not the biggest part) is spent
on the interfaces to other system, not on the core
functionality the system provides.

STEP is a big attempt to address this issue. In the
introduction of ISO 10303-1:1994 "Overview and fundamental
principles" and extended in the 2021 edition we can read:

"ISO 10303 is an International Standard for the
computer-interpretable representation of product information
and for the exchange of product data. The objective is to
provide a neutral mechanism capable of describing products
throughout their life cycle. This mechanism is suitable not
only for neutral file exchange, but also as a basis for
implementing and sharing product databases, and as a basis for
archiving. The information generated about a product during its
design, manufacture, use, maintenance, and disposal is used for
many purposes. The use can involve many computer systems,
including some that can be located in different organizations.
In order to support such uses, organizations need to be able to
represent their product information in a common
computer-interpretable form that is required to remain complete
and consistent when exchanged among different computer
systems."
=> https://www.iso.org/obp/ui/#iso:std:iso:10303:-1:ed-2:v1:en

In the introduction of ISO 15926-1:2004 "Overview and
fundamental principles" we can see slightly different but
highly overlapping goals:
=> https://www.iso.org/obp/ui/#iso:std:iso:15926:-1:ed-1:v1:en

The goals of the ISO 10303 and 15926 series of standards are
only very partially achieved in practice today.

The aim of the SST project is to make the visions of these
standards a reality by providing an open source environment
that is suitable to directly operate on distributed data on servers
managed by different organizations.

=== Background ===

ISO 10303 and ISO 15926 and other series of standards are
developed by the ISO Technical Committee 184: Automation
Systems and Integration, Subcommittee 4: Industrial data. In
its resolution number 9 from March 1985 the overall goal was
expressed as: "To develop as soon as possible a single
international standard for the exchange of product definition
data to be called the standard for the exchange of product
model data (STEP)." It took almost 10 years (1994/95) till the
publication of the first set of standards. The geometric data
models and assembly structures developed back then are today
widely implemented by CAD and other systems. Since then the
STEP standard grew constantly with lots of new application
areas and layers of abstraction; but with only a few of the new
capabilities really implemented and widely used. For upward
compatibility reasons the originals data models where kept in
place with only minor extensions.

In parallel to the development of STEP, the process industry
was searching for ways to capture the life-cycle data for their
process plants. As ISO 10303 misses to capture the things
around us in a clear and precise way they developed the ISO
15926 series of standards that are based on a 4D data model,
allowing to fully capture things in time and space. The generic
data model in part 2 was released in 2003, and now there is
part 12 with a mapping to OWL:
ISO/TS 15926-12:2018 Industrial automation systems and
integration — Integration of life-cycle data for process plants
including oil and gas production facilities — Part 12:
Life-cycle integration ontology represented in Web Ontology
Language (OWL)

There had been a number of attempts to implement ISO 10303 and
ISO 15926 on the basis of OWL/RDF technologies; but they all
have in common that their performance is relatively low. This
is on one hand caused by the way on how the data models are
mapped to OWL, but are also caused by APIs such as Jena that
are not optimized for this kind of data.

=== Proposal ===

The Semantic STEP Technology (SST) establishes a novel way to
operate with industrial product data as it is needed in
CAD/CAM/CAx systems, part libraries, product live cycle system
and similar product management systems. SST focuses to record
and integrate all aspects of particular products such as
machines, vehicles, buildings, electronics, piece parts in all
their life cycle faces (e.g. requirements, design,
manufacturing and planning, usage, maintenance, dismantling).
Data is recorded and integrated in a computer sensible way so
that computer systems can process the data and make decisions
on what to do without direct human intervention. SST has a
small memory footprint, allows fast traversing of application
data in any direction and is implemented in the GO programming
language. SST is optimized for web services and cloud
computing, but is also suitable for traditional single user
applications.

The SST data models directly support data models that are
defined in the ISO 10303 series of standards. ISO 10303 is
widely known as STEP which stands for "STandard for the
Exchange of Product model data". Unfortunately there are some
disadvantages with STEP; primarily week semantic definitions,
several modelling layers that are more or less replicating the
lower layers in some modified way and widely overlapping
Application Protocols (AP); resulting in a standard that is
much more complex than it needs to be.

To address the issue with week semantic definitions in STEP,
the SST data model is founded on the 4D data model of ISO
15926, the standard for "Integration of life-cycle data for
process plants including oil and gas production facilities".
The 4D data model is a superior model to support all
disciplines, supply chain company types and life cycle stages,
regarding information about functional requirements, physical
solutions, types of objects and individual objects as well as
activities. However ISO 15926 misses the design specific
aspects that STEP is focusing on to a big degree. A
disadvantages of ISO 15926 is that it fully relies on class
membership to state that several objects share common
characteristics. This results in wide use of 2nd and 3rd degree
higher level classes (named such as class_of_class_of_xxx) that
are quite hard to follow and that are not very suitable for
practical data exchange. SST replaces most of these higher
level classes and relationships by modelling most application
objects as individuals, including designs. Instead of using
class membership, SST is using the "isDefinedBy" relationship
to state that e.g. a physical individual is defined by a design
individual (knowing that there are always some smaller or
bigger deviations).

For the implementation basis SST relies on the Semantic Web
standards from W3C such as RDF RDFS and OWL. Basis is the
mapping provided in part 12 of ISO 15926, "Life-cycle
integration ontology represented in Web Ontology Language
(OWL)". SST is using this mapping, but is adding additional
constraints to support efficient implementations. So not every
RDF data set is suitable to be managed by SST. The main
constraints are:
* every subject of a triple must have a base URI that is the
  same as the NamedGraph it is used in
* splitting of data into NamedGraph for each application object
  that can be managed independently (e.g. a part of a part
  library, an organization, an assembly design)
* NamedGraph have to import import other NamedGraphs for needed
  application objects by owl:imports (e.g. an assembly design
  imports parts)
* NamedGraph are stored internally in a binary normalized
  (canonical) format that is optimized for very fast loading,
  saving, diff and merge operations
* application data uses for the base URIs random UUID-URNs and
  also random UUIDs for the fragments
* extensive use of punning, even for application data
  =>  https://www.w3.org/TR/owl2-new-features/#F12:_Punning
* revision control is realized for Namespaces (a collection of
  NamedGraph), very similar to GIT
* replication of Namespaces and history to other servers
  realized very similar to GIT

=== Initial Goals or the Mission ===

To turn SST into an Apache top level project, the following
features should be implemented for a 1.0 release:
* Data Model adoptions to OWL and integration in SST-dictionary
  for early binding:
  -- ISO 15926-12
  -- STEP integrated resources for representation (part43),
     geometry & topology (part 42) and more
  -- core parts of the STEP AP242ed2 DO-model
  -- ISO 80000 Quantities and units
  -- formal SST schema with full constraints
     (similar to the capabilities Express but for OWL)
* Internal data storage in binary form:
  -- NamedGraph
  -- Diff between two NamedGraph
  -- Complete/partial history of a NamedGraph
    (including several NamedGraphs and Diffs sections)
  -- Complete stage (all NamedGraph together)
* CRUD API:
  -- late binding (references to resources are not checked)
  -- early binding (references to resources are checked through
     compiled dictionary)
  -- undo / redo since last commit
* GIT like functionality:
  -- user identification, authorization, authentication;
  -- storage of Commit (user, timestamp, comment, Namespace
     with all involved NamedGraphs);
  -- storage of versions of NamedGraphs and Diffs under
     their checksum (SHA1 or SHA256);
  -- replication of Namespaces with history to other
     repositories (via packed file);
  -- delta update to/from remote repository
* Tools:
  -- compiler to convert schema level ontologies into
     early binding applications
  -- schema level validation; e.g. using SHACL
  -- generic filtering and extraction, e.g. using SPARQL
  -- filling NoSQL DB with derived information for general
     searching
* Import & Export:
  -- Turtle (*.ttl) files
  -- STEP part 21 files along AP242 (MIM level)
     for geometric models and assemblies
  -- STEP AP242 XML files for certain application areas
* Applications:
  -- generic web-based viewer and editor to SST data
  -- generic STEP viewer for parts, assembly structures
  -- 3D viewer of already tessellated data

=== Current Status ===

After many years of working in this field and trying out new
approaches, the real implementation of this proposal in the GO
programming language started from scratch in January 2021 by a
core team of two persons.

For end of June 2021 a full functional prototype is available
consisting of the:
* core API in late and early binding with complete CRUD
  functionality
* binary RDF file handling for NamedGraphs and Diffs
* import & export of Turtle files (*.ttl)
* STEP AP242 XML import
* a generic Web-Viewer onto the data
* and more is available

These capabilities are sufficient to start with the
implementation of first SST applications.

Next major things to implement are generic query capabilities
(SPARQL), testing for schema level conformance (e.g. SHACL),
GIT like functionalities for merging and replication, NoSQL DB
adoption for querying derived information.

** Meritocracy **
So far the SST project grew up in the non public by two
persons; each having a different focus. It is clear that this
phase is ending when SST is turned into an Apache incubator
project.

There are many potential SST extensions and application areas
inside and outside of the ASF that can only be managed by a
real community.

** Community **
In the early days of STEP many universities, research
institutes and CAD related companies where involved in the
development and implementation of the STEP standard. This is
partly documented in the book "The Grand Experience" from
Sharon J. Kemmerer, NIST, 1999. But as the implementations got
more mature, the strong growth in complexity of the standard
and the endless time needed to get consensus and run through
the standardization process, the number of participants got
smaller and smaller.

Today, as most of the original planned capabilities had been
finally standardized, and there is a new way to implement STEP
using RDF/OWL on the basis of a major parts of ISO/TS 15926-12,
there is a good chance that a new community will come together;
especially as today the integration of all these data on the
Internet is so much demanded.
There are already several communities around for which SST may
play a role, e.g.:
* MBx / CAx-IF implementor forum
  => https://www.cax-if.org/index.php
* LOTAR, Long Term Archiving and Retrieval
  => https://lotar-international.org/
* POSC Caesar Association
  => https://www.posccaesar.org/

** Core Developers **
The core developers are Vaidas Nargelas for the central GO code
and Lothar Klein for the STEP data model and converter.

** Alignment **
Other than standard GO libraries, SST is not really dependent
on other packages. However SST requires a second NoSQL DB with
derived information for which one of the Apache projects might
be used (maybe Apache Solr). Also it is clear that SST cloud
applications (e.g. micro services) will require several of the
capabilities provided by Apache.
For some potential SST applications a linkage via Apache PLC4X
might be used to directly drive production lines or a single NC
machine from data stored in SST.

** Known Risk **
Failure to set up a developer community

** Project Name **
We think that "Semantic STEP" is an appropriate naming of the
project as it clearly links with the W3C "Semantic Web"
standards and is clearly indicating the application area
"STEP".

Within the code we widely use the prefix to indicate the GO
packages:
* sst for the core API
* ssowl for the OWL ontology
* sslci for the adopted life-cycle ontology from ISO 15926
* ssont for the adopted STEP ontology
* ssrep for the adopted STEP representation models
  (e.g. geometry & topology)
* etc.

There are no trademarks on "Semantic STEP", however there are
several trademarks for "SST". But as these letters are so
generic there should be no conflict when using "Apache SST".

Also there are several web-domains registered such as:
* semantic-step.net and semanticstep.net
* semantic-step.org and semanticstep.org

The sponsoring entity is ready to transfer these domains to ASF
is this proposal is accepted

** Orphaned products **
Both, Lothar Klein (Germany) and Vaidas Nargelas (Lithuania)
have a long term commitment to make SST a success.

** Inexperience with Open Source **
Back in 2013 the committers have already released JSDAI under
the AGPL v3 license (see www.jsdai.net). "J" stands for Java
and "SDAI" stands for the "Standard Data Access Interface";
primarily an API according to:

ISO/TS 10303-27:2000 Industrial automation systems and
integration — Product data representation and exchange
— Part 27: Implementation methods: Java TM programming
language binding to the standard data access interface
with Internet/Intranet extensions
=> https://sourceforge.net/projects/jsdai/
=> https://jsdai.net/

** Length of Incubation **
maybe 6 to 12 month till the point that an initial community
has formed and a first release with all core components is
available.

** Homogenous Developers **
The two initial founders of SST have quite different focuses
and experiences.
It is clear that during the incubation time the focus must be
to build up a community. As the proposers have various contacts
to universities and industry around the world this should be
feasible over some time.
However there is the chicken-egg problem. So first SST needs to
become an official Apache incubator project to make people
interested and ensure them that this is real opens software
with an open community.

** Reliance on Salaried Developers **
So far the SST development is sponsored by LKSoftWare GmbH and
the expectations is that this continues.

** Relationship with Other Apache Products **
Currently there are no dependencies on other ASF projects but
Apache Solr and Apache Cassandra are being considered.

** A Excessive Fascination with the Apache Brand **
So far SST was developed without considering to make it an ASF
project. Weighting the pros and cons we come to the conclusion
that as an ASF project SST has a much bigger chance to make it
a success; especially because only in this case we can hope
that SST is widely used to publish industrial product data in a
computer sensible way on the Internet.

** Documentation **
Not ready today. More detailed SST documentation will become
available when in incubator stage.

** Initial Source **
All source files (GO, Turtle) are currently stored in an
internal GIT of LKSoftWare GmbH.
Source and Intellectual Property Submission Plan
All is currently under copyright of LKSoftWare GmbH.
Will be moved over to ASF when this podling application is
successful.

** External Dependencies **
No

** Cryptography **
There is no cryptography involved in the core code-base.

=== Required Resources ===
** Mailing lists: **
Needed lists are private@..., dev@..., commits@...

** Subversion Directory **
No

** Git Repositories **
Yes

** Issue Tracking **
JIRA

** Other Resources **
A Wiki for the documentation

** Initial Committers **
Vaidas Nargelas <vaidas.narge...@gmail.com>
Lothar Klein <lothar.kl...@lksoft.com>

=== Sponsors ===
** Champion: **
TBD

** Nominated Mentors: **
TBD

** Sponsoring Entity: **
TBD



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Champion and Mentors wanted for new potential Apache SST incubator project

Reply via email to