On 08.02.22 16:02, Saul Wold wrote: > This patch will read the begining of source files and try to find > the SPDX-License-Identifier to populate the licenseInfoInFiles > field for each source file. This does not populate licenseConcluded > at this time, nor rolls it up to package level. > > We read as binary file since some source code seem to have some > binary characters, the license is then converted to ascii strings. > > Signed-off-by: Saul Wold <saul.w...@windriver.com> > --- > v2: Clean up commit message > v3: Really fix up regex based on Peter's feedback! > > meta/classes/create-spdx.bbclass | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/meta/classes/create-spdx.bbclass > b/meta/classes/create-spdx.bbclass > index 8b4203fdb5..64aada8593 100644 > --- a/meta/classes/create-spdx.bbclass > +++ b/meta/classes/create-spdx.bbclass > @@ -37,6 +37,23 @@ SPDX_SUPPLIER[doc] = "The SPDX PackageSupplier field for > SPDX packages created f > > do_image_complete[depends] = "virtual/kernel:do_create_spdx" > > +def extract_licenses(filename): > + import re > + > + lic_regex = re.compile(b'^\W*SPDX-License-Identifier:\s*([ > \w\d.()+-]+?)(?:\s+\W*)?$', re.MULTILINE)
Taking inspiration from reuse-tool (https://github.com/fsfe/reuse-tool/blob/master/src/reuse/_comment.py) and the way they parse comment blocks the results with the updated regex look good. Test sample set: (* SPDX-License-Identifier: Foo-Bar *) (* SPDX-License-Identifier: Foo-Bar *) /* SPDX-License-Identifier: Foo-Bar */ <!-- SPDX-License-Identifier: Foo-Bar --> <#-- SPDX-License-Identifier: Foo-Bar --> <%-- SPDX-License-Identifier: Foo-Bar --%> {# SPDX-License-Identifier: Foo-Bar #} {/* SPDX-License-Identifier: Foo-Bar */} {{!-- SPDX-License-Identifier: Foo-Bar --}} @Comment{ SPDX-License-Identifier: Foo-Bar } ---> Only this one is missed (which is bibtex syntax) - no idea if that is of importance for anyone. Just wanted to highlight that this is not catching every possible input line > + > + try: > + with open(filename, 'rb') as f: > + size = min(15000, os.stat(filename).st_size) > + txt = f.read(size) > + licenses = re.findall(lic_regex, txt) > + if licenses: > + ascii_licenses = [lic.decode('ascii') for lic in licenses] > + return ascii_licenses > + except Exception as e: > + bb.warn(f"Exception reading {filename}: {e}") > + return None > + > def get_doc_namespace(d, doc): > import uuid > namespace_uuid = uuid.uuid5(uuid.NAMESPACE_DNS, > d.getVar("SPDX_UUID_NAMESPACE")) > @@ -232,6 +249,11 @@ def add_package_files(d, doc, spdx_pkg, topdir, > get_spdxid, get_types, *, archiv > checksumValue=bb.utils.sha256_file(filepath), > )) > > + if "SOURCE" in spdx_file.fileTypes: > + extracted_lics = extract_licenses(filepath) > + if extracted_lics: > + spdx_file.licenseInfoInFiles = extracted_lics > + > doc.files.append(spdx_file) > doc.add_relationship(spdx_pkg, "CONTAINS", spdx_file) > spdx_pkg.hasFiles.append(spdx_file.SPDXID) > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#161515): https://lists.openembedded.org/g/openembedded-core/message/161515 Mute This Topic: https://lists.openembedded.org/mt/88997967/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-