On Wed, Jul 28, 2021 at 4:49 AM Miel Vander Sande < [email protected]> wrote:
> Hi Nick, > > TBH, it's pretty much a function that converts a Dict or a JSON file in a > streaming fashion: > https://github.com/viaacode/construction-site/blob/main/construction_site/parse_functions.py. > I think it's a stand-alone thing; I don't plan anything extra on that > specifically, with maybe the exception of a cmd interface (hence the > proposed refactoring of csv2rdf) > Profiling / [comparative] benchmarks with e.g. Scalene [1][2] and/or perfplot [3] (%timeit) [4][5] could be worthwhile. [1] https://awesomeopensource.com/project/plasma-umass/scalene?categoryPage=26 [2] https://github.com/plasma-umass/scalene [3] https://github.com/nschloe/perfplot [4] https://docs.python.org/3/library/timeit.html#timeit-command-line-interface [5] https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit ijson [6] looks like it has some interesting features; iterative, asyncio, push. How does the performance compare? [6] https://pypi.org/project/ijson/ I do plan to develop more components that assist scalable ETL, data-to-rdf > like tasks. This includes a plugin for Apache Airflow ("provider"), which > would be good as a RDFLib family repository. > - The datasette and dogsheep projects have a bunch of *-to-sqlite utils and an interface that a number of projects on PyPI have implemented: https://datasette.io/tools https://github.com/dogsheep https://pypi.org/search/?q=to-sqlite - https://datasette.io/tools/csvs-to-sqlite - https://github.com/simonw/csvs-to-sqlite/blob/a8a37a016790dc93270c74e32d0a5051bc5a0f4d/tests/test_csvs_to_sqlite.py#L417-L446 - parse datetimes in CSVs - xsd:datetime (and schema.org/Date and schema.org/dateCreated and schema.org/dateModified) specifies that time will be specified in ISO8601 formats What are the solutions for generating RDFS schema from CSVs and SQL tables? - https://pypi.org/project/tablib/ https://github.com/jazzband/tablib/blob/master/tests/test_tablib.py - doesn't do anything with datatypes FWICS - https://sqlite-utils.datasette.io/en/stable/cli.html#showing-the-schema https://github.com/simonw/sqlite-utils/blob/main/sqlite_utils/db.py def suggest_column_types: https://github.com/simonw/sqlite-utils/blob/c7e8d72be9fe8fe0811f685a18eebc637662d41b/sqlite_utils/utils.py#L29-L58 - https://en.wikipedia.org/wiki/Knowledge_extraction https://en.wikipedia.org/wiki/Knowledge_extraction#Relational_databases_to_RDF - https://pypi.org/project/rdb2rdf/ https://github.com/nisavid/pyrdb2rdf/blob/master/rdb2rdf/stores.py > PyRDB2RDF provides RDFLib with an interface to relational databases as RDF stores. The underlying data is accessed via SQLAlchemy. It is mapped to RDF according to the specifications of RDB2RDF. The corresponding RDF graph is represented as an RDFLib graph. > > Translating from relational data to RDF via direct mapping is currently supported. Translating in the other direction and mapping with R2RML are planned but not yet implemented. - https://pypi.org/project/rdfizer/ - https://github.com/RDFLib/pyTARQL - Does this handle datetimes? - Generate JSONschema from JSON and SHACL from JSON-Schema: https://stackoverflow.com/questions/7341537/tool-to-generate-json-schema-from-json-data/30294535#30294535 - https://pypi.org/project/genson/ has been recently updated - Src: https://github.com/wolverdude/genson/ - https://github.com/mulesoft-labs/json-ld-schema https://github.com/mulesoft-labs/json-ld-schema#how-does-it-work > JSON-LD Schema defines a simple 'semantics' JSON-Schema vocabulary (effectively a JSON-Schema meta-schema) that reuses the official JSON Schema for JSON-LD to provide definitions for @context and @type properties. These annotations can be used to provide JSON-LD context for a JSON-Schema document. Provided this JSON-LD context, constraints over named 'properties' in a JSON Schema document can be understood as constraints over CURIES of JSON-LD documents following the context rules defined in the JSON-LD specification. ## CSVW: CSV on the Web - Homepage: https://w3c.github.io/csvw/ - Standard: https://www.w3.org/TR/tabular-data-model/ - Standard: https://www.w3.org/TR/tabular-metadata/ - Standard: https://www.w3.org/TR/csv2json/ - Standard: https://www.w3.org/TR/csv2rdf/ - Namespace: https://www.w3.org/ns/csvw# - xmlns: `@prefix csvw: <https://www.w3.org/ns/csvw#> .` - @context: https://www.w3.org/ns/csvw.jsonld CSVW (*CSV on the Web*) is a set of relatively new standards for representing :ref:`CSV` rows and columns as :ref:`RDF` (and :ref:`JSON` / :ref:`JSON-LD`) along with *metadata*. * URIs for datatypes (XSD) * URIs for columns (RDF) * Document Metadata * CSV -> JSON (-> JSON-LD -> RDF) * CSV -> RDF Could there be a file naming convention for specifying the extra CSVW header to apply_to or transform zero or more CSV files with? filename.csv filename.csv.csvw filename.csv.csvwheader.jsonld.json filename.csv.csvw.jsonld.json https://www.w3.org/TR/tabular-data-primer/ > Best, > > Miel > > Op wo 28 jul. 2021 om 06:09 schreef Nicholas Car < > [email protected]>: > >> Hi Meil, >> >> Yes, all offers of contribution are of interest! The CSV 2 RDF stuff is >> very old and many tools related to it, such as pyTARQL ( >> https://github.com/RDFLib/pyTARQL), are missing. Are you planning on >> presenting JSON2RDF as a new plugin to RDFlib? that may be an option, >> however remember that another option is also just to present your tool's >> repository within RDFlib's family of repositories (i.e. within >> https://github.com/RDFLib) and the choice will depend on how stable the >> tool is and how you see it's future development going. >> >> But perhaps you have other things in mind? Whatever the case, we'd love >> to hear your plans. >> >> Cheers, >> >> Nick >> >> On Tue, Jul 27, 2021 at 5:56 PM Miel Vander Sande < >> [email protected]> wrote: >> >>> Hi all, >>> >>> little late to the party, but what a great effort this is! Congrats with >>> the release and thank you; this library is super essential to my work and >>> it makes RDF usable in ways other libraries can't. >>> >>> Sidenote: I have a streaming direct json-to-rdf mapping implementation >>> (port of https://github.com/AtomGraph/JSON2RDF) that I'd like to >>> contribute, possibly in combination with a refactoring of >>> https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.tools.html#rdflib.tools.csv2rdf.CSV2RDF. >>> Would that be of interest? >>> >> Does JSON2RDF [need to] implement the w3c json-ld-streaming spec [7]? [7] https://w3c.github.io/json-ld-streaming/ - https://w3c.github.io/json-ld-streaming/#streaming-document-form - https://w3c.github.io/json-ld-streaming/#streaming-rdf-form > >>> Best, >>> >>> Miel >>> >>> Op di 20 jul. 2021 om 21:58 schreef Natanael Arndt <[email protected]>: >>> >>>> I've retweetet the tweet by jarven. But I don't use reddit or hacker >>>> news, I think also semantic web mailing list would be a good idea. >>>> >>>> If you'd like to post something in the channels, please do so. >>>> >>>> Natanael >>>> >>>> Am 20. Juli 2021 20:14:09 MESZ schrieb Wes Turner <[email protected] >>>> >: >>>> >Congrats and thanks! >>>> > >>>> >From the release notes on the Release: >>>> >https://github.com/RDFLib/rdflib/releases/tag/6.0.0 >>>> > >>>> >``` >>>> >6.0.0 is a major stable release that drops support for Python 2 and >>>> >Python >>>> >3 < 3.7. Type hinting is now present in much >>>> >of the toolkit as a result. >>>> > >>>> >It includes the formerly independent JSON-LD parser/serializer, >>>> >improvements to Namespaces that allow for IDE namespace >>>> >prompting, simplified use of g.serialize() (turtle default, no need to >>>> >decode()) and many other updates to >>>> >documentation, store backends and so on. >>>> > >>>> >Performance of the in-memory store has also improved since Python 3.6 >>>> >dictionary improvements. >>>> > >>>> >There are numerous supplementary improvements to the toolkit too, such >>>> >as: >>>> > >>>> >- inclusion of Docker files for easier CI/CD >>>> >- black config files for standardised code formatting >>>> >- improved testing with mock SPARQL stores, rather than a reliance on >>>> >DBPedia etc >>>> >``` >>>> > >>>> >Have there been ANN posts to e.g. Hacker news and e.g. /r/semanticweb? >>>> > >>>> >On Tue, Jul 20, 2021, 10:23 Florent Georges <[email protected]> >>>> wrote: >>>> > >>>> >> Congratulations, and thank you all for the hard work! >>>> >> >>>> >> -- >>>> >> Florent Georges >>>> >> H2O Consulting >>>> >> http://h2o.consulting/ >>>> >> >>>> >> On Tue, Jul 20, 2021, 16:00 Nicholas Car < >>>> >> [email protected]> wrote: >>>> >> >>>> >>> Hi all, >>>> >>> >>>> >>> Yes, 6.0.0 is out: >>>> >>> >>>> >>> - https://pypi.org/project/rdflib/6.0.0/ >>>> >>> - https://github.com/RDFLib/rdflib/releases/tag/6.0.0 >>>> >>> >>>> >>> Please publicise this release: it has a lot of stuff since 5.0.0 in >>>> >April >>>> >>> last year. >>>> >>> >>>> >>> Thank you very much to all of you who contributed, in particular my >>>> >>> co-maintainers, Ashley & Natanael and Edmond, Iwan, Tom, Remi, >>>> >Harold and >>>> >>> all the PR and Issue creators. Thanks also to the institutions that >>>> >>> provided time for their staff to contribute. >>>> >>> >>>> >>> If you see issues, please let the co-maintainers know straight away: >>>> >we >>>> >>> keen to get a 6.0.1 release out shortly (like weeks to a month) to >>>> >speed up >>>> >>> the RDFlib release cycle. >>>> >>> >>>> >>> Cheers, >>>> >>> >>>> >>> Nick >>>> >>> >>>> >>> -- >>>> >>> kind regards >>>> >>> Dr Nicholas Car >>>> >>> Data Systems Architect >>>> >>> >>>> >>> SURROUND Australia Pty Ltd and >>>> >>> SURROUND NZ Limited >>>> >>> >>>> >>> Address Level 9, Nishi Building, >>>> >>> 2 Phillip Law Street >>>> >>> New Acton Canberra 2601 >>>> >>> Mobile +61 477 560 177 >>>> >>> Email [email protected] >>>> >>> Website https://www.surroundaustralia.com >>>> >>> >>>> >>> Enhancing Intelligence Within Organisations >>>> >>> delivering evidence that connects decisions to outcomes >>>> >>> >>>> >>> Dr Nicholas Car >>>> >>> Adjunct Senior Lecturer >>>> >>> >>>> >>> Research School of Computer Science >>>> >>> >>>> >>> The Australian National University, >>>> >>> Canberra ACT Australia >>>> >>> +61 477 560 177 >>>> >>> [email protected] >>>> >>> https://cs.anu.edu.au/people/nicholas-car >>>> >>> https://orcid.org/0000-0002-8742-7730 >>>> ><https://www.surroundaustralia.com> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> http://github.com/RDFLib >>>> >>> --- >>>> >>> You received this message because you are subscribed to the Google >>>> >Groups >>>> >>> "rdflib-dev" group. >>>> >>> To unsubscribe from this group and stop receiving emails from it, >>>> >send an >>>> >>> email to [email protected]. >>>> >>> To view this discussion on the web visit >>>> >>> >>>> > >>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com >>>> >>> >>>> >< >>>> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh19yjpwB8EoHVqs5QzKug_rSq1X%2BfFHfnFtOJBdZ1RwYg%40mail.gmail.com?utm_medium=email&utm_source=footer >>>> > >>>> >>> . >>>> >>> >>>> >> -- >>>> >> http://github.com/RDFLib >>>> >> --- >>>> >> You received this message because you are subscribed to the Google >>>> >Groups >>>> >> "rdflib-dev" group. >>>> >> To unsubscribe from this group and stop receiving emails from it, >>>> >send an >>>> >> email to [email protected]. >>>> >> To view this discussion on the web visit >>>> >> >>>> > >>>> https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com >>>> >> >>>> >< >>>> https://groups.google.com/d/msgid/rdflib-dev/CADyR_r1Q_hvfnufYVD0YLYhP%3DwEXnjsi5ucpjzWK_owyYfsfnQ%40mail.gmail.com?utm_medium=email&utm_source=footer >>>> > >>>> >> . >>>> >> >>>> >>>> -- >>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet. >>>> >>>> -- >>>> http://github.com/RDFLib >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "rdflib-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/rdflib-dev/905F1E60-396C-4320-88D1-5A0BCB15B785%40gmail.com >>>> . >>>> >>> -- >>> http://github.com/RDFLib >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "rdflib-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com >>> <https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWs6jG-f5HWqew0iqdpqObab3ft-L%3DNyvS7p%2By%2BGAV4RoQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> http://github.com/RDFLib >> --- >> You received this message because you are subscribed to the Google Groups >> "rdflib-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com >> <https://groups.google.com/d/msgid/rdflib-dev/CAP7nqh06aApBfyx11L0_uik_BMFA563qXUFLYYOaonumyP4A5g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > http://github.com/RDFLib > --- > You received this message because you are subscribed to the Google Groups > "rdflib-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com > <https://groups.google.com/d/msgid/rdflib-dev/CAHeRLWtCCa-b7Q%3DycdsxaoTRd%2BHALfWK9rAeNS0JTZcO4pUX9w%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- http://github.com/RDFLib --- You received this message because you are subscribed to the Google Groups "rdflib-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/CACfEFw_4aE2CHTCjnB_Yy5spB%2Bn3WQYOKoKUZ6oy74jUFYDiTw%40mail.gmail.com.
