Testing for a bad namespace with SHACL

Overview

This is a brute force approach to using SHACL to report invalid use of a namespace. It is only effective where there are limited combinations of the bad namespace and matching classes [1] for testing.

Using the SHACL shapes:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix d1: <http://ns.dataone.org/schema/SO/nsvalidation#> .

d1:DatasetBad1Shape
    a sh:NodeShape ;
    sh:targetClass <https://schema.orgDataset/> ;
    sh:message "Expecting SO namespace of <https://schema.org/> not <https://schema.org>" ;
    sh:not [
        sh:path rdf:type ;
        sh:minCount 1;
    ].
d1:DatasetBad2Shape
    a sh:NodeShape ;
    sh:targetClass <http://schema.org/Dataset> ;
    sh:message "Expecting SO namespace of <https://schema.org/> not <http://schema.org/>" ;
    sh:not [
        sh:path rdf:type ;
        sh:minCount 1;
    ].
d1:DatasetBad3Shape
    a sh:NodeShape ;
    sh:targetClass <http://schema.orgDataset/> ;
    sh:message "Expecting SO namespace of <https://schema.org/> not <http://schema.org>" ;
    sh:not [
        sh:path rdf:type ;
        sh:minCount 1;
    ].

and a graph with three SO:Dataset sub-graphs that use invalid namespaces:

[
  {
    "@context": {
      "@vocab": "https://schema.org"
    },
    "@id":"demo_0",
    "@type":"Dataset",
    "name": "https, no trailing slash"
  },
  {
    "@context": {
      "@vocab": "http://schema.org"
    },
    "@id":"demo_1",
    "@type":"Dataset",
    "name": "http, no trailing slash"
  },
  {
    "@context": {
      "@vocab": "http://schema.org/"
    },
    "@id":"demo_2",
    "@type":"Dataset",
    "name": "http only"
  }
]

The SHACL tests are applied and results printed:

import rdflib
import pyshacl
shape_graph = rdflib.Graph()
shape_graph.parse("examples/shapes/test_namespace.ttl", format="turtle")
data_graphs = rdflib.ConjunctiveGraph()
data_graphs.parse("examples/data/ds_bad_namespace.json", format="json-ld", publicID="https://example.net/")
conforms, results_graph, results_text = pyshacl.validate(
    data_graphs,
    shacl_graph=shape_graph,
    inference="rdfs",
    meta_shacl=True,
    abort_on_error=False,
    debug=False
)
print(results_text)
Validation Report
Conforms: False
Results (3):
Constraint Violation in NotConstraintComponent (http://www.w3.org/ns/shacl#NotConstraintComponent):
	Severity: sh:Violation
	Source Shape: d1:DatasetBad2Shape
	Focus Node: <https://example.net/demo_2>
	Value Node: <https://example.net/demo_2>
	Message: Expecting SO namespace of <https://schema.org/> not <http://schema.org/>
Constraint Violation in NotConstraintComponent (http://www.w3.org/ns/shacl#NotConstraintComponent):
	Severity: sh:Violation
	Source Shape: d1:DatasetBad3Shape
	Focus Node: <https://example.net/demo_1>
	Value Node: <https://example.net/demo_1>
	Message: Expecting SO namespace of <https://schema.org/> not <http://schema.org>
Constraint Violation in NotConstraintComponent (http://www.w3.org/ns/shacl#NotConstraintComponent):
	Severity: sh:Violation
	Source Shape: d1:DatasetBad1Shape
	Focus Node: <https://example.net/demo_0>
	Value Node: <https://example.net/demo_0>
	Message: Expecting SO namespace of <https://schema.org/> not <https://schema.org>

For comparison, a valid SO:Dataset:

{
  "@context": {
    "@vocab": "https://schema.org/"
  },
  "@graph": [
    {
      "@type": "Dataset",
      "@id": "./",
      "identifier": "dataset-01",
      "name": "Dataset with metadata about",
      "description": "Dataset snippet with metadata and data components indicated by hasPart and the descriptive metadata through an about association.",
      "license": "https://creativecommons.org/publicdomain/zero/1.0/",
      "hasPart": [
        {
          "@id": "./metadata.xml"
        },
        {
          "@id": "./data_part_a.csv"
        }
      ]
    },
    {
      "@id": "./metadata.xml",
      "@type": "MediaObject",
      "contentUrl": "https://example.org/my/data/1/metadata.xml",
      "dateModified": "2019-10-10T12:43:11+00:00.000",
      "description": "A metadata document describing the Dataset and the data component",
      "encodingFormat":"http://www.isotc211.org/2005/gmd",
      "about": [
        {
          "@id": "./"
        },
        {
          "@id": "./data_part_a.csv"
        }
      ]
    },
    {
      "@id": "./data_part_a.csv",
      "@type": "MediaObject",
      "contentUrl": "https://example.org/my/data/1/data_part_a.csv"
    }
  ]
}

Does not match any of the bad namespace tests and so conforms.

data_graphs.parse("examples/data/ds_m_about.json", format="json-ld", publicID="https://example.net/")
conforms, results_graph, results_text = pyshacl.validate(
    data_graphs,
    shacl_graph=shape_graph,
    inference="rdfs",
    meta_shacl=True,
    abort_on_error=False,
    debug=False
)
print(results_text)
Validation Report
Conforms: True

Footnotes

[1]The limitation of this approach stems from the need to identify a target node that the SHACL constraints are applied against. Adding checks for additional SO: types with this pattern requires a separate sh:targetClass rule for each combination of namespace and type. In this case, three entries for each type being tested would be required.

Running code on this page

All examples on this page can be run live in Binder. To do so:

  1. Click on the “Activate Binder” button
  2. Wait for Binder to be active. This can take a while, you can watch progress in your browser’s javascript console. When a line like Kernel: connected (89dfd3c8... appears, Binder should be ready to go.
  3. Run the following before any other script on the page. This sets the right path context for loading examples etc.
import os
try:
    os.chdir("docsource/source")
except:
    pass
print("Page is ready. You can now run other code blocks on this page.")