Identifiers for Datasets¶
Contents
Overview¶
The purpose of this document is to describe the behavior of DataONE indexers when encountering
identifiers in SO:Dataset
[3] instances.
In the context of DataONE, a dataset has multiple components. Each component version is preserved and each component version has a persistent, globally unique identifier (PID). Each component may also be assigned a globally unique identifier that always resolves to the most recent version of a component (SeriesID or SID). That context is used in this document.
id and identifier¶
The @id
property in JSON-LD [1] identifies a node in the RDF graph, and must be an IRI [2].
The SO:identifier
is an optional property of a node that may or may not be a URI, and may or may
not be the same as the @id
for the node.
Ideally, the @id
and the SO:identifier
would
have the same value though this if often not the case for datasets.
Identifier Conflation¶
The string “978-1-5387-1847-6” is an identifier, in this case an ISBN. A number of services are available to provide more information about the subject of the identifier. For example, `ISBN Search`_ is a lookup service that provides a HTML view of the results. Goole provides a `Books API`_ that returns structured data, though requires anuthentication to use, for example:
curl "https://www.googleapis.com/books/v1/volumes?key=${GAPIKEY}&q=isbn:9781538718476"
{
"kind": "books#volumes",
"totalItems": 1,
"items": [
{
"kind": "books#volume",
"id": "SyqzDwAAQBAJ",
"etag": "q7NUsBTwiu8",
"selfLink": "https://www.googleapis.com/books/v1/volumes/SyqzDwAAQBAJ",
...
Note that the canonical form of the identifier is "``9781538718476``", the commonly
used human readable form is "``978-1-5387-1847-6``", and a resolvable form that
varies with the resolving service such as the aforementioned Google Books API.
Persistence¶
There is no notion of immutability in schema.org.
Foototes¶
[1] | IRIs are a fundamental concept of Linked Data, for nodes to be truly linked, dereferencing the identifier should result in a representation of that node. https://www.w3.org/TR/json-ld/#node-identifiers |
[2] | An IRI (Internationalized Resource Identifier) is a string that conforms to the syntax defined in RFC 3987 |
[3] | https://schema.org/Dataset |
[4] | http://schema.org/docs/datamodel.html#identifierBg |
Running code on this page¶
All examples on this page can be run live in Binder. To do so:
- Click on the “Activate Binder” button
- Wait for Binder to be active. This can take a while, you can watch progress in your
browser’s javascript console. When a line like
Kernel: connected (89dfd3c8...
appears, Binder should be ready to go. - Run the following before any other script on the page. This sets the right path context for loading examples etc.
import os
try:
os.chdir("docsource/source")
except:
pass
print("Page is ready. You can now run other code blocks on this page.")