| |
![]() |
The Genealogy Trust (GT) is an OWL ontology used for recording assertions of belief and a method for evaluating the trustworthiness of genealogical information (Note: The actual ontology has not yet been defined. It's use in this document is only suggestive.). GT makes use of the Semantic Web Publishing (SWP) ontology to assert and sign named graphs (SWP is one of the building blocks of the TriQL.P Trust Architecture).
So long as the digital genealogical information you work with is confined to your own private database, issues of trust and belief can usually be ignored (at least explicitly). However, for the decentralized human family tree to function there must be a rigorous trust mechanism. RDF is based on an open world model where anyone can publish documents adding additional metadata to existing resources. This means that one person can publish an individual's name and birthdate on one server, while another person publishes that individual's name and gender on another. The two names might even be different (correctly or incorrectly so). Because the most useful software will see information from disparate sources as one global database, all data must be considered suspect until proven otherwise. The GT provides a vocabulary and a mechanism for managing genealogical information in an untrustworthy environment.
This document introduces and demonstrates the use of GT. It builds on the Genealogy Core and Genealogy Provenance tutorials, which you should read first if you haven't yet.
Digitally signing graphs is neccessary because information can come from and pass through any number of sources. Digital signatures help assure us that data has not been tampered with. For this example we will use the <#deathCertificateOfNaomi> graph from the Genealogy Provenance Tutorial:
The first step is to use a secure hash to create a digest of the graph (using its canonical representation). The digest is recorded in a secondary graph called the warrant graph (Note: The digests and signatures used in this example are fictional):
The warrant graph asserts itself. We then digitally sign the warrant graph itself and record the signature in the warrant graph. We use the private key of the author (Mike Davis in this case) to do this:
Now we can publish or otherwise share the <#deathCertificateOfNaomi> graph, along with it's corresponding warrant graph, and others can be confident that this information did indeed come from Mike Davis and that it hasn't been tampered with. They would do this by first recomputing the hash on the graph to verify that it hasn't been tampered with. They would then retrieve Mike's public key and verify his signature. This would assure them that Mike signed this information (assuming his key hasn't been compromised). The question then is whether Mike is trustworthy, which is a social problem, outside the scope of this discussion.
Genealogical information comes from many different sources, and much of it is of dubious value. GT provides a vocabulary and a mechanism for annotating to what degree you trust or distrust a graph of information. For example, we can show that we have a high degree of trust in the information of the <#deathCertificateOfNaomi> graph like so:
If we want to make a belief statement about only some of the information in a graph, we create a subgraph with the relevant information (verbatim) and make statements about it. The rdfg:subGraphOf predicate is used to establish a subgraph relationship between two graphs (it is found in the Named Graphs (RDFG) vocabulary; more information can be found in the Named Graphs, Provenance and Trust paper). If, for example, we find that Naomi's death certificate in fact does not say that she died on August 2, 1949, we could assert this in the following way:
This assertion is specifically about what Naomi's death certificate says. If we want to assert that Naomi did not die on August 2, 1949, regardless of the source, we would do so with a new graph: