While this document continues to be useful, it is out of date.  You can track current progress on my research blog.

Introduction to the Genesis Data Model

RDF Resource Description Framework Icon Creative Commons License

Working Draft


The objective of the Genesis project is to lay the foundation for the decentralized human family tree.  This effort requires a data model for genealogical information with the following characteristics:

In light of these requirements, the Resource Description Framework (RDF) was chosen as the underlying data model.  RDF is a very simple data model that allows the user to express statements, such as "Mary Flatley was born in London."  These statements are comprised of three parts: a subject ("Mary Flatley"), a predicate ("was born in"), and an object ("London").

Different statements can of course refer to the same objects and subjects.  A collection of statements therefore makes a graph.  For example, if we add the statements "Mary Flatley is female" and "Cassidy is married to Mary Flatley," we get the following graph:

Because there are quite a few good introductions to RDF on the web, I won't cover it in any more detail here.

The OWL Web Ontology Language is a language for defining RDF vocabularies.  These vocabularies are used to give meaning to RDF data, both for humans and machines.  Humans can use them as a kind of grammar to help understand the relationships between resources and their descriptors.  Software agents can use them to perform inference over data (deducing, for example, that the father of a given person's father is that person's grandfather).

The Genesis project makes use of three such ontologies: Genealogy Core (GC), Genealogy Provenance (GP), and Genealogy Trust (GT).  These ontologies are used to record genealogical information and the provenance of that information, respectively.  I've written a tutorial introducing each of them:

  1. Genealogy Core Tutorial
  2. Genealogy Provenance Tutorial
  3. Genealogy Trust Tutorial