Store Module¶
Overview¶
The store module houses the different types of data stores supported by
the Knora API server. At the moment, only triplestores are supported. The triplestore
support is implemented in the org.knora.webapi.store.triplestore
package.
Lifecycle¶
At the top level, the store package houses the StoreManager-Actor
which is started when the Knora API server starts. The StoreManager then starts
the TripleStoreManagerActor which in turn starts the correct actor
implementation (e.g., GraphDB, Fuseki, embedded Jena, etc.).
HTTP-based Triplestores¶
HTTP-based triplestore support is implemented in the org.knora.webapi.triplestore.http package.
An HTTP-based triplestore is one that is accessed remotly over the HTTP protocol. We have implemented support for the following triplestores:
- Ontotext GraphDB
- Fuseki 2
Embedded Triplestores¶
Embedded triplestores is implemented in the org.knora.webapi.triplestore.embedded package.
An embedded triplestore is one that runs in the same JVM as the Knora API server.
Apache Jena TDB¶
Note
The support for embedded Jena TDB is currently dropped. The documentation and the code will remain in the repository. You can use it at your own risk.
The support for the embedded Jena-TDB triplestore is implemented in org.knora.webapi.triplestore.embedded.JenaTDBActor.
The relevant Jena libraries that are used are the following:
- Jena API - The library used to work programmatically with RDF data
- Jena TDB - Their implementation of a triple store
Concurrency¶
Jena provides concurrency on different levels.
On the Jena TDB level there is the Dataset object, representing the
triple store. On every access, a transaction (read or write) can be
started.
On the Jena API level there is a Model object, which is equivalent
to an RDF Graph. Here we can lock the model, so that MRSW (Multiple
Reader Single Writer) access is allowed.
Implementation¶
We employ transactions on the Dataset level. This means that every
thread that accesses the triplestore, starts a read or write enabled
transaction.
The transaction mechanism in TDB is based on write-ahead-logging. All changes made inside a write-transaction are written to journals, then propagated to the main database at a suitable moment. This design allows for read-transactions to proceed without locking or other overhead over the base database.
Transactional TDB supports one active write transaction, and multiple read transactions at the same time. Read-transactions started before a write-transaction commits see the database in a state without any changes visible. Any transaction starting after a write-transaction commits sees the database with the changes visible, whether fully propagates back to the database or not. There can be active read transactions seeing the state of the database before the updates, and read transactions seeing the state of the database after the updates running at the same time.
Configuration¶
In application.conf set to use the embedded triplestore:
triplestore {
dbtype = "embedded-jena-tdb"
embedded-jena-tdb {
persisted = true // "false" -> memory, "true" -> disk
loadExistingData = false // "false" -> use data if exists, "false" -> create a fresh store
storage-path = "_TMP" // ignored if "memory"
}
reload-on-start = false // ignored if "memory" as it will always reload
rdf-data = [
{
path = "../knora-ontologies/knora-base.ttl"
name = "http://www.knora.org/ontology/knora-base"
}
{
path = "../knora-ontologies/knora-dc.ttl"
name = "http://www.knora.org/ontology/dc"
}
{
path = "../knora-ontologies/salsah-gui.ttl"
name = "http://www.knora.org/ontology/salsah-gui"
}
{
path = "_test_data/ontologies/incunabula-onto.ttl"
name = "http://www.knora.org/ontology/incunabula"
}
{
path = "_test_data/demo_data/incunabula-demo-data.ttl"
name = "http://www.knora.org/data/incunabula"
}
{
path = "_test_data/ontologies/images-onto.ttl"
name = "http://www.knora.org/ontology/dokubib"
}
{
path = "_test_data/demo_data/images-demo-data.ttl"
name = "http://www.knora.org/data/dokubib"
}
]
}
Here the storage is set to persistent, meaning that a Jena TDB store
will be created under the defined tdb-storage-path. The
reload-on-start flag, if set to true would reload the triplestore
with the data referenced in rdf-data.
TDB Disk Persisted Store¶
Note
Make sure to set reload-on-start to true if run for
the first time. This will create a TDB store and load the data.
If only read access is performed, then Knora can be run once with
reloading enabled. After that, reloading can be turned off, and the
persisted TDB store can be reused, as any data found under the
tdb-storage-path will be reused.
If the TDB storage files get corrupted, then just delete the folder and reload the data anew.
Actor Messages¶
ResetTripleStoreContent(rdfDataObjects: List[RdfDataObject])ResetTripleStoreContentACK()
The embedded Jena TDB can receive reset messages, and will ACK when
reloading of the data is finished. RdfDataObject is a simple case
class, containing the path and name (the same as rdf-data in the
config file)
As an example, to use it inside a test you could write something like:
val rdfDataObjects = List (
RdfDataObject(path = "../knora-ontologies/knora-base.ttl",
name = "http://www.knora.org/ontology/knora-base"),
RdfDataObject(path = "../knora-ontologies/knora-dc.ttl",
name = "http://www.knora.org/ontology/dc"),
RdfDataObject(path = "../knora-ontologies/salsah-gui.ttl",
name = "http://www.knora.org/ontology/salsah-gui"),
RdfDataObject(path = "_test_data/ontologies/incunabula-onto.ttl",
name = "http://www.knora.org/ontology/incunabula"),
RdfDataObject(path = "_test_data/all_data/incunabula-data.ttl",
name = "http://www.knora.org/data/incunabula")
)
"Reload data " in {
storeManager ! ResetTripleStoreContent(rdfDataObjects)
expectMsg(300.seconds, ResetTripleStoreContentACK())
}