Tools: Tech Talk

From MarineLives
Revision as of 12:33, February 7, 2021 by ColinGreenstreet (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Rowan Beentje


Rowan Beentje described technology behind the MarineLives wiki

Rowan Beentje is the designer of the MarineLives semantic media wiki and volunteer technical advisor to Digital Pop Up Lab. In his day job he works in mobile and web application development for a major media company.

The article below, written by Rowan Beentje, describes the technology behind the MarineLives wiki.



Structure of the wiki


The MarineLives wiki is built on a PHP-based stack:
- Media Wiki
- Semantic MediaWiki extension to allow storage and querying of data across pages
- Semantic Forms extension to allow editing of pages as structured data
- Custom extensions for folio navigation, basic transcription, and improved behaviour to match transcription expectations.



History of the wiki


The wiki was migrated from a collection of separate wikis; historically the wiki was set up with relatively unstructured data from one wiki site per volume of depositions (witness statements) and several further wikis for cross volume analysis. An importer took the data from each wiki and converted to structured data on a single wiki, with the analysis wikis moved to namespaced pages.



Technology approaches available to three teams


The three teams within the Digital Pop Up Lab will need different technology approaches:

Team one: semi-automated recognition of handwritten manuscripts


Rather than starting from scratch with a system like Tesseract which has problems with even handwriting, the recommended approach is to integrate with the Transkribus suite. The base software is written in Java, and currently has a number of client approaches in Java together with a JS+ LAMP platform known as a Transcriptorium which uses the Transkribus web services.

Team two: tailored and semantic search


While it would be possible to build on top of exported data and use solutions such as graphql on top of that, the recommended initial approach is to explore search interfaces and data exploration built on top of the semantic mediawiki data interface. This would probably use a mix of PHP for any custom extensions and editing wiki pages to use the advanced Semantic MediaWiki syntax. See: https://www.semantic-mediawiki.org/wiki/Ask_API , https://www.semantic-mediawiki.org/wiki/Help:Selecting_pages , and https://www.semantic-mediawiki.org/wiki/Special:Ask)

Team three: visualisation of historical data


Ask API generated output from MarineLives wiki

This team has the most flexibility in the tools to use in exploring the data. The data being visualised may come from custom data sets, or semantic/annotated data read live from the wiki using normal MediaWiki APIs, or more likely the Ask API from Semantic MediaWiki. Custom extensions presenting data transformed from internal APIs could also be used as a data source, but transformation and presentation after that could take many forms.

For an example of Ask API generated data output for further digital processing see image.