User:GavinRobinson/Named entity subobjects
This is a demo of possible ways of using semantic tags for named entities mentioned in transcribed documents. The current version concentrates on people but if it works it could easily be extended to other types of named entity such as ships or places. You can discuss it on the talk page of this page or on talk pages of more specific pages linked to below. Everyone should feel free to edit the test pages.
Contents
General Principles
- Template everything. Hiding the implementation makes things easier for editors because they only have to learn templates. It's also easy to change the implementation in the future without editing every page.
- Getting the best out of Semantic MediaWiki means not having to search as much. Most users should be able to find most of what they want by clicking through a trail of wikilinks, although the trail will usually have to start with a simple search, and there will always be a need for some users to construct their own queries.
- Historical data is often uncertain. Semantic markup needs to represent uncertainty.
- TEI can help us to think more clearly about the structure and semantics of documents even if we're not using it for markup.
Subobjects and Properties
This approach uses subobjects to group together the following properties:
- Property:Name transcribed as text property containing the name string as transcribed in the text. This allows searching on strings marked up as names even if the person they refer to hasn't been identified.
- Property:Identified as person page property linking to a page for the person if they can be positively identified.
- Property:Could be person page property similar to above but for people whose identity isn't as certain. Allows multiple values if it could be more than one person.
- Property:Performs role in document the person's role in the document, eg deponent. Currently a text property but could be changed to a page property in future as that would allow documenting what each role means.
- Property:Mentioned in page always defaults to the page name of the page that contains the subobject. Used to hide subobjects in query results and avoid the need for an extra level of query to find the parent page.
Templates, Forms and Transcripts
The subobjects can be used in two different ways:
- inline markup: Template:persname can be embedded directly in transcribed text. If the person is positively identified, it creates a wikilink to their page. For example, see User:GavinRobinson/Test page inline.
- semantic forms: Template:persname in form is a repeatable template that can be added to a form. It allows entering metadata about people separately from the transcribed text. For example, see User:GavinRobinson/Test page with form, which uses Form:persname (which also transcludes Template:Get person subobjects).
There's plenty of room to explore and debate which approach is better. Inline is likely to be easier for readers but harder for editors, and could be inconvenient for text miners. The semantic forms approach is likely to be easier for editors but harder for readers, and fits better with current practice at Marine Lives.
Person Pages
These are examples of pages for people that can contain a biography, queries for other pages that mention them, and links to external sources:
In future, pages like this could contain structured semantic data about people. They could contain semantic links to each other, which might be something else to think about when deciding on semantic markup of names.
Example Query
This inline query should return every existing person subobject that has at least a transcribed name: