PundIt MarineLives Forum

From MarineLives
Jump to: navigation, search

PundIt MarineLives Forum

Editorial history

30/10/13: CSG, created page



Purpose of this page

This page provides a discussion forum and set of resources for MarineLives project members exploring the functionality of the PundIt tool.


Background

PundIt is an experimental semantic annotation tool for web pages which is currently under further development by Net7, and which is being used by the DM2E project.

The DM2E project is a project of Europeana, which has emerged out of the European Digital Library Network. Dr Christian Morbidioni and Dr Kai Eckert are two of the DM2E project workstream leaders, and have approached the MarineLives project leadership team to explore the potential for them to collaborate with us and with our partners at Bath Spa University and the National Archives.

As a first and important step, MarineLives is working with Dr Christian Morbidioni of the University of Ancona and Simone Fonda of Net7 to explore a working demo of PundIt. The demo can be found here. PundIt has recently been awarded the top price at the 2013 LODLAM summit in Montreal, Canada. See PundIt video at LODLAM.

In parallel, MarineLives is working with Dr Kai Eckert and Dominique Ritze of the University of Mannheim, to explore the potential for automatic and semi-automatic entity recognition for MarineLives transcriptions. The topics of semantic annotation and entity recognition are clearly closely related.



Approach to evaluation

We would like to focus our initial experimentation with PundIt on the High Court of Admiralty deposition book, HCA 13/72

Roughly 800 pages of HCA 13/72 have now been transcribed and edited, and are available in edited form on the following Annotate HCA 13/72 wiki. Digital images of many (but not all) of the same transcribed pages can be viewed in our tailored transcription software, MarineLives - Transcript together with the transcriptions.

We suggest that evaluators try annotating web pages from both the wiki version of the transcribed text and the MarineLives - Transcript version of the transcibed text (and indeed the images themselves, and or image fragments).

The current PundIt demo has been set up with sample custom vocabularies extracted from the Annotate HCA 13/71 wiki. Colin Greenstreet is exploring with Simone Fonda how we can create new custom vocabularies for ships, people, places and materials specifically for HCA 13/72, and how we can then add and edit new individual records in these custom vocabularies. First cut wiki versions of terms for these controlled vocabularies for HCA 13/72 are available as follows:

HCA 13/72 People
HCA 13/72 Materials
HCA 13/72 Places
HCA 13/72 Ships

A discussion is required with the PundIt team on how to handle spelling variants. The MarineLives approach (to date) has been to capture ALL spelling variants in our vocabulary lists.






Suggested links


DM2E
Europeana
Pundit

Annotate HCA 13/72 wiki
MarineLives - Transcript: HCA 13/72 pages



Rolling list of questions about PundIt functionality in the context of MarineLives project


Please post your questions here, together with answers and comments as they emerge



ANNOTATING WIKI AND OTHER PAGES & PAGE FRAGMENTS



  • Each of these wikis has a standard front page page structure and a standard structure for transcribed pages.


  • The standard structure for the transcribed pages includes standard page fragments, which with HTML links to those fragments, so that both the pages and the page fragments can be addressed and accessed from other pages


e.g. http://annotatehca1372.wikispot.org/HCA_13/72_f.4r_Annotate displays both our transcription of folio 4 recto of the deposition volume HCA 13/72 (covering years 1657 and 1658)

CAPTURE Structure HCA 1372 f4r.PNG

e.g. with a standard fragment structure of:

       Suggested links
       Transcription
       Topics
           People
           Places
           Ships
           Materials
           Miscellaneous
       Sources
           Primary sources
           Secondary sources


e.g. with an addressable HTML fragment address for the transcription of http://annotatehca1372.wikispot.org/HCA_13/72_f.4r_Annotate#head-7792b396c165940a2ef3372031f6dbb64b71233e

-QUESTIONS:

-- Can we specify the transcription fragment address as the "Page" in PundIt terminology (see screen grab below) or do we have to specify it as a "Text-Fragment"?



ANNOTATING DIGITAL IMAGES


HCA 13 72 f4r MchtMarks LH Margin.PNG
  • Our digital images are held in a mediawiki picture library, and can be accessed as such directly, but are more conveniently accessed for zooming and viewing through our tailored transcription software MarineLives - Transcript, which uses the open source software SCRIPTO.


-- So, when specifying an "Image" using PundIt terminology, should we go to the Image in the Mediawiki library, or through MarineLives - Transcript?

(for your reference, the digital image of HCA 13/72 f.4r in TRANSCRIPT/SCRIPTO can be accessed at http://marinelives-transcript.org/scripto/scripto/?scripto_action=transcribe&scripto_doc_id=2045&scripto_doc_page_id=2280)

  • NEW, 30/10/13: It would be useful to be able to comment on and otherwise annotate and key word marginalia in fragments of the digital images. For example, symbolic merchants marks and signatures and markes of deponents


SIGNATURE Hojah Peter Armenian Merchant HCA 1365f53v.PNG


CUSTOM VOCABULARIES


  • How were Marinelive: Persons & Marinelive: Boats custom vocabularies created?


  • Can the two MarineLive custom vocabularies be renamed by us, and if so, how?


  • How do we create and label new custom vocabularies?


  • How do we modify or add to existing custom vocabularies?


  • NEW, 30/10/13: Could we work with TNA in Kew to access custom vocabularies they may have developed or be developing for people and places?


  • NEW, 30/10/13: Could we work with Professor Tim Hitchcock to access GIS codes developed by one of Professor Hitchcock's recent projects for the geolocation of all English historical parishes? These GIS codes could be linked automatically to place record information in a MarineLives place controlled vocabulary, and would then provide a route into semi-automated/automated GIS mapping of MarineLives datasets defined by users




DATA FORMATS, DATA SUSTAINABILITY AND DATA PORTABILITY


  • NEW, 31/10/13: What format or formats are the data created by Pundit users stored in?


- Is there a choice of formats which can be made by the Pundit users?

  • NEW, 31/10/13: How can data generated in Pundit be ported to other applications for further processing and/or editing?


  • NEW, 31/10/13: In the long term, should PundIt cease to be supported as a software application, or the data formats used by Pundit should cease to be supported by Pundit or at industry level, how can PundIt user generated data be ported to another application and/or stored for posterity?




DATES


  • How should we and PundIt handle C17th calendars (English old style; English new style, etc)?




LINKING TO METADATA



-- EXAMPLE: http://discovery.nationalarchives.gov.uk/SearchUI/Details?uri=C7710172

Metadata reads:

Reference: SP 46/99/fo9
Description: Cornelius Burroughs [Steward-General of the fleet at Jamaica] recommending assistance for Capt. Christopher Mings [Myngs].
Date: 1658 Apr. 18
Held by: The National Archives, Kew
Former references: in The National Archives: SP 46/99/fo 9
Legal status: Public Record



SEARCHABLE DATABASES


  • Relevance of of Freebase to MarineLives?


  • Relevance of DBPedia to MarineLives?


- NEW, 30/10/13: A limited number of people, place, and material entries in Wikipedia (via DBPedia) are likely to be of direct relevance to annotating Marine Lives, e.g. http://en.wikipedia.org/wiki/Christopher_Myngs. SEE: http://marinelives-theshippingnews.org/blog/2013/10/05/christopher-myngs-naval-officer/



SERVERS


  • NEW, 31/10/13: Where are there servers used to store Pundit User generated data and under whose operational control?


  • NEW, 31/10/13: Can PundIt be installed on a users own servers, or on third-party servers nominated by a user?


- If so, is the installation process clearly documented, and what provision for version control etc. would there be to update PundIt code as the software is developed?



SOFTWARE LANGUAGES


  • NEW, 31/10/13: What software language or languages and development tools have been used to write the code for Pundit?




TRIPLES


PundIt MyItems 301013.PNG
  • How create new subjects in drop down triple menu?


  • How create new predicates in drop down triple menu?


  • How create new objects in drop down triple menu?


USER INTERFACE




Correspondence with Simone Fonda, Net Seven



Simon Fonda to Colin Greenstreet, 30/10/13: 12.30


Simone Fonda
12:30 PM (1 hour ago)

To me, Jill, Philip, William, Alex, Kai, Christian, Richard, Andrew
On Wed, Oct 30, 2013 at 10:54 AM, Colin Greenstreet

<colin.greenstreet@googlemail.com> wrote:

> Dear Simone,
>
> Dr Christian Morbidioni has kindly given me your name as our principal
> technical contact for the MarineLives project when working with PundIt,
> espcially over the next ten days when Dr Morbidioni has limited internet
> access. He has also encouraged us to keep Dr Kai Eckert in the loop.

Dear Colin and all,
pleased to meet you! :)

I'm not 100% up-to-date on your needs, your use cases and what
Christian promised you, but i'll do my best to figure it out.
Moreover, i'm not sure about the audience i'm talking with, so
apologies in advance if i go too techie!

> — Can we specify the transcription fragment address as the "Page" in PundIt
> terminology (see screen grab below) or do we have to specify it as a "Text-Fragment"?

Uhm, not sure on what do you mean, moreover there's no screen grab!

When you select a piece of text in any web page (and for example add
it to "my items") Pundit automatically assigns the "Text-fragment"
type to this newly created item. This is not something you can specify
or configure, it is tightly related to pundit internals and can't be
changed.

> — Our digital images are held in a mediawiki picture library, and can be
> accessed as such directly, but are more conveniently accessed for
> zooming and viewing through our tailored transcription software
> MarineLives - Transcript, which uses the open source software
> SCRIPTO. So, when specifying an "Image" using PundIt terminology,
> should we go to the Image in the Mediawiki library, or through
> MarineLives - Transcript?

The image annotation tool is very (very very) simple and does not
support pyramidal images (yet?). If you try to annotate one of such
scripto images, you will face two problems:
- it will annotate just a tile, not the entire image
- it will not annotate any image that has been loaded after pundit
(each zoom/pan/move might load new images).

Your only option, at the moment, to annotate an image or part of it,
is having a good old plain <img> tag into an HTML page.

> CUSTOM VOCABULARIES
> * How were Marinelive: Persons & Marinelive: Boats custom vocabularies created?

The vocabularies follow a very simple json format. For example the ships one:
[] http://metasound.dibet.univpm.it/marinelives/Marinelives-Ships.json

Christian implemented quite a complicated workflow to create that! He
started from a google drive document, exported it to RDF then with an
home-made java software produced the json.

> * Can the two MarineLive custom vocabularies be renamed by us, and
> if so, how?
> * How do we create and label new custom vocabularies?
> * How do we modify or add to existing custom vocabularies?

As of now, you dont have any control over those vocabularies, only
Christian has. Their name can be set directly into them, if you opened
the link above, you noticed the vocab_label: "Marinelive:Boats" bit.

Pundit can be configured with any number of vocabularies and they can
come from any part of the web. Indeed we also have a piece of software
named Korbo (currently not open for the public) which feeds pundit
with vocabularies. In this environment, a user is ideally able to
modify it, add or delete nodes, create hierarchies etc.

Another possibility is configure pundit to read from a file under your
control. This way you have the responsibility of keeping it correct,
but you would be able to update it and see the changes the next time
you load pundit.

> * How should we and PundIt handle C17th calendars
> (English old style; English new style, etc.)

Sorry to say: i'm absolutely clueless about "C17th calendars" and,
surprisingly enough, google ain't helping either!

Pundit is able to express dates only in a "precise" format, which is
YYYY-MM-DD. If you need to express an "imprecise" date ("middle age",
"spring 1919", etc) you will need anyway to express it as a precise
one.

For periods, you need to create two statements with the proper
predicates, specifying the start and end date, again as precise dates.

> * Relevance of of Freebase to MarineLives?
> * Relevance of DBPedia to MarineLives?

Those external databases can be switched off in the pundit
configuration. And new components can be developed and used to pull
Linked Data semantic informations from your provider of choice.

> * How create new subjects in drop down triple menu?
> * How create new predicates in drop down triple menu?
> * How create new objects in drop down triple menu?

The answer for the first two is very easy: you can not. The drop down
menu are meant just to help the user find the item they want to insert
in the statement.

For the subject: most of the time it will be a text fragment or an
image fragment, so to create a new one just select a piece of text or
click the icon which appears on an image when you move the mouse over
it, and add it to "my items" or annotate it directly.

For the predicates, it is tightly related to vocabularies. Pundit uses
the same format i was talking about earlier for expressing predicates.
So again it's a configuration, which means you can have the predicate
you want tailored exactly for your needs. We just need Christian to
change the .json files!

For the object, you can create new items from the drop down menu, but
only free text or dates. Other kind of items (persons, places etc) are
pulled from either vocabularies or linked data providers
(freebase-like). You can not create this kind of entities directly in
the pundit client.

I'm pretty sure my answers will not be enough, they might even be
totally out of track and more questions will follow, so don't hesitate
to ask more, i'll do my best to help you out.

Best,
Simone
Colin Greenstreet



Colin Greenstreet to Simon Fonda


Colin Greenstreet
30/10/13: 1:38 PM (3 minutes ago)

To Simone, Jill, Philip, William, Alex, Kai, Christian, Richard, Andrew

Hi Simone,

Many thanks for your speedy response.

Rather than firing back another set of questions or clarifications, let me work for the rest of the day and get back to you tomorrow.

Neverthless, I have already updated my rolling list of questions at: marinelives.wikispot.org/PundIt_MarineLives_Forum

For example, I have provided a link to a wikipedia article on new stile and old stile dating (http://en.wikipedia.org/wiki/Old_Style_and_New_Style_dates)

I have also added screen grabs to the wiki version of my questions, and further links, which should help you come up to speed on our use case.

If you are interested in the background on the High Court of Admiralty, see: http://marinelives.wikispot.org/Introduction_to_the_High_Court_of_Admiralty


Best wishes

Colin