Friday, 17 October 2014

In which I am awed by the generosity of others, and have some worthy goals

A quick update from my CENDARI fellowship working on a project that's becoming 'In their own words: linking lived experiences of the First World War'. I've spent the week reading (again a mixture of original diaries and letters, technical stuff like ontology documentation and also WWI history forums and 'amateur' sites) and writing. I put together a document outlining a rang of possible goals and some very sketchy tech specs, and opened it up for feedback. The goals I set out are copied below for those who don't want to delve into detail. The commentable document, 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions goes into more detail.

However, the main point of this post is to publicly thank those who've helped by commenting and sharing on the doc, on twitter or via email. Hopefully I'm not forgetting anyone, as I've been blown away by and am incredibly grateful for the generosity of those who've taken the time to at least skim 1600 words (!). It's all helped me clarify my ideas and find solutions I'm able to start implementing next week. In no order at all - at CENDARI, Jennifer Edmond, Alex O'Connor, David Stuart, Benjamin ┼átular, Francesca Morselli, Deirdre Byrne; online Andrew Gray @generalising; Alex Stinson @ DHKState; jason webber @jasonmarkwebber; Alastair Dunning @alastairdunning; Ben Brumfield @benwbrum; Christine Pittsley; Owen Stephens @ostephens; David Haskiya @DavidHaskiya; Jeremy Ottevanger @jottevanger; Monika Lechner @lemondesign; Gavin Robinson ‏@merozcursed; Tom Pert @trompet2 - thank you all!

Worthy goals (i.e. things I'm hoping to accomplish, with the help of historians and the public; only some of which I'll manage in the time)

At the end of this project, someone who wants to research a soldier in WWI but doesn't know a thing about how armies were structured should be able to find a personal narrative from a soldier in the same bit of the army, to help them understand experiences of the Great War.

Hopefully these personal accounts will provide some context, in their own words, for the lived experiences of WWI. Some goals listed are behind-the-scenes stuff that should just invisibly make personal diaries, letters and memoirs more easily discoverable. It needs datasets that provide structures that support relationships between people and documents; participatory interfaces for creating or enhancing information about contemporary materials (which feed into those supporting structures), and interfaces that use the data created.
More specifically, my goals include:
  • A personal account by someone in each unit linked to that unit's record, so that anyone researching a WWI name would have at least one account to read. To populate this dataset, personal accounts (diaries, letters, etc) would need to be linked to specific soldiers, who can then be linked to specific units. Linking published accounts such as official unit histories would be a bonus. [Semantic MediaWiki]
  • Researched links between individual men and the units they served in, to allow their personal accounts to be linked to the relevant military unit. I'm hoping I can find historians willing to help with the process of finding and confirming the military unit the writer was in. [Semantic MediaWiki]
  • A platform for crowdsourcing the transcription and annotation of digitised documents. The catch is that the documents for transcription would be held remotely on a range of large and small sites, from Europeana's collection to library sites that contain just one or two digitised diaries. Documents could be tagged/annotated with the names of people, places, events, or concepts represented in them. [Semantic MediaWiki??]
  • A structured dataset populated with the military hierarchy (probably based on The British order of battle of 1914-1918) that records the start and end dates of each parent-child relationship (an example of how much units moved within the hierarchy)
  • A published webpage for each unit, to hold those links to official and personal documents about that unit in WWI. In future this page could include maps, timelines and other visualisations tailored to the attributes of a unit, possibly including theatres of war, events, campaigns, battles, number of privates and officers, etc. (Possibly related to CENDARI Work Package 9?) [Semantic MediaWiki]
  • A better understanding of what people want to know at different stages of researching WWI histories. This might include formal data gathering, possibly a combination of interviews, forum discussions or survey 

Goals that are more likely to drop off, or become quick experiments to see how far you can get with accessible tools:

  • Trained 'named entity recognition' and 'natural language processing' tools that could be run over transcribed text to suggest possible people, places, events, concepts, etc [this might drop off the list as the CENDARI project is working on a tool called Pineapple (PDF poster). That said, I'll probably still experiment with the Stanford NER tool to see what the results are like] 
  • A way of presenting possible matches from the text tools above for verification or correction by researchers. Ideally, this would be tied in with the ability to annotate documents 
  • The ability to search across different repositories for a particular soldier, to help with the above.

Friday, 10 October 2014

Linking lived experiences of WWI through battalions?

Another update from my CENDARI Fellowship at Trinity College Dublin, looking at 'In their own words: linking lived experiences of the First World War', which is a small-scale, short-term pilot based on WWI collections. My first post is Defining the scope: week one as a CENDARI Fellow. Over the past two weeks I've done a lot of reading - more WWI diaries and letters; WWI histories and historiography; specialist information like military structures (orders of battle, etc). I've also sketched out lots of snippets of possible functions, data, relationships and other outcomes.

I've narrowed the key goal (or minimum viable product, if you prefer) of my project to linking personal accounts of the war - letters, diaries, memoirs, photographs, etc - to battalions, by creating links from the individual who wrote them to their military unit. Once these personal accounts are linked to particular military units, they can be linked to higher units - from the battalion, ship or regiment to brigade, corps, etc - and to particular places, activities, events and campaigns. The idea behind this is to provide context for an individual's experience of WWI by linking to narratives written by people in the same situation. I'm still working out how to organise the research process of matching the right soldier to the right battalion/regiment/ship so that relevant personal stories are discoverable. I'm also still working out which attributes of a battalion are relevant, how granular the data will be, and how to design for the inevitable variation in data quality (for example, the availability of records for different armies varies hugely). Finally, I’m still working out which bits need computer science tools and which need the help of other historians.

Given the number of centenary projects, I was hoping to find more structured data about WWI entities. Trenches to Triples would be useful source of permanent URLs, and terms to train named entity recognition, but am I missing other sources?

There's a lot of content, and so much activity around WWI records, but it's spread out across the internet. Individual people and small organisations are digitising and transcribing diaries and letters. Big collecting projects like Europeana have lots of personal accounts, but they're often not transcribed and they don't seem to be linked to structured data about the item itself. Some people have painstakingly transcribed unit diaries, but they're not linked from the official site, so others wouldn't know there's a more easily read version of the diary available. I've been wondering if you could crowdsource the process of transcribing records held elsewhere, and offer the transcripts back to sites. Using dedicated transcription software would let others suggest corrections, and might also make it possible to link sections of the text to external 'entities' like names, places, events and concepts.

Albert Henry Bailey. Image:
Sir George Grey Special Collections,
Auckland Libraries, AWNS-19150909-39-5
To help figure out the issues researchers face and the variations in available resources, I'm researching randomly selected soldiers from different Allied forces. I've posted my notes on Private Albert Henry Bailey, service number 13/970a. You'll see that they're in prose form, and don't contain any structured data. Most of my research used digitised-but-not-transcribed images of documents, with some transcribed accounts. It would definitely benefit from deeper knowledge of military history - for a start, which battalions were in the same place as his unit at the same time?

This account of the arrival and first weeks of the Auckland Mount Rifles at Gallipoli from the official unit history gives a sense of the density and specificity of local place names, as does the official unit diary, and I assume many personal accounts. I'm not sure how named entity recognition tools will cope, and ideally I'd like to find lists of places to 'train' the tools (including possibly some from the 'Trenches to Triples' project).

If there aren't already any structured data sources for military hierarchies in WWI, do I have to make one? And if so, how? The idea would be to turn prose descriptions like this Australian War Memorial history of the 27th AIF Battalion, this order of battle of the 2nd Australian Division and any other suitable sources into structured data. I can see some ways it might be possible to crowdsource the task, but it's a big task. But it's worth it - providing a service that lets people look up which higher military units, places. activities and campaigns a particular battalion/regiment/ship was linked to at a given time would be a good legacy for my research.

I'm sure I'm forgetting lots of things, and my list of questions is longer than my list of answers, but I should end here. To close, I want to share a quote from the official history of the Auckland Mounted Rifles. The author said he 'would like to speak of the splendid men of the rank and file who died during this three months' struggle. Many names rush to the memory, but it is not possible to mention some without doing an injustice to the memory of others'. I guess my project is driven by a vision of doing justice to the memory of every soldier, particularly those ordinary men who aren't as easily found in the records. I'm hoping that drawing on the work of other historians and re-linking disparate sources will help provide as much context as possible for their experiences of the First World War.

Update, 15 October 2014: if you've made it this far, you might also be interested in chipping in at 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions.

Tuesday, 30 September 2014

It's here! Crowdsourcing our Cultural Heritage is now available

My edited volume, Crowdsourcing our Cultural Heritage, is now available! My introduction (Crowdsourcing our cultural heritage: Introduction), which provides an overview of the field and outlines the contribution of the 12 chapters, is online at Ashgate's site, along with the table of contents and index. There's a 10% discount if you order online.

If you're in London on the evening of Thursday 20th November, we're celebrating with a book launch party at the UCL Centre for Digital Humanities. Register at

Here's the back page blurb: "Crowdsourcing, or asking the general public to help contribute to shared goals, is increasingly popular in memory institutions as a tool for digitising or computing vast amounts of data. This book brings together for the first time the collected wisdom of international leaders in the theory and practice of crowdsourcing in cultural heritage. It features eight accessible case studies of groundbreaking projects from leading cultural heritage and academic institutions, and four thought-provoking essays that reflect on the wider implications of this engagement for participants and on the institutions themselves.
Crowdsourcing in cultural heritage is more than a framework for creating content: as a form of mutually beneficial engagement with the collections and research of museums, libraries, archives and academia, it benefits both audiences and institutions. However, successful crowdsourcing projects reflect a commitment to developing effective interface and technical designs. This book will help practitioners who wish to create their own crowdsourcing projects understand how other institutions devised the right combination of source material and the tasks for their ‘crowd’. The authors provide theoretically informed, actionable insights on crowdsourcing in cultural heritage, outlining the context in which their projects were created, the challenges and opportunities that informed decisions during implementation, and reflecting on the results.

This book will be essential reading for information and cultural management professionals, students and researchers in universities, corporate, public or academic libraries, museums and archives."

Massive thanks to the following authors of chapters for their intellectual generosity and their patience with up to five rounds of edits, plus proofing, indexing and more...

  1. Crowdsourcing in Brooklyn, Shelley Bernstein; 
  2. Old Weather: approaching collections from a different angle, Lucinda Blaser; 
  3. ‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections, Tim Causer and Melissa Terras; 
  4. Build, analyse and generalise: community transcription of the Papers of the War Department and the development of Scripto, Sharon M. Leon; 
  5. What's on the menu?: crowdsourcing at the New York Public Library, Michael Lascarides and Ben Vershbow; 
  6. What’s Welsh for ‘crowdsourcing’? Citizen science and community engagement at the National Library of Wales, Lyn Lewis Dafis, Lorna M. Hughes and Rhian James; 
  7. Waisda?: making videos findable through crowdsourced annotations, Johan Oomen, Riste Gligorov and Michiel Hildebrand; 
  8. Your Paintings Tagger: crowdsourcing descriptive metadata for a national virtual collection, Kathryn Eccles and Andrew Greg.
  9. Crowdsourcing: Crowding out the archivist? Locating crowdsourcing within the broader landscape of participatory archives, Alexandra Eveleigh; 
  10.  How the crowd can surprise us: humanities crowdsourcing and the creation of knowledge, Stuart Dunn and Mark Hedges; 
  11. The role of open authority in a collaborative web, Lori Byrd Phillips; 
  12. Making crowdsourcing compatible with the missions and values of cultural heritage organisations, Trevor Owens.