Author Archives: tla

An ontology for critical editions of variant text

In 2020 I had a paper accepted to the DH conference, concerning my attempts to create an ontology for the Stemmarest data model. The motivation for doing this was simply to see how far I could bring my work on Stemmarest in line with community norms for data modelling standards, and certainly in 2019 OWL vocabularies were all the rage.

The Stemmaweb model has been a graph model from the beginning, and it was a natural step in 2015 to move to a graph database to back it. Since we thought (and still think) about the problem domain in terms of texts, witnesses, and versions rather than storehouses of individual tiny facts, we didn’t seriously consider RDF as the backing graph model. Neo4J was our preferred solution in the end, because it provided the graph traversal and path finding algorithms we had needed from the beginning to do the validation of our data.

Since the pandemic wiped away the opportunity to present the work in Ottawa, and since the consequences of the pandemic also wiped away the time and energy I would have needed to present this work as part of the online event, I have published here the full abstract (which is, unaccountably, not available from the DH2020 website or book of abstracts) and a link to the ontology, bugs and all, that I had created as of July 2020. I have meanwhile continued this work after a fashion, which I hope to be able to talk about in 2023.


An explanation of the data model behind Stemmaweb

I post here the slides and the abstract for a presentation I gave most recently at the conference of the European Society for Textual Scholarship in 2019. This is intended as a guide to the data model and thinking that informs Stemmaweb and its tools.

This content (both abstract and presentation) are released under a CC-BY 4.0 license, so feel free to download and share!


What are the consequences for data modelling when we think of critical edition not as a document, but as a process? Our aim in this talk is to open a discussion on the difference between treating a critical edition as a text, and treating it as an intellectual endeavour whose result is a text.

The typical digital representation of a critical edition takes the form of a document, whether it is prepared with a word processor, the Classical Text Editor, LaTeX, or TEI-based tools and specifications. While these formats can certainly represent the features of a published critical edition, there is very little that makes explicit the editorial logic behind the product.

Here we will consider a different approach, adopted in the recently-concluded SNSF-funded project “The Chronicle of Matthew of Edessa Online”, in which the logic of edition is modelled not merely in the data format, but also in the associated computer code, embedding logic that allows the editor to define custom answers to question such as the following:

  • What constitutes a reading, in what context(s)? A lemma reading? A variant?
  • How should variants be classified? What implicit hierarchy, if any, does the editor’s classification scheme have and what are the implications?
  • How should the text be subdivided, and in what order(s) should these subdivisions be read?
  • What kind of information is carried within the text, and how can that be expressed?

Most crucially, the process model allows the answers to these questions to be enforced consistently within the project, with the useful side effect of compelling the editor to reconsider assumptions that turn out not to be adequate. The result, as we hope to demonstrate, is a digital critical edition that inherently captures, not only the resulting text, but also the intellectual process by which it is produced.



A HOWTO for using Stemmaweb

I have been asked for a guide to using the tools on Stemmaweb – there is documentation under ‘About/Help’, of course, but it would be useful to give an overview for where to start and how you go on from there. So this guide is meant to be an introduction, of sorts.

The first step is to create yourself a user account. You can do this by clicking ‘Sign In/Register’ at the top, where you will have three options:

  1. Use your Google account, if you have one.
  2. Use any other OpenID account, if you have one.
  3. Use an account created especially for Stemmaweb. To get one of these you must first click the ‘Sign in with Stemmaweb’ bar, and then follow the ‘Register’ link. Once you are registered you can sign in back on the ‘Sign in with Stemmaweb’ tab.

Once you have done one of these three things, you will find that Stemmaweb looks just the same, but the ‘Sign In’ link will be replaced with a greeting to you (or to your email address anyway), and you will see a new button:

Add Button

Now you can upload texts of your own to work with!

Stemmaweb operates on collated text. Someday, I hope, there will be an integrated collation tool that will do this first step for you, but as of today we are not there. So the first thing you need, if you want to work on Stemmaweb, is a collation of some text. You can provide this collation in, broadly speaking, one of three ways:

  • Do it yourself. Align the text in a spreadsheet, one witness per column, with the sigla in the first row. The spreadsheet can be saved as comma-separated format with the extension .csv, tab-separated format with the extension .txt, or as a Microsoft Excel spreadsheet (either XLS or XLSX).
  • Do it yourself, TEI style. Create a TEI parallel-segmentation file for your text and its witnesses. This is somewhat less recommended because there are a million different ways to apply the TEI guidelines and I have only had time and energy to support a few of them. PLEASE review these guidelines if you want to use this option.
  • Do it yourself, CTE style. If you have been preparing your witnesses in Classical Text Editor, you can export your work in TEI double-endpoint-attachment format, and Stemmaweb will make a best-effort attempt at reconstructing the witnesses from your apparatus. There are a lot of caveats about doing this; you can read more here. The upload may well fail due to some confusion encountered by the Stemmaweb parser; I am working on a tool to validate CTE input, but it is not yet generally available.
  • Get a collation tool to do it for you. CollateX is a good option for this; I recommend that you request CollateX results in its GraphML format, in order to preserve any detected reading transpositions. Do NOT use CollateX’s TEI-style parallel-segmentation output.

Now that you have a collation uploaded, if you click on that text in the list you will see some extra buttons. Let’s look at what each of them do.

Graph Viewer Button

This is where you should probably start. Clicking on this button will load your text into the ‘relationship mapper’; you will see the collation in the form of a graph, running in one direction from beginning to end. Each reading in the text is a node in the graph somewhere; each witness is essentially a single long string of these reading nodes, collecting its words from beginning to end. Wherever witnesses agree, they are ‘strung’ through the same node. Wherever they disagree, their ‘strings’ will diverge, and the variant readings will appear stacked roughly on top of each other in the graph.

The purpose of this tool is not only to look at the pretty picture made by your variant graph, but also to annotate the variants in relation to each other. Are the variant readings synonyms? Spelling variants? Link them as such. Has a word been shifted to a different location in one witness? Link the two occurrences of the word as a transposition. Do you think the variation in question is stemmatically significant? Make a note of it in the dialog when you create the link. More instructions for using the tool can be found here.

At the moment the available types of links are mostly limited to syntactical relationships between words. Someday, the users of Stemmaweb (that’s you, the scholar) will be able to define their own relationship categorization, but that day is not today. If none of the syntactical categories apply, you are very welcome to make liberal use of the ‘Other’ categorization and leave yourself a note in the ‘Annotation’ field.

Add Stemma Button

If you click this button, you will find a fairly arcane (but hopefully well-explained) way of defining a stemma for your witnesses. This is meant to be used for the definition of any stemma at all, so long as it has a root (an archetype) and doesn’t have a cycle (e.g. A->B, B->C, C->A. Unless you are working on the New Testament, you probably won’t have this.)

Once you add a stemma, there will be a button labelled ‘Edit this stemma’; that brings up the same stemma definition box, but with the stemma in question pre-loaded there. The left- and right-arrow buttons will allow you to page through the stemmata you have defined. There is no limit to how many you can define!

Stemweb Button

Stemmaweb is connected to the Stemweb service provided from the Helsinki Institute of Information Technology, which can run one of a variety of phylogenetic algorithms on a collated text tradition. If you click this button, you will get a dialog box that asks you which algorithm you want to run. Clicking on ‘What is this?’ will bring up a description of the selected algorithm. Some of the algorithms take parameters; if you choose one of these, you will be asked to fill in the parameter.

If you have marked up the relationships between variants in the graph viewer / relationship mapper, then you will also be able to discount selected categories of relationship, if you wish – for example, it is fairly common to want to disregard spelling variation, and this is the option that lets you do it.

The algorithms offered by Stemweb all return unrooted trees; depending on the algorithm you select and the size or amount of variation of your tradition, you may have multiple trees returned. (At present there is no good way to delete a tree, or to reorder the trees that are returned.)

An unrooted tree is not, by this definition, a stemma until it has been oriented by selecting a root. In Stemmaweb you can orient/root (or re-root!) a tree by clicking on the witness node that you wish to treat as the archetype, and selecting the green checkbox to ‘Use this node to root the stemma’.

Stexaminer Button

Now you have your text uploaded and marked, and you have your stemma hypothesis (or maybe several hypotheses) – you are ready to click the most exciting button!

The Stexaminer is a program and algorithm that was developed in concert with the DTAI group at KU Leuven; its job is to tell you, for each location in the text where variation occurs, whether that variation fits the stemma. In the case of stemmas without contamination / conflation this is pretty easy to calculate, but when you have traditions where contamination is known or suspected to have occurred, up to now it has been difficult to say for sure whether a given pattern of variation can be explained by the stemma. The Stexaminer can handle as complex a stemma as you care to throw at it.

Like the graph viewer / relationship mapper tool, the Stexaminer has its own help documentation for you to consult. The basic idea is that you can generate an overview of how well your stemma seems to match the textual evidence, and you can also drill down variant-by-variant to see which witnesses carry which reading. Where the pattern of variants does not match the stemma, the Stexaminer deduces where in the stemma the change might have been introduced, so that the number of ‘coincidences’ is kept to a minimum. It will also try to detect reading reversion – that is, when a scribe might have altered a reading in the exemplar to restore an ancestral reading. This is a highly experimental feature, and not one to rest a philological argument on without a lot of caution.

So there you have it! An overview of how Stemmaweb’s tools fit together and pointers to how you might use them. Go wild, and if something goes wrong, get in touch!

New home, new features

Along with my own recent institutional move to the University of Bern, I have taken the opportunity to organize a new home (and a new domain) for Stemmaweb. Now that the migration is finished I am very happy to announce two new major features:

These features are not yet as well documented as I would like, but up and running and ready to use. As ever, use of the Stemmaweb tools is free for any scholar anywhere, and if you have any questions or difficulties, don’t hesitate to contact me.

White Paper: Interoperability between Stemmatological Microservices

White Paper
Interoperability between Stemmatological Microservices

Tara Andrews1, Simo Linkola2, Teemu Roos2, Joris van Zundert3
  1. KU Leuven (BE)
  2. Helsinki Institute for Information Technology HIIT, University of Helsinki (FI)
  3. Huygens Institute for the History of the Netherlands, Royal Netherlands Academy of Arts and Sciences (NL)

7 May 2013

In 2012 two online tools for text stemmatology research were published to the web: Stemweb [source code available at], under development by members of the Helsinki Institute for Information Technology HIIT, and Stemmaweb [; source code available at] developed by the Tree of Texts project at KU Leuven in collaboration with members of the Interedition project []. Stemweb is a resource for many sorts of phylogenetic and other statistical analysis of a text, including the RHM and SemStem methods developed specifically for the case of recovering manuscript text stemmata. Stemmaweb is a complementary resource for the visualization, regularization, and analysis of the variation within a text using graph search methods, and allows the scholar to define as many hypothetical stemmata as she or he would like to explore. On 25-26 April 2013 a meeting was held, with funding generously provided by a Small Project Grant of the European Association for Digital Humanities (ALLC), to create an open blueprint for integration. Our blueprint is meant not only to be open to the two existing tools but also to provide a framework for interoperability with any other tool for stemmatological research that may appear in future. The blueprint is provided here in the form of a white paper; comments are invited from interested developers and scholars, and those received by 31 May will be taken under consideration in time for the first implementation.


Stemmatology is the study of how to derive computationally the copying relationships between ancient and medieval manuscripts. Various statistical and algorithmic approaches have been adapted from the field of evolutionary biology for this over the past few decades, such as maximum parsimony and neighbour joining; others, such as the RHM method and the more recent Semstem, have been developed specifically with the problem of text genealogy in mind. Many of the methods require the scholar to become familiar with software packages for evolutionary biology, and are in that sense not particularly approachable (or even, in the case of non-free software, not easily available.) One of the greatest advantages of Stemweb is precisely the collection of several applicable algorithms in one place, so that the scholar can use different methods on the same dataset without having to devise a different technical working process for each.

To allow practical application of and reflection on the various algorithms by as wide a community of scholars as possible, our aim is to provide both open GUI and API access to these tools. The web-based user interfaces allow the integrated facilitation of the various methods for stemmatology developed by different researchers in various locations. Founding this integration on the basis of (web) APIs will allow anyone who develops additional approaches to stemmatology to let their solutions be interoperable with the current published ones.

Where Stemweb provides a collection of algorithms for the creation of stemmatic hypotheses, Stemmaweb is a collection of tools designed to examine and analyse collated sets of texts, the relationships between variant readings, and the logical consequences of one or more stemmatic hypotheses. In this context Stemmaweb is a consumer of the hypotheses that Stemweb can generate.


The aim of this proposal is thus to allow any scholar to:

  1. have open web access to our current technology for stemmatology
  2. provide an open interoperable way to contribute stemmatological algorithms to the framework
  3. provide an open API that allows integration of stemmatological services in any web-enabled GUI

Throughout future stages of the project, we wish to continue an active engagement with the scholarly community concerning the direction and functionality of the ecosystem of tools for stemmatological and text-genetic research. As far as possible given the resources available to us, we will provide an open communication space for scholars to reflect, comment on, and participate in our work. We also intend to provide or collect  information on the properties of different stemmatological methods as well as online guidelines for their use and other documentation.

Request for Comments

At this point in time we have provisionally agreed on the primary APIs to connect and interoperate the Stemweb and StemmaWeb solutions. We present in this white paper the proposed protocol and data formats that will allow the interconnection of these two solutions. We have striven to keep this protocol for interoperability lightweight, extensible and open so that any developer, researcher, or contributor may scrutinize, comment and suggest changes and enhancements to this protocol.

Description of the proposed framework

The protocol proposed is based on the idea of microservices. These are very small web services, with a RESTlike API as far as possible. The API normally uses JSON as its means of exchange and communication between server and client. Microservices have been defined in the context of the Interedition project ( Individual microservices may be combined in various ways to drive the functionality of multiple web applications. The interoperability of Stemweb and Stemmaweb follows this model as depicted in Figure 1.

Figure 1: Microservices architecture

Figure 1: Microservices architecture

In this instance the Stemmaweb visualisation service will be a client of the microservice interface provided by Stemweb, sending collation data and receiving one or more stemmatic tree hypotheses in return. The aim is nevertheless to define an API whose ‘server’ functions can be implemented by any other microservice that provides a stemmatological algorithm, and the ‘client’ functions can be implemented by any consumer of such an algorithm.

Server-side API

  • Discovery: The server must provide a discovery resource, e.g.

  GET /algorithms/available

which will return a JSON response that lists the stemmatology algorithms available on the server, along with descriptions for display in a user interface and option parameters that are required or recommended for their use. The response takes the following form:

[ { model: argument
    pk: <ID in server database>
    value: <argument value’s type, see below>
    verbose_name: <human readable name of the argument>
    description: <longer description of argument’s behaviour>
  { model: argument
  { model: algorithm
    pk: <ID in server database>
    name: <human readable name>
    desc: <longer description of the algorithm>
    url: <link to original article, if available>
    args: [<list of argument pk’s for input arguments>]
  { model: algorithm

Valid argument values at present are:  positive_integer, integer, boolean, input_file, float and String.

  • Request: The server must implement an address to listen for incoming requests, e.g.

  POST /algorithms/calculate

The client should send the following request data to accompany the POST request:

{ userid: <email / ID of user>
  algorithm: <ID of algorithm>
  parameters: { ... }
  data: <string containing data in specified format> }

The server will indicate its response via appropriate HTTP status codes, e.g.:

200 OK (job was accepted and will now be processed
  { jobid: id }
400 Bad Request (e.g. request was malformed)
  { error: <error message> }
403 Forbidden (e.g. client not authorized)
  { error: <error message> }

The algorithms for stemmatology calculation can take an arbitrarily long time to run, and we therefore propose an asynchronous callback method for the return of results. For this reason the initial server response to a successful request will consist only of a job ID.

When the calculation is finished, the server must make a POST request to a location implemented on the client side with the results. The return of a 200 response implies a commitment that the job will run and the results, whether success or error, will be returned. See below, ‘Client-side API’, for more information.

  • Job status: The server should accept requests for job status, e.g.:

GET /algorithms/jobstatus?jobid=<ID number>
  { jobid: <id number>
    statuscode: < 1 = running / >1 = failed / 0 = success >
    *result: <data>
    *result_format: <format> }

The “result” keys should only be included, where applicable, if the job is no longer running. This is intended primarily as a fallback interface in case the server was unable to make the initial POST reporting results to the client as described above.

Client-side API

In addition to ensuring that requests to the server API are well-formed as documented above, a client to the stemmatological algorithm microservice must implement a URL to which the server can post results:

POST /stemmatology/result
 { jobid: <ID number>
   statuscode: 0
   result_format: <format>
   result: <data> }
 { jobid: <ID number>
   statuscode: >1
   result: <error message> }

 In case no results are received in a reasonable time, or it is otherwise suspected that the results of a job failed to be returned, the client may request a job status from the server as documented above.

Crediting issues

When building web services based on this framework, the author(s) should in each case ensure that the providers of the various solutions and components on which the services are built are given due credit – for instance, that a user using Stemmaweb will be notified that the stemmatological algorithms are provided by Stemweb. A general technical solution to the problem of “giving credit where credit is due” is beyond the scope of this white paper; comments on the issue are nevertheless welcome for the purpose of creating a suitable policy in the future. At this point we can only recommend that web service authors make liberal use of logos.

Implementation plan

We are presently releasing this white paper for comments on the Digital Medievalist mailing list (, Humanist (, and the Textual Scholarship blog ( Comments are invited on the initial phase until the end of May 2013. The teams at KU Leuven, HIIT, and Huygens ING will begin implementation of the functionality required by the framework and the API in June 2013, with the intent of releasing the complete and tested web services by the end of November 2013.

We hope particularly to solicit comments concerning the API from the point of view of potential future extensions of the services outlined above, such as other tools and resources (both front-end and back-end) that can be implemented in the framework. For instance, it will probably be beneficial to integrate a collation tool such as CollateX into the system, with seamless data storage or sharing between the microservices so that the user need not repeatedly upload and download his or her data in order to perform a full analysis cycle. Another desideratum might be integration with an informational resource such as the Parvum lexicon stemmatologicum [] produced by members of the Studia Stemmatologica group, which is a collection of definitions of terms used in stemmatology. The API should be able to accommodate any such extensions as seamlessly as possible.


Announcing Stemmaweb

The Tree of Texts project formally comes to an end in a few days; it’s been a fun two years and it is now time to look at the fruits of our research. We (that is, Tara) gave a talk at the DH 2012 conference in July about the project and its findings; we also participated in a paper led by our colleagues in the Leuven CS department about computational analysis of stemma graph models, which was presented at the CoCoMILE workshop during the European Conference on Artificial Intelligence. We are now engaged in writing the final project paper; following up on the success of our DH talk, we will submit it for inclusion in the DH-related issue of LLC. Alongside all this, work on the publication of proceedings from our April workshop continues apace; nearly all the papers are in and the collection will soon be sent to the publisher.

More excitingly, from the perspective of text scholars and critical editors who have an interest in stemmatic analysis, we have made our analysis and visualization tools available on the Web! We are pleased to present Stemmaweb, which was developed in cooperation with members of the Interedition project and which provides an online interface to examining text collations and their stemmata. Stemmaweb has two homes: (the official KU Leuven site) (Tara’s personal server, less official but much faster)

If you have a Google account or another OpenID account, you can use that to log in; once there you can view the texts that others have made public, and even upload your own. For any of your texts you can create a stemma hypothesis and analyze it with the tools we have used for the project; we will soon provide a means of generating a stemma hypothesis from a phylogenetic tree, and we hope to link our tools to those emerging soon from the STAM group at the Helsinki Institute for Information Technology.

Like almost all tools for the digital humanities, these are highly experimental. Unexpected things might happen, something might go wrong, or you might have a purpose for a tool that we never imagined.  So send us feedback! We would love to hear from you.

Report on ‘Methods and Means’ workshop

The workshop “Methods and means for digital analysis of ancient and medieval texts and manuscripts”, held on 2-3 April in Leuven and Brussels, was by any measure a resounding success.  We had 40-45 attendees on each day of the workshop; the good attendance resulted in some stimulating discussion after each of the paper sessions.

We began the first day with a set of papers on palaeography and manuscript digitization moderated by Juan Garcés. Ira Rabin (Berlin) presented cutting-edge work on the application of infrared imaging to the chemical identification (and therefore, in many situations, the provenance) of the ink used in medieval manuscripts. Daniel Deckers (Hamburg) followed up this contribution with a look at a range of methods for manuscript imaging, adding ultraviolet and multispectral methods to the infrared method proposed by Rabin. Ainoa Castro Correa (Barcelona) rounded out the session with a presentation of her database of Visigothic palaeography; a lively discussion followed all three papers.

The second session, moderated by Torsten Schaßan, saw a pair of presentations by Patrick Andrist (Fribourg) and David Birnbaum (Pittsburgh) on the topic of manuscript descriptions and cataloguing. Andrist proposed a cataloguing model for online (and print) use that is more suited than common current models for the accurate capture of information, including dating, for the different parts that might comprise an entire manuscript.  Birnbaum discussed the analysis techniques that he has applied to the catalogue descriptions of medieval Slavic manuscripts.

Session three, moderated by Tara Andrews (Leuven), focused on stemmatology – that is, the attempt to recover the history of transmission of a text based on the manuscripts that survive.  Jean-Baptiste Camps and Florian Cafiero (Paris) presented the techniques that they have developed to handle translations within a text tradition; Philipp Roelli presented a neo-Lachmannian method aimed at the automatic identification of Leitfehler, or ‘significant error’ that can be used to reconstruct a text stemma.

Session four, moderated by Aurélien Berra (Paris), concerned statistical and stylistic analysis of texts.  Armen Hoenen (Frankfurt) presented his research into creating a statistical model for scribal error and showed its application in the case of Avestan manuscripts.  Karina van Dalen-Oskam (Amsterdam/Den Haag) demonstrated the use of stylistic analysis applied to the Rijmbijbel of Jacob van Maerlant, not only to examine the ways in which a text was adapted by its various scribes but also to show the effect that modern edition has had.  The paper was followed by a lively discussion that returned to the theme of stemmatology and its uses in the cases of popular and fluid medieval texts such as the Rijmbijbel. The final paper, presented by Mike Kestemont and Kees Schepers (Antwerp), demonstrated the application of stylistic methods to distinguish distinct ‘voices’ in the collection ‘Ex epistolis duorum amantium’, which provides scientific support for the hypothesis that the letters did indeed have two authors.

The first day of the workshop was rounded out with a discussion, led by Joris van Zundert (Den Haag), on the nature of textual scholarship and whether there is any justification for non-digital text edition.  Participants were asked to take five minutes at the end of the discussion to write down what, in their opinion, were the most important points arising from it. A consensus developed over the course of the discussion that, while paper editions are still necessary, digital methods in a variety of forms have become indispensable to well-prepared text editions. There remains a great deal of question and debate, however, on the subject of publication forms, acceptance and use of standards for data formatting, and sustainability of the digital products of scholarship.

Day two of the workshop was hosted by the Royal Flemish Academy in Brussels. We began the morning with the fifth session, on existing databases for textual analysis and presentation, moderated by Karina van Dalen-Oskam. Eugenio Luján and Eduardo Orduña (Madrid) presented their work on a database of palaeo-Hispanic inscriptions, many of whose scripts remain undeciphered to the present day; the database raises a number of issues for encoding and representation of text that we do not yet have the ability to read.  Nadia Togni (Geneva) presented a database, BIBLION, centered on the representation and display of Italian “giant bibles” of the eleventh and twelfth centuries. Francesco Stella (Siena) gave an overview of the state of the art of digital publication, and presented the publication of the Corpus Rhythmorum Musicorum in this context.

We returned to the subject of stemmatology for session six, moderated by Joris van Zundert. Alberto Cantera (Salamanca) discussed the coherence-based model for ascertaining text genealogy as it applies to the tradition of Avestan religious texts, and Tuomas Heikkilä (Helsinki) discussed the transmission and readership of the Life and Miracles of St. Symeon Treverensis.

Sessions seven and eight, moderated respectively by Caroline Macé (Leuven) and Tuomas Heikkilä, looked at aspects of inter-texual analysis, taking us from scholarship of a single text or corpus to the investigation of relationships across disparate texts.  Charlotte Tupman (London) presented the work of the multi-institutional ‘Sharing Ancient Wisdoms’ project on tracing the provenance and transmission of gnostic sayings throughout medieval literature, including Greek and Arabic works.  Samuel Rubenson and Benjamin Ekman (Lund) presented their work on a database of the Apophthegmata Patrum (sayings of the church fathers) as transmitted throughout medieval Christian literature.  Linda Spinazzè (Venice) presented the Musisque Deoque project and discussed the ongoing research into intertextual aspects of their corpus of medieval Latin poetry up to the Renaissance.  Finally, Maxim Romanov (Michigan) discussed his work on the analysis of public sermons in the Islamic world, as reported in Arabic chronicles.

The organizers of the workshop (Caroline Macé and Tara Andrews) closed the event with a presentation of the Tree of Texts project, wherein we seek to derive an empirical model for textual transmission in the Middle Ages based on the statistical analysis of a variety of texts in several different languages.  It then remained only to thank the speakers and attendees for their enthusiastic participation. The workshop was an excellent showcase for the wide variety of analysis methods and techniques being applied to the study of medieval texts.

Registration and programme for 2-3 April workshop

We are happy to announce that the 2-3 April workshop, “Methods and means for digital analysis of ancient and medieval texts and manuscripts”, is now open for registration.  The workshop will take place at the Leuven Faculty Club on the first day, and the Royal Flemish Academy of Belgium (KVAB) on the second day. (Click on the links for specific directions to the venues.)

The workshop is sponsored by the Tree of Texts project, Interedition, the Royal Flemish Academy of Belgium, and the KU Leuven Faculty of Arts.  Thanks to the generosity of our sponsors, registration is free and includes coffee and sandwich lunch, but please be sure to register before 26 March in order that we have accurate numbers for catering.


By air: The best airport to fly into is Brussels National (Zaventem); from there, the easiest way to reach Leuven is by train.  Direct trains run about twice per hour and cost €5.40; the trip takes about 25 minutes.

By train: From the east / Germany, the best connection is probably via Liège; otherwise via Brussels.  Intercity trains between these two cities run every half hour and stop at Leuven.


The train station is on the east side of the city center, and the bootcamp and most hotels are walkable from there.  Otherwise you can get a taxi in the square in front of the station.


The second day of our workshop will be hosted by the Royal Flemish Academy in Brussels. We have arranged coach transportation from Leuven that morning; we will meet at 8:15 on the Martelarenplein, which is the large plaza outside the train station (thus across the street from those of you staying in the hotel La Royale.)


Your best best is to use a website like to find a hotel. There are several hotels in Leuven across the spectrum of price ranges; please do contact the organizers if you need specific advice.



2 April 2012 – Faculty Club, Leuven

9.00 – 9.30: Coffee and registration

9.30 – 10.00: Welcome (Frederik Truyen) + Introduction by the organizers (C. Macé / T.

10.00 – 11.30: Digitisation and Palaeography | Chair: Juan Garcès
Rabin, Ira – Ink identification to accompany digitization of the manuscripts
Deckers, Daniel – Special imaging techniques for reading palimpsests and damaged manuscripts
Castro Correa, Ainoa – The application of digital tools in the study of Visigothic script in


12.00 – 13.00: Cataloguing and Codicology | Chair: Torsten Schaßan
Andrist, Patrick – Electronic catalogues of ancient manuscripts: between the wishes of the
libraries and the needs of manuscript science
Birnbaum, David – Quantitative Codicology: An analysis of the formal descriptions of medieval Slavic miscellany manuscripts


14.30 – 15.30: Tradition and Genealogy I | Chair: Tara Andrews
Cafiero, Florian / Camps, Jean-Baptiste – Genealogy of Medieval Translations? The Case of the Chanson d’Otinel
Roelli, Philipp – Petrus Alfonsi or on the mutual benefit of traditional and computerised


16.00 – 17.30: Style and Statistics | Chair: Aurélien Berra
Hoenen, Armin – Letter similarity and ancient manuscripts – the meaning of
vowel/consonant awareness
van Dalen-Oskam, Karina – Authors, Scribes, and Scholars. Untangling the knot with
computational help
Kestemont, Mike / Schepers, Kees – Stylometric Explorations of the Implied Dual
Authorship in the Epistolae duorum amantium

18.00 – 18.45: Discussion session: Should textual scholarship be fully digital? | discussion prepared and led by Joris van Zundert

3 April 2012 – KVAB, Brussels

9.30 – 10.00: Welcome (Dirk Van Hulle) + Introduction by the organizers (C. Macé / T.

10.00 – 11.30: Primary Sources | Chair: Thomas Crombez
Luján, Eugenio / Orduña, Eduardo – Implementing a database for the analysis of ancient
inscriptions: the Hesperia electronic corpus of Palaeohispanic inscriptions
Togni, Nadia – BIBLION: A data processing system for the analysis of medieval manuscripts
Stella, Francesco – Digital models for critical editions of medieval texts and the Corpus Rhythmorum Musicum


12.00 – 13.00: Tradition and Genealogy I | Chair: Joris van Zundert
Alberto Cantera – The problems of the transmission of the Avesta and the tools for Avestan
text criticism
Tuomas Heikkilä / Teemu Roos – Tracing the medieval readers of Vita et miracula s.
Symeonis Treverensis


14.00 – 15.00: Inter-textual Analysis I | Chair: Caroline Macé
Tupman, Charlotte – Sharing Ancient Wisdoms: developing structures for tracking cultural
Rubenson, Samuel – A database of the Apophthegmata Patrum


15.30-16.30 Inter-textual Analysis II | Chair: Tuomas Heikkilä
Spinazzè, Linda – Intertextual research with digital variants in Musisque Deoque: a case
Romanov, Maxim – Public preaching in the Muslim World (900-1350 AD)

16.30 – 17.15: Discussion session: How many types of “textual scholarship” (new, old,
historical, artefactual, genetic…)? | discussion prepared and led by Caroline Macé

Closing of the workshop by the organizers (C. Macé / T. Andrews)

Workshop: Methods and means for digital analysis of ancient and medieval texts and manuscripts

Leuven, 2-3 April 2012 

This workshop aims at mapping the various ways in which digital tools can help and, indeed, change our scholarly work on “pre-modern” texts, more precisely our means of analyzing the interrelationships between manuscripts and texts produced in the pre-modern era. This includes the history of textual traditions in a very broad sense, encompassing several fields of research, such as book history, stemmatology, research on textual sources, tracing of borrowings and influences between texts, etc.

We welcome researches in any field of textual scholarship carried out on any ancient or medieval textual tradition in any language (Latin, Greek, “vernacular” / “oriental” languages…), using computer-aided methods of analysis.

Possible topics are: stemmatological analysis of manuscript traditions, digital palaeography / codicology, analysis of relationships between texts, textual history, textual criticism…

This workshop is seen as complementary to the Interedition ‘bootcamp’ to be held in Leuven in January 2012 (see for more information).

To participate in the workshop, please submit a short abstract (preferably in English) (300-500 words) to Tara Andrews ( by 15 December 2011. As we seek to encourage the participation of early-stage researchers (PhD students or post-doctoral researchers), a limited number of bursaries are available to cover travel expenses. If you wish to apply for one of these, please submit an additional statement motivating your application (main criteria are importance of this workshop for your current research and absence of other possible funding). Abstracts and applications for bursaries will be evaluated by the scientific committee. The result of this evaluation will be made known by 1 February.

The language of the workshop is primarily English, but we may consider other languages.

Please note that we intent to publish the papers presented at this workshop as a book. If your abstract is accepted, you will also receive some guidelines for the publication.

The Tree of Texts project is a CREA (“creative research”) project (3H100334), funded by the KU Leuven from 1/10/2010 to 30/9/2012) with Caroline Macé as promoter and Tara Andrews as main researcher. The project is focused on the field of text stemmatology, and the aim is to arrive at an empirical model for variation in medieval text traditions.

The goal of Interedition <> is to promote the interoperability of the tools and methodology used in the field of digital scholarly editing and research. Equally, Interedition seeks to raise the awareness of the importance of sustainability of the digital artifacts and instruments we create.

Tara Andrews (K.U.Leuven), Aurélien Berra (Université Paris-Ouest), Thomas Crombez (Universiteit Antwerpen), Juan Garcès (Göttingen Centre for Digital Humanities), Tuomas Heikkilä (University of Helsinki), Caroline Macé (K.U.Leuven), Torsten Schaßan (Herzog August Bibliothek Wolfenbüttel), Frederik Truyen (K.U.Leuven), Dirk Van Hulle (Universiteit Antwerpen), Joris van Zundert (Huygens Institute).

Upcoming events: bootcamp and workshop on text analysis

The Tree of Texts project will be hosting two events in 2012 focused on research, methods, and tools for the analysis of classical and medieval texts.  The first event, a development bootcamp, will run from 11-14 January 2012.  The second, a workshop targeted at scholars and researchers, will take place in early April 2012.  Both of these will be funded in whole or in part by the COST action Interedition. Stay tuned for calls for participation, and information on bursaries, for both events.