Stemmaweb - a collection of tools for stemmatic analysis of texts

Stemmaweb is a set of tools that has grown out of the Tree of Texts, a CREA ("creative research") project funded by the KU Leuven. The tools were developed variously within the project, on behalf of the project by Shadowcat Systems, and in collaboration with the Interedition project. The source code for all tools and associated libraries is available on Github.

All tools are free for scholarly and nonprofit use and adaptation. Although some data may be viewed publicly without a user account, use of the tools with your own data is possible only by registering as a user. You may log in with a Google account or another OpenID account, or you may register with a local username and password for use on the site. The Tree of Texts project and KU Leuven retain rights to uploaded text traditions only insofar as it is necessary to store and back them up, display them according to the stated preferences, and analyze them with the tools provided and linked.

Tools available

At present the Stemmaweb tools comprise the following:

Uploader ("Add a new text tradition or section")

Any logged-in user may upload a text collation in one of several forms:

Spreadsheet collation (Excel file, CSV file, or tab-separated values). Witness sigla should appear in the first row, one per column; the text of each witness should occur in sequence in the appropriate column, with collated words/readings lined up according to row. CSV and tab-separated value files are assumed to be Unicode, in the UTF-8 encoding.
TEI XML, parallel segmentation format. Please see the documentation here for the expected format of the TEI file.
TEI XML, as exported from the Classical Text Editor tool. Please see the documentation here for some guidelines on how to prepare your CTE file for upload.
GraphML, as exported from the CollateX tool.

Once a tradition is successfully uploaded, you may change its name and its language, and choose whether others may view (but not edit) it.

Stemma editor ("Add a new stemma / Edit this stemma")

For any text tradition you own, you may associate one or more stemma hypotheses to the tradition. Currently the way to specify a stemma is in "dot" format, as documented within the interface for the "Add/edit stemma" buttons.

Stemma generation via Stemweb ("Run a Stemweb algorithm")

Given an uploaded collation, it is possible to generate a stemma hypothesis using one of several algorithms. This feature, Stemweb, is a service originally provided by researchers at the HIIT (Helsinki Institute for Information Technology) and now maintained by the DH group at the University of Vienna. The algorithms currently available include:

Phylip Pars: tree construction based on overall maximum parsimony
RHM: compression-based analysis
Neighbor Joining: tree construction based on sequential joining of nearest neighbours
NeighborNet: based on neighbor-joining, recommended for highly 'contaminated' traditions

The RHM algorithm requires a single parameter, "iterations", to specify the number of times the analysis should be run to arrive at a consensus model. A higher number produces a more certain result, but requires a longer time to run. A suggested minimum is 1000000 iterations.

All of these algorithms run "asynchronously" - when the owner of a tradition makes a request for a generated tree, the result can take some minutes (or even hours, depending on the complexity of the tradition) to return. When a request has been made, the owner can check its progress using the same button; if the stemma has meanwhile been calculated, it will be loaded. If the owner leaves the Stemmaweb site and the stemma is calculated meanwhile, it will appear when Stemmaweb is next used.

Generated stemmas are not oriented - they have no inferred root or origin. This means that, in order to examine the texual variants against a generated stemma (see "Stexaminer" below), a root must first be chosen. This can be done simply by clicking a node in the returned graph, and choosing the option to use that node as the root. Any stemma may be re-rooted any number of times.

Stexaminer ("Examine variants against this stemma")

This tool allows visualization of the variants within a text tradition, according to the selected stemma hypothesis. The stemma graph and the variant witness groupings are sent to a calculation service, provided by the Declarative Language and Artificial Intelligence research group of the KU Leuven, that attempts to determine for each variant location within the text:

Whether that location fits the stemma in a genealogical way
(If not), the minimum number of coincidental occurrences of a given reading
Whether any coincidental occurrence might be a reversion to an ancestor reading

For more information on this tool and the analysis behind it, please see the following paper.

Relationship mapper ("View collation and relationships")

The relationship mapper tool allows you to define the relationships between variant readings within your text. This is useful for, among other things, later stemma analysis - it allows classification of the sorts of variants that may or may not yield clues as to the history of the text. Please see the "Help/about" link at the top of the relationship mapper for more information about its use.

License

All source code for the Stemmaweb tools and user interface is open-source. The Perl libraries are governed by the Perl license; the remaining software is governed by the GNU General Public License.

Rights to all textual data uploaded to the Stemmaweb system are retained by its original owner. By uploading the data you assert that you have the right to use it, and you grant us rights to it insofar as it is necessary for us to store, backup, and display the data.