Accepting and Modeling Uncertainty

Views
11890
Downloads
21
Editorial Pre-Review
Kategorie
Artikel
Version
1.0
Michael Piotrowski Autoreninformationen

DOI: 10.17175/sb004_006a

Nachweis im OPAC der Herzog August Bibliothek: 1037072987

Erstveröffentlichung: 31.07.2019

Lizenz: Sofern nicht anders angegeben Creative Commons Lizenzvertrag

Medienlizenzen: Medienrechte liegen bei den Autoren

Letzte Überprüfung aller Verweise: 30.07.2019

GND-Verschlagwortung: Computerunterstütztes Verfahren | Digital Humanities | Modellierung | Unsicherheit |

Empfohlene Zitierweise: Michael Piotrowski: Accepting and Modeling Uncertainty. In: Die Modellierung des Zweifels – Schlüsselideen und -konzepte zur graphbasierten Modellierung von Unsicherheiten. Hg. von Andreas Kuczera / Thorsten Wübbena / Thomas Kollatz. Wolfenbüttel 2019. (= Zeitschrift für digitale Geisteswissenschaften / Sonderbände, 4) text/html Format. DOI: 10.17175/sb004_006a


Abstract

Unsicherheit ist eine Herausforderung für die Erstellung informatischer Modelle in den Geisteswissenschaften: wir können sie weder ignorieren, noch eliminieren, daher müssen wir sie modellieren – und benötigen dafür entsprechende formale Modelle. Solche Modelle existieren und werden in verschiedenen Bereichen praktisch eingesetzt. Ebenso gibt es aktive Grundlagenforschung zu Unsicherheit und ihrer Repräsentation in Mathematik, Philosophie und Informatik. Manche der dort entwickelten Ansätze könnten auch für die Modellierung von Unsicherheit in den Geisteswissenschaften interessant sein – aber es fehlt bisher eine Brücke, um diese Ansätze auf ihre Eignung in den Geisteswissenschaften überprüfen zu können. Wir vertreten die Ansicht, dass die digitalen Geisteswissenschaften Unsicherheit über projektspezifische Modelle hinaus betrachten muss; insbesondere müssen wir ein besseres Verständnis von Unsicherheit in den Geisteswissenschaften anstreben, um allgemeinere Ansätze für ihre Behandlung zu entwickeln.


This article aims to outline the challenge of uncertainty for the construction of computational models in the humanities. Since we can neither ignore nor eliminate uncertainty, we need to model it, and so we need computational models of uncertainty. Such models already exist and are being used in practical applications. There is ongoing fundamental research on uncertainty and its representation in mathematics, philosophy, and computer science. Some of these approaches may be suitable for modeling uncertainty in the humanities—but we are still lacking the »bridge« that could relate the uncertainty encountered in the humanities to such formal modeling frameworks. We argue that DH needs to go beyond project-specific models of uncertainty and consider uncertainty more generally; in particular, we must closely examine various types of uncertainty in the humanities and seek to develop more general frameworks for handling it.



1. Introduction

As the saying goes, »nothing can be said to be certain, except death and taxes.« Uncertainty is an unavoidable aspect of life and thus we have an intuitive understanding of uncertainty, but coming up with a strict definition of uncertainty is hard. Uncertainty is generally considered to be related to a lack of information (or ignorance) or imperfect information. Ignorance is used here in a non-pejorative sense; Smithson remarks that ignorance is usually treated »as either the absence or the distortion of »true« knowledge, and uncertainty as some form of incompleteness in information or knowledge.«[1] He notes that to »some extent these commonsense conceptions are reasonable, but they may have deflected attention away from ignorance by defining it indirectly as nonknowledge.«[2]

Predictions, such as about the weather, are generally uncertain, as we only have limited information about the future. But uncertainty does not only concern the future; the following statements could all be said to express uncertainty:

  1. I know Bob is married, but I don’t know the name of his spouse.

  2. I know Bob is married, but I don’t remember whether his wife’s name was Alice or Alicia (or was it Alison?).

  3. Jack told me that Bob’s wife is called Alice, but John said it was Alicia.

  4. Bob is about 30.

  5. Bob is 30 or 31.

  6. Bob is between 30 and 35 years old.

  7. Bob is a player.

Upon closer inspection, the uncertainty in these statements not only relates to different pieces of information, but it also takes different forms. For example, in statement 1, the name of Bob’s wife is completely unknown, whereas in 2 it is one from a set of names; statement 4 could be called vague, whereas in 5 a set, and in 6 a range of possible ages is given. In addition, not all options may be equally likely; in statement 2, for example, the speaker may consider Alice or Alicia more likely than Alison. In statement 3, uncertainty stems from conflicting information; whether the speaker considers one of the two options more likely may also depend on whether Jack or John is considered (or believed to be) more trustworthy. Finally, in statement 7, uncertainty about the meaning of the statement is caused by the lack of context and by the semantic ambiguity of the word »player.«

These examples are not intended to be exhaustive but rather to illustrate that uncertainty can have different causes, take different forms, and is related to other phenomena such as imprecision, vagueness, and ambiguity; it may also involve issues of belief and trust. These different types of uncertainty may thus have different consequences and may need to be addressed in different ways. Some cases of uncertainty may be resolved, or the uncertainty may at least be reduced; for example, we may be able to ask Bob about the name of his wife, or new information may allow us to narrow the range of ages in statement 6. Predictions about the future, on the other hand, will remain uncertain until the prediction can be compared to the actual outcome. Yet other cases of uncertainty are effectively unresolvable because the required information is inaccessible (such as other people’s thoughts, lost documents, or perfectly precise measurements) or nonexistent (e.g., when considering counterfactual questions such as »If John Wooden were alive and coaching in the NCAA today, would he be as successful?«), or because the criteria for deciding an issue are unknown or subjective. In fact, one may argue that in the general case, uncertainty can never fully be resolved, as we will never have perfect knowledge when we are dealing with the real world; Parsons notes that »any intelligent system, human or otherwise, is constrained to have finite knowledge by virtue of its finite storage capacity, so it will always be possible to find some fact that is unknown by a particular system.«[3]

Fig. 1: Smithson’s taxonomy of ignorance. [Piotrowski 2019, redrawn after Smithson 1989, p. 9.]
Fig. 1: Smithson’s taxonomy of ignorance. [Piotrowski 2019, redrawn after Smithson 1989, p. 9.]
Fig. 2: Smets’s taxonomy of imperfection. [Piotrowski 2019, drawn after Smets 1997.]
Fig. 2: Smets’s taxonomy of imperfection. [Piotrowski 2019, drawn after Smets 1997.]

Various taxonomies have been proposed that aim to systematize uncertainty and related concepts, for example by Smithson (see Figure 1), who considers it a sub-type of a broader concept of ignorance,[4] or by Smets, who uses imperfection as the general concept (see Figure 2).[5] Parsons discusses further taxonomies that have been proposed,[6] eventually coming to the conclusion that »while it is far from clear that the taxonomies that they provide are of fundamental importance, they do help to outline what uncertainty is.«[7]

As uncertainty is so pervasive, it obviously also affects research and scholarship; Pollack writes: »The uncertainty arises in many ways, and the nature of the uncertainty may change through time, but the scientific endeavor is never free of uncertainty.«[8] Humans are generally quite good at dealing with uncertainty in everyday life, for example with respect to the weather or to public transit. However, these estimations of uncertainty heavily depend on individual knowledge and previous experience, and are generally hard to communicate. For example, what should an out-of-town visitor make of a statement like »usually the train’s on time, but sometimes it’s delayed quite a bit in the morning, but I think you’re gonna be fine«?

As scholarly research strives for intersubjectivity, it requires transparency with respect to uncertainty; appealing to »common sense,« experience, or intuition is clearly insufficient. Mathematics and the natural sciences have developed intricate formal methods for dealing with (particular types of) uncertainty, which is probably one reason why »people who are not scientists often equate science with certainty, rather than uncertainty.«[9]

In the humanities, uncertainty is usually described verbally; one domain that describes (again, particular types of) uncertainty in a relatively systematic fashion is that of critical scholarly editions: critical apparatuses record illegible passages, uncertain readings, uncertain identifications of persons and places, and other cases of uncertainty pertaining to the edited sources. In addition, research in the humanities does not only need to deal with uncertain, vague, incomplete, or missing information, but also with an irreducible variety of positions (points of view, values, criteria, perspectives, approaches, readings, etc.), resulting in what Paul Ricœur calls »le conflit des interprétations.«[10]

Now, what about digital humanities? If digital humanities is the intersection (or at the intersection?) of computer science and the humanities, as is often said, what does this mean for the handling of uncertainty? For example, Burdick et al. argue that computing »relies on principles that are […] at odds with humanistic methods,«[11] and assert that »ambiguity and implicit assumptions are crucial to the humanities.«[12] »What is at stake,« they conclude, »is the humanities’ unique commitment to wrestle with uncertainty, ambiguity, and complexity«.[13] We believe that one cannot really answer this question without defining what one means by »digital humanities«. In the next section we will therefore first present our definition of digital humanities. As we will see, the concept of models is central to our definition; we will thus, in the subsequent section, outline our notion of models. In the following section, we will give a brief overview of the modeling of uncertainty in computer science and in digital humanities. In the final section we will conclude our discussion by outlining what we belief to be the specific challenges for digital humanities and what should be the next steps to advance the state of the art.

2. Defining digital humanities

2.1 Challenge

Kirschenbaum has argued that the »network effects« of blogs and Twitter have turned the term digital humanities into a »free-floating signifier«.[14] Perhaps it is not unique to digital humanities, but it is still a rather strange situation that a field tries to constitute itself around a marketing term rather than the other way round. The multifariousness of its understandings is succinctly summarized by Ramsay in the volume Defining Digital Humanities:[15]

»[...] the term can mean anything from media studies to electronic art, from data mining to edutech, from scholarly editing to anarchic blogging, while inviting code junkies, digital artists, standards wonks, transhumanists, game theorists, free culture advocates, archivists, librarians, and edupunks under its capacious canvas.«[16]

Some claim that a definition is no longer needed:

»I don’t think many people in DH care about definitions too much now. Thankfully the debates have moved on.«[17]

Others even go as far as to argue that it is impossible to know what digital humanities is:

»In closing, I will be as plain as I can be: we will never know what digital humanities ›is‹ because we don’t want to know nor is it useful for us to know.«[18]

We should pause briefly at this point and remind ourselves that the question is, in fact, neither what digital humanities is ontologically, nor how to exhaustively describe »the disparate activities carried on under its banner.«[19] The question is rather how we want to define it—what Carnap[20] called an explication. We also contend that it is not only »useful« to explicate, but crucial: the development of a research program, as well as the creation of academic positions, departments, and programs require a consensus around an explicit definition. How would one otherwise ensure the relevance and quality of research, the comparability of degree programs (and thus student mobility), or the adequate evaluation of research programs and thus their financing? And how would one want to cooperate with researchers from other fields?

2.2 Approach

We think the problem of defining digital humanities is unnecessarily exacerbated by confounding a number of related, but actually distinct issues. In short, we posit that any coherent field of research (regardless of whether one wants to consider it a discipline or not) is ultimately defined by a unique combination of (1) a research object and (2) a research objective.[21] Research methods constitute a third aspect, but only play a secondary role: research methods depend on the research object and the research objective, as well as on technical and scientific progress, which requires them to adapt and, at the same time, permits them to evolve. The research object and the research objective, however, remain relatively stable over time. We would also like to point out that disciplines have never used a single method: they always use a variety of methods. For example, while qualitative methods may certainly be considered »typical« for many humanities disciplines, quantitative methods have always been used as well.[22] This means that it is not useful to attempt to define digital humanities (or any other field or discipline for that matter) by way of the methods it (currently) happens to use, such as: »Digital Humanities is born of the encounter between traditional humanities and computational methods.«[23]

Despite the hype currently surrounding digital humanities, it is neither the humanities’ first nor only encounter with computer science. One notable example is computational linguistics. Linguistics is the study of human language; like any other field of research, it studies its research object by creating models of it. Computational linguistics has the same research object and the same research objective as »traditional« linguistics—the essential difference is that it creates computational models of language. Computational models have the important advantages that they are 1) formal and 2) executable, and can thus—among other things—be automatically tested against large amounts of actual linguistic utterances.

The construction of computational models of human language poses, however, a number of specific challenges that differ substantially from other modeling tasks in computer science, including related ones, such as the study of formal languages. Computational linguistics thus actually consists of two fields: applied computational linguistics, which creates formal models of particular languages, and theoretical computational linguistics, which serves as a kind of »metascience« for the former, studying the means and methods of constructing computational models in linguistics in general and providing the »building materials.« One could thus argue that applied computational linguistics is essentially linguistics, whereas theoretical computational linguistics is essentially computer science: it does not study human language, but rather computational issues in modeling human language.[24]

If we apply these considerations to digital humanities, we can define digital humanities in the following precise fashion:

  1. theoretical digital humanities: research on and development of means and methods for constructing formal models in the humanities, and

  2. applied digital humanities: the application of these means and methods for the construction of concrete formal models in the humanities.

We thus understand construction of formal models as the core of digital humanities, a view we notably share with authors such as McCarty, Meunier, and Thaller.[25] This is not surprising, as historically speaking, »computers came into existence for the sake of modeling.«[26] Consequently, models are the foundation for any serious computational work in digital humanities. To name just a few topics, the production of digital critical editions, visual clustering of artworks, historical network analysis, virtual archaeological reconstruction, or authorship attribution: all of this is only secondarily a question of computing power. Primarily, it is a question of modeling texts, artworks, buildings, relationships, authors, etc., and the findings about them in a way that can be meaningfully processed by computers. The models can take many forms, but some kind of formal model is the precondition for any type of computational processing; we will say a bit more about models in the following section.

3. Models

The construction of models is an everyday task; we construct models all the time. In his influential 1971 paper »Counterintuitive behavior of social systems,«[27] Forrester points out:

»Each of us uses models constantly. Every person in private life and in business instinctively uses models for decision making. The mental images in one’s head about one’s surroundings are models. One’s head does not contain real families, businesses, cities, governments, or countries. One uses selected concepts and relationships to represent real systems. A mental image is a model. All decisions are taken on the basis of models. All laws are passed on the basis of models. All executive actions are taken on the basis of models.«

Fig. 3: Original – model mapping. [Piotrowski 2019, redrawn after Stachowiak 1973, p. 157.]
Fig. 3: Original – model mapping. [Piotrowski 2019, redrawn after Stachowiak 1973, p. 157.]

The construction of models in the humanities is thus not per se new: all research, whatever the domain, is based on models. As in the case of uncertainty, we have an intuitive understanding of the term model, but it is surprisingly hard to come up with a good definition. We use the term in the sense of Stachowiak’s Allgemeine Modelltheorie.[28] The basic assumption is that arbitrary objects can be described as individuals characterized by a finite number of attributes.[29] Attributes can be characteristics and properties of individuals, relations between individuals, properties of properties, properties of relations, etc.[30] Modeling is then a mapping of attributes from the original (which can itself be a model) to the model, as illustrated in Figure 3. According to Stachowiak, models are characterized by three fundamental properties:[31]

Mapping property (Abbildungsmerkmal) Models are always models of something, namely mappings from, representations of natural or artificial originals, which can themselves be models.

Reduction property (Verkürzungsmerkmal) Models generally do not capture all attributes of the original they represent, but only those that the model creators and / or model users deem relevant.

Pragmatic property (pragmatisches Merkmal) Models are not per se uniquely assigned to their originals. They fulfill their replacement function

  1. for particular subjects that use the model,

  2. within particular time intervals, and

  3. restricted to particular mental or actual operations.

With respect to the mapping of attributes, three interesting cases shall be briefly mentioned: preterition, abundance, and contrasting. Preterite attributes are attributes that are not mapped from the original to the model; abundant attributes are attributes that do not exist in the original. Contrasting refers to the exaggeration of certain attributes in the model, typically to highlight certain aspects of the original.

Now, if all disciplines create models, the choice is not »whether to build models; it’s whether to build explicit ones.«[32] In contrast to much of the natural and engineering sciences, which tend to use—explicit and formal—mathematical models, models in the humanities are traditionally often only partially explicit and tend to be expressed informally using natural language.[33] The word formal means nothing more than »logically coherent + unambiguous + explicit.«[34] While there are different degrees of formalization, it should be clear that in the context of digital humanities we are ultimately interested in a degree of formalization that allows models to be processed and manipulated by computers, i.e., computational models. Traditional—informal—models in the humanities do not lend themselves to computational implementation as directly as mathematical models. Furthermore, research questions in the humanities are primarily qualitative rather than quantitative, which, too, has held back the full adoption of the computer as a modeling tool rather than just as a writing tool and a »knowledge jukebox.«[35]

4. Modeling uncertainty

If we accept that »being uncertain is the natural state of things,«[36] it follows that we need to consider uncertainty when constructing models. Furthermore, if we define digital humanities as the construction of formal models in the humanities, we thus also need to reflect upon how to formally (i.e., in a logically coherent, unambiguous, and explicit fashion) represent uncertainty as it occurs in the humanities. The formal representation of uncertainty in digital humanities should have two main objectives:

  1. to make uncertainty explicit, and

  2. to allow reasoning under and about uncertainty.

The following brief example may serve to illustrate these two objectives. Consider a database of historical persons—a common computational model. In some cases, there will be uncertainty with respect to the dates of birth and death of these persons. The uncertainty may take different forms; for example, only one of the dates may be known with some certainty. But it may also happen that both dates are unknown and the existence of a particular person is only inferred from some evidence indicating that they were alive at some point, for example by a document such as a contract.

Suppose the (database) model represented birth and death by a single date each (e.g., of the SQL DATE type): dates would all be represented in the same way, whether certain or uncertain, exact or approximate. Even if the procedure for mapping uncertain dates to a date in the model were documented somewhere, the uncertainty would not be represented formally and thus be inaccessible to the computer. The computer may thus respond to a query with an exact date such as 1291-08-01, but there would be no information about the certainty (nor the precision) of this date. The computer would also be able to carry out operations such as date arithmetic without any problems—but one date may in fact represent a range and the other just a guess. Taken at face value, this may lead to an »illusion of factuality,« which is obviously problematic, in particular if this information were to be used as a basis for further work, as the uncertainty would propagate.

If, however, the database used unstructured text fields to represent the dates, users could enter something like »between 1291 and 1295,«»late 13th century«, »probably 1291«, etc., or even describe how an approximate date was inferred, and thus preserve the uncertainty. The obvious downside is that such informal representations cannot be processed by the computer: neither could we perform queries or arithmetic on the dates (reasoning under uncertainty), as searching for »1291« will not find »late 13th century« and vice versa, nor could we perform operations on the uncertainty itself, such as a query for all dates of birth that are known to be exact to at least one year (reasoning about uncertainty).

4.1 Uncertainty in computer science

None of this is new. The problem of managing uncertain data has received much attention in computer science, motivated by numerous real-world applications that need to deal with uncertain data. For example, in information retrieval, the actual relevance of a document depends on the information needs of the user, which are only vaguely expressed in the query. Uncertainty also occurs with measurement data of all sorts, environmental sensors, or RFID systems due to technical limitations, noise, or transmission errors. It also occurs with biomedical data, e.g., protein–protein interactions, or when working with anonymized data. Data is obviously uncertain when it has been constructed using statistical forecasting, or when it is based on spatiotemporal extrapolation, e.g., in mobile applications.[37] The two most widespread approaches for uncertain databases are fuzzy databases and probabilistic databases;[38] uncertain graphs are increasingly used to represent noisy (linked) data in a variety of emerging application scenarios and have recently become a topic of interest in the database and data mining communities.[39]

Classic (i.e., symbolic) artificial intelligence (AI) is a second area in computer science in which extensive research on the representation of uncertainty and on reasoning under and about uncertainty has been carried out. This research was primarily motivated by the need to model uncertainty in knowledge representation (KR), which was in turn driven by the development of expert systems, starting in the early 1970s;[40] the objective of these systems was »to capture the knowledge of an expert in a particular problem domain, represent it in a modular, expandable structure, and transfer it to other users in the same problem domain.«[41] To accomplish this goal, research needed to address knowledge acquisition, knowledge representation, inference mechanisms, control strategies, user interfaces, common-sense reasoning—and dealing with uncertainty. A large number of formal methods have been proposed for managing uncertainty in AI systems.[42] Issues of uncertainty are obviously not limited to expert systems, but concern all artificially intelligent agents, from dialog systems to self-driving cars. »Modern« AI research now mostly focuses on deep learning;[43] however, symbolic representations recently gained renewed interest in the context of the Semantic Web,[44] where there is no central authority for resolving contradictions,[45] and issues relating to the trustworthiness of information.[46]

Computational approaches for dealing with uncertainty draw heavily on research in mathematics—in particular probability theory, statistics, and information theory—and logic, in particular epistemic logic; at the same time, many modern formal approaches in mathematics and logic are computational and directly motivated by requirements from KR.[47] Two specific examples are McBurney’s dialectical argumentation framework for qualitative representation of epistemic uncertainty in scientific domains, or the fuzzy model for representing uncertain, subjective, and vague temporal knowledge in ontologies by Nagypal.[48] Shannon’s information theory[49] measures information by the reduction of uncertainty by receiving a message, and uncertainty (i.e., a lack of information) by entropy. Even though very abstract, this theory has had a huge impact on the development of concrete information and communication systems. Information theory is still a very active field of research; of particular interest in this context is work on uncertain information,[50] where it is known that some piece of information is valid under certain assumptions, but it is unknown whether these assumptions actually hold. Also relevant is the extensive work on fuzzy sets and related concepts;[51] Zadeh later outlined a Generalized Theory of Uncertainty[52] aiming to integrate a number of different approaches. Another strand of research is based on the Dempster–Shafer theory (DST),[53] which also continues to produce a number of interesting approaches, such as the theory of hints.[54]

In epistemic logic, there are several approaches that explicitly take uncertainty and source trust into account, e.g., subjective logic, a probabilistic logic for uncertain probabilities. Uncertainty is preserved throughout the analysis and is made explicit in the results so that it is possible to distinguish between certain and uncertain conclusions. One interesting recent development is justification logic.[55] Justification logics are epistemic logics that allow knowledge and belief modalities to be made explicit in the form of so-called justification terms. There are also a number of extensions to basic justification logic; particularly interesting in this context is Milnikel’s logic of uncertain justifications, a variant in which one can formally express statements like »I have degree r of confidence that t is evidence for the truth of X«,[56] and generalizations to multi-agent justification logic, where multiple agents share common knowledge.[57]

4.2 Uncertainty in digital humanities

In the preceding section we have briefly (and eclectically) reviewed some of the research on the representation of uncertainty in computer science; the point of this overview was not to give readers a complete summary of the state of the art but rather to highlight that computer science is far from oblivious to the issue of uncertainty and that its modeling remains an area of active research in computer science as well as in mathematics and logic.

What is the state of the art in digital humanities? Searching the archives of the journal Digital Scholarship in the Humanities (DSH), the leading journal in the field (founded in 1986 under the name Literary and Linguistic Computing) for the term uncertainty yields (at the time of this writing) 87 articles.[58] We obviously do not claim this to be a thorough literature review, but we nevertheless consider this result to reflect, at least to some extent, the state of research concerning uncertainty in DH: on the one hand, there is definitely an awareness of the problem, also witnessed, for example, by this special issue and the preceding workshop at the Academy of Sciences and Literature in Mainz that prompted this paper. On the other hand, over a period of 32 years, the number of articles explicitly touching on the topic is not very high—and most only mention it in passing. Just to give an impression, here are some examples of digital humanities publications explicitly dealing with uncertainty (not limited to LLC/DSH):

  • In The Virtual Monastery: Re-Presenting Time, Human Movement, and Uncertainty at Saint-Jean-des-Vignes, Soissons, Bonde et al. discuss the representation of uncertainty in visual archaeological reconstructions.[59]

  • In Artefacts and Errors: Acknowledging Issues of Representation in the Digital Imaging of Ancient Texts, Terras studies the additional uncertainty introduced in paleography and epigraphy by digitization. She concludes that uncertainty is »an important issue to address when building computational systems to aid papyrologists. Unfortunately, encapsulating uncertainty in computational systems is not a straightforward process.«[60]

  • The paper Digitizing the act of papyrological interpretation: negotiating spurious exactitude and genuine uncertainty by Tarte discusses similar issues. The author stresses that an important aspect of digital papyrology, beyond the mere digitization of the artifact, is »to enable the digital capture of the thought process that builds interpretations of ancient and damaged texts«, i.e., to document the choices made by the editor in the face of uncertainty.[61]

  • Known Unknowns: Representing Uncertainty in Historical Time and On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges are examples of papers discussing the visualization of temporal and spatiotemporal uncertainty in timelines and maps.[62]

  • Uncertain about Uncertainty: Different Ways of Processing Fuzziness in Digital Humanities Data focuses on practical aspects, such as how to »ensure that such information [on persons and places] is added in a coherent way, while allowing the data to be vague or apparently contradictory.«[63]

Our general impression is that most papers discuss uncertainty as it occurs in the context of a particular research project, and in most cases aim to also solve the issues in this context. This is not surprising, as currently most publications in digital humanities belong to the applied digital humanities. One notable exception is the paper by Tarte already mentioned above, as it also discusses the modeling of uncertainty in more general terms. The author notes that »[c]apturing uncertainty is vital to the recording process«,[64] and explicitly bases her approach on theoretical frameworks, namely argumentation theory and theory of justification to »provide a formal, yet invisible, epistemological framework that allows us to point out inconsistencies without forbidding them.«[65]

The state of the art in digital humanities is perhaps best illustrated by the Guidelines of the Text Encoding Iniative (TEI), the de facto standard for digital critical editions. There are some provisions for modeling uncertainty in Chapter 21, Certainty, Precision, and Responsibility, which define »several methods of recording uncertainty about the text or its markup«.[66] For example, editors can use the elements and attributes provided to indicate that some aspects of the encoded text are problematic or uncertain, and to record who is responsible for making certain choices. The certainty element defined in this chapter may be used to record the nature and degree of uncertainty in a structured way, which allows encoders to express quite complex phenomena. To take an example from the chapter, one could express that in the passage »Elizabeth went to Essex; she had always liked Essex,« one thinks that there is a 60 percent chance that »Essex« refers to the county, and a 40 percent chance that it refers to the earl, and furthermore that the occurrences of the word are not independent: if the first occurrence refers to the county, one may decide that it is excluded that the second refers to the earl. However, we are not aware of any TEI application using these facilities, for which there are probably two main reasons: the complexity of the markup and, perhaps even more importantly, the fact that the TEI Guidelines leave it open how one may determine the »probability,«»chance,« or »certainty« of an interpretation. As Tarte points out, »quantifying uncertainty is always risky and usually presupposes that problems are complete, i.e. that all the alternatives to a given situation are known […], which is far from being the case in a papyrological context«,[67] and, one may add, neither in most other contexts in the humanities. Binder et al. also mention the additional challenge of communicating the uncertainty recorded in this way to human users.[68]

We are only aware of one framework in the wider field of digital humanities aiming for a more general modeling of uncertainty, the Integrated Argumentation Model (IAM) by Doerr, Kritsotak, and Boutiska.[69] The original motivation for IAM comes from the domains of archaeology and cultural heritage; consequently, IAM is intended as an addition to the CIDOC Conceptual Reference Model.[70] Instead of trying to quantify »certainty« (or quietly assuming that it can be quantified), the IAM approaches the problem as an argumentation process, during which hypotheses may be strengthened or weakened, similar to the approach used by Tarte.[71]

5. Conclusion

In this article we have tried to show the challenge of uncertainty for the construction of computational models in the humanities—digital humanities. It is clear that we cannot ignore uncertainty, and we cannot eliminate it either: we need to model it, and we thus need computational models of uncertainty.

Computational models of uncertainty already exist in various research disciplines and are being used in commercial and industrial applications. There is ongoing fundamental research on uncertainty and its representation in mathematics, philosophy, and computer science. Some of these approaches may also be suitable frameworks for computational modeling of uncertainty in the humanities—what is lacking, however, is the »bridge« that could relate the uncertainty encountered in humanities research to these formal modeling frameworks. What is missing in particular, is a systematic account of uncertainty in humanities research, which would aim to document causes for uncertainty, as well as its behavior, i.e., questions such as: Could this type of uncertainty be (in principle) resolved? What information would be required to resolve it? What happens when new information becomes available? How is it taken into account? And so on.

We would like to stress that the goal is not to come up with an answer to the ontological question of the »true nature« of uncertainty, but rather to find ways of modeling this omnipresent but still elusive phenomenon. We would also like to stress that it is unlikely that anyone will ever come up with a »grand theory of uncertainty« allowing for a single universal model—we recall Stachowiak’s pragmatic property of models. What may very well be possible, though, are more general modeling frameworks that provide researchers with the building materials required for constructing models of particular types of uncertainty in a particular domain.

As we have noted above, there are numerous approaches for the formal modeling of uncertainty, which differ in many respects. Smets notes: »Newcomers to the area of uncertainty modeling are often overwhelmed by the multitude of models. One frequent reaction is to adopt one of the models, and use it in every context. Another reaction is to accept all the models and to apply them more or less randomly. Both attitudes are inappropriate.«[72] We are not sure whether the (digital) humanities have even reached this point; in any case, there is no catalog of uncertainty modeling frameworks from which humanities scholars could pick a framework according to a set of criteria. Neither are there best practices in the humanities that could guide the selection; whether an approach that was successfully used in one digital humanities project can be transferred to another is not much easier to answer than the question of whether the methods used for dealing with uncertain environmental measurement data could be adapted to the domain of medieval manuscripts.

Intuitively, the kinds of uncertainty encountered in the humanities tend to differ in some respects from many more »traditional« applications, such as in engineering, meteorology, or demography; for example,

  • humanities data is often more like expert knowledge than measurements;

  • the amount of available data tends to be relatively small and often concerns singular, non-repeatable events;

  • the reasoning is rather evidential than predictive;

  • intercausal, counterfactual, and deductive reasoning may be of more importance;

  • phenomena similar to those known as selectively reported data and missing data in statistics may be more frequent;[73]

  • uncertainty is rather qualitative than quantitative, or at least hard to quantify;

  • belief is likely to play an important role with respect to conflicts of interpretation.

This is obviously just a very superficial assessment. All of these phenomena may occur in other fields as well—knowledge representation in classical AI comes to mind—, and there are certainly cases in the humanities that do not exhibit them. What we are trying to say is that uncertainty in humanities research likely has »typical« characteristics, just like uncertainty in, say, gambling and structural engineering is caused and influenced by different factors and thus exhibits particular characteristics; the two fields are also driven by different concerns, so some modeling approaches will be more pertinent than others.[74]

As Bradley notes, »[w]hen dealing with uncertainty, it often seems like the right approach is to try to quantify that uncertainty,« but, he stresses, »there are limits to what we can (and should) quantify.«[75] This certainly applies to many types of uncertainty in the humanities; mathematical and logical approaches to uncertainty are important as potential modeling and reasoning frameworks to target, but there is still the open problem of how (if at all) to quantify this uncertainty—which is not incidentally the main issue in the approach suggested by the TEI Guidelines (see above).

From this we conclude that we need theoretical digital humanities and theory formation to study questions such as what types of uncertainty do we find in the humanities? What are the specifics? What can be generalized? Building on our insights we can then evaluate approaches from mathematics, logic, and computer science and develop methods and recommendations for applied digital humanities. In other words: It is the task of the theoretical digital humanities to develop, in close dialog with the humanities disciplines, modeling frameworks and methods that can specifically address these challenges. Theoretical digital humanities is crucial for laying the theoretical groundwork for the development of models and methods appropriate for the humanities—instead of using »second-hand« methods originally developed for completely different purposes. This is, we believe, what Meister refers to when he characterizes digital humanities as »a methodology that cuts across disciplines, systematically as well as conceptually«.[76]

And this way, digital humanities can in fact play an important role for the transformation of the humanities within the larger digital transformation of society as a whole, and mean more than just »contemporary humanities«.


Footnotes


Bibliographic References

  • Charu Chandra Aggarwal / Philip S. Yu: A survey of uncertain data algorithms and applications. In: IEEE Transactions on Knowledge and Data Engineering 21 (2009), no. 5, pp. 609–623. [Nachweis im GVK]

  • Managing and mining uncertain data. Ed. by Charu Chandra Aggarwal. Boston, MA 2009. [Nachweis im GVK]

  • Sergei Artemov: Explicit provability and constructive semantics. In: Bulletin of Symbolic Logic 7 (2001), no. 1, pp. 1–36. [Nachweis im GVK]

  • Sergei Artemov: Justified common knowledge. In: Theoretical Computer Science 357 (2006), no. 1–3, pp. 4–22. DOI: 10.1016/j.tcs.2006.03.009 [Nachweis im GVK]

  • David M. Berry. twitter.com. @berrydm. June 2017. https://twitter.com/berrydm/status/877532752735764480. Tweet no longer available.

  • Frank Binder / Bastian Entrup / Ines Schiller / Henning Lobin: Uncertain about uncertainty. Different ways of processing fuzziness in digital humanities data. In: Proceedings of Digital Humanities 2014. (DH2014, Lausanne, 07.–11.07.2014) Lausanne 2014, pp. 95–98. [online]

  • Sheila Bonde / Clark Maines / Elli Mylonas / Julia Flanders: The virtual monastery. Re-presenting time, human movement, and uncertainty at Saint-Jean-des-Vignes, Soissons. In: Visual Resources 25 (2009), no. 4, pp. 363–377. [Nachweis im GVK]

  • Piero P. Bonissone / Richard M. Tong: Editorial: Reasoning with uncertainty in expert systems. In: International Journal of Man-Machine Studies 22 (1985), no. 3, pp. 241–250. [Nachweis im GVK]

  • Seamus Bradley: Uncertain reasoning. In: The Reasoner 12 (2018), no. 4, pp. 31–32. PDF. [online]

  • Rule-based expert systems: The MYCIN experiments of the Stanford Heuristic Programming Project: Ed. by Bruce G. Buchanan / Edward Hance Shortliffe. Reading, MA 1984. [online] [Nachweis im GVK]

  • Samuel Bucheli / Roman Kuznets / Thomas Studer: Justifications for common knowledge. In: Journal of Applied Non-Classical Logics 21 (2012), no. 1, pp. 35–60. [Nachweis im GVK]

  • William M. Bulleit: Uncertainty in structural engineering. In: Practice Periodical on Structural Design and Construction 13 (2008), no. 1, pp. 24–30. [Nachweis im GVK]

  • Anne Burdick / Johanna Drucker / Peter Lunenfeld / Todd Presner / Jeffrey Schnapp: Digital Humanities. Cambridge, MA 2012. [Nachweis im GVK]

  • Yi Cai / Ching-man Au Yeung / Ho-fung Leung (2012a): Fuzzy computational ontologies in contexts. Berlin et al. 2012. [Nachweis im GVK]

  • Yi Cai / Ching-man Au Yeung / Ho-fung Leung (2012b): Modeling uncertainty in knowledge representation. In: Fuzzy computational ontologies in contexts. Berlin et al. 2012, pp. 37–47. [Nachweis im GVK]

  • Rudolf Carnap: Logical foundations of probability. Chicago, IL 1950. [Nachweis im GVK]

  • A. P. Dawid / James M. Dickey: Likelihood and Bayesian inference from selectively reported data. In: Journal of the American Statistical Association 72 (1977), no. 360a, pp. 845–850. [Nachweis im GVK]

  • Definition of the CIDOC Conceptual Reference Model. Ed. by Nick Crofts / Martin Doerr / Tony Gill / Stephen Stead / Matthew Stiff. Paris 2011. PDF. [online]

  • Arthur P. Dempster: Upper and lower probabilities induced by a multivalued mapping. In: The Annals of Mathematical Statistics 38 (1967), no. 2, pp. 325–339. DOI: 10.1214/aoms/1177698950 [Nachweis im GVK]

  • Martin Doerr / Athina Kritsotaki / Katerina Boutsika: Factual argumentation - a core model for assertions making. In: Journal on Computing and Cultural Heritage 3 (2011), no. 3, pp. 8:1–8:34. [Nachweis im GVK]

  • Joshua M. Epstein: Why model? In: Journal of Artificial Societies and Social Simulation 11 (2008), no. 4. [online] [Nachweis im GVK]

  • Jay Wright Forrester: Counterintuitive behavior of social systems. 1995. PDF. [online]

  • Jay Wright Forrester: Counterintuitive behavior of social systems. In: MIT Technology Review 73 (1971), no. 3, pp. 52–68. [Nachweis im GVK]

  • Zoubin Ghahramani: Probabilistic machine learning and artificial intelligence. In: Nature 521 (2015), no. 7553, pp. 452–459. [Nachweis im GVK]

  • Aleksej Vsevolodovič Gladkij / Igor Aleksandrovič Mel’čuk: Elementy matematičeskoj lingvistiki. Moskva 1969. [Nachweis im GVK]

  • Susan Haack: Evidence and inquiry towards reconstruction in epistemology. Reprinted. London 2001. [Nachweis im GVK]

  • Olaf Hartig: Foundations of RDF* and SPARQL*: An alternative approach to statement-level metadata in RDF. In: Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web (AMW 2017). Ed. by Juan Reutter / Divesh Srivastava. Montevideo 2017. PDF. [online]

  • Olaf Hartig: Querying trust in RDF data with tSPARQL. In: The Semantic Web: Research and applications. Proceedings of the 6th European Semantic Web Conference. Ed. by Lora Aroyo / Paolo Traverso / Fabio Ciravegna / Philipp Cimiano / Tom Heath / Eero Hyvönen / Riichiro Mizoguchi / Eyal Oren / Marta Sabou / Elena Simperl. (ESWC: 6, Heraklion, 31.05.-04.06.2009) Berlin et al. 2009, pp. 5–20. DOI: 10.1007/978-3-642-02121-3_5 [Nachweis im GVK]

  • Applications of uncertainty formalisms. Ed. by Anthony Hunter / Simon Parsons. Berlin et al. 1998. [Nachweis im GVK]

  • Stefan Jänicke / Greta Franzini / Muhammad F. Cheema / Gerik Scheuermann: On close and distant reading in digital humanities. A survey and future challenges. In: Eurographics Conference on Visualization (EuroVis) – STARs. Ed. by Rita Borgo / Fabio Ganovelli / Ivan Viola. (EuroVis: 17, Cagliari, 25.–29.05.2015) Geneve 2015. PDF. [online] [Nachweis im GVK]

  • Janusz Kacprzyk / Sławomir Zadrożny / Guy De Tré: Fuzziness in database management systems. Half a century of developments and future prospects. In: Fuzzy Sets and Systems 281 (2015), pp. 300–307. [Nachweis im GVK]

  • Arijit Khan / Yuan Ye / Lei Chen: On uncertain graphs. In: Synthesis Lectures on Data Management 10 (2018), no. 1, pp. 1–94. DOI: 10.2200/s00862ed1v01y201807dtm048

  • Adam Kirsch: Technology is taking over English departments. In: New Republic. Article from 02.05.2014. [online]

  • Matthew G. Kirschenbaum: What is Digital Humanities, and why are they saying such terrible things about it? In: Differences 25 (2014), no. 1, pp. 46–63. [Nachweis im GVK]

  • Matthew G. Kirschenbaum: What is digital humanities and what’s it doing in English departments? In: Debates in the digital humanities. Ed. by Matthew K. Gold. Minneapolis, MN 2012, pp. 3–11. [online] [Nachweis im GVK]

  • Jürg Kohlas / Paul-André Monney: An algebraic theory for statistical information based on the theory of hints. In: International Journal of Approximate Reasoning 48 (2008), no. 2, pp. 378–398. DOI: 10.1016/j.ijar.2007.05.003 [Nachweis im GVK]

  • Jürg Kohlas / Christian Eichenberger: Uncertain information. In: Formal theories of information. Lecture notes in computer science. Ed. by Giovanni Sommaruga. Berlin et al. 2009, pp. 128–160. [Nachweis im GVK]

  • Florian Kräutli / Stephen Boyd Davis: Known unknowns: Representing uncertainty in historical time. In: Electronic visualisation and the arts. Ed. by Kia Ng et al. (EVA 2013, London, 29.–31.07.2013) Swindon et al. 2013, pp. 61–68. [online] [Nachweis im GVK]

  • Zongmin Ma / Fu Zhang / Li Yan / Jingwei Cheng: Fuzzy knowledge management for the Semantic Web. Heidelberg et al. 2014. [Nachweis im GVK]

  • Michael S. Mahoney: Historical perspectives on models and modeling. In: Scientific Models: Their Historical and Philosophical Relevance. (DHS-DLMPS: 13, Zürich, 19.–22.10.2000) Zürich 2000. [online]

  • Peter McBurney / Simon Parsons: Representing epistemic uncertainty by means of dialectical argumentation. In: Annals of Mathematics and Artificial Intelligence 32 (2001), no. 1–4, pp. 125–169. [Nachweis im GVK]

  • Willard McCarty: Humanities computing. Paperback. Basingstoke 2014. [Nachweis im GVK]

  • Jean-Guy Meunier: Humanités numériques et modélisation scientifique. In: Questions de communication 31 (2017), no. 1, pp. 19–48. [Nachweis im GVK]

  • Jan Christoph Meister: DH is us or on the unbearable lightness of a shared methodology. In: Historical Social Research, 37 (2012), no. 3, pp. 77–85. [online]

  • Robert S. Milnikel: The logic of uncertain justifications. In: Annals of Pure and Applied Logic 165 (2014), no. 1, pp. 305–315. DOI: 10.1016/j.apal.2013.07.015 [Nachweis im GVK]

  • Amihai Motro: Management of uncertainty in database systems. In: Modern database systems. The object model, interoperability, and beyond. Ed. by Won Kim. New York, NY 1995, pp. 457–476. [Nachweis im GVK]

  • Gábor Nagypál / Boris Motik: A fuzzy model for representing uncertain, subjective, and vague temporal knowledge in ontologies. In: On the move to meaningful Internet systems 2003: CoopIS, DOA, and ODBASE. Lecture notes in computer science. Ed. by Robert Meersman / Zahir Tari / Douglas C. Schmidt. (Conference, Catania, 03.–07.11.2003) Berlin et al. 2003, pp. 906–923. [Nachweis im GVK]

  • Keung-Chi Ng / Bruce Abramson: Uncertainty management in expert systems. In: IEEE Expert, 5 (1990), no. 2, pp. 29–48. DOI: 10.1109/64.53180.

  • Vít Nováček / Pavel Smrž: Empirical merging of ontologies—a proposal of universal uncertainty representation framework. In: The Semantic Web: Research and applications. Ed. by York Sure / John Domingue. (ESWC: 3, Budva, 11.–14.06.2006) Berlin et al. 2006, pp. 65–79. DOI: 10.1007/11762256_8 [Nachweis im GVK]

  • Simon Parsons / Anthony Hunter: A review of uncertainty handling formalisms. In: Applications of uncertainty formalisms. Ed. by Anthony Hunter / Simon Parsons. Berlin et al. 1998, pp. 8–37. [Nachweis im GVK]

  • Simon Parsons: Qualitative approaches for reasoning under uncertainty. Cambridge, MA 2001. [Nachweis im GVK]

  • Michael Piotrowski: Digital humanities: An explication. In: Workshop der GI Fachgruppe „Informatik und Digital Humanities“: Im Spannungsfeld zwischen Tool-Building und Forschung auf Augenhöhe – Informatik und die Digital Humanities. Workshop Proceedings. Ed. by Manuel Burghardt / Claudia Müller-Birn. Gesellschaft für Informatik e.v. (INF-DH, Berlin, 25.09.2018) Bonn 2018. DOI: 10.18420/infdh2018-07

  • Henry N. Pollack: Uncertain science … uncertain world. Cambridge 2005. [Nachweis im GVK]

  • Marc Pouly / Jürg Kohlas / Peter Y. A. Ryan: Generalized information theory for hints. In: International Journal of Approximate Reasoning 54 (2013), no. 1, pp. 228–251. DOI: 10.1016/j.ijar.2012.08.004 [Nachweis im GVK]

  • Stephen Ramsay: Who’s in and who’s out. In: Defining digital humanities. Ed. by Melissa Terras / Julianne Nyhan / Edward Vanhoutte. Farnham et al. 2013, pp. 239–241. [Nachweis im GVK]

  • Paul Ricœur: De l’interprétation: Essai sur Freud. Paris 1965. [Nachweis im GVK]

  • Donald B. Rubin: Inference and missing data. In: Biometrika 63 (1976), no. 3, pp. 581–592. DOI: 10.1093/biomet/63.3.581 [Nachweis im GVK]

  • Glenn Shafer: A mathematical theory of evidence. Princeton, NJ 1976. [Nachweis im GVK]

  • Claude E. Shannon: A mathematical theory of communications. In: The Bell System Technical Journal 27 (1948), pp. 379–432. [Nachweis im GVK]

  • Sarvjeet Singh / Chris Mayfield / Rahul Shah / Sunil Prabhakar / Susanne Hambrusch / Jennifer Neville / Reynold Cheng: Database support for probabilistic attributes and tuples. In: 2008 IEEE 24th International Conference on Data Engineering. 3 Vol. (ICDE: 24, Cancun, 07.–12.04.2008) Piscataway, NJ 2008. Vol. 2, pp. 1053–1061. [Nachweis im GVK]

  • Michael Smithson: Ignorance and uncertainty. New York, NY 1989. [Nachweis im GVK]

  • Philippe Smets: Imperfect information: Imprecision and uncertainty. In: Uncertainty management in information systems. Ed. by Amihai Motro / Philippe Smets. Boston, MA et al. 1997, pp. 225–254. [Nachweis im GVK]

  • Herbert Stachowiak: Allgemeine Modelltheorie. Wien et al. 1973. [Nachweis im GVK]

  • Dan Suciu / Dan Olteanu / Christopher Ré / Christoph Koch: Probabilistic databases. San Rafael, CA 2011. DOI: 10.2200/s00362ed1v01y201105dtm016 [Nachweis im GVK]

  • Ségolène M. Tarte: Digitizing the act of papyrological interpretation: Negotiating spurious exactitude and genuine uncertainty. In: Literary and Linguistic Computing 26 (2011), no. 3, pp. 349–358. DOI: 10.1093/llc/fqr015

  • Melissa Terras: Artefacts and errors: Acknowledging issues of representation in the digital imaging of ancient texts. In: Kodikologie und Paläographie im digitalen Zeitalter 2 / Codicology and palaeography in the digital age. Ed. by Franz Fischer / Christiane Fritze / Georg Vogeler. 4 Vol. Norderstedt 2010. Vol. 2, pp. 43–61. URN: urn:nbn:de:hbz:38-43429 [Nachweis im GVK]

  • Defining digital humanities. Ed. by Melissa Terras / Julianne Nyhan / Edward Vanhoutte. Farnham 2013. [Nachweis im GVK]

  • Manfred Thaller: Between the chairs: An interdisciplinary career. In: Historical Social Research Supplement 29 (2017), pp. 7–109. DOI: 10.12759/hsr.suppl.29.2017.7-109 [Nachweis im GVK]

  • Stephen E. Toulmin: The uses of argument. Updated edition. Cambridge et al. 2003. [Nachweis im GVK]

  • Lotfi A. Zadeh: Fuzzy sets. In: Information and Control 8 (1965), no. 3, pp. 338–353. DOI: 10.1016/s0019-9958(65)90241-x [Nachweis im GVK]

  • Lotfi A. Zadeh: Generalized theory of uncertainty (GTU) - principal concepts and ideas. In: Computational Statistics & Data Analysis 51 (2006), no. 1, pp. 15–46. DOI: 10.1016/j.csda.2006.04.029 [Nachweis im GVK]


List of Figures with Captions

  • Fig. 1: Smithson’s taxonomy of ignorance. [Piotrowski 2019, redrawn after Smithson 1989, p. 9.]
  • Fig. 2: Smets’s taxonomy of imperfection. [Piotrowski 2019, drawn after Smets 1997.]