I was pleased to work on a topic modeling project in Thursday’s workshop, as I have been learning about and practicing topic modeling for Alison Langmead’s digital humanities seminar this semester. In my own project, I am using the command line interface tool Mallet with a poetry corpus, experimenting with the tool’s ability to model figurative language. Any suspicion that the quantitative methods of topic modeling replace more humanist hermeneutics has been more or less erased as I have grappled with the interpretative questions necessary in “training” the model. Throughout the process I have also returned again and again to the question that is often asked about algorithmic textual analysis: is this generating anything meaningful? In my own project, I truly can’t say yet. In Thursday’s workshop, the example provided us with a list of topics that demonstrate what I think may be a more useful application of topic modeling.
I have been inspired by Lisa Rhody’s assertion that “topic modeling poetry works, in part, because of its failures.” (She also provides a very nice produce-based analogy to explain how topic modeling works; I find food to be a helpful point of entry for any subject.) Rhody’s explanation of her use of topic modeling in studying ekphrastic poetry, for me, echoes Ramsay’s statement that “in literary criticism, as in the humanities more generally the goal has always been to arrive at the question.” The models provide alternate means of exploration, not answers. Between my own project and our workshop, I have been able to imagine some scenarios in areas more closely related to my actual research in which topic modeling (or potentially other data mining methods) could provide a useful way to begin asking questions about a large body of work. But again, I see this merely as a way to get started. When it comes to the digital humanities, one of my concerns is whether or not there is too much of a learning curve or technological barrier to tools that simply allow us to prepare data for interpretation. It does seem that this work requires a dedication to computational methods themselves, beyond dedication to the interpretation of the content, which may be a (very reasonable) barrier for many researchers.