I Want to Make Digital Humanities Happen

Our unit on Reading Machines and Matt’s presentation has me excited about digital humanities. I didn’t know much about digital humanities practices previously. My understanding was that Moretti could use computers to prove that Hamlet is the main character of Hamlet and other such nifty trivialities. It seemed like kind of a cool-tool for other people to play with but I never thought of identifying myself with it or using the practices myself. For some reason it never occurred to me that one might integrate digital humanities techniques into a larger project that combines close and distant reading.

I have been thinking a bit about how I might try to incorporate some of the potentials of digital humanities into my own work. It seems there might be a lot of potential in modelling grammatical structures to develop an initial hypothesis of how various texts are oriented towards various material objects and towards the future. I could then combine that with a close reading and standard critical analysis on materialism/realism and futurity. Moretti on the grammar of instrumental reason and temporal continuity, significantly likely in Robinson Crusoe: “Past gerund; past tense; infinitive: wonderful three part sequence.” Now just need to figure out similar constructions and let the computer search the universe for me.

Certainly it seems like a great skill to have in the toolbox. Now just to learn how to actually make it happen…

Program (n): a plan or scheme of any intended proceedings (whether in writing or not)

“Any reading of a text that is not a recapitulation of that text relies on a heuristic of radical transformation,” writes Stephen Ramsay. Yes. so much yes. In part, this observation is what research into analog and digital procedural making (e.g. making things from others’ texts) hinges on. Even when we only use our eyes or other technologies of reading (like highlighters, or the boxing tool in Adobe PDF Reader), we are selecting texts and isolating words and phrases that appear to our minds to be the keystones of the text we are attempting to “understand.” We all have our programs of reading. Some of those programs are more involved or more constrained, or are more deliberate than others. For example, when I’m reading I almost always write down what I perceive of as key terms in the margins next to particularly meaningful passages. My key terms might not be exactly the author’s, and they might be mine but not anyone else’s key terms for the same article. As I look through the key term results for my data set for “machinic” I am struck by the knowledge that these key terms are author-generated, while the other data represent repeated terms as computer generated, and include things like “of the.” But when you think of the rhetorical moves necessitated by using “of the” a reading emerges. Someone is trying (at least in one portion of the thing) to compare or narrow. We need not know what is on either end, we only need to reflect on how such a combination of tiny words is used. Certainly the program finds what the eye is not capable of and the eye is interestingly fallible, but largely the programs Ramsay discusses and the program JSTOR has created to do this search for me are matters of scale and redirection. Either version — me scanning the results and coming to a reading just from that glance, or the programming we’re about to write to deal with these data — are programs of reading.

“Distant Reading” and the Complexity of Language

While I do believe that Ramsay does a good deal of work to prove how all manner of readings end up deforming their text(s), forming the “paratext” that a classmate described in another blog post, I still find myself unconvinced in attempting to understand how “distant reading” revolutionizes our understanding of how texts are experienced. This, of course, has a lot to do with my own academic training — I jumped ship from a PhD program in literature that loathed the push towards quantification and alignment with STEM disciplines as The Important Disciplines that is taking place all across universities (particularly state-funded ones).

My beef with distant reading has to do with the way in which it is so often employed, to produce word lists like these:
Screen Shot 2015-11-05 at 3.46.17 PM
To further-articulate this beef: literary texts may be nothing more than an aggregate of words, but these words are always shaped by their given grammatical context! Our experience of literature hinges on much more than the lexicon of a given text. I believe that some of Ramsay’s own textual examples prove this point; a text like the I Ching is embedded within a tradition of Chinese literature and philosophy that makes use of a near-infinite polysignifying discourse (a popular example of this, articulated in linguistics texts and courses, has to do with how the word [ma], in Chinese, can mean either “mother,” “hemp,” “horse,” or “scold,” depending on context and the affixed tone). Or, regardless of how Ramsay seems to situate him, an author like Faulkner is challenging not for the number of unique lexical items in his vocabulary but because his syntax is always threatening to overflow the prescriptive limits of written English grammar (or so I believe).

Given my training as a de Manian, always looking at the way metonymy is playing out in a given text, I would be interested to see how natural language processing might be used in reference to literary syntax in order to better-uncover the workings of given lexical items in context. Such an approach might still be deforming, but it would certainly be less-deforming than the word-list approach. How might a computer deal with something like the first sentence from Ben Lerner’s second novel, 10:04?:

The city had converted an elevated length of abandoned railway spur into an aerial greenway and the agent and I were walking south along it in the unseasonable warmth after an outrageously expensive celebratory meal in Chelsea that included baby octopuses the chef had literally massaged to death.

It’s true that octopi and the notion of their “proprioception” appear all over Lerner’s text, but how do these figures play out within what Lerner calls the “prosody” of his sentences? Do computers yet distant-read syntax, and could they ever do so within the competing discourses of Prescriptive Written English and the infinite spoken varieties of the English language?

Playing with Algorithms

I really enjoyed Ramsay’s Reading Machines, and I think he’s done a good job pointing at something that is really fundamental to working in the digital humanities: as he says, algorithmic criticism is “simply an attitude toward the relationship between mechanism and meaning that is expansive enough to imagine building as a form of thinking” (85). This reminded me of something that happened at a recent talk given by a prominent member of the video essay community. After talking for an hour about the possibilities of video essays and showing us wonderful examples of the critical tools that they could provide, a professor began the Q&A with the offhand assertion, “well play can never be criticism,” before going on to his question, which the speaker handled quite well. But the thing that many of us in the room remembered was that simple fact, that somehow there are prominent academics who honestly believe that play and criticism are fundamentally separate activities.

When it comes down to it, criticism is play, and play is criticism. The questions that the digital humanities force us to grapple with carry with them the fundamental understanding that this is the case, and it’s easy for us to forget that this is a somewhat heretical position. For my DFR I did something very simple: a while back I proposed a paper that tackled the question of contemporary animation and simultaneity, pointing out the “many things at once”-ness of a particular anime series, so I just searched for “animation” and “simultaneity” and got 408 articles that mention both. I’m looking forward immensely to playing with this data, if only because I know that simultaneity is something that comes up all the time in animation, but that it’s not often thought of as a primary mode for understanding the medium. Figuring out where simultaneity comes up and within what contexts could be a hugely powerful heuristic for my project. This is something so simple, but algorithmic criticism allows me to study it in a unique way using resources that would otherwise take months to sift through.

Which is exactly the kind of thing that the video essayist was trying to get at. Putting things side by side, mapping them out, playing with them, applying some sort of algorithm to them — these actions bring us back into an understanding of close reading and analysis, not push us further away, which is obviously something essential to the very concept of this class.

Distant Reading

Like many of you, I, too, am excited for today’s workshop and I’m interested to see how we can make use of the data we’ve gathered–well, we’ve asked JSTOR to gather for us, I guess. I’ve been interested in distant reading for a while now, and I’m excited to see how we can make use of it.

I suppose I should start by mentioning that I decided to do a query on something somewhat unrelated to this class. Perhaps I might have found more interesting patterns if I’d run a search for “social media” and “computation” and “literacy,” or something to that effect which would more closely relate to my final project. However, I’m working independently on the way the conventions of the novel construct the subject and how queer literature resists or complicates that construction. Anyway, I thought it would be interesting to see how “queer” and “subjectivity” related to each other, so those are the primary key terms in my search. In retrospect, I probably should have chosen some different key words to complicate the search and get some more interesting data, but really I just wanted a sample that I could use in today’s workshop and play around some more with later. Besides, I don’t know how to make heads or tails of what I just got back from that query anyway–but I’m sure this workshop will help!

Downloading my query data was fairly easy, though the first time I downloaded “citations only” and got a big list of citations which seems pretty useless. It didn’t take me very long to realize my mistake, though, and now I have a much more useful and seemingly informative file. I, too, found that my second query with the n-grams etc. took much longer than my first query, but I suppose that’s to be expected.

Here’s my theory on distant reading: I think it can be really useful and cool as a tool for further research inquiry–you use google’s distant reading tool to analyze searches/occurrences in Google Books of the word “marriage,” for example, and notice peaks and valleys. You try a second term, like “heterosexual,” and see correlation. Eureka! There’s something to investigate there! But sometimes, I think distant reading just leaves you at a dead end. Maybe you were hoping to find some cool pattern in keyword use, but it turns out that no such pattern exists. Maybe you were hoping to find a correlation between the rise in popularity of two terms in classical literature, but actually they don’t relate to each other at all. While finding patterns can lead to interesting questions for further research, I don’t know how useful distant reading is when it doesn’t reveal any pattern at all. Maybe that’s obvious but it’s what I tend to think about when I consider distant reading.

Distant Reading

I decided to do some searches on JStor that were more directly related to the final project that I have planned for this class, dealing with the work of John Keats.  Regarding this, I did (or think I did, at least), a simple search for John Keats.  This was after doing a couple of searches just to get used to the system, both of which yielded almost instant results.  My search for John Keats, however, has been on the burner for about 15-20 minutes now and has yet to yield any results, which is confusing to me.  I am still not entirely sure how the DFR works, but my assumption is that I did not place enough constraints on the search and, getting excited and searching for word-count and quadrams as well, I might be taxing the search process more than I intended to.  There is also a chance that my understanding of this is wrong, or that I haven’t even made a real query, so I will keep playing around with it to see what I can return.

I’ll admit, I had heard of distant reading before, but I never really knew what it was, or looked into it too deeply for that matter.  I am torn on the subject, and I look forward to discussing it more in class.  On one hand I have (consistently, I think) some type of ingrained opposition to machines doing work for us, whether logical or not.  I don’t like the detachment.  On the other hand, I love large amounts of data easily interpreted and analyzed without myself having to do a large amount of reading.

JSTOR DFR ok!

I’m very pleased to be investigating the DFR elements built into JSTOR. Adding and removing constraints seems much like constructing any regular JSTOR search. Reading Ramsay reminds me that the chart options, revealing disciplinary origin of texts and similar metadata, are themselves paratexts, intermediaries between the invention and application of constraints to a corpus and the more typical critical activity of interpreting the results. Ramsay writes of a word count table, for example, “The list is a paratext that now stands alongside the other, impressing itself upon it and upon our sense of what is meaningful” (12).

Perhaps it’s worth mentioning that JSTOR as a corpus has been methodically coded (thank you for that labor, someone!) so that its texts are legible to algorithmic search protocols. For some purposes, such as questions about which disciplines might be most engaged with certain topics, the “distant reading” of its data visualizations may of course be more informative than traditional language-driven engagement with specific texts.

I received my data promptly from JSTOR, perhaps because my query (basically: refugees, freedom of movement, and middle east) only produced 29 hits. I’m still a bit unclear about the meaning of the various n-grams, though I suspect the # symbol comes to stand in for one of my search terms? By the way, this ngram-viewer is a thing on Google Books:

NGRAM_View

Not that I suppose this is relevant to where we are headed: I see we are generating CSM (Comma Separated Values) data so that we might be able to manipulate it with python, to go beyond the algorithmic features currently built into JSTOR. I suspect this is where I am going to soon be pleasantly befuddled, or at least highly dependent on copying and pasting according to what I hope will be a very, very carefully scaffolded workshop.

I’m aware that the concept of difficulty is operative in this class, and as any good student of Mariolina Salvatori (formerly of Pitt English) I’m willing to encounter this cognitive-affective state. I would note, however, that the seminal Elements and Pleasures of Difficulty turned on transmuting  interpretive difficulty into generative writing,  included no computational examples. Perhaps the transposition of this concept to code-based learning merits more explicit theorization.

Ready to do some stuff

I’m excited about this workshop later, but I plan on doing a distant reading project for this class, so, yeah. I have my .csv file, and even figured out how to combine the 1000 .csv files into a single file. I had to use the command prompt, which is a big deal for me that I used it well (even if it took me way longer than it should). Now, I have to just figure out something to do with all of this information.

I have never done a distant reading before, though I took Steve Carr’s class and learned a bunch about it. I’m glad I’m going to actually try to do something this time around. So, I guess I’ve sort of gotten to know distant reading in a way that I know Marxist criticism or something like that. Yet, I do think distant reading is kind of a hybrid of a method and a sort of a theoretical lens. Sort of like it’s a school of theory but also kind of a nuts and bolts approach to reading at the same time. A weird sort of hybrid. But maybe all theories are sorts of hybrids like this? I guess there could be a case for that. Maybe it is a matter of degree or kind.

Anyway, Ramsay’s take on algorithmic criticism is, for me, a convincing reason why it is kind of both a method and a theoretical lens—which I find directly relatable to many of the conversations we’ve had as far as what computers do as well as how they might make us see the world differently: “If algorithmic criticism is to have a central hermeneutical tenet, it is this: that the narrowing constraints of computational logic…is fully compatible with the goals of criticism set forth above….such procedures can be made to conform to the methodological project of inventio without transforming the nature of computation or limiting the rhetorical range of critical inquiry. This is possible because critical reading practices already contain elements of the algorithmic” (16). These constraints are methodological, but the claim that critical reading practices “already contain elements of the algorithmic,” I think is a claim of how to view reality similar to how we have discussed previously that programming is world-making or how customer service was procedural for Bogost. I like this approach, and it feels like many of these readings we’ve done have had these sorts of tennis matches between computers as tools and as aesthetic or theoretical objects in themselves.

Distance and the Digital

Though I knew little to nothing about the concept of “distant reading” and the critical contexts from which it emerged, I spent much of my undergraduate career conducting what I now recognize as distant readings. I don’t recall how this preoccupation emerged — but I do remember obsessively scouring Chaucer concordances for all occurrences of “sigh” and “sick” (and its other Middle English iterations) to make arguments about Medieval lovesickness. I don’t recall the legitimacy of these arguments, or many specifics — but I do remember how refreshingly expansive the project was. I got to read widely and bring together scholarship in literature, writing, history, and medical anthropology. In some way, experimenting with this form of distant reading (even if I didn’t know it by name) gave me my first taste of cross-disciplinary research, which is something I am still invested in today.

I am slightly better informed in the way I conduct and represent research now — but despite my best efforts, I am often surprised by how many digital tools escape my knowledge. Not only do they present themselves to me in coursework (Agate? Help?), but also quite randomly in daily life. While avoiding work and chatting with an art history scholar at a Halloween party, I got an impromptu lesson on mapping technologies (which I should have taken notes on…). While sharing my latest research on the etymology of [ornithological] jizz with a non-academic interlocutor, he pulled up an n-gram for me and was surprised that I hadn’t already done so for my own research.

To that end, one of my Jstor DFR queries was “jizz ornithology” (I’m not sure why I chose that phrasing, and am curious as to what different versions of it would shore up — I am slightly afraid of just inputting “jizz,” but maybe I need to get over that fear for the sake of thorough research). Another bird-related single-word query I did enter was “alerion” (a footless mythological bird I have been able to find too little information on). Finally, I queried “infinite monkey theorem,” my best effort at a topic relevant to computation. (My current plan is to discuss it alongside the recently trending story of Betty, the Tweeting chicken.)

One of my enduring questions about distant reading is in how to keep up with the technology — knowing which digital tools will be most efficient, productive, and effective for the research I am trying to do.

Did I even submit a query?

First, let me take a moment to “chuckle” at the pun and response to the pun in Simula’s post.

I don’t quite know how I feel about distant reading. The majority of me is of the opinion that “yes, this is totally a valid way to analyze literature and is amazingly useful in conjunction with familiar close reading.” The other part of me is looking at the Agate tutorials and cookbooks and is wordlessly expressing confusion and doubt. Maybe the fact that the tutorial is working with something inherently more quantifiable than literature makes me wary of how well my experiment will work in tomorrow’s workshop.

I’m very uncertain about my queries. First, I tried to search for “frock,” and the results compiled very quickly. Then I searched for “dress.” The results were also quick, which makes me nervous. I wish I could think of something like “queer world making” or another delightfully specific phrase. I would believe in the results of such a query. As it stands, I’ll be interested in seeing if my results even form a table tomorrow.