Data Cleaning

Thinking about last week’s class, it seemed like it might be useful to share this resource on data preparation. We read excerpts from Trina Chiasson and Dyanna Gregory’s Data + Design in Dr. Langmead’s Digital Humanities seminar this term, and I found the chapters on “Getting Data Ready” to be both helpful and insightful. What do we do to data before we present it? Considering our conversation about the subjectivity of databases last week, I thought this might be worth a glance!

From Moretti to coffee…

fig-1

from Moretti’s Graphs, Maps, Trees

My first encounter with “distant reading,” beyond Roberto Busa and his famous concordance, occurred last summer in a digital humanities seminar I took as a Masters student. For the course, we divided up Franco Moretti’s Graphs, Maps, Trees: Abstract Models for Literary History, and focused individually on different sections of the book. As an initial exposure to distant reading, I must admit I found the visualizations in the book to be lackluster, and therefore was not convinced that distant reading was a useful analytical tool. I suppose I’m not convinced that taking a quantitative approach to literature is a “good thing”?

I am, nonetheless, excited to see what happens in our workshop tomorrow. I ran a query through the JStor DFR portal and downloaded a substantial CSV file. My query, “coffee,” yielded 1,000 results from a variety of journals. The usefulness of the results, of course, is not clear from the data aggregated in the CSV file as very few of the relevant hits are accompanied by an abstract, but I’m intrigued by the prospect of doing more with this…

A fortress without walls…

In embarking on my first ever venture with Dwarf Fortress, I must admit that I relied heavily on the Wiki. Unlike Machine, I loath the music that accompanies the game- it is annoyingly simple and melodic (apparently played on a 6-string guitar by Tarn Adams himself) and seems to taunt me as I toil over the intricacies of the game itself. Indeed, I turned my laptop’s volume off and appreciated the early 2000s hip-hop issuing forth from the coffee shop’s speakers instead.

So vintage Jay-Z accompanied me into the Dwarf Fortress. I opted to follow the Fortress Mode tutorial step-by-step, as I’m a complete beginner, and this guide offered the basics for building a “minimal fortress”. I’ve not only never played DF, but have also never played a “construction and management simulation” type of game, like DF or SimCity. I’ve only ever played in worlds where the virtual structures were already present, and my main goal was to guide the protagonist through a series of riddles.

So I feel like a complete novice. To add an extra dose of anxiety to this experience, I read that Dwarf Fortress is unique because: “unlike many games, the world that your game takes place in will always be procedurally randomly generated by you or someone else.” Whoa! This game is already making me feel disoriented! (Is DF a type of “metanovel,” as Wardrip-Fruin defined it? A computer program that tells stories that only a computer can tell?). Is this why I feel like I have zero control and keep pressing the spacebar to pause the game?

clubs

My sea of clubs…

Anyway, the Wiki provided helpful advice as I conjured up my first world– no aquifers and low savagery levels, please! I had to do a few searches before I found a suitable region, and then it took me an additional while to assign all of the tasks to my various dwarves (and avoid the useless stray cats and dogs and yaks). I could only find six dwarves in my virtual world, so I had to do my best with them. I’m not sure if the seventh was hiding somewhere, but I scoured the entire region and found no trace of him or her. With Ast, Tirist, Tulon, Urdim, Udib, and Rigòth as my faithful (sometimes) companions, I attempted to do some mining and channelling. The channeling seemed successful, as evidenced by a rectangle of upside down triangles. However, I couldn’t seem to successfully mine the area. Or, at least, there were no visual cues beyond the original blinking “+” signs to indicate that the area had been zoned for mining. Was the mining happening, and I just couldn’t see this process? After leaving the dwarves alone for five minutes to work on their invisible mining, I found that much of my region had been overtaken by white club symbols. It seems I must restart again…

Beyond being frustrated by the absence of visual queues to indicate certain processes (or being unaware of their presence because of my own stupidity), I was also annoyed by the fact that I couldn’t seem to successfully “save” my progress in the game. I had to restart my fort-building three times with three different sets of semi-cooperative dwarves because I had encountered obstacles or had had to temporarily shut down my computer because it was lunchtime and I had to walk home, etc.

I haven’t given up, but I also haven’t turned the volume back on.

Fiction out of context

I must admit that, as it is Tuesday afternoon, I’ve only read the first third of Noah Wardrip-Fruin’s Expressive Processing (2009). Thus far, I’ve attempted to absorb some definitions (“operational processing” and “ideology machine,” for example), and developed a better understanding of Joseph Weizenbaum’s ELIZA, the psychotherapist chat bot that Ian Bogost prompted me to engage with a few weeks ago. I realized, through Wardrip-Fruin’s explanation of the Eliza Effect, that I’m the type of user who provokes nonsense from Eliza. In other words, I’m an uncooperative collaborator. I want to coax her into unwittingly revealing her machinic, non-comprehending self. However, I’ve also realized that I underestimated Eliza. Reading about the Eliza Effect and its inherent weaknesses certainly made me gain a greater appreciation for Weizenbaum’s machine. As the author reveals, the machine’s flaws actually provide insights about the underlying system processes (as described on p. 38).

Implementation, Nick Montfort and Scott Rettberg

Implementation, Nick Montfort and Scott Rettberg

Conversely, I was particularly intrigued by how digital fictions confront the issue of the Eliza effect by explicitly revealing underlying mechanisms. Rather than try to perpetuate the illusion of seamless cognition and appropriate response (Eliza), these fictions embrace the notion of exposing the messy processes that occur during the creation and delivery of narrative. I was reminded of so-called cell phone novels when I read about Nick Montfort and Scott Retberg’s Implementation Project. As excerpts are printed and posted in random locations, the story unfolds in a compelling way…yet the intrinsic processes (of printing and pasting, for example) are very straightforward and immediately recognizable. Interestingly, the project has now been published as a book (2012) with a linear narrative that was intentionally absent from the original novel.

Anyway, let’s return to the cell phone novel, or “keitai shosetsu,” popularized in Japan. This novel, frequently distributed in text messages, represents an intimate type of digital fiction that, similar to Montfort’s project, is serialized and encountered without context. Just as models of the world are represented in video games, and Turing’s “imitation game” is still discussed in this regard (as reiterated by Wardrip-Fruin), cell phone novels are often like folktales in that they convey familiar narratives in new ways (I found this article, I *heart* Novels in the New Yorker to be helpful in understanding this genre of writing). As we discussed last week, some aspects of this new use of cell phones…or this skewing of our media ideologies may seem threatening to the “traditional” author, as these novels are often published anonymously or under pseudonyms. However, these novels are seen as an extension of oral storytelling rather than an intrusion on literature.

Perhaps I’ve strayed too far from computer games, but when I read Wardrip-Fruin’s call to action regarding moving beyond the rigidness of quest flags and dialogue trees, this struck me as one medium that seems to continually captivate users. However, these users are not necessarily responding to the novel–I’d be curious to learn more about how readers suggestions are incorporated into new additions to cell phone novels.

I must finish the rest of the book, but will also just mention this lecture that Wardrip-Fruin gave at UC- Santa Barbara because it is helping me better understand certain concepts as I read about them in the book: Saying it with Systems.

Real people make fake people

I really really appreciated having the opportunity to work in the new Digital Scholarship space last night. The Codeacademy.com tutorials have been helpful, and I do appreciate how the company simulates a community environment in their lively forums, but there’s nothing like sitting down with a group of people in the same room and trying to do something new together.

I am brand new to Twitter and honestly didn’t truly understand Twitterbots until we started writing our own Python code under the guidance of Matt Burton. For some reason I had this preconception that Twitterbots and Spambots were one and the same, and that they were inherently evil (yes I’m a bit of an alarmist, still, despite learning more about computers every day). Troubleshooting with a group of humans, exchanging notes face-to-face, and witnessing each other’s reactions to various steps of the process, made this exercise enjoyable and far more fruitful than some of my individual struggles with coding.

As regards thinking about what the Twitterbot of my dreams could potentially do, I am curious about its potential in the podcasting world. I wonder how it could generate suspense by Tweeting out excerpts of podcast stories, in sequence, in the lead-up to the actual full-length audio episodes being released? Hmm…I probably need to think through this a bit more, but there might be something there?

 

Persuasion through Spectacle?

As I read Ian Bogost’s book, Persuasive Games, I found myself subconsciously substituting “coding” for “gaming” quite frequently. So, for example, I read that “It is common…to equate videogame playing [CODING] with idle time” or that CODING (and gaming) are “easily denigrated as trivial” (Preface, vii). It seems that since coding has not yet found its place in my average “work day,” I often perceive it as belonging to those stolen moments in between one activity and the next. Is this how others view coding and gaming?

Eliza Chat bot

The Eliza Chat bot: a disappointing therapy session

In addition to drawing these parallels, I attempted to embrace Bogost’s thesis of persuasion as accomplished through procedurality. When Bogost described ELIZA, an early example of Natural Language Processing (I believe “her” program was written in 1973), I opted to try the Eliza Chat bot to get some sense of how these conversations ran on procedures (see image). My session with Eliza quickly revealed why she was/is called a “Rogerian psychotherapist,” as she expressed an unnerving degree of empathy and was constantly reaffirming my feelings by regurgitating phrases I’d previously entered. Of course, this type of therapy “logic” aligns well with the way subroutines in code are established, and the façade of language fluency quickly disappeared from this interaction. I had a few “conversations” with Eliza, and they all left me feeling frustrated and, in some way, inadequate.

Progressing beyond ELIZA and to the games themselves, I found Chapter 4: Digital Democracy, incredibly compelling. Looking for a form of effective expression and a “desirable possibility space for interpretation” (28), as described early in Bogost’s tome, I was intrigued by the macabre games simulating such events as JFK’s assassination, the September 11th terrorist attacks, and the Waco siege of 1993. These games require players to embody the roles of victim or assassin subjected to randomly-assigned circumstances. So, during one session, a player might escape the World Trade Center, but on another they may be forced to jump to their death. Although these games may create a type of “meaningful engagement with procedurality” (124), I’m still unsure of the purpose of this type of engagement. Are these games attempting to generate an even greater degree of empathy for the victims? Or are they satisfying some strange curiosity about these events, and feeding our somewhat disturbing appetite for spectacle? Speaking of which…Bogost writes about the spectacularity of JFK’s death in JFK Reloaded.

The Strappado. Courtesy of Wikipedia.

The Strappado. Courtesy of Wikipedia.

This immediately transported me to Michelle Foucault’s Discipline and Punish, and his descriptions of punishment as public spectacle. This, in turn, brought me to Francisco Goya’s etchings, Los Desastres de la Guerra or The Disasters of War, and Jacques Callot’s Les misères et les malheurs de la guerre or The Miseries and Misfortunes of the War. Both of these series are unique in their unapologetic depiction of war as horrific spectacle. What is the difference between Callot’s 17th-century engravings and Goya’s 19th-century etchings and Waco Resurrection, for instance? The game and Callot’s series are:

  • staged,
  • they place the viewer in the midst of the action and the various gruesome procedures of war,
  • and they adhere to a pre-defined sequence controlled by the creator.

Perhaps I’m oversimplifying things in order to make my argument work!

I’ll leave you with a couple of additional questions:

  • How can political campaigns incorporate “meaningful engagement with procedurality” when the outcomes are unknown? The examples provided in the book are of historical events defined by their outcomes…but how could Bernie Sanders, for example, incorporate procedurality in his current campaign tactics? Bogost’s examples (including Howard Dean) all pre-date Twitter and seem *somewhat* outdated, so I’m wondering what a game might look like now?
  • This is a reference back to the beginning of the book, but Bogost describes a videogame as a “medium,” in the sense that film and radio are media. Is “code” a medium?

Week 1: My Coding Origins

I do not have substantial experience with coding. I’m not exactly computer illiterate, but I’m also not comfortable with building something from the ground up. I exist in this somewhat precarious middle-ground where I can sometimes “get by” through a combination of code-borrowing, Lynda.com tutorials, and trial-and-error. But I do not feel as though I always understand what I’m really doing to reach my desired endpoint. For me, there’s still a significant chasm between what I want to do and how I go about achieving it (or at least achieving a semblance of it).

My coding “career,” in sum:

  • HTML: I started a slip-shod website in the mid-2000s that functioned to display my artwork and drawings in a fairly limited and not-entirely-pleasing way.
  • I’ve played around with Text Wrangler a bit, and was pleased to see it mentioned in Paul Ford’s piece, but was then worried that that automatically tagged me as someone that Ford wouldn’t like to hang out with.
  • I’ve installed and maintained sites using Drupal, which is fairly straightforward, but has never felt entirely “creative” to me, which sounds bizarre and probably says more about my own conception of creativity than anything else.
  • I took a Coursera class in Python but didn’t complete all of the work because I (a) didn’t prioritize the class and (b) didn’t have any real bearing on the context in which Python would be used. It felt like I was literally doing algebra in a dark void (represented by the black box in which I was typing script).

So here I am.

In relation to my experience of English studies, I have found that coding, or doing work in the digital humanities, has consistently required a level of concision and efficiency that I hadn’t previously encountered using the tools of pen and paper or Microsoft Office. Although there are obviously ways of incorporating parameters in writing (word or page limits, etc.), the physical tool does not enforce these restrictions in the latter examples. However, in coding I’m consistently pushed to limit words or decide what is really worth including or what is expected to be included within the boundaries of a webpage or a blog post or a Tweet (I’m asking myself about media ideologies, even as I write this post).