The subtle relationship between machines and language has evolved over a reasonably long period but is now accelerating. Soon after the first computers were built, various abstract languages were formulated in order to link these machines’ inner mechanisms to processes coded by humans. The man-machine relationship has evolved dramatically since then, especially through the languages used both to instruct the machine and to relate to it. These languages now have a double role: meta-knowledge (language used for the functional description of processes) and content (language processed in various forms but in the end reduced to readable text). The digitalization of everything, by both institutions and private companies, is progressively producing impressive “corpuses” which, in their ethereal digital nature, can be goldmines for neural network software. There’s still very little awareness though of the sophisticated strategies that online giants are pursuing with a view to building advanced AI and creating new monopolies in strategic services. While Google continuously refines its knowledge corpus through the scanning and indexing of texts from the whole Web and all of the printed realm, Apple enhances the credibility (and the emotionality) of Siri in order to affectively engage users; and Facebook attempts to customize and shape our entertainment environment as no friend has ever done before. All this is based on data, and text and words are among the purest data (basic in structure and extremely rich in meanings) that can be used, once properly contextualized.
There’s a small selection of software “literature” in various formats that has almost no perceivable machine “accent” whatsoever. Tweetbots (software that algorithmically composes tweets according to certain strategies) for example, have to be very synthetic. Among the literary ones we find portmanteau_bot creating new portmanteau words every hour, including some quite interesting ones from time to time (a “portmanteau word” is the fusion of different words or parts of words, a term derived from Lewis Carroll’s Through the Looking-Glass). Or the ingenious @pentametron by artist Ranjit Bhatnagar, which searches for tweets in iambic pentameter and retweets them in rhyming couplets. The “human quality” expressed in these two examples can be clearly perceived, and even if some level of redundancy means they remain a bit “distant”, once one becomes familiar with their structures and outcomes, they become quite attractive. As in every interesting stream of content, we’re compelled to test the limits of our enjoyment against the next outcome, and when machines are involved this process can be endless.
Sarah Harmon, a PhD student at the University of California, takes a more complex approach. Her software FIGURE8 doesn’t write entire poems, but single similes, focusing on figurative language. It attempts to combine likeness with unexpectedness, basing its choices on a database of all the public domain stories that Harmon could find and then taking into account the compatible characteristics of their elements and associated actions. Here the results are more sophisticated, as the software has learned to deduce rules and started to display more autonomous behaviour; like using two or three adjectives in a row.
Again, it is “how” machines are being taught that makes the difference, together with the knowledge base that is used. Among these bases, ConceptNet, developed at MIT, is a freely available common sense one that supports many textual-reasoning tasks. It is used in Definitions by Brian Ma, an installation that incorporates 15 networked LCDs searching ConceptNet. Each screen displays what would look like random words, one at a time, in a sequence, restarting with a new sequence once the last word is displayed on the last screen. The relationships between the words are semantic, so sometimes they sound unexpected and can potentially be infinite, but it’s noticeable that the first and last words are always “people” and “money.” The work creates a specific narrative supported by human semantic and machine choices.
Poetry may seem easier to handle than longer writings, despite being much more dense in meaning. Fiction (and non-fiction, too) is longer and more “diluted” as a literary form, but has the same hard-to-formalize qualities, which we perceive while reading, but above all, one in particular: style.
In The Death of the Authors, 1941 edition, An Mertens and Femke Snelting (part of Constant collective) have taken an innovative approach in skilfully handling works on which the copyright has expired: they wrote generative software that generates a novel entitled The Death of the Authors based on texts by Virginia Woolf, James Joyce, Rabindranath Tagore, Elizabeth Von Arnim, Sherwood Anderson and Henri Bergson. The finished product is freely downloadable from a webpage once finished. All of the authors concerned died in 1941, so the copyright of their whole work expired on 1 January 2012. This literary gesture is purely the result of a remix totally governed by software. The respective authors return with a text that incorporates all of their individual styles as part of an unpublished, unexpected and ever-changing collaboration. In this context, “stylometry” is the scientific discipline of studying linguistic style to attribute authorship using statistical analysis, genetic algorithms and neural networks (again). So the technical infrastructure to simulate or create an author is already there and the gesture of programming becomes a political one that implies certain decisions that need to be taken, and social consequences too. This is important in the ongoing non-reversible process now taking place: humans are working hard to render the best written cultural products of the last few thousand years in a machine-digestible form that can be physically stored in a device no bigger than a USB stick. In turn, humans are nurturing machines to “learn” about the knowledge that is considered the height of human achievement, in order to create new knowledge. But it will be essential not to loose human-based criticism in the process and to shape this new knowledge into something that is relevant to humans themselves.