Some Thoughts On Interactive Storytelling

Recently, Chris Crawford announced that he was ending the pursuit of his lifelong goal of at least inspiring a new generation of developers to pursue the vision of interactive storytelling that he had been trying to kickstart for decades. I had been passively following Crawford for some time, eagerly awaiting another chance to try out the Storytron engine, which the recent open source release has finally given me a chance to do. The end of his pursuits saddened me a bit, in large part because he is ending his time working on interactive storytelling without even a fraction of the attention that his exit from the games industry brought over two and a half decades ago. However, what was said got me thinking and, while I have yet to read through all of his findings and thoughts on his vision, I have begun to formulate some opinions on the subject, which I am approaching from the view of a game designer and someone who critiqued games for a living for four years, rather than that of an interactive fiction writer. I feel that approaching the topic from this viewpoint is important, in large part due to Crawford’s struggles to make his more “gameplay”-heavy ideas fun.

Crawford’s vision is notoriously AI-driven. The way that I’ve interpreted it is that there’s a world of AI actors that can interact with and influence each other. The player interacts with those actors and they respond based on various traits/qualities/what-have-you, including what they may have heard about interactions that the player has had with other actors. Player choices are important and the effects of said choices affect the narrative greatly. Perhaps most importantly, however, the narrative isn’t particularly set in stone. The author defines scenes and the narrative emerges from the way that the player interacts with those scenes – this distinction is important because it’s what separates the output from that of similar, simpler games. The overall gameplay is largely socially-oriented, which makes sense, as Crawford has long held that games don’t challenge players’ social intelligence nearly enough. The way he’s gone about reaching this goal over the years has been increasingly choice-based and firmly rooted in a tradition of largely text-based systems.

My immediate question is “how does this (incredibly complex concept) improve the player experience?” and my immediate answer is that it doesn’t. There are a number of reasons that it doesn’t, many of which are explicit design decisions that Crawford has made over the years and some that I gather just can’t be done at present due to the insane level of complexity that they imply. However, for the sake of this essay, I am going to distill them down to three key problems.

Problem 1: The choice to continue to use a text-based system

Before I proceed, I want to be clear: There’s nothing wrong with text-based systems. I simply don’t believe that they are the right choice for what Crawford wants to achieve. Because of the way that you have to portray space in a text-based system, any sort of behaviors dealing with movement or space have to be abstracted. This is important because movements are important in real life. Even something as seemingly innocuous as where you stand, when you stand there, and how long you stand there can drastically alter a person’s perception of you – not least of which in the context of the scenes that are likely to be created in a system like Storytron. Reason tells you that everyone participating in a conversation is within two feet of at least one other person in the conversation, but it may never be explicitly mentioned, in part because authors expect you to intuit such details based on context. This alone railroads players into a specific way of perceiving the storyworld, one that ultimately shapes their interactions with the game.

This extends into explicitly socially damning actions, such as peering over a person’s shoulder to sneak a peek at what they are currently doing. Not only would it be difficult to implement every such action, and more difficult still to implement them in a way that wouldn’t make gameplay awkward, important details are simply impossible to rationally define. Who saw you peer over that person’s shoulder? From a game design standpoint, the most obvious answer is “everyone in the room/scene,” but that would make any action like that completely useless and isn’t a reasonable assumption. At any given point in time, the actors could be wrapped up in their own conversations or anything else that they are currently doing, but the exact nature of where they are and where they’re looking is difficult to define in a text-based system, even if not impossible. For truly meaningful interaction, you then have to allow them the same actions as the player, which complicates matters.

However, even if you were to create such a system to simulate AI movements, it would be odd to not allow the player to move as the AI do, but it would be nearly impossible to implement a system of player movement that both mirrored the AI’s capabilities and wasn’t incredibly clunky.

But there’s a greater underlying problem here. In graphical games, you can use a few actions and phrases to sort of fake the notion that something completely new is happening. You can’t really do that in text-based games. Everything that happens has to be explicitly defined, which will inevitably lead to overlap. What’s more, that overlap is more damning than that seen in graphical games because the patterns are easier to see and, when it comes to actions, there’s generally no room for interpretation.

Problem 2: The choice to remain choice-based

It’s odd to me that, in a system that is idealized as having a narrative that is largely shaped by the player’s interaction with the world, player agency is heavily restricted by a choice-based system. However, beyond that apparent contradiction, the choice to incorporate choice presents a number of issues.

The first major issue is that players can only do what the author tells them that they are able to do. Players could think of 20 different actions that could be taken at any given moment, but, if the developer only implements five, then they can only perform any one of those five. Rather than responding to the narrative naturally, they are then railroaded into the developer’s way of thinking, which, in my opinion, defeats the purpose of having the advanced AI in the first place. However, because of the way that text-based systems works, the process of adding each individual action is costly and, in a case such as this, should be tailored to each unique situation. It could also lead to tons of ultimately meaningless actions – the kinds of actions usually relegated to emotes in a graphical game – having their own unique text describing how they did nothing. Conversely, these actions may be significant in other contexts, so you can’t exclude them. Furthermore, you may be thinking “can’t I just include them situationally?” and my answer would be no. You can’t underestimate the value of actions having an effect situationally. That moment that that otherwise pointless emote means something is a moment that could have a lasting impact on your player. That could be the moment that is burned in their brain as memorable.

It’s because of this that a choice-based system simply isn’t ideal for achieving this goal. In order to make any one choice truly meaningful, you would simply have to create too many choices with too many passages of text – regardless of how you decided to achieve this goal, be it stitching paragraphs together based on tons of stat checks or explicitly writing out each result, you’d still be writing a lot of text – many of which would only be of use contextually. That’s not to say that it couldn’t be done, but simply that the benefits of creating such a game probably wouldn’t be worth the time invested.

But the choice to remain choice-based presents another major problem: abstraction of time. Because of the decision to remain choice-based, actions are carried out in a turn-based manner. This means that, if the AI are going to interact, they can only interact in-between your decisions. The decision you have to make is set in stone and won’t be altered by waiting. You can stay at that decision for eternity and it will still be exactly the same decision with exactly the same underlying stats.

Games like Telltale’s try to model a real-time system by incorporating a timer that essentially says “if you don’t answer before time is up, you will remain silent,” but it’s still not ideal. Truly challenging players’ social intelligence means that you have to allow them the opportunity to break their silence, which isn’t an option that is given very often. Even when it is given, it’s given on a follow-up question/statement, with the previous question/statement’s silence etched in eternity. The reality of conversation is that it’s not just the act of being silent that affects others, but how long you choose to remain silent. Details like this are important in a real social setting, but are abstracted away in a choice-based system.

Problem 3: An inability to describe how actions are carried out

In Crawford’s essay “What is the Essence of Computing?”, he places emphasis on how actions and objects are equally important, seemingly trying to opine that games are far too object-driven. While that may be true, there is another issue that holds existing systems back in terms of simulation of social interactions: You can’t describe how you are carrying an action out.

Imagine that someone asks you to move a box, as some game might actually make you do in some sort of tutorial. You could do it cheerfully – after all, you’re helping them out and that makes you happy – or you could do it begrudgingly – after all, why can’t they just move it themselves? It’s this lack of definition that keeps actions from being meaningful. If you move the box cheerfully, it could make the actor who requested the help happy, but it could also make any others that are helping to move boxes frustrated with you, as you are cheery while they’re miserable. If you move the box begrudgingly, any others that are helping to move boxes could commiserate with you, whereas the actor who asked you to move the box might resent you for it and wonder why you didn’t just decline to help.

Distinctions like these are fairly important, but the details are left up to the author. You perform the action the way that the author envisions your nameless character performing it, which can be frustrating enough on its own when an action isn’t perceived the way that you expected it to be. Some authors attempt to fix this by defining two or three ways that you can perform an action, but you’re still stuck on their path, which is problematic in a setting filled with socially aware AI.

To be completely fair, the Storytron system does include the ability to define adverbs, but it’s not really in the way that I’m describing. Storytron turns adverbs into a series of values and those values are used to give additional options to, for example, how much you are attempting to pressure Afghanistan. It allows for the difference between a feeble and forceful attempt at pressuring an actor, but it doesn’t – easily, at least – allow for distinctions between cheerfully, begrudgingly, angrily, or neutrally accepting an offer to help move a box. Due to the way that its algorithms work, the system is best suited to a scale, wherein the option that affects the numbers the least is the weakest version of a single form of augmentation while the option that affects the numbers the most is the strongest version. Parallel adverbs that affect the numbers to similar extents, but in different ways, aren’t really supported. As described on the Storytron Wiki, “Some Sentences may also have an Adverb. This is really just a Quantifier, describing the intensity of the Sentence in question.” Furthermore, you are still locked into the author’s way of thinking – each action has a set of adverbs assigned to it and adverbs may be conditionally removed, depending on the action – even if it makes sense that you would be able to both begrudgingly accept an offer to move a box and begrudgingly look for food in the freezer in the garage because the fridge inside is empty, the author may not see it that way and only allow you to begrudgingly accept the offer to move the box.

The fairly obvious problem with any attempts to fix this is that it’s simply too complicated. Every single adverb that you add to the list of possible adverbs potentially multiplies the complexity. You have to write text for each and every adverb-verb pair relative to the nouns that they are being used with, which simply isn’t practical. Simply throwing more numbers at a system can’t exactly solve this either, as they still imply predefined responses to each action, even if they are on an actor-to-actor basis. As such, I’m not sure that there is a good solution to this problem just yet.

Perhaps a starting point would be a graphical game where players can opt to select moods – even the happy/sad/angry/neutral lineup would allow for a decent amount of player agency. The player’s chosen mood when performing actions could then affect the way that actors perceive them.

Personally, I feel that the field would be better suited to a system in which authors define actions and reactions, rather than interactions, a distinction that I may go over in more detail in a different post. In short, however, players should have a set of social actions that can be performed at any time and a set of adverbs that can be used to augment each of them. Each actor should have a predefined response to each action in the context of the social action-adverb pair being directed at them and at every other individual actor, something that could be algorithmically defined by that actor’s relationships with each of the others. They should then have exemptions to those rules – for example, Actor A may enjoy you being rude to Actor B on a normal day, but, if one of their relatives just died, such behavior would be socially unacceptable. These building blocks would then be used to algorithmically create an interaction, rather than having each one explicitly defined. It’s the difference between “you chose this action with this adverb so it has this result” and “you chose this action while augmenting it with this adverb, so the computer chose this result.”

The topic of interactive storytelling, as defined by Crawford, is a complex one, a goal that is seemingly out of reach at present, which is likely why many have opted to remain working on more traditional efforts. There are many considerations to be made, many of which I’m not entirely sure have been made yet, as Crawford seems to be devoted to a singular vision of interactive storytelling that he chased for years. The thing is that, as many of his blog posts and player comments about his games seem to say, I don’t think that this vision is conducive to creating something that’s fun. It’s a vision that’s so entirely devoted to the systems that it almost seems to have forgotten how the player interacts with said systems.

Note: To that end, I’d love to see Storytron’s Storyteller client remade in something like Electron with a completely new, more modern interface.

Now, that’s not to say that I don’t like the work that he’s done over the years – in fact, I rather like the Storytron system, even if actually building something with it is insanely difficult. Rather, I simply don’t believe that the Storytron system was the best step towards his definition of interactive storytelling. I believe that, for the reasons outlined above, a graphical system is better suited to the sort of social awareness required for such a simulation.

While some of this may seem nitpicky or unreasonable, think about it. The only way to truly challenge players’ social intelligence is to semi-accurately model social interactions, even if the interactions are ultimately abstracted and simplified. That simply cannot be done when so much about your interactions is left up to the author. In the system that Crawford has created, the author defines the abstraction of space. The author defines how actions are carried out. The author defines how actions are perceived on an actor-to-actor basis, rather than defining the context for which certain actions should be perceived and letting the computer decide the actual perception in each individual situation.

But I guess that some of that comes down to confusion about the ultimate goal. Is the goal to create algorithm-driven literature or is it to create emergent stories that challenge players’ social intelligence with a backing social simulation? Does either really need fun gameplay or, despite his dramatically exiting the games industry in 1992, does Crawford believe that gamers are the only ones that he can market his creations to? If the goal is algorithm-driven literature, does it even need all of the complexities that Storytron has? Furthermore, if the goal is simply algorithm-driven literature, how does it challenge players’ social intelligence any more than socially-inclined games that already exist? These are questions that whoever takes up the reins next perhaps needs to answer before progress can be made in the field.

I decided to ask Crawford the first question, as well as whether order of events was important. He rather cryptically responded that he was aiming for “interactive storytelling,” rather than an “interactive story” and proceeded to say that, in this vision of interactive storytelling, there was no order of events, as such. He also laced the relatively short response with multiple insinuations that the difference was hard to grasp.

Regardless of the actual difficulty of grasping the concept, the question we have to ask ourselves is, again, how it affects the user experience. Does this improve it? Will the user be able to tell the difference? Does this actually advance the field in a way that isn’t strictly academic, which the many closed source academic attempts at such a thing might imply? My own reasoning would tell me that the answer to all of these questions is no, that such complexities would be nice, but are ultimately unnecessary.

At the point where there is no longer an order of events, that the experience is so far personalized that the order of events is driven by how the computer interprets your actions, there are only really two ways that the writing can be handled. Either it is stripped of most of its dramatic implications, due to the need for it to be allowed to happen in any order, or the computer itself has to generate it, which Crawford himself says removes the “soul” and artistic vision of the work. Regardless of which method you choose, it is still a long and arduous process, one that is incredibly messy and prone to error. It’s a process that, if you get it right, probably shouldn’t be noticed by the player, but may not actually provide a better experience than that of a more traditional story. However, as mentioned above, I believe that a graphical approach would be more conducive to such a goal, as the details are explicitly defined by visuals and changes to those visuals don’t need to be explicitly written out, leaving only the speech to be written by the author.

In gaming, such independent AI interactions as those that are described as being important in interactive storytelling have been attempted in simpler form in many games and they rarely pan out. Oftentimes, it devolves into a player carrying out an action that will be viewed negatively, such as thievery; getting caught by a single AI; and everyone in the vicinity suddenly knowing instantly that you are, for example, a thief. Similarly, several wargames over the years have tried to model chain of command and time to transmit orders in various eras of warfare, but that usually ultimately comes down to timers that seem reasonably realistic in systems that already provide layer upon layer of abstraction, rather than any sort of actual AI interaction. More serious attempts have been made, but I’ve never seen one turn out right. Every single attempt that I’ve seen to create independent AI that you can interact with directly has provided either no visible benefit, due to the models being too shallow to actually provide such a benefit, or created an awkward user experience. Now, that’s not to say that it can’t be done, but simply that it hasn’t been as of yet – not even in a traditional game.

But the question that I keep asking myself is why go to such lengths at all? Literature is dramatically interesting because it was written that way. Similarly, can’t you use a quality-based system like StoryNexus or Varytale to make a story that is dramatically interesting, but also provides enough variation to allow the player to believe that they have some agency or, at the very least, the same amount of agency that they would believe was provided by a system like Storytron?

Therein seems to lie my greatest problem with Chris Crawford’s vision of interactive storytelling and why I think that the overall vision may need a reboot. Crawford has been chasing down the same goal since he left the games industry in the 90s. He’s been using the same games that he first made in the 80s as proofs of concept for his new software. Using that mindset will inherently lead to results that are less than satisfactory and, ultimately, I think that it’s led to a situation where several parallel goals have been wrapped into one goal that is more cryptic than the rest. Is it important to challenge players’ social intelligence? Is it important to have dramatically interesting choices? Is it important that the narrative is driven by the way that the computer interprets your responses? Any one of these goals could be achieved independently, perhaps even in existing software, but the problem seems to be that he’s been so laser-focused on achieving them all simultaneously that he’s failed to achieve any of them in a way that lives up to his standards, which may in large part be due to the fact that the technology that he’s using has barely changed since the days of Erasmatron.

We need a new take, a new approach, one that isn’t so firmly rooted in the traditions of literature – because, let’s face it, it’s not important that storytelling be done with words, a point that Crawford seems to have missed despite viewing interactive storytelling as analogous to cinema. In Crawford’s own words, we aren’t using computers to their full potential, so let’s start doing that by taking this field into the graphical realm that we all know that they’re capable of.

Leave a Reply

Your email address will not be published. Required fields are marked *