The Facts of IF – A Model of IF Comp Scoring

I’ve come up with a predictive model for Comp scores.  It’s accurate to well within my margin of error, except for two games, and these I can account for.

It’s based on three variables, namely Immersion, Puzzle Design, and Playability.  The coefficients for these, respectively, are:  .6, .2, and .2.  If you’re good at linear algebra and you want to out-face me on the variable coefficients (or indeed on variable selection), email me and I’ll send you the data set.

The model is:

 60% (Immersion) + 20% (Puzzle Design) + 20% (Playability)
= Comp Score

Now, as I say, this *didn’t* work for two games.  Those are _Rover’s Day Out_, which picks up +1.09 somewhere, and _Earl Grey_, which gains a mysterious +1.39 boost.  But I’m happy with that, because these both are games that many reviewers considered surprisingly creative, and creativity isn’t something I measured.

But, this could be flat wrong:  _Byzantine Perspective_, which was remarkably creative, suffers a -0.68 penalty from somewhere.  That’s within my margin of error, which is about 1 Comp point, but I still find it puzzling.  (In fact, I find _BP_’s score puzzling:  I’d expected it to do much better.)

It may be that _Rover_ was creative in ways that dazzled people, while _Byzantine_ was creative in ways that confused them.

I’ll be working through the Comp games according to this model, showing how they were rated by quality and how the model predicts those qualities would have shaped their Comp scores.  But first, let’s talk about the model itself.

METHODOLOGY

I asked a bunch of people a bunch of questions which were designed to use subtle psychological gimmicks to ascertain how they would rate the game according to certain qualities.  For example, to get them to tell me how they would rate a game according to “Immersiveness,” I designed a question that asked:

Immersiveness.  How much did you get into the game?

and in this way I ranked each game on a 1-10 scale for ten qualities.

Once the Comp ended, I closed the survey, collected the final data, and (just a few hours later) downloaded the final scores and compared them to the survey data.  What I was interested in was how strongly the different qualities I’d asked about predicted the final Comp scores.

As you probably know, the math you use to answer a question like this, assuming you’re dealing with linear influences, is the Pearson product moment correlation, or:

which can take a while to calculate, so of course you can make your life easier by doing it this way:

 

which is exactly the same.  But in my case, I was using Open Office Spreadsheet, so I did it like this:

which, when I applied it to the data told me some interesting things.

CONCLUSIONS

Primarily, what Pearson’s magic formula told me when I used it to hook together my survey and the Comp scores was that Comp games are rated on immersion.  Emily Short has said that she rates games based on whether they have worthy ambitions, and whether they achieve those ambitions; which I understand to mean that she rates puzzle games based on whether they are good puzzle games, and so on.

But that is certainly *not* how most Comp players rate games.  Comp players rate according to their own standards, and they only give games very limited lee-way to redefine those standards.  Players may have a great deal of respect for games with knock-out puzzles, but their final ratings do not reflect that respect.  At most, they gauge their immersiveness-based score up or down based on other factors like puzzle design.

Immersion is the quality that correlated highest to Comp scores, at .95 — the maximum possible being 1, and the minimum possible being 0 (or -1, if you’re interested in the direction of the correlation).  The only quality that ranks equally highly is ‘writing,’ which also hits .95.  Immersiveness and Writing correlate to each other at .94, and Immersiveness and Story correlate at .96.  Story correlates to the final score at .92.

Since these seemed to be measuring very similar things, I could only use one of them in the predictive model.  I picked Immersiveness, on the theory that Writing was a statement about the game, while Immersiveness was a statement about the player’s response to the game.  So, you create immersion through writing — at least, through .95 of your writing.

Comparing Immersiveness to the other qualities, I found two that correlated poorly to it, and with each other, while correlating respectably well to the final score.  Therefore, I reasoned, these were my best candidates for independent variables.  Those qualities are Puzzle Design, which correlates to the score at .85, and Playability, which correlates at .79.  I’ve tied these together in the formula I’ve given above.

This is the full list, showing how the various qualities measured by the survey stack up:

Correlation to the Final Score
Writing:  .95
Immersion:  .95
Story:  .92
PC Characterization:  .88
Puzzle Design:  .85
NPC Characterization:  .83
Emotional Meaningfulness:  .82
Playability:  .79
Agency:  .74
Game Flow:  .53
Noise:  .15

 

That last one bears some explanation.  The more numbers you collect — the more survey completions, the more Comp judges — the clearer your mathematical lenses can focus the statistical image:  the more precisely you can tell when sets overlap and when they don’t, and so on.

So, for demonstration purposes, I created some random numbers and treated them as data.  As you can see, the random numbers correlate with the Comp scores at .15.  If we had an infinite number of games this year, we could expect Noise to correlate to the scores at 0, or thereabouts.

That tells us how blurry our picture is:  as correlations get within .15 of each other, we must become increasingly uncertain of which really correlates more strongly.  We can be just about certain that Playability is less important to people than Puzzle Design, and we can be very certain that Immersiveness is more important than Playability.  But when we say Immersiveness is more important than Puzzle Design, we’re just playing the odds.

RECOMMENDATIONS

Now, we can compare this prioritization with a picture of the average Comp game, to see if we as an IF authoring culture are spending our resources — our development time, our thinkums, our beta-testing hours, and so on — efficiently.  And we can see that, on average, we’re not.

We over-refine our games’ playability and game flow (the Bos-Osam effect).  Game flow means not getting ‘stuck,’ and it correlates with the final score only at .53.  Apparently players are willing to get stuck on a game if it’s for a good reason — if the game is hard; if they’re into it.  If you’ve been following this blog, you’ll notice that this is at odds with something I said before:

Reading the Comp 09 reviews, I argued earlier that puzzles were a bad investment of time and energy, because they tend to make players stuck.  But now it seems that’s not exactly the issue.  On average, we’re putting an appropriate amount of energy into puzzle design, for the return:  but we’re spending *too much* energy on playability and especially on game flow.  These things are important, but not important enough to neglect immersion — writing and story — as much as we have been.

We tend to skimp on agency, and that’s fine — agency is expensive, and it’s not important to players that they’re given much.  However, they must be given some, and we’ll talk about what that means later.  Worthwhile investments, on average, would be in NPC characterization and in developing emotional meaningfulness.  Both of these correlate pretty well with immersion, and currently you don’t need to do much to stand out.

So, that does it for the broad over-view of what the IF Comp scores and my recent survey tell us about how Comp games get the scores they get.  In the coming weeks, I’ll be writing about this statistical model in more detail, and taking a fresh look at several of the Comp games, to arm-chair general them in ways that will hopefully help us all understand IF writing better.

More coming.

Advertisements
Published in: on November 17, 2009 at 1:01 pm  Comments (9)  
Tags: , ,

The URI to TrackBack this entry is: https://onewetsneaker.wordpress.com/2009/11/17/the-facts-of-if-a-model-of-if-comp-scoring/trackback/

RSS feed for comments on this post.

9 CommentsLeave a comment

  1. I really don’t want to spoil your fun, but it seems to me that your logic is somewhat flawed. By asking about “immersion”, you’re basically asking “How much did you like the game?” — that this should correlate strongly with the final scores is not really surprising.

    Granted, the question was phrased differently, but in the end I can’t think of an instance of answering “How much did you get into the game?” with a very high score and then *disliking* the game.

    It’s a common mistake when you design a survey; you need to make sure that your terms are well-defined, do not overlap and are neutral, otherwise you’re asking something, but not gaining any new information, merely rephrasing the answers. “Immersion” is, IMHO, too vague and ill-defined — it’s an emotional response that subsumes a lot of other factors.

  2. By asking about “immersion”, you’re basically asking “How much did you like the game?”

    Correct. Almost correct, anyway. I’m rating their response to the game-as-narrative. There are cases where immersiveness does not alone predict a game’s final score accurately.

    This result is important because people have had various opinions about the importance of immersion to game value. Those disputes, to the extent one accepts our metric of Comp score = game value, have now been settled. The follow-up question is, “What fosters a strong sense of immersion?” — which will be addressed also.

  3. I wonder if people are more willing to get stuck in Comp games because most of the games come with walkthroughs.

    The point about the expense of agency is an interesting one, and one that maybe goes to what the first comment said about immersion; there are some factors that may have a large impact on how much someone likes the game but that may also be hard to implement. Ideally you want to look for factors that are relatively easy to improve and have a relatively large impact on scoring. (“Running spell-check” is perhaps one such factor; presumably it falls under “writing.”)

    By the way, this post is by Matt Weiner; the comment immediately above is from another person named Matt W. (the Grounded in Space author, perhaps?) I assume the icon next to our posts will distinguish us.

  4. Just to give you a hard time…

    “I picked Immersiveness, on the theory that Writing was a statement about the game, while Immersiveness was a statement about the player’s response to the game. So, you create immersion through writing — at least, through .95 of your writing.”

    Doubtless a minor quibble, but I do believe ‘immersiveness’ is a statement about the game — specifically, how well the game _tends_ to produce a response in the player (vs. ‘immersion’, which states the magnitude of the response). Admittedly we will judge ‘immersiveness’ on the basis of their own response; but we can often tell if we’re biased. Whether people actually vote according to this is less clear, but if the goal is to write better IF the tendency is probably more important.

    To defend the implicit premise: a story (or IF) is composed of a number of elements like ‘plot’ and ‘character’ and ‘writing’ so forth; but these are designed to produce a mental image (or world-model, for IF), which in turn is designed to effect us somehow, if only by being perceived (as beautiful, for instance). So the technical qualities, like ‘writing’, are only means to forming this mental model, which is only a means to producing some effect (sorry, best argument I could think of right now). So in trying to answer ‘what are stories (or IF) for’ we will prefer a measure of the effect itself, rather than a measure of the means used to produce that effect. Hence ‘immersiveness’ over ‘writing’.

    But…

    “This result is important because people have had various opinions about the importance of immersion to game value.”

    I wonder: these statistics tend to correlate immersion with game value, but don’t specify how. Immersion might not be the reason for game value, but a consequence: we are immersed _because_ we find something of value. But maybe you’ll discuss this in your next post.

  5. By the way, though it is not obvious on the main Comp site, “Byzantine Perspective” tied for third place (with “Broken Legs”) in the authors-only voting.

    I think it is a little uncharitable to assume that people who voted low for “Byzantine Perspective” were necessarily confused by it. I did see a handful of “this is confusing and therefore terrible” reviews, but many were more like, “this was clever, but way too short to count as a Real Comp Game.” (In fact, the only statistic I’ve personally been keeping is a tally of the times the phrase “one-trick pony” appeared. It comes to, at present, five times out of thirty-one reviews.) I’m not too deeply offended by this evaluation, since several of my favorite games are clever but quite short, like Suveh Nux. I was mildly confused, since I thought the entire point of the Comp was to showcase short games; some of my testers did take the full two hours, and I certainly didn’t want anyone to run out of time without finishing. It seems like the current standard for judging involves a lot more peeking at the walkthrough, so I’ll keep that in mind in the future.

  6. Just for the record; the first comment is not by me, the Grounded in Space author. I guess we now have 3 Matt Ws in the IF community now…

  7. Yeesh. Note to self: proofread before hitting submit.

    “I guess we now have 3 Matt Ws in the IF community…”

  8. Hi, all! I need to be brief today, which goes against everything I stand for.

    Newbot,

    I suppose it shouldn’t really matter if we consider a property of a game to be “in” the game, the person, or indeed the survey. The goal is simply to figure out what we need to do as authors to write a good game, and we’ll measure how good games are with Comp scores (until we find something better).

    “I wonder: these statistics tend to correlate immersion with game value, but don’t specify how. Immersion might not be the reason for game value, but a consequence…”

    Well, really it doesn’t matter. If we had a property we called “sporkiness,” which correlated in an interesting way with a game’s Comp score, and we could get a clear idea of what kinds of games had a lot of spokriness and what had a few, then we ought to try writing a lot of sporkiness into our games.

    We’re just trying to get specific enough feedback that we can tell what a game needs more of, or less of, to score highly.

    Lea,

    ‘I think it is a little uncharitable to assume that people who voted low for “Byzantine Perspective” were necessarily confused by it. I did see a handful of “this is confusing and therefore terrible” reviews, but many were more like, “this was clever, but way too short to count as a Real Comp Game.”’

    It didn’t occur to me until a few days ago to measure game length. I can’t really do it after the fact. I have a sense of how long each game took me, but I’d have to poll people to get a valid metric.

    But you’re right; that may be a factor influencing score.

    It may not be accurate to say people who rated _Byzantine Perspective_ did so because they couldn’t solve the puzzle. It seems likely it was a contributing factor.

    But in fact, it seems even people who *did* solve the puzzle didn’t rate it as highly as I personally would have liked to see. From the game’s profile and the statistical model that’s shaken out of my survey, it seems that’s because the game was too lacking in narrative. It needed a story; all the better if that story had emotional or thematic resonance with the idea of perspective.

    All Matt W.’s,

    Could I ask you to sign your last names, or to come up with handles? You’re killing me here.

  9. I pretty much always use either my full name (like I do here) or “mwigdahl”.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s