Facts of IF: Why Did Byzantine Perspective Do So Poorly?

In general, I don’t much like puzzle IF.  Nevertheless, I am a big fan of this game.  This puzzle is so nifty, so creative, and so cool that I was really looking forward to it placing well in the Comp.  Also I appreciated the author for implementing ‘nab.’  Instead, it got a 5.76, putting it in ninth place out of 24:  not even in the top third.

Partly that’s because we had a very good Comp this year.  Even so:  as an end product, it was a better work of IF than the higher-scoring _Earl Grey_, and as an inspired work it was much better than _Snow Quest_ — all due respect to those authors.

Really, it should have done better.  Why didn’t it?

If we take a look at its profile, this is what we see:


People had *enormous* respect for it as a puzzle game.  They considered it highly playable.  Its game flow rating is surprisingly high for a game that’s so challenging.  But nevertheless, people rated it almost exactly at its Immersiveness level.

Why don’t Comp judges reward excellent puzzle design like _Byzantine Perspective_?

–I don’t know why, actually.  Wrote myself into a corner there.  But they just don’t.  Puzzle design has a lower correlation to a game’s comp score (.85) than does immersiveness.  _Byzantine Perspective_ was considerably more immersive than we would expect from its story or emotion rating.  It even has a higher immersiveness than its PC characterization rating, which is unusual.

Doubtless this is a rare case of a game’s immersiveness rating being largely formed by interesting puzzle design.  But this is not an efficient way to get immersion, and we can see from the game’s profile that it was so lacking in story and character that this hurt it badly in the eyes of the judges.

From a private email or two with the author, I gather that she had a single idea for a puzzle, and wanted the game to purely be about that.  But we can predict from the math that her game would have done far better if she included even a minimally interactive storyline.

This story could be basic:  We have an introductory scene where the PC is talking to her cat-burglar friend about her financial woes.  She gets the idea of turning to a life of crime; her friend tries to dissuade her; but he gives in in the end.  After the puzzle scene, we can have her return victoriously to her cat-burglar friend, who promises to set her up with a fence.

Even this minimal amount of story — two brief scenes — would have raised the score into the high 6’s or low 7’s.

Alternatively, the story could have been more dramatic:  the cat-burglar is a kind of fallen father figure, whose lifestyle our heroine disapproves of.  But he’s suddenly taken ill and needs money to get a cure (having spent his ill-gotten gains on a life of luxury, or having had it confiscated when he got caught).  Now the PC must go against her values in order to save her friend, who unfortunately is not well enough to explain the burglarizing devices to her.

Or, we could have had a betrayal-trust subplot, perhaps involving a lackey who might go to the police, or be a cop, or steal the goods from her, and who the PC could be romantically entangled with.

Any of these would have added a potentially interesting NPC, given the player something to respond to emotionally, and provided a story.  None of them are terribly creative — they’re different mixtures of Audrey Hepburn and Sean Connory movies; doubtless the author could do better — but any of them would have posed the PC’s problem narratively, shown the working-out of that problem interactively, and then allowed the player to bask in the reward of having successfully solved that problem, all in-game.

If our model is correct, adding a narrative would have done better than other ways of modifying the game:  for example, the addition of a night-time security guard who must be avoided.  That would have made the game more lively, but probably would not have fed the judges’ hunger for story, which appears to be the greatest determiner of immersion.

Finally, the problem with puzzle-heavy games is that puzzles only seem to add value when the player solves them on his own.  That means the author could have probably boosted the game’s score simply by cluing the main puzzle a little better, thus “catching” a lot of the judges who couldn’t solve the game.  For example, there is a button on the goggles which the player should find.  He will only know to look for it if he reads the paper or if he feels the goggles.  He wouldn’t normally know how to read the paper without first knowing how the goggles work, and he has no reason to feel the goggles either.

The simplest solution would be to prompt the player to feel the goggles if he tries to examine them.  The problem with this approach, as well as adding narrative, is that it compromises the game’s ability to misdirect the player.  You will recall that, when you started the game, the direction that had an exit was one of the two directions you could not move.  Nevertheless, the puzzle here is *so* strong that I think the game could lose that misdirection, simply landing the player in a dark room until the goggles are put on, without compromising the game’s basic coolness.

So, the lesson we have to learn from this is that Comp judges do *not* respond well to puzzle games:  even immaculately-programmed, well-written and highly playable puzzle games will be scored severely lower than they would deserve on the strength of their puzzles, if they are not accompanied by a minimally engaging story.

Published in: on November 18, 2009 at 6:34 am  Comments (11)  
Tags: , , ,

The URI to TrackBack this entry is: https://onewetsneaker.wordpress.com/2009/11/18/facts-of-if-why-did-byzantine-perspective-do-so-poorly/trackback/

RSS feed for comments on this post.

11 CommentsLeave a comment

  1. I think there’s another issue you’re not counting here: people didn’t trust the game. Especially those who were playing it early on, before many reviews had come out, and those who had done their best to avoid seeing any reviews, reacted to the opening sequence of the game as displaying extreme bugginess, rather than a clever bit of misdirection. And little is quite as fatal to immersion as the feeling that the game itself is broken and that you can’t trust the author.

    I’m not sure what to suggest to fix this. It’s not something that’s wrong with the nature of the game, but with the context in which it appears — which is one reason I expect it will be more warmly received outside the competition, when prospective players may come upon it already knowing/assured that it’s not one of those bottom-of-the-barrel-scraping comp disasters.

  2. I didn’t rate the game that highly and thought its position was fair (I’m pretty disappointed that ROVER won, but it certainly deserved to come above this, for instance). Good puzzle, yes, but there was basically nothing else in the game – the writing was at best okay, generally the whole thing was sparse, etc. That doesn’t take away from the game, it’s a perfectly decent implementation of an interesting puzzle (so kudos to the author), but I don’t think that alone deserves a ranking above 6 say. Also, the game was only about twenty minutes long, with the first fifteen being spent figuring out what was going on. :)

    I agree that an introductory scene would have helped. Adding more sugar in this way (including a nicely-written story part, plus some ‘normal’ interaction before you get to the museum, which could also include clues for the fancy goggles) would also have helped with the trust issues; if players can tell they’re playing a high-quality product which generally works as expected, then they are going to assume that something weird which then happens is intended. If you’re dropped right into it, maybe you might just put it down as buggy. (The title should’ve given a hint, but…)

  3. Emily,

    That may be. Do you think that trust has an influence on game value other than through immersion?

    It seems to me that poor trust can break immersion, but high trust doesn’t add much.


    Thanks for the report on how you experienced the game. I was especially interested in the score for _Byzantine Perspective_ because of its profile: would judges rate it strictly on its puzzle, or on some kind of averaging?

    It seemed we might see a judging function where people went with the peak value — if something was a good puzzle game, they’d respond to that; if a good narrative, they’d respond to that. But _BP_ (and the other games, which we’ll get to) seems to show pretty clearly that judges like to take everything into account.

  4. This is interesting, because I had basically the opposite reaction to everyone else; I thought that adding more frame material would have distracted from the main puzzle and in fact made it harder for me to solve. Indeed, I thought there was one object too many in the game. (Though I agree that more descriptions of the museum would’ve been nice.) I guess I think that one thing IF is good at is delivering a puzzle like this, and it’s not necessarily good to lard such a puzzle up with other things. I had sort of the opposite reaction to Grounded in Space, where I wanted to try out the central puzzle but found myself getting frustrated by the frame material. [Admittedly it was unfair to GiS to play it this way, because the YA-Heinlein story obviously was important to the author, and I was playing it pretty hastily at the end of the comp.]

    I wonder if I feel this way partly because I play a lot of graphical escape-the-room games, which generally have little to no plot (and if they do have a plot, it’s often in Japanese), and also aren’t usually supposed to take more than fifteen minutes. It certainly helped that I’d read enough of Emily’s review to know that the game looked buggy but wasn’t (though it also meant I spent a little while wondering if the Parchment illegal object bug was supposed to be significant).

  5. Matt,

    We all have our own taste. I’m finding the judges’ cumulative decisions quite often at odds with mine, too: but that’s why this is useful. Because I can’t rely on my personal tastes to steer me true when it comes to popular opinion.

  6. Part of it was that it felt like (and was) a gimmick game. There’s nothing intrinsically wrong with that, and I might have fun with a gimmick. But once you’ve uncovered the gimmick, you’re pretty much done.

    One of the issues for me was that there’s really only two scenarios of character knowledge here: either the character knows how the goggles work, and that knowledge is withheld from the player *deliberately* even though ostensibly the character is you, or the character doesn’t know either, but doesn’t show any signs of cottoning to the weirdness that is seeing an exit to the south and then running into a wall getting there. This isn’t an unusual set-up, but I find it irritating. (It’s not quite as bad as playing someone who has a very clever plan whose steps you must replicate without actually knowing what the plan is, but it’s in the same genre.)

    I didn’t finish BP, but I didn’t understand, even after I found out how the goggles worked, why you’d need them. Why not simple night-vision goggles and a crawl through some ventilation shafts instead? Was security *so tight* that this was the only way to get in? It didn’t *feel* tight. Maybe this was clear later, but I didn’t get it at the time. So even after I was told to trust the game, I didn’t trust the game.

    What would have made the game stronger would have been doing what the author didn’t want to do – taking the gimmick and making it into something bigger, something that you could use within a story, rather than forcing the story around the puzzle. I absolutely get why the author might not want to do that, and authors should write what they want. On the other hand, it wasn’t a game I enjoyed much.

  7. I should have said “stronger for me, personally”. That wasn’t meant to be a sweeping declaration of universal potential value or anything.

  8. It seems to me that poor trust can break immersion, but high trust doesn’t add much.

    For me it makes a world of difference. If I trust a game, I’m likely to spend more time working on puzzles before going to the walkthrough; suspend judgment on perceived flaws; and mull longer over the possible meaning of the plot and ending(s). I don’t know whether those are all things that fall under your immersion category or not.

  9. [Matt Weiner here. Unfortunately, since I’m logged into wordpress, I have no choice but to post as “matt w.”]

    It’s interesting to look at the Jayisgames commenters for this game, who I expect are generally IF newbies who are pretty likely to have played a lot of escape-the-room games like me. Most of them seem pretty confused; one clearly doesn’t know about the “inventory” command.

  10. One question that interests me with judging, maybe something you might address in a future post if you have any ideas about it, is: do judges try to be fair or just completely subjective? Are there two camps or is it just a continuum somewhere in the middle?

    (For example, people who don’t like military SF scoring Duel of the Ages, or people who really can’t stand the stage-kid theme of Broken Legs even as comedy, etc.)

    If most judges do not try to be fair, then issues such as the theme of the game could become very important in scoring (pick an unpopular theme and even if your game is great, you’re screwed). On the other hand, if most judges do try to be fair, then games like this one (Byzantine Perspective) might be disadvantaged because even if people really like the clever puzzle, being ‘fair’ means they might not give it an 8 or 9 because compared to other games, it wasn’t as impressive an effort overall.

    (By the way, certainly nothing says judges have to be fair. I’m not using this as a derogatory statement.)

  11. On reflection, I think a puzzle-heavy game that players don’t trust will be seen as broken. So its profile would probably be more like that of a broken game — low everything.

    _BP_ really pushes the boundaries in this way: it doesn’t establish much trust, and it’s heavy with the misdirection. I suspect it had some players consider it broken for that reason.

    Probably, my sample selection is biased toward experienced game-players. Newbies to IF games would be more likely to have heard of the Comp than of my survey, so that might well account for the discrepancy between the predicted score and the real score. The newbies being more likely to consider it broken, or otherwise impossible, in other words.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s