Designing an automated tester for Choicescript

I’m currently having a go at writing a python based tool to automatically test Choicescript code. It started out as a good excuse to get back into using python, but now that the tool is taking some good shape, I’m keen to try and release something that is actually useful to the community. So I have a few questions I wanted to pose in order to help me on my way.

A quick overview of what I’m doing and hope to be able to achieve:

  • Checks all your defined variables (creates and temps) and flags any that you haven’t called in your code
  • Checks all called variables and flags any that haven’t been defined in the code (or where the temp definition is potentially after you have called it)
  • Checks all your indents to see if any are not a multiple of 4 - also hopefully also finding cases where you fall out of commands incorrectly too (we’ll see)
  • Validates all your choice blocks
  • Provides stats and metrics on ‘runs’ of the game

The main way this is different (and dare I hope, better) than random test is that the tool will test your whole game in one go. Whilst randomtest ‘plays’ your game in a linear manner and conks out at the first error, this tool will attempt to find all the errors in one go. I’m currently thinking through ideas of how to get the tool to play all possible paths simultaneously.

Question 1: Would you find it useful to have a tool which just spots attributes that haven’t been called, or called attributes that haven’t been defined and checks all your indents? That’s a realistic goal for me to achieve in the next week and if it’ll be useful to people, then I can look to make that available.

Question 2: What common errors in your code does randomtest not do so well at finding? Or are a nightmare to figure out the exact problem for? Or typically rear their ugly head after you’ve published the game? Simply, what are the key areas you would like a tool to help you with?

Question 3: What specific information would you find useful to be output from a tool that has played many/most/all possible paths through your game?

Question 4: Would you find it useful to be able to add comments to your code, which the tool can read, which do things like tell the tool to do specific things. One idea I had was to label certain choices as ‘key choices’ and the tool would provide specific metrics around which options each path could pick/did pick, what the stats were at that point, or whatever.

Question 5: Do you feel that the existing test tools already have you covered and you don’t need additional functionality?

6 Likes

Honestly, I’m not 100% sure I have a use for more testers. I guess having all the errors in a single list without having to go over and over would be good, but I’m not sure how you’d do that since the stuff that it’s picking up will often be game breaking in some way unless it’s just an improperly defined variable. (ie if there’s a missing # in a choice, how will the tester know if it’s a missing # or an incorrect spacing issue? Wouldn’t that potentially cause compounding errors that aren’t actually errors, or missed sections of text downstream?) The only thing that can be annoying with the current ones, is very occasionally the tester doesn’t pinpoint the correct line or character that has caused the issue and requires hunting.

It’s late and I may have misunderstood your points a bit, so please do clarify if I have.

The idea with my tool is that it wouldn’t actually ‘play’ the game. So, if it encounters something that would cause Choicescript (or randomtest) to encounter a fatal error, the tool itself just logs the issue and carries on - it won’t encounter a fatal error itself. In this way it is able to do lots of validation (variable names, indents, etc.) as a block check on the whole code in one go and return everything that fails.

The tricky bit is in getting it to simulate playing the game for multiple paths and doing the correct checks on the code - without it encountering its own fatal errors, and as you say, without it then picking up subsequent issues as errors which aren’t. The ideas for that functionality are half formed right now - and are separate to the above mentioned checks on variables and indents.

You mentioned an issue of a missing # leading to a missed option. Presumably that can only be spotted manually by eye anyway? I’m not aware that randomtest applies any logic to the code it reads to determine if the actual content is correct?
Having said that, as I type this, I think there is a way I could reasonbly identify when a # has been missed off.

It’d be good to get a list of errors that usually can only be found by manually playing the game and spotting them on the screen. I think there’s a few neat little tricks that can be used to infer the context of the text.

2 Likes

Oh no that’s cool, let’s see if I can explain better. So take for example

*choice
   #Left
      You go left
      *goto left

   #Right
      You go right
      *goto right

   # You take the center path
      You stay on the main path
      *goto straight

*label alternateroute
*comment label is coming from an earlier choice

So lets say this happens

*choice
   #Left
      You go left
      *goto left

   #Right
      You go right
      *goto right

   # You take the center path
      You stay on the main path
      goto straight
(-> missing * on the goto statement)

*label alternateroute
*comment label is coming from an earlier choice where the storyline has split.

What should the tester do? Usually you’d get a fall out error. In this case it would go to the alternateroute if it skipped onwards meaning your “staight path” storyline would not get tested, and you could experience ongoing errors in the alternate route if it assumes certain variables have or have not been set that aren’t actually a problem because you never should have been on that part of the storyline in the first place.

This could also potentially happen if you lost the # on the choice if the game didn’t know where to go with the indented text, as it’d no longer look like a choice option.

I could be over thinking issues here.

Edit: On rereading your response, I think you’re going for something different than what I was thinking of. So I’m guessing it’s more of a reads sections of code in isolation down the page sort of situation.

What you’re trying to build is a static checker or code sniffer. You might wanna look those up.

You can already catch indentation errors with CSIDE and the VS Code plugin if I’m not mistaken.

That sounds more trouble than it’s worth. ChoiceScript optimization is a futile endeavor. ChoiceScript is the opposite of optimized by nature. So, saving a few bytes of memory by removing unused variables won’t make much of a difference. Unless you want to pursue it for the funs.

Checking if a variable is called before declaration is hard, because ChoiceScript code can be executed in an arbitrary order.

Instead, you could create a tool that separates declaration and assignment, and hoists all declarations to the top of a file (except for variables declared inside subroutines). This way you can always guarantee a variable exists before it’s used.

This is surprisingly simple. Turn the source code into a weighted directed graph, then perform an algorithm to find all possible paths. There are many implementations available on the internet. The worst part would be to build a custom parser that compiles ChoiceScript source into a graph data structure.

I think variable mutation statistics would help in balancing the game.

2 Likes

I don’t mean to steal your thunder, but my first reaction whenever I see a thread like this is: why don’t you offer to help improve one of the already existing tools instead?

There’s a few reasons why I suggest this.

Firstly, while I don’t mean to make presumptions about your technical ability, if it was an easy endeavour, do you not think it would have already been done (by CoG themselves no less?).

Second, even if getting something started was easy, there’s a actually a lot of responsibility that comes with developing and publishing a tool like this. Once people from the community start relying on and using your tool, there is an implicit expectation that you’ll stick around and maintain and improve it. Imagine if I just disappeared tomorrow and never contributed to CSIDE again? I’m sure the world wouldn’t end, but over time (even if only because it’s version of ChoiceScript didn’t get updated), that would inconvenience a not insignificant amount of people. So not only does it need to keep up-to-date with things like the latest versions of ChoiceScript, but it also needs to be of a very high quality as well. There’s little worse than a tool – that’s meant to be helpful – having its own bugs and issues that distract you from writing or debugging your game. Tests for something like this will be all but essential, and writing good ones isn’t trivial (nevermind implementing the actual functionality). Don’t forget documentation either.

Third and final point: Both CSIDE and the vscode plugin by Sargent have (or will have) a lot of behaviour similar to what you list above. I’m obviously biased, but I do think it’d be more beneficial for the community to have a smaller number of higher quality tools. Not only because expertise and effort can be focused, but also simply because it makes a new author’s introduction to ChoiceScript easier: there’s less choice and opinion, and the suggested tool paths are obvious, clear and well supported (just imagine if we had three different wikis; how confusing would that be?!).

So, to finish: I’d personally be delighted if anyone offered to help accelerate and/or improve features like this in CSIDE. Heck, I don’t even think I’d mind helping mentor/teach people programming to do so, if they’re committed. I’m fairly sure Sargent would feel somewhat the same about the vscode plugin.
Therefore, if you have this energy and drive to create such tooling, I would implore you to consider offering your services to improving already existing solutions. I truly think it would be the best thing for everyone (including you).

That said, it is just a request/suggestion. You need to do what you want to do in the end (that’s the only thing that’ll keep you going). There’s no monopoly here, and I don’t intend to try and start one. But hopefully this at least gives you some food for thought :slight_smile:

If you do go ahead on your own, I wish you the best of luck.

5 Likes

Figuring out why a crash happens, sometimes. It’d be nice to know what all the variables are at that moment. Also would be nice to page back a step or two and see what the variables were there then, too.

How often each variable is tested, and against which target values. With a per-chapter breakdown, maybe statistical analysis of each variable and the spread of target values it’s tested against. The user could limit it to only specified variables (say, only the character stats) to reduce the amount of information.

Maybe. Key choices would be cool.

I’d worry about infinite loops being encountered by this tool, though. How are you planning to detect and escape them?

1 Like

Somehow I very much doubt that. But you make a great point.


It depends on implementation, but generally speaking a statical analyzer would not execute the code only map the nodes and edges between nodes avoiding nodes that have already been visited. Algorithms to calculate cyclomatic complexity of code do this.

2 Likes

Absolutely.

3 Likes

Hmm. That would duplicate the behavior of Quicktest, though, not Random. Without executing the code and state-tracking, it wouldn’t be able to determine which paths are accessible and which are theoretically plausible but actually unreachable because the variables can’t hit the target values. If the author has a *goto or an #option gated behind (var > 90), but var never exceeds 80, this tester won’t be able to tell unless it executes the code and determines the variable states.

1 Like

Do you think it could calculate the minimum and maximum range of variables by certain parts of the game?

Eg) It would be nice to know the min and max of a variable like strength for each scene.

End of Scene 1:
Strength, min: 35 to max: 65

End of Scene 2:
Strength, min: 30 to max: 75

Or something like that.

3 Likes

I second this; this would be an amazing feature for any testing program. I’m guessing we all have Excel sheets where we crunch the possible stat ranges at various points in our games and yeah the math is elementary school level stuff, but it’s still time intensive, and it’s time spent not writing.

3 Likes

In a way, you’re right. But there are ways around it, if you consider the weight of an edge the predicates. The algorithm would definitely be slower, but would be worth it.

Some occurences can be accounted for with a bit of flexibility in the coding, which allows it to infer the purpose of what it is reading (if it encounters a *choice and then a string that has a # but a space after it, it knows it is in a choice and can infer it is now reading an option - depending what functionality you’re trying to build, that might be enough).

The missing * is a good one, or an errant space between the * and the command. A static read of the code wouldn’t know that the * has been misplaced (though you can code it to check for this if you want). I’ve noted that down as something to consider - it’ll depend a lot on how any parsing of the code actually works as to how to pick up on those errors.

Thanks for the thoughts @cup_half_empty, given me some things to think about.

I’m not familiar with the VS code plugin, but I must be missing where CSIDE picks up on indents (unless you’re referring to the fact that random/quick test will spot them for you?)

Identifying unused variables (and duplicates) sort of comes for free. I have a list of defined and called variables to find the ones that are called without being defined, so doing the check in reverse is pretty simple. As you say, it’s largely pointless and more for those than want to be a little anal about their code!

On the declaration before calling - absolutely. So the check will first look in the startup and see if it was created in there, if so, then you’re golden. As you say, for temps, the code itself may not be in order - so that is an assumption I have written into the code (at least for the static check of variables), the idea is that any errors here would be written out as warnings rather than errors so that the user can go check themselves.

I’ve never really wrangled with graph theory. Are you able to direct me to a resource that would help explain how I would pursue all the valid edges of the graph for each distinct ‘run’ of the code?
I have ideas for how to do this myself - the main challenge I had forseen, as you say, was representing the game schema inside some kind of shell (graph, dictionaries, whatever).

I very much appreciate your thoughts and where you’re coming from.

I started this as a route to playing with python on a regular basis. I wasn’t sure I could even achieve the basic ideas I had, but once those started to come into place I figured that I could see if this would actually be useful - hence the thread. I guess I’m not really looking to pick up another language and become a developer on another project (I fear my motivation would slip away in an instant). Mainly I just wanted to see if I could build something that fit a useful gap.

I think my first step is to build what is in my head and share it - and then we go from there. I certainly wouldn’t be looking to compete with anything already established. As you say, there’s a whole world of complexity sitting under a piece of released software and I’ve spent the past 2 weeks coding enough unit tests at work to last me a while.

That is all theoretically possible - if I can figure out an efficient way of parsing all the code.

My idea is that the tool would take the string:
*if (variable > 90)
*goto scene

And the trick would be to link the *goto to the *if. So that it can take each path and its current stats, evaluate against the condition and then send the qualifying paths to the scene - whilst the non-qualifying paths continue on.
This requires a method of constructing a schema for the game in some way that correctly links all the commands together. @cup_half_empty has suggested using graphing, which intuitively makes sense as a method for doing this - I just have no idea how it actually works.

Yes, that would be possible (again, predicated on having a methodology to achieve the fundamental parsing of the game). You probably wouldn’t want to output those stats too many times (for all your unique paths) - instead, that’s the idea of using *comment to flag to the tool some key things. There could be one called *comment stat_check - everytime the tool gets to one, it outputs the current state of every run.


Thanks everyone for your thoughts, ideas and suggestions.
Doing ‘simple’ things like checking variable declaration and indents should be relatively straight forward and I think I have a core design that will let me do those kinds of things.

Doing anything in relation to evaluating ‘runs’ of the game will require a robust and efficient methodology for building the schema of the game (choices, options, checks, labels and making sure they are all correctly linked) and then building the functionality on top of that. In other words - don’t get too excited.

Graph theory is not your greatest challenge. There are many interesting problems that can be represented as graph problems, such as “what is the shortest valid route between two cities?” or “what are the common friends of any two random users in a social network?” Solutions for these already exist and there’s quite substantial material available for free online.

From the top of my head, ChoiceScript could be represented as a graph like this:

node.drawio

Your greatest challenge will be creating a custom parser for ChoiceScript. You could use ChoiceScript’s current parser, but 1) the code is proprietary, and 2) tightly coupled with the engine itself. @Sargent built an LSP for his VS Code plugin using XText if I’m not mistaken. I have no idea if you can repurpose that, but it’s worth investigating. I think @CJW also built his own LSP for CSIDE.

2 Likes

That’s what I undoubtedly should have done for my LSP, or used ANTLR, but ended up hand-rolling a parser. I fiddled with making a language definition, but there’s no formal language definition, and I ended up needing to make sure my plugin’s behavior matched how the Choicescript code performed parsing.

3 Likes

That’s basically what made me give up when I was writing an ANTLR grammar. :confused: