Designing an automated tester for Choicescript

I’m currently having a go at writing a python based tool to automatically test Choicescript code. It started out as a good excuse to get back into using python, but now that the tool is taking some good shape, I’m keen to try and release something that is actually useful to the community. So I have a few questions I wanted to pose in order to help me on my way.

A quick overview of what I’m doing and hope to be able to achieve:

  • Checks all your defined variables (creates and temps) and flags any that you haven’t called in your code
  • Checks all called variables and flags any that haven’t been defined in the code (or where the temp definition is potentially after you have called it)
  • Checks all your indents to see if any are not a multiple of 4 - also hopefully also finding cases where you fall out of commands incorrectly too (we’ll see)
  • Validates all your choice blocks
  • Provides stats and metrics on ‘runs’ of the game

The main way this is different (and dare I hope, better) than random test is that the tool will test your whole game in one go. Whilst randomtest ‘plays’ your game in a linear manner and conks out at the first error, this tool will attempt to find all the errors in one go. I’m currently thinking through ideas of how to get the tool to play all possible paths simultaneously.

Question 1: Would you find it useful to have a tool which just spots attributes that haven’t been called, or called attributes that haven’t been defined and checks all your indents? That’s a realistic goal for me to achieve in the next week and if it’ll be useful to people, then I can look to make that available.

Question 2: What common errors in your code does randomtest not do so well at finding? Or are a nightmare to figure out the exact problem for? Or typically rear their ugly head after you’ve published the game? Simply, what are the key areas you would like a tool to help you with?

Question 3: What specific information would you find useful to be output from a tool that has played many/most/all possible paths through your game?

Question 4: Would you find it useful to be able to add comments to your code, which the tool can read, which do things like tell the tool to do specific things. One idea I had was to label certain choices as ‘key choices’ and the tool would provide specific metrics around which options each path could pick/did pick, what the stats were at that point, or whatever.

Question 5: Do you feel that the existing test tools already have you covered and you don’t need additional functionality?

8 Likes

Honestly, I’m not 100% sure I have a use for more testers. I guess having all the errors in a single list without having to go over and over would be good, but I’m not sure how you’d do that since the stuff that it’s picking up will often be game breaking in some way unless it’s just an improperly defined variable. (ie if there’s a missing # in a choice, how will the tester know if it’s a missing # or an incorrect spacing issue? Wouldn’t that potentially cause compounding errors that aren’t actually errors, or missed sections of text downstream?) The only thing that can be annoying with the current ones, is very occasionally the tester doesn’t pinpoint the correct line or character that has caused the issue and requires hunting.

It’s late and I may have misunderstood your points a bit, so please do clarify if I have.

The idea with my tool is that it wouldn’t actually ‘play’ the game. So, if it encounters something that would cause Choicescript (or randomtest) to encounter a fatal error, the tool itself just logs the issue and carries on - it won’t encounter a fatal error itself. In this way it is able to do lots of validation (variable names, indents, etc.) as a block check on the whole code in one go and return everything that fails.

The tricky bit is in getting it to simulate playing the game for multiple paths and doing the correct checks on the code - without it encountering its own fatal errors, and as you say, without it then picking up subsequent issues as errors which aren’t. The ideas for that functionality are half formed right now - and are separate to the above mentioned checks on variables and indents.

You mentioned an issue of a missing # leading to a missed option. Presumably that can only be spotted manually by eye anyway? I’m not aware that randomtest applies any logic to the code it reads to determine if the actual content is correct?
Having said that, as I type this, I think there is a way I could reasonbly identify when a # has been missed off.

It’d be good to get a list of errors that usually can only be found by manually playing the game and spotting them on the screen. I think there’s a few neat little tricks that can be used to infer the context of the text.

3 Likes

Oh no that’s cool, let’s see if I can explain better. So take for example

*choice
   #Left
      You go left
      *goto left

   #Right
      You go right
      *goto right

   # You take the center path
      You stay on the main path
      *goto straight

*label alternateroute
*comment label is coming from an earlier choice

So lets say this happens

*choice
   #Left
      You go left
      *goto left

   #Right
      You go right
      *goto right

   # You take the center path
      You stay on the main path
      goto straight
(-> missing * on the goto statement)

*label alternateroute
*comment label is coming from an earlier choice where the storyline has split.

What should the tester do? Usually you’d get a fall out error. In this case it would go to the alternateroute if it skipped onwards meaning your “staight path” storyline would not get tested, and you could experience ongoing errors in the alternate route if it assumes certain variables have or have not been set that aren’t actually a problem because you never should have been on that part of the storyline in the first place.

This could also potentially happen if you lost the # on the choice if the game didn’t know where to go with the indented text, as it’d no longer look like a choice option.

I could be over thinking issues here.

Edit: On rereading your response, I think you’re going for something different than what I was thinking of. So I’m guessing it’s more of a reads sections of code in isolation down the page sort of situation.

What you’re trying to build is a static checker or code sniffer. You might wanna look those up.

You can already catch indentation errors with CSIDE and the VS Code plugin if I’m not mistaken.

That sounds more trouble than it’s worth. ChoiceScript optimization is a futile endeavor. ChoiceScript is the opposite of optimized by nature. So, saving a few bytes of memory by removing unused variables won’t make much of a difference. Unless you want to pursue it for the funs.

Checking if a variable is called before declaration is hard, because ChoiceScript code can be executed in an arbitrary order.

Instead, you could create a tool that separates declaration and assignment, and hoists all declarations to the top of a file (except for variables declared inside subroutines). This way you can always guarantee a variable exists before it’s used.

This is surprisingly simple. Turn the source code into a weighted directed graph, then perform an algorithm to find all possible paths. There are many implementations available on the internet. The worst part would be to build a custom parser that compiles ChoiceScript source into a graph data structure.

I think variable mutation statistics would help in balancing the game.

3 Likes

I don’t mean to steal your thunder, but my first reaction whenever I see a thread like this is: why don’t you offer to help improve one of the already existing tools instead?

There’s a few reasons why I suggest this.

Firstly, while I don’t mean to make presumptions about your technical ability, if it was an easy endeavour, do you not think it would have already been done (by CoG themselves no less?).

Second, even if getting something started was easy, there’s a actually a lot of responsibility that comes with developing and publishing a tool like this. Once people from the community start relying on and using your tool, there is an implicit expectation that you’ll stick around and maintain and improve it. Imagine if I just disappeared tomorrow and never contributed to CSIDE again? I’m sure the world wouldn’t end, but over time (even if only because it’s version of ChoiceScript didn’t get updated), that would inconvenience a not insignificant amount of people. So not only does it need to keep up-to-date with things like the latest versions of ChoiceScript, but it also needs to be of a very high quality as well. There’s little worse than a tool – that’s meant to be helpful – having its own bugs and issues that distract you from writing or debugging your game. Tests for something like this will be all but essential, and writing good ones isn’t trivial (nevermind implementing the actual functionality). Don’t forget documentation either.

Third and final point: Both CSIDE and the vscode plugin by Sargent have (or will have) a lot of behaviour similar to what you list above. I’m obviously biased, but I do think it’d be more beneficial for the community to have a smaller number of higher quality tools. Not only because expertise and effort can be focused, but also simply because it makes a new author’s introduction to ChoiceScript easier: there’s less choice and opinion, and the suggested tool paths are obvious, clear and well supported (just imagine if we had three different wikis; how confusing would that be?!).

So, to finish: I’d personally be delighted if anyone offered to help accelerate and/or improve features like this in CSIDE. Heck, I don’t even think I’d mind helping mentor/teach people programming to do so, if they’re committed. I’m fairly sure Sargent would feel somewhat the same about the vscode plugin.
Therefore, if you have this energy and drive to create such tooling, I would implore you to consider offering your services to improving already existing solutions. I truly think it would be the best thing for everyone (including you).

That said, it is just a request/suggestion. You need to do what you want to do in the end (that’s the only thing that’ll keep you going). There’s no monopoly here, and I don’t intend to try and start one. But hopefully this at least gives you some food for thought :slight_smile:

If you do go ahead on your own, I wish you the best of luck.

6 Likes

Figuring out why a crash happens, sometimes. It’d be nice to know what all the variables are at that moment. Also would be nice to page back a step or two and see what the variables were there then, too.

How often each variable is tested, and against which target values. With a per-chapter breakdown, maybe statistical analysis of each variable and the spread of target values it’s tested against. The user could limit it to only specified variables (say, only the character stats) to reduce the amount of information.

Maybe. Key choices would be cool.

I’d worry about infinite loops being encountered by this tool, though. How are you planning to detect and escape them?

1 Like

Somehow I very much doubt that. But you make a great point.


It depends on implementation, but generally speaking a statical analyzer would not execute the code only map the nodes and edges between nodes avoiding nodes that have already been visited. Algorithms to calculate cyclomatic complexity of code do this.

2 Likes

Absolutely.

3 Likes

Hmm. That would duplicate the behavior of Quicktest, though, not Random. Without executing the code and state-tracking, it wouldn’t be able to determine which paths are accessible and which are theoretically plausible but actually unreachable because the variables can’t hit the target values. If the author has a *goto or an #option gated behind (var > 90), but var never exceeds 80, this tester won’t be able to tell unless it executes the code and determines the variable states.

1 Like

Do you think it could calculate the minimum and maximum range of variables by certain parts of the game?

Eg) It would be nice to know the min and max of a variable like strength for each scene.

End of Scene 1:
Strength, min: 35 to max: 65

End of Scene 2:
Strength, min: 30 to max: 75

Or something like that.

5 Likes

I second this; this would be an amazing feature for any testing program. I’m guessing we all have Excel sheets where we crunch the possible stat ranges at various points in our games and yeah the math is elementary school level stuff, but it’s still time intensive, and it’s time spent not writing.

3 Likes

In a way, you’re right. But there are ways around it, if you consider the weight of an edge the predicates. The algorithm would definitely be slower, but would be worth it.

Some occurences can be accounted for with a bit of flexibility in the coding, which allows it to infer the purpose of what it is reading (if it encounters a *choice and then a string that has a # but a space after it, it knows it is in a choice and can infer it is now reading an option - depending what functionality you’re trying to build, that might be enough).

The missing * is a good one, or an errant space between the * and the command. A static read of the code wouldn’t know that the * has been misplaced (though you can code it to check for this if you want). I’ve noted that down as something to consider - it’ll depend a lot on how any parsing of the code actually works as to how to pick up on those errors.

Thanks for the thoughts @cup_half_empty, given me some things to think about.

I’m not familiar with the VS code plugin, but I must be missing where CSIDE picks up on indents (unless you’re referring to the fact that random/quick test will spot them for you?)

Identifying unused variables (and duplicates) sort of comes for free. I have a list of defined and called variables to find the ones that are called without being defined, so doing the check in reverse is pretty simple. As you say, it’s largely pointless and more for those than want to be a little anal about their code!

On the declaration before calling - absolutely. So the check will first look in the startup and see if it was created in there, if so, then you’re golden. As you say, for temps, the code itself may not be in order - so that is an assumption I have written into the code (at least for the static check of variables), the idea is that any errors here would be written out as warnings rather than errors so that the user can go check themselves.

I’ve never really wrangled with graph theory. Are you able to direct me to a resource that would help explain how I would pursue all the valid edges of the graph for each distinct ‘run’ of the code?
I have ideas for how to do this myself - the main challenge I had forseen, as you say, was representing the game schema inside some kind of shell (graph, dictionaries, whatever).

I very much appreciate your thoughts and where you’re coming from.

I started this as a route to playing with python on a regular basis. I wasn’t sure I could even achieve the basic ideas I had, but once those started to come into place I figured that I could see if this would actually be useful - hence the thread. I guess I’m not really looking to pick up another language and become a developer on another project (I fear my motivation would slip away in an instant). Mainly I just wanted to see if I could build something that fit a useful gap.

I think my first step is to build what is in my head and share it - and then we go from there. I certainly wouldn’t be looking to compete with anything already established. As you say, there’s a whole world of complexity sitting under a piece of released software and I’ve spent the past 2 weeks coding enough unit tests at work to last me a while.

That is all theoretically possible - if I can figure out an efficient way of parsing all the code.

My idea is that the tool would take the string:
*if (variable > 90)
*goto scene

And the trick would be to link the *goto to the *if. So that it can take each path and its current stats, evaluate against the condition and then send the qualifying paths to the scene - whilst the non-qualifying paths continue on.
This requires a method of constructing a schema for the game in some way that correctly links all the commands together. @cup_half_empty has suggested using graphing, which intuitively makes sense as a method for doing this - I just have no idea how it actually works.

Yes, that would be possible (again, predicated on having a methodology to achieve the fundamental parsing of the game). You probably wouldn’t want to output those stats too many times (for all your unique paths) - instead, that’s the idea of using *comment to flag to the tool some key things. There could be one called *comment stat_check - everytime the tool gets to one, it outputs the current state of every run.


Thanks everyone for your thoughts, ideas and suggestions.
Doing ‘simple’ things like checking variable declaration and indents should be relatively straight forward and I think I have a core design that will let me do those kinds of things.

Doing anything in relation to evaluating ‘runs’ of the game will require a robust and efficient methodology for building the schema of the game (choices, options, checks, labels and making sure they are all correctly linked) and then building the functionality on top of that. In other words - don’t get too excited.

Graph theory is not your greatest challenge. There are many interesting problems that can be represented as graph problems, such as “what is the shortest valid route between two cities?” or “what are the common friends of any two random users in a social network?” Solutions for these already exist and there’s quite substantial material available for free online.

From the top of my head, ChoiceScript could be represented as a graph like this:

node.drawio

Your greatest challenge will be creating a custom parser for ChoiceScript. You could use ChoiceScript’s current parser, but 1) the code is proprietary, and 2) tightly coupled with the engine itself. @Sargent built an LSP for his VS Code plugin using XText if I’m not mistaken. I have no idea if you can repurpose that, but it’s worth investigating. I think @CJW also built his own LSP for CSIDE.

2 Likes

That’s what I undoubtedly should have done for my LSP, or used ANTLR, but ended up hand-rolling a parser. I fiddled with making a language definition, but there’s no formal language definition, and I ended up needing to make sure my plugin’s behavior matched how the Choicescript code performed parsing.

3 Likes

That’s basically what made me give up when I was writing an ANTLR grammar. :confused:

Thanks @cup_half_empty for the pointers. I’ve been working away at getting my basic ideas into a usable ‘tool’ - I have a couple of things I want to take a look at along those lines before I tackle the big challenge of the simultaneous playthroughts.

On that note, I have a working python script which:

  • Just needs to sit inside a folder alongside your CS projects. Run the script, put in the folder name of the project you want to test and off it goes.
  • Takes less than 10 seconds to run (I’ve borrowed the latest version of Zombie Exodus off DashingDon to test against)
  • Generates two easy to read csv outputs that list all of the errors found
  • Parses all code and prose, in all files in the project.
  • Identifies variables created twice, or defined as temps twice in same file
  • Identifies variables that are defined but never called
  • Identifies variables that are called but not defined (or defined as a temp on a later row)
  • Handles variables using # to denote a character in another variable value
  • Handles variables using [ ] to call the value of another variable to construct the variable name (partial - it just checks that the variable in the [ ] is valid and that the trunk of the variable using the [ ] exists (i.e. before the underscore)
  • Identifies lines that are incorrectly indented in a manner that would cause a fatal exception in Choicescript

As said, this has largely been a project for me to play around with python and learn some new skills. I’ve had a lot of fun and frustration trying to figure it out and handles all the weird edge cases I could find.
I’m fairly happy that what I have produced is pretty accurate in finding errors (though there will no doubt be many edge cases).

I would greatly appreciate anyone who is interested taking the tool for a spin and:

  • Finding cases where it reports an error that is not actually an error
  • Finding cases where it doesn’t report an error that causes a fatal exception in CS
  • Suggesting any usability, documentation or other bits and pieces that you’d like to see
  • Suggesting any specific errors in CS or issues that are common that you’d like to see the tool handle.

You can get the code from my GitHub: GitHub - SinNation/CStest: Choicescript testing tool in python
You just need python installed on your computer, then put the script in a folder alongside your other projects, like so:
image
Then double click to run the code, type in your project folder name. Then the output for the test will be inside a newly created folder within the CStest folder - it will give you the directory when the test runs.

The next thing I am going to work on is getting it to identify when the code falls out of a choice incorrectly.

As a quick test of the tool, @JimD I hope you don’t mind but I borrowed your code from DashingDon to run the tool against. You have a lot of words and a lot of complex code, so it was an excellent benchmark.
My last run reported two variable errors (which are not real errors) and one indentation error.
On line 3,178 in Chapter 10, you have a line of prose with an indent of 1 and then a line below with no indent. Are you able to confirm that is indeed a bug in your code?

4 Likes

Hey! Way to go! Congrats! :partying_face:

1 Like

You should also check for variables of the same name defined with both a *create command in startup and a *temp in later files. That can cause very subtle bugs if you’re not careful.

This is giving me an error when I try to run it:

Traceback (most recent call last):
  File "C:\...\CStest-main\CStest.py", line 6, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'
2 Likes