Hosted Games Data Sheet: For All You Number Nerds Out There

The popularity on the included bargraph is the average for each genre. The boxplot also shows the median and quartiles. I calculated popularity simply by adding the omnibus ratings count and Google ratings count for each game (not the best metric probably, but it’s what we have). It just so happens that the genres with the most games are also the genres that have the highest average review counts per game.

Check out this hover-able graph of games by genre and popularity.

Also, I’ve done a regression analysis of popularity that includes sub-genres, encoding them in the same way as primary genres (since they’re mostly the same words):

Summary of regression results including sub-genres

Well, the R^2 is a bit higher here, at 0.363 vs 0.351 for the primary genre-only model. So that’s good, indicating that the sub-genre provides some additional information. The coefficients for Word Count and is_free are pretty similar as before.

The interesting thing is that it totally changes the directions of some of the coefficients for the genres:

  • Romance is now the best genres, giving 1080 to 5558 additional reviews (it was negative when only considering primary genre; I’m blaming Wayhaven).
  • The bonus from Fantasy is no longer significant no longer significant (the range is -408 to 2485).
  • Superhero and Supernatural still give significant bonuses.

In order of highest to lowest predicted popularity gain, the genres are:

  1. Romance*
  2. Superhero*
  3. Supernatural*
  4. Post-apocalyptic
  5. Fantasy
  6. Historical
  7. Slice-of-life
  8. Spy
  9. Steampunk
  10. Mystery
  11. Humor

and these genres are predicted to cause popularity loss:

  1. War
  2. School
  3. Sci-Fi
  4. Puzzle
  5. Horror
  6. Adventure

A * indicates a statistically significant result at p<0.05.

I’ve also done a bit of analysis on the ratings. Here’s an interactive plot of games with ratings by genre.

Omnibus vs GPS ratings - They generally correlate pretty well.

Ratings vs popularity - on both platforms, more rated games also tend to have higher ratings.


Summary of regression results for predicting omnibus ratings given genre and word count

Interpretation:

  • The R^2 is 0.434, which is not bad.
  • The “baseline” rating is about a 4.
  • Higher word counts give higher ratings.
  • Free games have a lowered rating by about 0.24 stars.

Genres sorted by rating bonuses:

Adventure
Spy
Fantasy*
Superhero*
War*
Supernatural*
Slice of Life*
Post-Apocalyptic*
Crime*
Horror*
Sci-Fi*
Historical
Romance
Mystery
Steampunk

And here are the predicted “negative” genres:
Puzzle
School
Humor

Again, a * indicates statistical significance at p<0.05.

Interestingly, Adventure and Spy were some of the worst performing genres in terms of popularity. But they’re both literally sample sizes of 1. So is School.

13 Likes

Okay, since I’m unhinged I did another analysis on “underrated” games. In order to determine which games are underrated, I did a linear regression of rating vs popularity, and took the games with the highest residuals. That is, which games are most highly rated compared to what their popularity would predict? Here are the results:

Most underrated games on omnibus
  1. Trees Don’t Tell
  2. The Dryad’s Riddle
  3. Lost in the Pages
  4. Diamant Rose
  5. Captive of Fortune
  6. Foundation of Nightmares
  7. Highlands, Deep Waters
  8. Divided We Fall
  9. Guns of Infinity
  10. Starship Adventures
Most underrated games on Google Play
  1. Keeper of the Day and Night (probably due to recency)
  2. Guns of Infinity
  3. The Butler Did It
  4. The Dryad’s Riddle
  5. Trees Don’t Tell
  6. The Saga of Oedipus Rex
  7. Relics of the Lost Age (again, recency)
  8. The Harbinger’s Head
  9. The Volunteer Firefighter
  10. A Study in Steampunk

So, what do you think of Trees Don’t Tell?

Detailed methodologies are in the analysis notebook I linked earlier. Choice of games should pay me for doing this.

18 Likes

First off, I love all of this. You are doing some great work here. One thing to keep in mind with the omnibus is that it would have an inherent skew against older titles that never had a sequel (or their sequel was also older, like Way Walkers); if the games released before the omnibus itself did in 2018 (or released early into the life of the omnibus when fewer people had downloaded it), many people that owned it as a standalone might never have gotten it downloaded onto the omnibus as well. This could lead to disparity where it has a seemingly outsized number of GPS or Steam reviews compared to omnibus ones.

I mention the sequel thing because for games like Wayhaven or Evertree, the presence of newer sequels causes the predecessors to rocket up the omnibus charts once the followup comes out. Wayhaven 1 used to ‘only’ have like 2,000 omnibus reviews before 2 released, then it rocketed up about 6-8k literally overnight. While some of this is people buying the old one with the new one, a lot of it is likely also people getting around to converting their older standalone purchase onto the omnibus.

@autumnchen As for Trees Don’t Tell, I cannot speak to it individually, as I haven’t played it. But I think that in the last couple years there’s been a rise in titles that have an unfortunate perfect storm of being in unpopular genres (horror and mystery are both tough rows to hoe in HG), having low word counts, yet still having relatively high prices. See also Journey Into Darkness, which had no score because it hasn’t had enough reviews yet on GPS, or Macabre Mansion, the only title in the last few years to not yet reach even 1,000 installs on Google Play. It’s not at all a comment on their quality, just their curb appeal, which is basically what I was trying to measure here. As for Trees having a high score but a low number of reviews, that one is easy. If the writer and a couple of their friends or more loyal readers read a title and give it 10/10, normally that has little real impact. But if that game’s only got 9 total omnibus reviews, suddenly it’s a big honkin’ deal. Odds are that the score will normalize as people slowly find it and rate it. I remember Floating City was at the top of the highest rated chart on the omnibus for a bit, because it had 5.0 on 1 or 2 reviews. But ultimately that’s not sustainable.

@Lan The asterisks reflect the handful of games whose links on the main CoG website for the Google Play store actually redirect to the HG omnibus instead of the page for that particular game. I noted that since it could lead to them having relatively fewer reviews for their GPS standalone offering, as it would be harder to find. I imagine that was due to the standalone offering releasing later than expected, as it sometimes does. Happens a lot more often with Amazon, but I didn’t include their info on the chart because they are such a pitiful amount of sales that it doesn’t really have relevance. I wish I had also marked for all the ones that didn’t have a Steam link on the website, because I know some of the older ones didn’t. I would usually check a game’s name on steamdb.info to make sure it didn’t release there. This is every HG that is on the omnibus, which I believe is all of them. Not sure what the deal is with Wizard’s Choice so I left it alone.

As for whether I would do CoG, I don’t know. It took a while to do this, but it’s not out of the question.

15 Likes

Thank you! I actually saw that there wasn’t a significant relationship between popularity and year of release, or between popularity and # of works by author. I was combining GPS and omnibus ratings so the effects might have canceled out.

In the omnibus alone, there was a slight trend of increasing popularity by year, but it was weak (R^2=0.05). In GPS, there was a slightly decreasing trend, but it was very weak (R^2=0.02). But again, this is very small data with a lot of outliers. So, when doing the regression for popularity, adding either a years term or a >=2018 term does not have a significant effect. Neither does adding a # of works by author term. (they do change some of the genre rankings around but most of those weren’t significant anyway)

However, when only looking at the omnibus ratings count, adding a >=2018 term does have a pretty significant effect.

With regards to the high-rated games, yeah, you’re right about the effect of low numbers of reviews (but beyond the first two, the others do have decent numbers of ratings). I should be using Bayesian averaging. But I feel like I’ve spent way too much time on this already :weary:

8 Likes

That’s a good point about how the relatively low omnibus ratings for older titles would largely be canceled out by their extremely high GPS ones simply by virtue of being on the store for so long. Some of the early titles that have reached 10k or 50k GPS installs would only be at 1k or 5k if they released nowadays, but the combination of the relative lack of options back then and the sheer impact of 9 or 10 years of availability counts for a lot.

6 Likes

Interesting bits of insight.

It should! :joy:

5 Likes

@hustlertwo did you collect it by hand or did you use some kind of scraper?

2 Likes

can anyone explain what is going on ?

2 Likes

By hand. I’d finish this or that task at work and then do whatever the next row of three games was on the omnibus when they were ordered by release date, going from the bottom up. Look it up on GPS in one tab, Steamdb.info on another and then get the omnibus data, word count, genre classification and release date from the omnibus itself. Rinse and repeat 161 times.

@Jaydeepsinh_Dabhi Nothing to see here, just nerds being dorks.

9 Likes

Oh my, this thread made my little HG heart flutter!! Thank you for this diligent and excellent work :heart::heart: Not only is this thread insightful it goes to show how much y’all care about these games and I’m just :sob::sob:

I will certainly be pouring over this data later, so thank you again @hustlertwo and @autumnchen

15 Likes

Not a problem, glad you like it!

Thanks for this. Just an FYI UnNatural does have a Steam page.

2 Likes

I appreciate the heads-up; that must have been before I realized some games were missing Steam links on the website. I’ve added that data in.

2 Likes

No worries. It makes for interesting reading indeed.

1 Like

Could you explain what you meant

Honestly, I thought that was the explanation. That’s how I gathered the data. I looked up each game on the Google Play store, Steam, and the omnibus, then wrote down what I found on an Excel sheet.

3 Likes

That’s dedication right there, people. :point_up:

The good news is that’s what scrapers are for. Really handy. The bad news is that you can’t download a ready-made solution. You’d have to build it. Well, apparently you can.

2 Likes

If I do it again I’ll just do it the same way. I enjoy it. I was probably meant to be an actuary, but that’s not really a career you think about when you’re younger.

5 Likes

Horribly late to this thread, but I must say: I love it. Thank you SO MUCH for doing all this maths @hustlertwo and @autumnchen . Incredibly insightful and important! :star_struck:

4 Likes