Data Bits: App Reviews Are Warmer And Fuzzier Than You May Think

We recently introduced a brand new section to the blog called Data Bits. This is where we choose an interesting set of data, analyze it, and turn it into bite sized blog posts for your reading pleasure.

Last time we looked at app reviews and how many of them a typical developer can expect to have. This time we’re diving a bit deeper into the data to find out exactly what sort of message those reviews are trying to communicate.

The data

While there are tons of reviews in almost every language, in this post we’ll be focusing specifically on ones written in English. Most text analysis tools are built around English which makes those reviews easier to analyze. It also happens to be the language we’re most familiar with.

After some internal discussion over the best way to slice the data by language, we decided to grab a slice of iOS and Mac App Store reviews from these major English-speaking countries: US, Canada, UK, Australia, and New Zealand. Our sample comes out to roughly 25 million individual reviews–more than enough to give us an idea of what’s going on in there.

Visualizing it all

Next we needed to decide how to analyze and present so much content. After much brainstorming we settled on a method which is both simple and effective: word clouds. Once all the pieces were in place, we threw all 25 million reviews into the blender and here’s what came out:

click images to enlarge

We were pretty surprised at how positive this word cloud seems to be. Being app developers ourselves, we’re quite familiar with how picky reviewers tend to get, and we assumed that reviews would have been a bit less glowing and slightly more critical. So we ran the numbers again but the same results came out. ‘Great’, ‘love’, ‘fun’, and ‘good’ are used way more often than words like ‘poor’, ‘useless’, ‘waste’, and ‘sucks’.

And that’s it… NOT

Just because a word is positive or negative on its own doesn’t mean there aren’t other words in the sentence modifying it. While evolution has fine-tuned us humans to identify such language nuances, it’s not so easy for a computer. So we started tinkering with the data to see if there’s anything clever we can do to get a better idea of the context around each word.

We started off by sectioning the reviews according to their star rating. We figured that the star rating (1 – 5) of a review is usually a good indication of its overall sentiment.

We turned to the blender once more, this time creating a cloud of words from only 5-star reviews.

Compare that with a word cloud of all 1-star reviews:

There’s a definite contrast here, showing that words like ‘love’ and ‘beautiful’ aren’t thrown around as much in very negative reviews, while words like ‘crashes’ and ‘waste’ aren’t very popular in positive ones. We did the same breakdown with star ratings 2 through 4, and, as expected, there was a gradual change in the use of positive and negative words.

Adding some color

Armed with this new information we decided to try something crazy: We’ll assign a ‘positivity’ score to each word depending on how often it appears in positive (highly rated) reviews and how often it appears in negative (low rated) reviews. We then recreated the original word cloud, this time coloring words with a high score green, those with a low score red, and everything in the middle gray.

We weren’t sure what to expect out of this experimental analysis method, but it turned out to be pretty spot-on. We were surprised at how well the algorithm does at coloring words with a negative connotation (such as ‘crashes’, ‘waste’, and ‘useless’) red, while highlighting the positive ones (like ‘great’, ‘love’, and ‘good’) in green.

So it looks like what we suspected originally about the critical and picky reviewer was wrong, and that the first word cloud above was pretty telling on its own: there are way more positive things being said about iOS and Mac apps than negative. Who would have thought?

5 thoughts on “Data Bits: App Reviews Are Warmer And Fuzzier Than You May Think”