Users often leave reviews for apps they like to share their experiences. The purpose of this project is to analyze app reviews. I am particularly interested in apps related to movies, so I selected two popular ones: IMDb and Fandango. I scraped publicly available user reviews for these two apps. My goal is to analyze and compare them to understand user experiences better.
How users leave reviews for apps?
What are their needs and pain points?
What are the experience of each app look like?
What are insights for improving the app?
Understand the review data and review related to time
Understand users' needs and pains, identify opportunities from reviews
Analysis review data to understanding user experience of app
Combine all the aspects to identify key opportunities
Basic Exploration Analysis
Time-based Analysis
Sentiment Analysis
Language Pattern Analysis
Developer Response Analysis
Python
title: Title of the review.
rating: Numeric rating given by the user (likely from 1 to 5).
isEdited: Whether the review has been edited or not.
review: Text of the user's review.
userName: Username of the reviewer.
date: Timestamp of when the review was posted.
developerResponse: Response from the developer, if any.
app_name: Name of the app being reviewed and it is same for 'imdb-movies-tv-shows'
app_id: ID of the app. which is the same for all entries: 342792525
title: The title of the review.
rating: The user's rating, ranging from 1 to 5 (integer).
isEdited: Boolean indicating whether the review was edited.
review: The text of the review.
userName: The username of the reviewer.
date: The date the review was posted, formatted as YYYY-MM-DD HH:MM:SS.
developerResponse: The response from the developer, if any, mostly null but some are JSON objects containing an ID and response body.
app_name: The name of the app, which is 'fandango-movies' for all entries.
app_id: The application ID, which is the same (307906541) for all entries.
Total Entries: 41,580 rows
Mean Rating: 4.62
Standard Deviation: 0.96
Minimum Rating: 1
25% : 5
50% : 5
75%: 5
Maximum Rating: 5
Distribution shows strong positive skewness (most ratings are high)
Total Entries: 23,540 rows
Mean Rating: 4.41
Standard Deviation: 1.21
Minimum Rating: 1
25% : 4
50% : 5
75%: 5
Maximum Rating: 5
Ratings are skewed towards the higher end, with 50% of ratings at the maximum of 5.
Rating Volume:
IMDb received significantly more 5-star ratings and slightly more 4-star ratings compared to Fandango, as shown in the frequency graph.
Distribution Pattern:
The overall rating patterns for both apps are quite similar, with a strong tendency towards the highest rating, indicating high levels of user satisfaction. This is evident in the percentage graph.
Most reviews on both IMDb and Fandango are medium-length (51-300 characters) and are generally positive (5 stars).
Short reviews (0-50 characters) also tend to be 5 stars on both platforms.
Very long reviews (over 500 characters) are rare but still tend to be positive (5 stars).
IMDb has slightly more 4-star ratings in the medium-length reviews compared to Fandango.
Early Fluctuations (2011-2012): Ratings varied a lot, with some months very high and others quite low, indicating initial instability or mixed user experiences.
Stable High Ratings (2012-2013): Ratings became more stable and mostly high, suggesting the app might have undergone improvements.
Periodic Drops (2013-2015): There were several drops in ratings, especially around 2015, where ratings briefly fell below 3.0, indicating user dissatisfaction due to bugs, changes in the app, or other issues.
Fluctuations with a Decline (2016-2018): Ratings fluctuated more and generally declined, but mostly stayed above 4.0, suggesting there are recurring issues affecting user satisfaction.
Drop and Recovery (2018-2020): A sharp drop in ratings occurred around 2018, followed by a gradual recovery, likely due to addressing major issues.
Recent Trends (2020-2021): Ratings have been more up and down recently, but issues seem to be active managed, leading to recovery.
Initial Variability (2011-2012): Ratings varied significantly, with some months reaching near-perfect scores and others dropping below 3.0, indicating mixed user experiences.
Stable High Ratings (2012-2017): Ratings became more stable and consistently high, mostly around 4.5 to 5.0, suggesting improvements in the app and high user satisfaction.
Significant Drop (2018-2020): Around 2018, there is a noticeable drop in ratings, indicating a period of user dissatisfaction. Ratings show a gradual decline with more fluctuations, often dropping below 4.0, suggesting recurring issues affecting user satisfaction.
Major Drop and Recovery (2020-2021): A significant drop occurs around 2020, with ratings briefly falling to around 1.0, followed by a recovery. This suggests a major issue that was eventually resolved.
Recent Stability with Fluctuations (2021-2022): The ratings show more fluctuations but generally hover around 4.0, indicating ongoing efforts to maintain and improve user satisfaction.
Stability:
IMDb: Generally more stable with ratings frequently above 4.0.
Fandango: Less stable with more frequent drops below 4.0.
Average Ratings:
IMDb: Maintains higher average ratings over the period, indicating better overall user satisfaction.
Fandango: Lower average ratings due to more frequent dips, suggesting more recurring issues.
Significant Drops:
IMDb: Occasional dips below 4.0 but quick recoveries.
Fandango: More pronounced and frequent drops below 4.0, with some prolonged periods of lower ratings.
High User Satisfaction Period: From around 2011 to 2013, the app received a very high number of ratings, mostly 5 stars, indicating a period of very high user satisfaction.
Decline in Ratings: After 2013, the number of ratings dropped significantly but still maintained a majority of high ratings.
Stabilized Trends: From 2014 onwards, the number of ratings has been stable but much lower than the peak, with consistent high ratings indicating sustained user satisfaction.
High User Satisfaction Period: From around 2013 to 2016, the app received predominantly high ratings, indicating a period of high user satisfaction.
Increased Variability and Decline: After 2016, there is an increase in the number of ratings but also more variability, with a higher number of lower ratings indicating mixed user experiences.
Recent Trends: From 2018 to 2021, there is a decline in the total number of ratings and increased variability, suggesting fluctuating user satisfaction and potentially unresolved issues.
Rating Volume:
IMDb: Peak around 2012 with very high numbers, followed by a sharp decline and stabilization at a lower volume.
Fandango: Increasing trend until 2018, followed by fluctuations and a decline.
User Satisfaction:
IMDb: Mostly high ratings throughout, even during lower volume periods.
Fandango: High initial ratings with more variability and significant drops in recent years.
Stability:
IMDb: More stable in maintaining high ratings over time despite the drop in volume.
Fandango: More fluctuations with periods of significant drops indicating more frequent user dissatisfaction.
Correlation with Updates/Events:
Correlate peaks and valleys in ratings with related events, such as, app updates, marketing campaigns, or other significant events.
Sentiment Analysis:
Analyze the sentiment of reviews to provide more context to the ratings trends and related events.
Investigate Anomalies:
Further investigate time periods with unusual increases in low ratings along with related events.
Positive: 72.08%
Neutral: 23.10%
Negative: 4.82%
Positive: 74.79%
Neutral: 18.57%
Negative: 6.64%
I tried to examine how effectively the sentiment analysis categorized reviews into different sentiments. Overall, it is mostly accurate, but there are still some occasional mistakes. For example, " I cannot stand Rotten Tomatoes! Terrible reviews." or "I’ve used other movie apps. They are all horrible compared to fandango!!!" is rated as super negative.
Positive: 62.14%
Neutral: 34.51%
Negative: 3.35%
Positive: 58.28%
Neutral: 36.64%
Negative: 5.09%
I tried to examine how effectively the sentiment analysis categorized review titles into different sentiments. Overall, it is mostly accurate, but there are still some occasional mistakes. For example, "Not happy with this App" or "Not user friendly " is rated as positive somehow.
Positive Correlation: Both platforms show that higher ratings tend to have more positive sentiment, but this relationship is stronger for Fandango compared to IMDb. Moderate positive correlation (0.4459) for Fandango vs Weaker positive correlation (0.2746) for IMDb.
IMDb: There is a wider spread of sentiment polarity for each rating, indicating more variability in the language used in reviews across different ratings.
Fandango: The sentiment polarity is more tightly clustered around higher ratings. This means that users giving higher ratings tend to use more positive language.
Positive Correlation: Both platforms show that higher ratings tend to have more positive sentiment, but this relationship is stronger for Fandango compared to IMDb. Moderate positive correlation (0.4198) for Fandango vs Weaker positive correlation (0.3242) for IMDb.
IMDb: There is a wider spread of sentiment polarity for each rating, indicating more variability in the language used in review title across different ratings.
Fandango: The sentiment polarity is more tightly clustered around higher ratings. This means that users giving higher ratings tend to use more positive language.
Positive Correlation: Both platforms show that higher ratings tend to have more positive sentiment, but this relationship is stronger for Fandango compared to IMDb. Moderate positive correlation (0.2871) for Fandango vs Weaker positive correlation (0.1474) for IMDb.
Processing Title Text:
Tokenization: remove punctuation, remove stopwords (ex. app)
Stemming and Lemmatization: Various words in English have the same meaning. There are two main methods for handling tasks such as recognizing "strike, striking, struck" as the same words. Stemming refers to the removal of suffixes, like “ing”, “ly”, “s”, etc. by a simple rule-based approach. I implemented the PorterStemmer method from nltk.
N-grams: N-grams are the combination of multiple words used together to help identify common phrases or collocations in text.
Prominent Words: "update," "search," "ad," "terrible," "crashes"
Interpretation: Users giving a 1-star rating frequently mention issues with updates, search functionality, and advertisements. Words like "terrible," "crashes," and "broken" indicate significant dissatisfaction and frustration.
Prominent Words: "great," "awesome," "best," "love," "perfect"
Interpretation: Users giving a 5-star rating highlight positive aspects such as the app being "great," "awesome," and "perfect." Words like "love," "useful," and "best" reflect high satisfaction and appreciation.
Prominent Words: "convenience," "fee," "service," "customer," "service", "gift card"
Interpretation: Users giving a 1-star rating frequently mention issues with customer service, convenience, and fee. Words like "terrible," "error," and "horrible" indicate significant dissatisfaction and frustration.
Prominent Words: "great," "love," "convenient," "awesome," "easy"
Interpretation: Users giving a 5-star rating highlight positive aspects such as the app being "great," "awesome," and "perfect." Words like "love," "useful," and "best" reflect high satisfaction and appreciation.
Back Message Board and Bring Back Message are the most frequent trigrams, indicating users with lower ratings often mentioned wanting the return of the message board feature.
Phrases like Board Gone, Search Bar, Ad Ad Ad, and Please Bring Back suggest that users are expressing dissatisfaction with the removal of certain features and the presence of ads.
Other common trigrams such as Worse Every Update, Can't Watch Anything, and App Broke show frustration with app updates and functionality issues.
High-rated reviews are characterized by strong positive affirmations and frequent use of the app, indicating a high level of user satisfaction.
"Poor Customer Service," "Worst Customer Service," and "Horrible Customer Service" are the most frequent trigrams. This shows that users with low ratings often complain about the customer service.
Phrases like "Fandango Gift Card," "Convenience Fee," and "Can't Use Gift" suggest problems with gift cards and additional fees are common complaints.
Other frequent phrases include "Worst App Ever," "Convenience Fee Rip," and "App Doesn't Work," highlighting dissatisfaction with the app’s functionality and extra charges.
Users frequently praise the app as the best for buying movie tickets, highlighting its convenience, ease of use, and overall positive experience.
IMDb : The developer response rate is extremely low, with only 1.6% of reviews receiving a response. This suggests a significant opportunity for developers to improve user engagement by addressing more reviews.
Fandango: The developer response rate is low, with only 6.4% of reviews being responded to. This indicates that developers might need to engage more actively with their users.
IMDb : Developers' responses tend to be varied, with a mix of positive, neutral, and negative sentiments. Positive responses are the most frequent, but there is still a considerable number of negative responses, which might affect user perception.
Fandango: Developers predominantly provide neutral responses, which can seem impersonal or non-committal, likely because they rely on a standard, universal reply. The absence of positive responses indicates an opportunity to improve by engaging more positively with users.
IMDb : Graph illustrates that reviews with developer responses are mostly low-rated, around 1 to 2 stars, whereas reviews without responses are primarily high-rated, around 4.5 to 5 stars. This suggests that developers are more likely to respond to negative feedback.
Fandango: The trend is consistent with IMDb, where developer responses are mainly directed towards low ratings. This strategy aims to mitigate dissatisfaction and possibly improve user experience.
Common Pattern: Both IMDb and Fandango show a significant peak in developer responses in early 2020, followed by a sharp and sustained decline. This could be due to a specific event, policy change, or shift in strategy around that time.
IMDb: Developer engagement increased steadily until early 2020, then dropped and stayed low. This indicates a potential shift in response strategy or resource allocation.
Fandango: Similar to IMDb, but with more pronounced fluctuations before the early 2020 peak. The dramatic drop to nearly zero responses suggests a significant change in approach to handling reviews and user feedback.
IMDb: Developer response lengths vary considerably over time, with several peaks and troughs. This indicates a less consistent approach in how developers craft their responses, possibly reflecting varying strategies or levels of detail depending on the issue addressed.
Fandango: Developer responses are highly consistent in length, suggesting a standardized approach. This could indicate the use of template responses or a fixed style of engagement with users.