Build a Movie Recommendation Flask Based Deployment

Tharun P

Published in

Analytics Vidhya

8 min readAug 28, 2021

Create your own movie recommendation API for other developers to use it in their app or website

“Every time I go to a movie, it’s magic, no matter what the movie’s about”.
— Steven Spielberg

The live demo of the application can be checked HERE

Hi there, I am going to explain the following:

Data Pre-processing
Building Movie Recommender Machine Learning Model
Build a REST API using Flask
Test it on local host 127.0.0.1:5000— API goes online

The recommendation system generates a list of recommendations in one of two ways:

Collaborative filtering: Collaborative filtering methods build models based on past user behavior (i.e. items purchased or searched for by users) and similar decisions made by others This model then used to predict items (or item ratings) that the user might be interested in.

Filtering based on content. Content-based filtering methods take advantage of some of the unique characteristics of an item to suggest additional items with similar properties. Content filtering is based solely on the item’s description and profile of the user’s interests, and recommends items based on the user’s past interests.

So, without ado, let’s get started. Also skip any part if you are used to it.

Section 1: Data Pre-processing

Let’s look at the dataset first. We are going to use TMDB5000 DATASET from Kaggle.com. The dataset consists of 2 files, namely, tmdb_5000_credits.csv & tmdb_5000_movies.csv. There is another dataset called THE MOVIES DATASET which has more than a million movie reviews and ratings. However, I did not use it for 2 reasons.

The dataset is too large for the system & requires an estimate of 45–50GB RAM.
The machine learning model produced is also too large for Heroku. Heroku does not allow us to store more than 250MB on a free account.

Let me go through the dataset very briefly so that we can focus on building the machine learning model part.

We load the 2 CSV files into df1 & df2 pandas data frames.

Return value of get_data() :-

Fig 1: DataFrame 1 — tmdb_5000_credits.csv

Fig 2: DataFrame 2 — tmdb_5000_movies.csv

Instead of dealing with both data frames, I have combined the data frames, so we have to work with one data frame. Fortunately, the dataset doesn’t have a lot of empty values. Let’s deal with them one by one. Here is an overview of all columns.

Return value of combine_data() :-

As for the ID column that is unique to each movie, we don’t need it because it doesn’t affect the recommendation. Also, it is a good idea to remove the intro column because most movies have an overview and therefore an intro will lead to more similar contexts. By removing those 2 columns, you get a dataframe with 21 attributes.

There are several columns (see Figure 3) where we have a row or node that contains a dictionary. We can use the literal from the last module to remove those lines or nodes and get an inline dictionary. So, we use literal values for the attributes of the actor, keyword, group and category, now we have them in dictionary form, we can use them and get important characteristics such as directors’ names, which is a very important factor for our recommendation system. for the Actor, Keyword and Gender attributes, we can return the first 3 names from each category in the list. We can now create one column that will be the sum of all 4 of these attributes, which are very important factors for our recommendation system. Let’s call this column “soup” (because it’s like a soup / 4-attribute combination).

Let’s check out the dataset for NaN values now.

Since there are a lot of blank values on our home page, we have no choice but to remove them. We can also fill the empty runtime values with the mean. Since we have an unpublished move, we can delete this particular line since the movie hasn’t been released yet. We now have the final dataset for some of the machine learning models.

Section 2: Building Movie Recommender Machine Learning Model

To build our model, we first create a counting matrix generated with the vector counting tool. We create a counting vector with English stop words and fit and transform the soup column we just created in the previous section. A good technique is called cosine similarity. It is simply a metric used to determine the similarity of documents regardless of their size. After constructing a cosine similarity matrix for our dataset, we can sort the results to find the top 10 similar movies. We return the movie title and indexes to the user.

recommend_movies()

recommend_movies() takes four parameters.

title : Name of the movie
data : Return value of get_data()
combine : Return value of combine_data()
transform : Return value of transform_data()

Create a Pandas Series with indices of all the movies present in our dataset.

Section 3: Build a REST API using Flask

Flask:-

To understand this part of the article I suggest you have a basic idea about Flask. For our task, we just need to know about some beginner level functionalities.

To install Flask on your system head over to the terminal/Command Prompt and type pip install Flask. That’s it, Flask is now installed on your system.

To know the basics about a simple Flask application head on to the link below.

Flask Hello World

Now, that we have a basic idea about Flask let’s move on to the next topic i.e. REST APIs.

REST API:-

Creating a RESTful API With Flask — GET Requests

File 2 — app.py

In this file, we will code our Flask application and use the recommendation system we built before.

Import the required packages

After we import the Flask class then the request library to send HTTPS requests and finally we import jsonify to return our results in a JSON format.

We import the flask_cors to enable cross-origin requests for our API.

What is a cross-origin request?

Cross-origin resource sharing (CORS) is a mechanism that allows restricted resources on a web page to be requested from another domain outside the domain from which the first resource was served.

To know more about the CORS policy do go through the below-mentioned link. It explains all you need to know about CORS policy.

What is CORS?

Now we import our recommendation.py file as a module to use it in our app.py file.

Flask Code:-

Line 1: we create an instance of this class. The first argument is the name of the application’s module or package.
Line 2: We use the CORS() method to enable the CORS policy on our API.
Line 4: We then use the route() decorator to tell Flask what URL should trigger our function. In this case, we use the /movie endpoint with the base URL.
Line 5: Now, we define a function named recommend_movies() which will be used to return the top 20 recommendations.
Line 6: In this line, we call the results() function from our recommendation.py file and store the recommendations in a variable named res. The movie name is passed as a query string to our results() function using the request.args.get() and the parameter name is title.
Line 7: Lastly we return the results received from recommendation.py in a dictionary format to app.py and convert them to JSON format and return the results.
Line 9: This line indicates that if we call our app.py file directly from the terminal/command prompt then it will execute what follows after.
Line 10: We run the app after our app.py file is called directly in the terminal/command prompt. We set our port number to 5000 when running on localhost and we set debug=True to trace back any errors that occurs whilst running our application.

4. Test it on localhost — 127.0.0.1:5000

Now, that we are done with the coding part let’s test our application on localhost and see if it’s working.

If you want to use Postman for testing our API then download it from the below link.

Download Postman

You can use your browser for testing as well if you prefer that over Postman. We will test on both of them and you will get to see the results.

Testing our API:-

Step — 1: Open up your command prompt if in Windows or terminal if you are using Linux.

Step — 2: Navigate to the folder where you have stored the dataset, recommendation.py file and app.py file using command line.

We store our files in a folder named Recommendation 2.0. Below is our directory structure.

Our directory structure(Recommendation 2.0)movie_data.csv.zipRecommendation.pyapp.py

All the files and the dataset should be present in a single folder for ease of use when developing the application.

Step — 3: When we are in our Recommendation 2.0 folder type the following commands in the command line.

set FLASK_APP=app.py

for running the application:-

flask run

After executing both the commands we will see our application running on localhost.

Test our API on localhost using Postman or any browser.

Let’s see our results when we pass a movie to our API.

Postman:-

Finally, our Movie Recommendation API is now online.

Hurray! Now we go live