Product Recommendation Systems -1

Mohit Shukla
3 min readFeb 26, 2025

--

To develop a recommendation system that predicts the products customers will likely purchase next, we can use various techniques. One common approach is collaborative filtering, where we make predictions based on similar user behaviors, or content-based filtering, where we recommend products similar to those the user has already interacted with. A combination of these methods can be used in hybrid models.

A simple collaborative filtering approach using Matrix Factorization with Singular Value Decomposition (SVD), which is one of the most widely used techniques for recommendation systems.

Step-by-Step Guide:

  1. Data Preprocessing: Start by preparing the customer interaction data (e.g., clickstream, ratings, purchase history). We need to build a user-item interaction matrix, where rows represent users, columns represent products, and the values represent the interaction level (e.g., rating or purchase frequency).
  2. Matrix Factorization: We use matrix factorization techniques (like SVD) to decompose the user-item matrix into lower-dimensional matrices representing latent factors, which can then be used to predict future purchases.
  3. Prediction: Using the decomposed matrices, we can predict missing values in the user-item interaction matrix (which correspond to unseen products for a given user).
  4. Recommendation: Based on the predictions, we can recommend the top N products for each user.

Python code that implements this using the surprise library for building a collaborative filtering recommendation system.

# Install the required libraries
# pip install surprise

from surprise import SVD, Reader, Dataset
from surprise.model_selection import train_test_split
from surprise import accuracy
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Sample data format: user_id, item_id, rating
data = {
'user_id': [1, 1, 1, 2, 2, 3, 3, 4, 4, 5],
'item_id': [101, 102, 103, 101, 104, 102, 105, 103, 106, 101],
'rating': [5, 4, 2, 5, 3, 5, 4, 4, 5, 3]
}

# Create a DataFrame from the data
df = pd.DataFrame(data)

# Load data into surprise
reader = Reader(rating_scale=(1, 5))
dataset = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)

# Split the dataset into train and test sets
trainset, testset = train_test_split(dataset, test_size=0.2)

# Build the SVD model
model = SVD()
model.fit(trainset)

# Predict ratings on the test set
predictions = model.test(testset)

# Calculate RMSE (Root Mean Squared Error)
rmse = accuracy.rmse(predictions)
print(f"RMSE: {rmse}")

# Making recommendations: Predict ratings for all items a user hasn't rated yet
def recommend_products(user_id, num_recommendations=3):
# Get a list of all item ids
all_items = df['item_id'].unique()
# Get the items the user has already interacted with
rated_items = df[df['user_id'] == user_id]['item_id'].values
# Filter out items that the user has already rated
items_to_predict = [item for item in all_items if item not in rated_items]

# Predict ratings for the remaining items
predictions = [model.predict(user_id, item) for item in items_to_predict]

# Sort the predictions by predicted rating in descending order
predictions.sort(key=lambda x: x.est, reverse=True)

# Get the top N recommendations
top_n = predictions[:num_recommendations]

return [(pred.iid, pred.est) for pred in top_n]

# Recommend top 3 products for user 1
recommendations = recommend_products(user_id=1, num_recommendations=3)
print(f"Top recommendations for User 1: {recommendations}")

# Visualization of predicted ratings for user 1
predicted_ratings = [pred.est for pred in predictions if pred.uid == 1]
items = [pred.iid for pred in predictions if pred.uid == 1]

plt.figure(figsize=(10, 6))
plt.bar(items, predicted_ratings, color='skyblue')
plt.xlabel('Item ID')
plt.ylabel('Predicted Rating')
plt.title('Predicted Ratings for User 1')
plt.show()

Key Steps:

  1. Data Preparation: We simulate user-item interaction data in the form of ratings. This would be replaced with the actual dataset in your case.
  2. Model Training: We use Singular Value Decomposition (SVD) from the surprise library, which is a popular collaborative filtering technique. The train_test_split function splits the data into a training and test set, and we train the SVD model on the training set.
  3. Prediction: After training, we make predictions on the test set and calculate the RMSE (Root Mean Squared Error) to measure the model’s performance.
  4. Recommendation: The function recommend_products predicts ratings for products that the user has not interacted with, sorts them by predicted rating, and returns the top recommendations.
  5. Visualization: We visualize the predicted ratings for a given user (e.g., User 1) to get insights into which products they might prefer based on the model’s predictions.

Output:

  1. RMSE: A lower RMSE indicates a better model fit.
  2. Top Recommendations: The top N predicted products for a user are displayed.
  3. Graph: A bar plot showing the predicted ratings for items not yet interacted with by a specific user.

Next Steps:

  • Data Expansion: You can replace the dummy dataset with your actual customer interaction data.
  • Hyperparameter Tuning: Fine-tune the model with different parameters (e.g., number of latent factors) for better accuracy.
  • Hybrid Models: Combine collaborative filtering with content-based filtering if you have additional metadata (e.g., product categories) to improve recommendations.

--

--

Mohit Shukla
Mohit Shukla

No responses yet