協調フィルタリングモジュール collab と推奨システム D-Recommend
映画(rating)データの読み込み
Movie Lensのデータセット https://grouplens.org/datasets/movielens/ を用いる.
まずは映画のレイティング(rating)データを読み込む.
path = untar_data(URLs.ML_100k)
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
usecols=(0,1,2), names=['user','movie','rating'])
ratings.head()
映画のデータも読み込む. movie列がratingデータと共有であり, ratingデータにマージすることによって映画のタイトル列を追加する.
movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1',
usecols=(0,1), names=('movie','title'), header=None)
movies.set_index("movie",inplace=True)
movies.reset_index(inplace=True)
ratings = ratings.merge(movies)
ratings.head()
ユーザーデータを準備する. Fakerパッケージを使って架空のユーザーを生成し,ratingデータに追加する.
user = list(set(ratings.user))
movie = list(set(ratings.movie))
print(len(user),len(movie))
fake = Faker(['en_US', 'ja_JP','zh_CN','ko_KR'])
Faker.seed(1)
name_dic ={}
for i in user:
name_dic[i] = fake.name()
#name_dic
user = list(set(ratings.user))
movie = list(set(ratings.movie))
print(len(user),len(movie))
fake = Faker(['en_US', 'ja_JP','zh_CN','ko_KR'])
Faker.seed(1)
name_dic ={}
for i in user:
name_dic[i] = fake.name()
name =[]
for i in ratings.user:
name.append( name_dic[i])
ratings["name"] = name
ratings.columns =["user","movie","rating","title","name"]
ratings_df = ratings.reindex(columns= ["user","name", "movie","title","rating"])
#ratings_df.to_csv(folder+"rating.csv", index=False)
ratings_df
users = pd.DataFrame( {"id": user, "name": [name_dic[i] for i in user]} )
#users.to_csv(folder+"users.csv",index=False)
ratings_df = pd.read_csv(folder+"rating.csv")
ratings_df.head()
映画の平均レイティングを計算する.
movies_df = pd.read_csv(folder+"movies.csv")
ave_rate = pd.pivot_table(ratings_df, index="movie", values="rating", aggfunc= "mean")
movies_df["average rating"] = list(ave_rate.rating)
movies_df.head()
ユーザーごとの平均レイティングを計算する.
users_df = pd.read_csv(folder+"users.csv")
ave_user = pd.pivot_table(ratings_df, index="user", values="rating", aggfunc= "mean")
users_df["average rating"] = list(ave_user.rating)
users_df.head(11)
learn = colab_learn(ratings_df)
preds0, target0, decoded0, loss0 = learn.get_preds(ds_idx=0, with_decoded=True, with_loss=True)
loss0
recommend_df = colab_predict(learn, movies_df, user_id=10)
recommend_df.head()
このユーザーのレイティングを確認する.
ratings_df[ ratings_df.user==10 ].head()
fig, movies = show_item_map(learn, movies_df)
plotly.offline.plot(fig);
movies.head()
fig, users = show_user_map(learn, users_df)
plotly.offline.plot(fig);
users.head()
fig = show_recommend(learn, movies_df, recommend_df, best=100)
plotly.offline.plot(fig);