big data in recommender systems

FIND A SOLUTION AT Academic Writers Bay

COMP03210/COMP6210 Workshop Week 7
Tutorial Questions
1. Can you provide some examples of big data in recommender systems?
Answer: Large volume of user-item interaction data (e.g., users’ clicks, comments on items) on ecommerce websites (e.g., yelp.com) and online social networks (e.g., Facebook). For example, in one
minute, Yelp users post 26,380 reviews, Facebook users share 2,460,000 pieces of content.
2. What are the main challenges in streaming recommender systems?
Answer: Overload problem, learning long-term users’ preference and capturing user preference
drift.
3. What are the main strategies used for handling streaming data in recommender systems?
Answer: Effective sampling, reservoir maintenance and user preference drift detection.
4. What are the main target problems and the corresponding recommendation algorithms in Netflix?
Answer: Ranking videos is the core problem in Netflix. Top-N video ranking and continue
watching ranking are the two main recommendation algorithms to address this problem.
Practical Questions
1. Design a Map-Reduce job that counts the total number of the occurrences of a certain word (such as
‘MapReduce’) in ‘MapReduce_wiki.txt’.
Input: We use the input file ‘MapReduce_wiki.txt’ in ‘Map-Reduce Example for Python’.
Running command in command prompt/interpreter (cmd.exe) is ‘python Practical2_Q1_Solution.py
MapReduce_wiki.txt > output_Practical2_Q1.txt’.
Output: The total number of the occurrences of ‘MapReduce’ in ‘MapReduce_wiki.txt’ is 12.
Code (a single-step job):
“””
Practical 2 — Question 1 Single-step job
Description: Design a Map-Reduce job that count the total number of the occurrences of a
certain word in the input
This is a kind of single-step job which only needs to subclass MRJob and
override a few method
If you want to learn more, visit
https://mrjob.readthedocs.io/en/latest/guides/writing-mrjobs.html#single-step-jobs
“””
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r”[w’]+”)
# Here, we count the total number of the occurrences of ‘mapreduce’ in the input file.
query_word=’MapReduce’
class MRMostUsedWord(MRJob):
# step 1: mapper, count the number of the occurrences of query_word in each mapper
def mapper(self, _, line):
for word in WORD_RE.findall(line):
if word == query_word:
yield word, 1
# step 2: reducer, count the total number of the occurrences of query_word in the
whole input
def reducer(self, word, counts):
yield word, sum(counts)
if __name__ == ‘__main__’:
MRMostUsedWord.run()
2. Design a Map-Reduce job that finds the most commonly used word in ‘MapReduce_wiki.txt’.
Input: We still choose ‘MapReduce_wiki.txt’ as the input file.
Running command in command prompt/interpreter (cmd.exe) is ‘python Practical2_Q2_Solution.py
MapReduce_wiki.txt > output_Practical2_Q2.txt’.
Output: The most commonly used word in ‘MapReduce_wiki.txt’ is ‘the’, the number of occurrences
is 24.
Code (a multi-step job):
“””
Practical 2 — Question 2 Multi-step job
Description: Design a Map-Reduce job that finds the most commonly used word in the input
This is a kind of multi-step job which needs to override steps() to return a
list of MRSteps
From: https://mrjob.readthedocs.io/en/latest/guides/quickstart.html#writing-your-secondjob
“””
from mrjob.job import MRJob
from mrjob.step import MRStep
import re
WORD_RE = re.compile(r”[w’]+”)
class MRMostUsedWord(MRJob):
#This is a multi-step job, so we need to override steps() function
def steps(self):
# step 1: mapper, get each word in each line
# step 2: combiner, count the numbers of the words after each mapper (it can
decrease total data transfer)
# step 3: reducer, count the total numbers of the words in the whole input
# step 4: reducer, find the most commonly used (the maximum number) word in the
whole input
return [
MRStep(mapper=self.mapper_get_words,
combiner=self.combiner_count_words,
reducer=self.reducer_count_words),
MRStep(reducer=self.reducer_find_max_word)
]
# step 1: mapper, get each word in each line
def mapper_get_words(self, _, line):
# yield each word in the line
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
# step 2: combiner, count the numbers of the words after each mapper
# this step can decrease total data transfer, you can remove this step, but it will
spend more time
def combiner_count_words(self, word, counts):
# optimization: sum the words we’ve seen so far
yield (word, sum(counts))
# step 3: reducer, count the total numbers of the words in the whole input
def reducer_count_words(self, word, counts):
# send all (num_occurrences, word) pairs to the same reducer.
# num_occurrences is so we can easily use Python’s max() function.
yield None, (sum(counts), word)
# step 4: reducer, find the most commonly used (the maximum number of occurrences)
word in the whole input
# discard the key; it is just None
def reducer_find_max_word(self, _, word_count_pairs):
# each item of word_count_pairs is (count, word),
# so yielding one results in key=counts, value=word
yield max(word_count_pairs)
if __name__ == ‘__main__’:
MRMostUsedWord.run()

YOU MAY ALSO READ ...  AURAEA002 Follow environmental and sustainability best practice
Order from Academic Writers Bay
Best Custom Essay Writing Services

QUALITY: 100% ORIGINAL PAPERNO PLAGIARISM – CUSTOM PAPER