Assume you have 3 documents with the following terms:

  • D1 = “computer”, “web”, “storage”, “options”
  • D2 = “computer”, “game”, “development”
  • D3 = “web”, “development”, “frameworks”

If the query Q is composed of terms “computer” and “development”, what is the relevance of each document to the query using the TF.IDF measure?

2. Explain in detail how the Hadoop system deals with DataNode failures.

3. Explain and write the pseudocode for a Mapper/Reducer that takes as input a large file (possibly split into chucks) of integers and outputs:

  1. The sum of the squares of each integer
  2. The maximum integer

4.  Explain in detail why MapReduce may be a better solution than OLAP for some problems. Provide concrete examples.

Is this part of your assignment? ORDER NOW