Information Retrieval Systems
The purpose of this exercise is to gain some "hands-on" experience in the process of performing relevance judgments, which we will use for evaluating information retrieval systems when we get to Module 5. You will be assessing documents that are retrieved in response to two "topics" (statements of information needs) from the TREC-5 evaluation (which we will read about in Module 5). The search engine will see only the short Web-like query in the "<title> Title:" field, but you will use the full topic description as a basis for judging the relevance of each "hit" (i.e., each retrieved document). The two search engines we'll be comparing are stored results from Google and Bing.
<top> <num> Number: 278 <title> Topic: DNA Information about Human Ancestry <desc> Description: A relevant document will discuss geneticists findings concerning the ancestry of the world's peoples. <narr> Narrative: To be relevant, a chosen item will discuss the genetic code research currently being done to determine the mysteries of mankind's origins and migrations. </top>[Google results] [Bing results] [Bing results continued]
<top> <num> Number: 294 <title> Topic: Animal husbandry for exotic animals <desc> Description: This topic will seek out reporting on the commercial growth of animal husbandry relating to "different" or "exotic" animals as opposed to the usual poultry, cattle, pigs, sheep, etc. <narr> Narrative: This study will attempt to discover the viability and economic prospects of commercial attempts to raise "exotic" animals. Some of the animals currently being raised are: llamas, emus, ostriches, mohair goats, alpacas, buffalo, catfish, crawfish, reindeer, rhea, trout, salmon, oyster, and shrimp. </top>[Google results] [Bing results]
To ensure that everyone evaluates the same hits, results from each search engine have been cached for you. To see these, click on the above links, which go to PDF documents (note that most of the content on the Bing PDF documents is on page 2 because Bing inserts a page break after the query box when printing; note also that Bing returned only 8 results in the first result set for Topic 278, so a second PDF with the subsequent hits is also included). You can't click on the links in these documents (because they sometimes redirect through the search engine so that they know what you clicked on), so these PDF files are just here to show you what the actual search results looked like. In selecting the hits to be judged I have ignored sponsored ads that may be present at the top, bottom or side of the page, suggestions for related searches and and "deep links" that a search engine shows indented below a primary link.
Use the URL's in this Excel spreadsheet to guide and record your relevance judgments (students can download Excel and other parts of Microsoft Office at no charge from TerpWare). In the column marked "Relevance", enter "1" (one) if you think the document is relevant. Enter "0" (zero) if you think the document is not relevant. If the document won't load for you, indicate that as blank (but try a different browser before giving up on it!). Include your name in the spreadsheet and change the name of the spreadsheet to be your last name (.xls) so that I don't wind up with a dozen files with the same filename!
You should submit your assignment on ELMS. Instructions for doing this are on the course ELMS site.
This assignment was adapted from James Allan's CMPSCI 646 course (Fall 2004) at the University of Massachusetts