Information Retrieval Systems
Project Batch Evaluation Design (Assignment P8)
This document applies only to instructor-designed projects.
The goal of this project component is to produce a plan for conducting
a batch evaluation of the assigned system variants. Given the limited
time in a semester, it might not be practical to design a new batch
evaluation for the same system that is the focus of your user study.
For the batch evaluation, we will instead therefore use the University
of Delaware Virtual IR
lab. I will design two new systems for you to evaluate, and you
will then compare those systems to baseline systems of your choosing
using test collections of your choosing and analyze the results.
Batch evaluations are designed to be conducted fully automatically.
They include at least the following:
The evaluation design needs to balance four desirable characteristics:
- A "canned" set of information that is to be searched.
- A set of requests to which the system will be expected to
- A set of "ground truth" responses that are expected as answers to
- Definitions for one or more evaluation measures that can be used
to characterize the system's effectiveness.
In your evaluation design, you will need to specify each of the four
components of the first group in a way that (in aggregate) reasonably
balances the four desirable characteristics in the second group.
The best way to see what an evaluation design looks like is to read a
TREC, CLEF, NTCIR or FIRE track overview paper. For example, here are
two that I have written:
Your plan need not be as detailed a these, of course, because these
were written AFTER the evaluation. To see what we had before the
evaluation, look at:
Of course, you won't need to specify all the submission format issues
that we did (since you won't actually be submitting everything). So 3
or 4 pages should probably suffice for what you will write up as a plan.
Submit your batch evaluation plan using ELMS.
This assignment will be graded, but (as with all the pieces) the
overall project grade will be assigned holistically rather than being
determined by a fixed formula.
Last modified: Sat Oct 18 22:39:33 2014