Chelsea Finn, MIT Earth Signals and Systems Group. Cambridge, MA
Behind the scenes of the Sloop system are several computer vision algorithms that match images of individual animals across hundreds or thousands of photos . The most recent species being added to the system is the whale shark (Rhincodon typus; Sloop-WS).
The goal of each of the matching algorithms in Sloop is to search for and retrieve all images of the same individual animal already in the database; somewhat analogous to searching for a fingerprint match in a database of human fingerprints. To do this, the system performs comparisons between of pairs images and calculates a score for each pair, a measure of the likelihood that the images are of the same particular animal. For the whale shark species, we started with a dataset of 35,000 underwater images of whale sharks across 3,000 individuals. We used this database to develop a matching algorithm specific to whale shark images that can automatically compare two photographs and rate how alike they appear.
The images in the dataset capture a spot pattern behind the gills of the whale shark on one or both sides. Unlike the other species in the Sloop system, the whale shark matching utilizes only this pattern of spots to compare the images. The coordinates of the spots are specified by users or extracted, and then the identification algorithm actually matches pairs of coordinate sets rather than the images themselves. The primary challenge with this approach is due to inconsistencies in spot identification. Frequently, spots are left unidentified in some images either because they look unclear or dark or because they are cropped out of the photo entirely. We built the point-set matching algorithm to address these issues. A preliminary version of the identification algorithm was briefly described in , and the final version was recently presented in .
The final version of the algorithm used an iterated correspondence and alignment approach to match feature coordinates (namely the spots). The correspondence step consists of choosing which spots in one image correspond to spots in the other image. This is not as simple as one might imagine, especially when considering that some points in one image may not exist in the other. The alignment step uses the calculated point-correspondence to best align the points on top of each other.
The two-iteration correspondence/alignment technique consists of the following steps: In the first pass, the correspondence is calculated via “shape contexts.” Here each of the points are characterized by the “shape” of its surrounding points (its “context”). Spots in the first image with very similar “shape contexts” to points in the second image are paired together. Some points with no good potential matches are left unpaired, and thus left out of the correspondence. In the subsequent alignment step, the point sets are shifted atop each other such that the average distance between pairs of corresponding points is minimized. That concludes the first iteration. The second iteration corresponds the points by their position within the first alignment and then realigns the points through an affine transformation. The correspondence in the second iteration is improved from that of the first iteration because the points are mostly well-aligned. This allows for the most accurate calculation of an affine transform via the RANSAC algorithm.
After the second pass of alignment, the pair of images are scored based on how successful the points aligned to each other. We determine the score by corresponding the aligned points one final time and then analyzing the distribution of distances between corresponding points.
To test our algorithm, we queried 147 images of 18 total individual whale sharks -- comparing each of the 147 images to all of the other images in the database in order to find matches. The algorithm was able to retrieve roughly 85% of matching individuals in the database with a false positive rate of less than 2%. These results can further be improved through relevance feedback; this is where the algorithm utilizes feedback from human users in order to more accurately score images in the database. In this method, the user identifies which of the top-scoring images are actually correct matches. The algorithm uses this information to improve performance; increasing the recall of matches to more than 90% with a comparable false-positive rate. A more specific display of results is shown below.
Vision algorithms such as this latest coordinate-matching technique also have future potential to be easily adapted and applied to other species. In this case, one could imagine using this algorithm for other species with individualized spot patterns in future versions of the Sloop system.
As a whole, our results demonstrate the continued success of computer vision algorithms paired with human interaction to enable large-scale conservation efforts via Sloop.
 Sai Ravela, James Duyck, Chelsea Finn: Vision-Based Biometrics for Conservation. MCPR 2013: 10-19
 James Duyck, Chelsea Finn, Andy Hutcheon, Pablo Vera, Joaquin Salas, Sai Ravela: Sloop: A Pattern Retrieval Engine for Individual Animal Identification. Submitted to Pattern Recognition.