Generic Item Detection with LaneHawk

LaneHawk has two methods for detecting Bottom Of Basket (BOB) items.  The primary mechanism is to match the item against a reference image using ViPR.  This works well as long as the reference image is available.  There are times however when a package changes or when an unusual item is found on the basket bottom and there is no reference image to compare to.  In this case LaneHawk falls back to generic item detection (GID).


The relationship between ViPR and GID can be explained with an analogy.  Suppose that one goes to a dog kennel and takes pictures of 100 dogs.  Now he picks one dog out of the 100 and takes a second picture of it.  He wants to use a computer to find out which dog out of the 100 the second picture belongs to (obviously he knows which dog it is, but suppose, for some reason, he wants the computer to tell him).   The computer can do this by comparing the new picture to the set of 100 one by one.


Now, suppose that one took 50 pictures of St. Bernard's and 50 pictures of Poodles.  Suppose further that he takes another Poodle (not one of the 50) and take its picture also.  This time he wants the computer to tell him if the picture is a Poodle or a St. Bernard (as before, he clearly knows it is a Poodle but he just wants to see if the computer can tell).  In this case, the computer can't just compare the new picture to the 100 because it won't be an exact match to any of them.  The computer needs to use a fuzzy comparison.


Using ViPR to find BOB items is like the first case where we know precisely what we are looking for.  Using GID is like the second case where we use a fuzzy comparison.


In GID there are two parts to the fuzzy comparison.  The first is what we call the appearance vector.  This is an array of numbers that is a measure of what the image looks like in general.  You can think of the appearance vector as a blurred out version of the image.


The second part is the motion between successive images.  This is shown in the picture above.  The red lines show the parts of the Pepsi logo that correspond between the two images.  Using this correspondence, the motion can be computed.


The appearance vector and motion are fed to a pattern classifier which decides if a BOB item is present or not.