Classical Computer Vision: Edge Detection

What is an Edge?

An edge is a continuity break in an image. It represents areas in an image that marks when an object ends and another object starts.

There are a lot of techniques for edge detection, some make use of simple convolution operations while others use more advanced algorithms to perform it.

An edge is a abrupt variation of the image's texture, so mathematically, it is a strong image gradient (strong values of the first derivative)

Edge Detection Algorithms

1. Using Convolution

Roberts
Sobel
Robinson

2. Multistage Gauss Convolution

Canny

3. Salient Structures detection

Sha'aShua

What is Edge Detection?

Edge detection is a "conditioning and simplification" approach, which means that, like previously saw when we talked about pre-processing and filtering, we are still transforming an image into another, only this time we are totally focused on simplifying the image.

What we do when trying to detect edges is to build another image in which the edges of the objects are the most important and relevant information.

To do that, we need to find the image's gradients for the pixels in x and y directions and take some decisions whether that gradient represents an edge or not.

Why are we simplifying the image?

What simplifying an image means is that we are trying to build another image that has a subset of information from the original one. For example, the new image could contain only the edges of an object, which is what we're investigating here.

In computer vision, what we aim, at the end of the pipeline, is to transform the image into a meaning.

For example, let's say we are classifying apples into three groups: good quality, medium quality, low quality.

Let's say that good quality apples should be sold as a top-shelf product, with a higher price tag, while medium quality apples could be sold to the general public in lower prices. Low quality apples should not be sold directly. Let's assume that they're better served economically as juice or turned into apple pies, apple biscuits, candies or any kind of industrialized food.

So, for our use case, it doesn't matter whether a particular apple has a more "red" portion of it's skin in a certain angle, nor another one has better looks when viewed from above... what we want here is to simply reduce ALL of our images into a simple integer. When it's "1", it's a top-shelf apple, it should be sold with a higher price tag. When it's "2", it's medium quality, it should be sold at a normal price. When it's "3", it's low quality, and should become juice or another product.

To make the long story short, what we do in a traditional computer vision pipeline is to remove unwanted information in every step, resulting in something that we can ultimately associate with a real programming meaning, such as an integer, an enum value and so on.

Robert's Cross Operator

It's one of the oldest and simplest techniques for border detection. It first starts by doing a convolution with the following kernels:

{\begin{bmatrix}+1&0\\0&-1\\\end{bmatrix}}\quad {\mbox{and}}\quad {\begin{bmatrix}0&+1\\-1&0\\\end{bmatrix}}.

The first kernel results in an Gx and the second one results in Gy. The gradient will be given by

The direction of the gradient can also be defined as follows:

\Theta (x,y)=\arctan {\left({\frac {G_{y}(x,y)}{G_{x}(x,y)}}\right)}-{\frac {3\pi }{4}}.

Pros:

Simple to understand
The convolution operation is very easy

Cons:

The resulting image has one line and one row less than the original image
The Gradient calculation involves a squared-root of squares, which are computationally expensive
Since the convolution matrix (kernel) is so small, it tends to find a lot of garbage in the image and to miss some large borders.

import numpy as np
import matplotlib.pyplot as plt

from skimage.data import camera
from skimage.filters import roberts

image = camera()
edge_roberts = roberts(image)

fig, ax = plt.subplots(ncols=2, sharex=True, sharey=True,
                       figsize=(8, 4))

ax[0].imshow(image, cmap=plt.cm.gray)
ax[0].set_title('Original')

ax[1].imshow(edge_roberts, cmap=plt.cm.gray)
ax[1].set_title('Roberts Edge Detection')

for a in ax:
    a.axis('off')

plt.tight_layout()
plt.show()

Sobel Operator

The Sobel operator uses two 3×3 kernels which are convolved with the original image, producing Gx and Gy, as following, where A is the source of the image.

The x-coordinate is defined here as increasing in the "right"-direction, and the y-coordinate is defined as increasing in the "down"-direction. At each point in the image, the resulting gradient approximations can be combined to give the gradient magnitude, using:

G = |Gx| + |Gy|

Using this information, we can also calculate the gradient's direction: