Classical Computer Vision: Labeling

Labeling

Labeling is an image transformation that aims to turn the image into a model. In other words, it is a mapping function I -> M that will transform the input image into a model, that can be an array of numbers, a string, a graph, a set of images or whatever that makes sense to the problem being tackled.

The goal here is to leave the pure image domain and finally enter into the meaning domain, by interpreting the image into a meaningful information. For example, given a photo, we want to detect that a car has crossed the cross-road on a red light. This final model information is a merely a boolean (True, crossed, False, didn't).

Most of the algorithms are concerned with pattern recognition.

As an example, let's try to label the following image. What we're interested here is to count how many "persons" are shown in this image and to identify each one as a number

People vector created by pikisuperstar

import skimage.io as io
import numpy as np
import matplotlib.pyplot as plt
from skimage.color import rgb2gray

def print_img(title, image):
    plt.title(title)
    if len(image.shape) >= 3 and min(image.shape) > 1:
        plt.imshow(image)
    else:
        plt.imshow(image, cmap=plt.cm.gray)
    plt.show()
    
image = io.imread('3186535.jpg')
print_img('original image', image)

So, the first thing we need to do is to greatly simplify that image, removing any unnecessary information from it. First and foremost, let's convert it to gray scale, since colors will not provide any useful information to us

image_gray = rgb2gray(image)
print_img('gray scaled', image_gray)

Now, let's apply a threshold, eliminating the inner forms of the objects in the image.

image_threshold = image_gray.copy()
image_threshold[image_threshold[:, :] < 0.95] = 0
image_threshold = image_threshold*255
print_img('threshold applied', image_threshold)

Nice! So all we have to do now is to count how many contiguous objects we have in this simplified image and we'll have our number of people in it.

Let's use the connected components algorithm to label those objects and detect then in the image.

Just for the sake of learning, we'll implement this algorithm from scratch.

The connected components algorithm works by labeling contiguous pixels with the same label. So, for every row, we'll start by looking at each pixel, checking whether it is background (255) or object (0). We' label the non-background pixels with, say label 1 until we find another background pixel. Every time it happens, we'll increment the label value by a factor and keep looking for other no-background pixels.

  
    img = image_threshold.copy()
    label = 1
    label_rate = (255 / (img.shape[0] * img.shape[1]))
    max_i = img.shape[0] - 1
    max_j = img.shape[1] - 1
    
    for i in range(max_i):
        labeling = False
        for j in range(max_j):
            if img[i, j] == 255:
                if labeling:
                    label += label_rate
                labeling = False
            else:
                labeling = True
                img[i, j] = label

   

Eventually, we'll end up with a full labeled image. Some labels are neighbors and some others are not.

[ 1. 1.0000425 1.000085 ... 1.5267025 1.526745 255. ]

We should, then, merge the neighboring pixels, in order to have the full object mapped to a single label, at least every connected part of it.

The way we implemented that was by visiting every pixel and, for each one, we check it's neighbors. If they have a different a bigger valued label, we should propagate our label to them and visit them.

We chose to implement this visiting by using a stack, since we're talking about a potentially huge chain call for every pixel. If we were to implement it using recursion, the implementation would be better eye catching, but wont work.

Bear in mid, though, that this algorithm is educational. It's worse case is indeed potentially exponential O(n^x)In a production environment, one should improve it or use better options, which usually are those already implemented.

We chose to check only the y and the x axis, ignoring the diagonals, since during our tests it didnt make much difference, so if it's not helping, why bother?

    def propagate_pixel(img,pi,pj, background):
        lstp = []
        max_i = img.shape[0]
        max_j = img.shape[1]
        lstp.append((pi,pj))

        while len(lstp) > 0:
            i,j = lstp.pop()

            #y-axis
            if (i > 0 and img[i-1, j] != background and img[i-1, j] > img[i, j]):
                img[i-1, j] = img[i, j]
                lstp.append((i-1,j))

            if (i < max_i and img[i+1, j] != background and img[i+1, j] > img[i, j]):
                img[i+1, j] = img[i, j]
                lstp.append((i+1,j))
            #x-axis
            if (j > 0 and img[i, j-1] != background and img[i, j-1] > img[i, j]):
                img[i, j-1] = img[i, j]
                lstp.append((i,j-1))

            if (j < max_j and img[i, j+1] != background and img[i, j+1] > img[i, j]):
                img[i, j+1] = img[i, j]
                lstp.append((i,j+1))

We call the function above for every non-background pixel, by doing:

    for i in range(max_i):
        for j in range(max_j):
            if (img[i,j] == 255): continue           
            propagate_pixel(img, i, j, 255)

So, now we have a much smaller distinct set of labels, correctly identifying that the original image has 6 objects (the label 255 represents the background)

[ 1. 1.0104125 1.0131325 1.02465 1.069445 1.077605 255. ]

Finally, as a bonus, we'll rename those labels to be more human readable

    labels_old = np.unique(img)
    labels = [x+1 for x in range(len(labels_old)-1)]
    labels.append(255)

    for i in range(len(labels_old)):
        if (labels_old[i] != 255):
            img[img[:,:] == labels_old[i]] = i + 1

[1, 2, 3, 4, 5, 6, 255]

Now, we can easily access every object in the original image by using the label matrix as a filter:

for label in labels:
    if (label == 255): continue
    print (f"person #{label}")
    img2 = original_image.copy()
    img2[img[:,:] != label] = 255
    print_img(f'image {label}',img2)

person #1

person #2

person #3

Therefore, we've correctly labeled every object in the original image!

Computer Vision Notes

Classical Computer Vision: Labeling

Labeling

Comentários

Postar um comentário