Classical Computer Vision: Image Segmentation with K-Means

 Intro

    Image Segmentation is the process of partition an image into multiple segments. The idea is to simplify the image, changing it's representation into something more useful and meaningful to analyze.

   It could be applied to product auto inspection in factories, quality control, face detection, brake light detection, locate an object in an image, machine vision etc.


How to segment an image?

    There are several techniques of image segmentation, starting from simple ones to more complex IA-related algorithms. 

    In classical computer vision, we are deliberately excluding anything IA related just for the sake of keeping things organized. So, let's first focus on classical segment algorithms.


Threshold

   The first one we already talked about it when we were dealing with value domain: threshold. As we saw earlier, the threshold technique allows us to exclude a portion of an image based on the pixel values, regardless of the pixel position.

   Sometimes, all we need is to exclude pixels that are under a certain value, and things will magically work. That's true when we work with simple images, but for the complex ones, thresholding can be useless.

More advanced techniques...

  In order to be able to process more advanced images, we need to take the pixel position into consideration. So now we are worried about a pixel's neighborhood, pixel variations in a neighborhood,and statistic values about a group of pixels 


K-means


    K-means is an algorithm of the clustering class. Before we talk about the k-means itself, let's explore the idea of clustering.

    The cluster class of algorithms is the kind of exploratory data techniques that helps us to get an intuition about the structure of the data. In image processing, is the computer that is getting a glimpse of the image's structure. 

    The idea in using clustering is to try to find homogeneous subgroups in the data such that the found elements of the same subgroup are similar with each other while different from elements of other groups.

    It is an unsupervised learning technique, since we don't train out model against a set of previously know answers. Obviously we still need to adapt and analyze the parameters in order to make it work anyway.

    The k-means is an iterative algorithm that tries to partition the dataset into "k" distinct subgroups that will not overlap. When some element belongs to a group, it won't belong to another group. There's no intersection whatsoever.

    The idea is to calculate the distance between any data point and the centroid of a cluster. Those data points assigned to at cluster are the ones that have the lowest distance to it's centroid. A cluster's centroid is the arithmetic mean of all it's data points.

    So, what we change is actually the cluster's centroid, not the data points.

    Therefore, the k-means works by:

       1. Firstly, we specify the number of clusters (K) that will divide our data.
       2. We initialize the centroids by randomly selecting K data points as the centroids.
       3. K clusters are created them, by associating every element with the nearest mean.
       4. The mean of each cluster is his new centroid, so we re-associate all the elements with the nearest mean.
       5. the 4th step is repeated until we reach the max iterations or a convergence.



    So, now that we understand k-means, lets apply it to image segmentation

    Lets start by trying to segment a very simple image:



    It's easy to see that this image is already "segmented", because it has three well defined colors, that are actually not even overlapping. This is intentional, in order to get a better grasp of this algorithm.

    The first thing we need to do is to load the image and make sure it is in RGB format:

import matplotlib.pyplot as plt
import skimage
from skimage import io as io
from skimage import data, color
from skimage.transform import rescale, resize, downscale_local_mean
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

image = cv2.imread("k-means.png")
image=cv2.cvtColor(image,cv2.COLOR_BGR2RGB)


    Just to illustrate how easy this image is to be segmented, let's show it channel by channel:

fig, axes = plt.subplots(nrows=1, ncols=4)
ax = axes.ravel()

r,g,b = cv2.split(image)
ax[0].imshow(image)
ax[0].set_title("Original image")

ax[1].imshow(r, cmap='gray')
ax[1].set_title("Red channel")

ax[2].imshow(g, cmap='gray')
ax[2].set_title("Green channel")

ax[3].imshow(b, cmap='gray')
ax[3].set_title("Blue channel")

plt.tight_layout()
plt.show()


    Obviously, if all kinds of images that we were supposed to work were as easy as this one, we wouldn't need a segmentation algorithm such as k-Means and others, because all we needed to do would be to split the image into channels and work each channel at a time.

    Nevertheless, lets assume that the image is not that simple and we still need to "discover" where the red, green, blue portions are in it.

    So, to do that with k-Means, the first thing to do is to reshape the image from a (X, Y)[r, g, b] matrix to a [V](r, g, b) vector.  (or, to be more precise, from shape 830 x 986 x 3 to shape 818380 x 3.

    Also, k-Means expect float values, so we need to change our vector to that format too:

vectorized = image.reshape((-1,3))
vectorized = np.float32(vectorized)

print (f'original format: {image.shape} - reshaped format: {vectorized.shape}')
  
Result: original format: (830, 986, 3) - reshaped format: (818380, 3)

    Now, we call the CV2 kMeans implementation to classify every single pixel in our vector as one of the four clusters that we are using to divide the image.

    Why four clusters? Well, we chose four to have the same result as splitting it by channel like we did before. Our hopes is that we will have one cluster for each color in that image (1 for red, 1 for green, 1 for blue and 1 for the black background)

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 4
ret, labels, centers = cv2.kmeans(
vectorized, K, None, criteria, 10, cv2.KMEANS_PP_CENTERS)

    
    This method will give us the labels vector, which is a (818380, 1) vector classifying every pixel of the image into one of the four clusters

    So now we could use this labels vector to obtain every "split" version of our image:

    We can now see that k-Means classified the background as label 1, the red "portion" of the image as label 2, the green portion as label 3, and the blue portion as label 0.


    Finally, lets list some interesting links about image segmentation using k-Means

Comentários