Classical Computer Vision: Value Domain III: Point Operations

What is a Point Operation?

We could see the value domain transformations as point operations. A point operation is an operation that modifies a pixel without affecting the neighboring ones. It doesn't change the size, structure or geometry of the image, and also, it isn't based on any particular aspect of the information inside the image. It just acts blindly upon every pixel.

Point Operations with Scalars

By using point operations between pixels and scalars, we can make several simple changes in the image, such as modifying brightness, contrast etc.

As an example of this, let's check the code below:

from skimage import io
import matplotlib.pyplot as plt

def trunc(px):
    if px > 255:
        return 255
    if px < 0:
        return 0
    return px

def point_op(image, fn):
    for row in range(0, image.shape[0]):
        for col in range(0, image.shape[1]):
            res = fn(image[row][col])
            l = len(res)
            if (l > 1):
                for i in range(0, l-1):
                    res[i] = trunc(res[i])
                image[row][col] = res

    return image

What this code does is to define a function 'fn' which we could apply to every pixel, one-by-one of an image. It also automatically truncates that

Now, we can use it to apply any transformation to every color channel of any image. As an example, lets reduce brightness of channel R and increase contrast of channels G and B by 10% and 20% respectively on the original "Lena" image.

if __name__ == '__main__':
   image = io.imread('lena.png')

   plt.figure()
   io.imshow(image)

   plt.figure()
   io.imshow(point_op(image.copy(), lambda px: [px[0] - 70, 1.1 * px[1], 1.2 * px[2]]))
   plt.show()

Image Math / Logic Operations between Images

We could also apply math or logic operations between images. In that case, every pixel from image-1 is to be compared with it's counterpart in image-2, should image-2 had that row,col pos. Otherwise, it should be left as it is (no operation performed).

The code for the algorithm above could be implemented as:

from skimage import io
import matplotlib.pyplot as plt

def trunc(px):
    if px > 255:
        return 255
    if px < 0:
        return 0
    return px


def image_merge_op(image1, image2, fn):
    max_row_img2 = image2.shape[0] 
    max_col_img2 = image2.shape[1]
    res_img = image1.copy()

    for row in range(0, image1.shape[0]):
        for col in range(0, image1.shape[1]):
            if (row < max_row_img2 and col < max_col_img2):
                res_pixel = fn(image1[row][col], image2[row][col])
                # truncate channels for result img
                l = len(res_pixel)
                if (l > 1):
                    for i in range(0, l-1):
                        res_pixel[i] = trunc(res_pixel[i])

                res_img[row][col] = res_pixel

    return res_img

Now, using our code, we could apply a mask to cut an image, as an example of a bitwise operation between images. For the same "Lena" example, let's cut everything else besides her face:

def bitwise_and(img1_px, img2_px):
    return [img1_px[0] & img2_px[0], img1_px[1] & img2_px[1], img1_px[2] & img2_px[2]]


if __name__ == '__main__':

    image = io.imread('lena.png')
    mask_image = io.imread('lena_cut_mask.png')

    fig, axes = plt.subplots(nrows=1, ncols=3)

    ax = axes.ravel()

    ax[0].imshow(image)
    ax[0].set_title("Original")

    ax[1].imshow(mask_image)
    ax[1].set_title("Mask")

    ax[2].imshow(image_merge_op(image, mask_image, bitwise_and))
    ax[2].set_title("Result")

    plt.tight_layout()
    plt.show()

A nice example of math operations between images is to detect movement. For every second, we compare one image with the next one, by subtracting every pixel from one to another, obtaining a resulting image containing what changed.

If something changed, I could alert a human operator that something happened, for example: a simple and efficient intrusion detector.

from skimage import io
import matplotlib.pyplot as plt
import av
import cv2

def trunc(px):
    if px > 255:
        return 255
    if px < 0:
        return 0
    return px


def change_detector(base_img, target_img, px_tolerance, total_tolerance):
    max_row_img2 = target_img.shape[0] 
    max_col_img2 = target_img.shape[1]

    diff = 0

    for row in range(0, base_img.shape[0]):
        for col in range(0, base_img.shape[1]):
            if (row < max_row_img2 and col < max_col_img2):
                if (int(target_img[row][col]) - int(base_img[row][col])) > px_tolerance:
                    diff = diff + 1

    if (diff > total_tolerance):
        print(diff)

    return diff > total_tolerance

def bitwise_and(img1_px, img2_px):
    return [img1_px[0] & img2_px[0], img1_px[1] & img2_px[1], img1_px[2] & img2_px[2]]


if __name__ == '__main__':
    video = av.open('20211126_180531.mp4')
    i = 0
    first = 0
    showed_first = False
    search = True
    for packet in video.demux():
        if search:
            for frame in packet.decode():
                img = frame.to_ndarray()
                if (i > 300):
                    if (not showed_first):
                        first = img
                        plt.imshow(img)
                        plt.show()
                        showed_first = True
                    elif (change_detector(first, img, 100, 100)):
                        print("detected movement on frame #" + str(i))
                        plt.imshow(img)
                        plt.show()
                        search = False
                        break
                i = i + 1

Obviously all these codes are conceptual and there's no optimization involved here. This is just for learning, it's a very very very slow code! Please be advised of that.

reference image

detected movement on frame #371

Operations using Statistics

Histograms are one of the most used techniques to pre-process images. For example, if we are to apply a threshold to an image, we could use histogram to define the threshold value automatically, based on the image's pixel distribution.

import numpy as np
import skimage
import skimage.io as io
import matplotlib.pyplot as plt
 
if __name__ == '__main__':
    image = io.imread('lena.png')
    image_red, image_green, image_blue = image[:,:,0], image[:,:,1], image[:,:,2]
    
    fig, ax = plt.subplots(2,3)
    ax[0,0].imshow(image_red, cmap='gray')
    ax[0,1].imshow(image_green, cmap='gray')
    ax[0,2].imshow(image_blue, cmap='gray')
    
    bins = np.arange(-0.5, 255+1,1)
    ax[1,0].hist(image_red.flatten(), bins = bins, color='r')
    ax[1,1].hist(image_green.flatten(), bins=bins, color='g')
    ax[1,2].hist(image_blue.flatten(), bins=bins, color='b')    

    plt.show()

Combined Operations

We could combine any operations to pre-process the images. For example, the histogram is used to define a threshold, then a threshold is applied. Then, we could apply a mask and get the resulting image.

Computer Vision Notes

Classical Computer Vision: Value Domain III: Point Operations

Comentários

Postar um comentário