OpenCV를 사용하여 이미지의 흰색 패치를 자르고 여권 크기의 사진을 만드는 방법 (How to crop white patches in image and make passport size photo using OpenCV)

문제 설명

완벽한 여권 크기 사진으로 잘라야 하는 이미지가 있습니다. 이렇게 자동으로 자르고 곧게 펴야 하는 수천 개의 이미지가 있습니다. 이미지가 너무 흐릿하고 자를 수 없는 경우 거부된 폴더에 복사해야 합니다. 나는 haar cascade를 사용하여 시도했지만 이 접근 방식은 나에게 얼굴만 제공합니다. 하지만 배경이 잘린 얼굴이 필요합니다. OpenCV 또는 기타에서 코딩하는 방법을 알려주실 수 있나요?

            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            faceCascade = cv2.CascadeClassifier(
                cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
            faces = faceCascade.detectMultiScale(
                gray,
                scaleFactor=1.3,
                minNeighbors=3,
                minSize=(30, 30)
            )
            if(len(faces) == 1):
                for (x, y, w, h) in faces:
                    if(x‑w < 100 and y‑h < 100):
                        ystart = int(y‑y*int(y1)/100)
                        xstart = int(x‑x*int(x1)/100)
                        yend = int(h+h*int(y1)/100)
                        xend = int(w+w*int(y2)/100)
                        roi_color = img[ystart:y + yend, xstart:x + xend]
                        cv2.imwrite(path, roi_color)

                    else:
                        rejectedCount += 1
                        cv2.imwrite(path, img)

참조 솔루션

방법 1:

I will handle your problem as follows:

First of all we need to grab the points which we are interested in
Know the size of a normal passport avatar in pixels

How to grab the points of interest.

We have more methods:

You can use windows paint application
But to be more programmatic we can use cv2. I'm going to show you how to do that using cv2.

Also note that this does not yield a high resolution image, you have to play around the code yourself.

# imports 
import numpy as np
import cv2

width = height = 600 # normal passport photo size in pixels

# global variable that will update the points when we clicked on the image
pt1 = []
pt2 = np.float32([[0, 0], [height, 0], [0, width], [height, width]])
def mouseEvent(event, x, y, flags, param):
    if event == cv2.EVENT_LBUTTONDOWN:
        global pt1
        if len(pt1) == 4:
            pt1 = []
        else:
            pt1.append([x, y])

while 1:
    image = cv2.imread("img.jpg", cv2.IMREAD_UNCHANGED)
    cv2.imshow("Original Image", image)
    cv2.setMouseCallback("Original Image", mouseEvent)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
    if len(pt1) == 4:
        break

Then we use two cv2 functions which are getPerspectiveTransform and warpPerspective. The getPerspectiveTransform() will accept two points which our pt1 and pt2 then we are going to call the warpPerspective() function and pass three positional args, the image, the matrix and the image shape:

image = cv2.imread("img.jpg", 0)
matrix = cv2.getPerspectiveTransform(np.float32(pt1), pt2)
image = cv2.warpPerspective(image, matrix, image.shape)
cv2.imshow("Wrap Perspective", image)
cv2.waitKey(0)

I know this is not a good explanation but you get the idea. The whole code program will look as follows:


import numpy as np
import cv2

width = height = 600 # normal passport photo size in pixels
pt1 = []
pt2 = np.float32([[0, 0], [height, 0], [0, width], [height, width]])
def mouseEvent(event, x, y, flags, param):
    if event == cv2.EVENT_LBUTTONDOWN:
        global pt1
        if len(pt1) == 4:
            pt1 = []
        else:
            pt1.append([x, y])
while 1:
    image = cv2.imread("img.jpg", cv2.IMREAD_UNCHANGED)
    cv2.imshow("Original Image", image)
    cv2.setMouseCallback("Original Image", mouseEvent)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
    if len(pt1) == 4:
        break

image = cv2.imread("img.jpg", 0)
matrix = cv2.getPerspectiveTransform(np.float32(pt1), pt2)
image = cv2.warpPerspective(image, matrix, image.shape)
cv2.imshow("Wrap Perspective", image)
cv2.waitKey(0)

When you run the following code then an image will be shown.
To use this program you have to click four points in order from A‑D. for example if this is your image:

‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
| (a)          (b)|
|                 |
|                 |
|                 |
|                 |
|                 |
| (c)          (d)|
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑

Where a, b, c and d are the points you are interested in on your image crop.

Demo

Click point 1 then 2 then 3 and lastly 4 to get the results above

방법 2:

Here is one way to extract the photo in Python/OpenCV by keying on the black lines surrounding the image.

Input:

 ‑ Read the input
 ‑ Pad the image with white so that the lines can be extended until intersection
 ‑ Threshold on black to extract the lines
 ‑ Apply morphology close to try to connect the lines somewhat
 ‑ Get the contours and filter on area drawing the contours on a black background
 ‑ Apply morphology close again to fill the line centers
 ‑ Skeletonize to thin the lines
 ‑ Get the Hough lines and draw them as white on a black background
 ‑ Floodfill the center of the rectangle of lines to fill with mid‑gray. Then convert that image to binary so that the gray becomes white and all else is black.
 ‑ Get the coordinates of all non‑black pixels and then from the coordinates get the rotated rectangle.
 ‑ Use the angle and center of the rotated rectangle to unrotated both the padded image and this mask image via an Affine warp
 ‑ (Alternately, get the four corners of the rotated rectangle from the mask and then project that to the padded input domain using the affine matrix)
‑ Get the coordinates of all non‑black pixels in the unrotated mask and compute its rotated rectangle.
 ‑ Get the bounding box of the (un‑)rotated rectangle 
 ‑ Use those bounds to crop the padded image
 ‑ Save the results

import cv2
import numpy as np
import math
from skimage.morphology import skeletonize

# read image
img = cv2.imread('passport.jpg')
ht, wd = img.shape[:2]

# pad image with white by 20% on all sides
padpct = 20
xpad = int(wd*padpct/100)
ypad = int(ht*padpct/100)
imgpad = cv2.copyMakeBorder(img, ypad, ypad, xpad, xpad, borderType=cv2.BORDER_CONSTANT, value=(255,255,255))
ht2, wd2 = imgpad.shape[:2]

# threshold on black
low = (0,0,0)
high = (20,20,20)

# threshold
thresh = cv2.inRange(imgpad, low, high)

# apply morphology to connect the white lines
kernel = np.ones((5,5), np.uint8)
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)

# get contours
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]

# filter on area
mask = np.zeros((ht2,wd2), dtype=np.uint8)
for cntr in contours:
    area = cv2.contourArea(cntr)
    if area > 20:
        cv2.drawContours(mask, [cntr], 0, 255, 1)

# apply morphology to connect the white lines and divide by 255 to make image in range 0 to 1
kernel = np.ones((5,5), np.uint8)
bmask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)/255

# apply thinning (skeletonizing)
skeleton = skeletonize(bmask)
skeleton = (255*skeleton).clip(0,255).astype(np.uint8)

# get hough lines
line_img = np.zeros_like(imgpad, dtype=np.uint8)
lines= cv2.HoughLines(skeleton, 1, math.pi/180.0, 90, np.array([]), 0, 0)
a,b,c = lines.shape
for i in range(a):
    rho = lines[i][0][0]
    theta = lines[i][0][1]
    a = math.cos(theta)
    b = math.sin(theta)
    x0, y0 = a*rho, b*rho
    pt1 = ( int(x0+1000*(‑b)), int(y0+1000*(a)) )
    pt2 = ( int(x0‑1000*(‑b)), int(y0‑1000*(a)) )
    cv2.line(line_img, pt1, pt2, (255, 255, 255), 1)

# floodfill with mid‑gray (128)
xcent = int(wd2/2)
ycent = int(ht2/2)
ffmask = np.zeros((ht2+2, wd2+2), np.uint8)
mask2 = line_img.copy()
mask2 = cv2.floodFill(mask2, ffmask, (xcent,ycent), (128,128,128))[1]

# convert mask2 to binary
mask2[mask2 != 128] = 0
mask2[mask2 == 128] = 255
mask2 = mask2[:,:,0]

# get coordinates of all non‑zero pixels
# NOTE: must transpose since numpy coords are y,x and opencv uses x,y
coords = np.column_stack(np.where(mask2.transpose() > 0))

# get rotated rectangle from coords
rotrect = cv2.minAreaRect(coords)
(center), (width,height), angle = rotrect
# from https://www.pyimagesearch.com/2017/02/20/text‑skew‑correction‑opencv‑python/
# the `cv2.minAreaRect` function returns values in the
# range [‑90, 0); as the rectangle rotates clockwise the
# returned angle trends to 0 ‑‑ in this special case we
# need to add 90 degrees to the angle
if angle < ‑45:
    angle = ‑(90 + angle)

# otherwise, just take the inverse of the angle to make
# it positive
else:
    angle = ‑angle

# compute correction rotation
rotation = ‑angle ‑ 90

# compute rotation affine matrix
M = cv2.getRotationMatrix2D(center, rotation, scale=1.0)

# unrotate imgpad and mask2 using affine warp
rot_img = cv2.warpAffine(imgpad, M, (wd2, ht2), flags=cv2.INTER_CUBIC, borderValue=(0,0,0))
rot_mask2= cv2.warpAffine(mask2, M, (wd2, ht2), flags=cv2.INTER_CUBIC, borderValue=(0,0,0))

# get coordinates of all non‑zero pixels
# NOTE: must transpose since numpy coords are y,x and opencv uses x,y
coords2 = np.column_stack(np.where(rot_mask2.transpose() > 0))

# get bounding box
x,y,w,h = cv2.boundingRect(coords2)
print(x,y,w,h)

# crop rot_img
result = rot_img[y:y+h, x:x+w]

# save resulting images
cv2.imwrite('passport_pad.jpg',imgpad)
cv2.imwrite('passport_thresh.jpg',thresh)
cv2.imwrite('passport_morph.jpg',morph)
cv2.imwrite('passport_mask.jpg',mask)
cv2.imwrite('passport_skeleton.jpg',skeleton)
cv2.imwrite('passport_line_img.jpg',line_img)
cv2.imwrite('passport_mask2.jpg',mask2)
cv2.imwrite('passport_rot_img.jpg',rot_img)
cv2.imwrite('passport_rot_mask2.jpg',rot_mask2)
cv2.imwrite('passport_result.jpg',result)

# show thresh and result    
cv2.imshow("imgpad", imgpad)
cv2.imshow("thresh", thresh)
cv2.imshow("morph", morph)
cv2.imshow("mask", mask)
cv2.imshow("skeleton", skeleton)
cv2.imshow("line_img", line_img)
cv2.imshow("mask2", mask2)
cv2.imshow("rot_img", rot_img)
cv2.imshow("rot_mask2", rot_mask2)
cv2.imshow("result", result)
cv2.waitKey(0)
cv2.destroyAllWindows()

Padded Image:

Threshold Image:

Morphology cleaned Image:

Mask1 Image:

Skeleton Image:

(Hough) Line Image:

Floodfilled Line Image ‑ Mask2:

Unrotated Padded Image:

Unrotated Mask2 Image:

Cropped Image:

방법 3:

If all photos have that thin white‑black border around them, you can just

threshold the pictures

get all contours and

select those contours that

have the correct gradient

are large enough

that reduce to 4 corners when passed through approxPolyDP

get an oriented bounding box

construct affine transformation

apply affine transformation

If those photos aren't scans but taken with a camera from an angle (not top‑down), you'll need to use a perspective transformation calculated from the corner points themselves.

If the photos aren't flat but warped, that's an entirely different problem.

import numpy as np
import cv2 as cv

im = cv.imread("Zh8QV.jpg")
gray = cv.cvtColor(im, cv.COLOR_BGR2GRAY)

gray = 255 ‑ gray # invert so findContours' implicit black border doesn't bother us

height, width = gray.shape
minarea = (height * width) * 0.20

# (th_level, thresholded) = cv.threshold(gray, thresh=128, maxval=255, type=cv.THRESH_OTSU)

# threshold relative to estimated brightness of "white"
th_level = 255 ‑ (255 ‑ np.median(gray)) * 0.98
(th_level, thresholded) = cv.threshold(gray, thresh=th_level, maxval=255, type=cv.THRESH_BINARY)

(contours, hierarchy) = cv.findContours(thresholded, mode=cv.RETR_LIST, method=cv.CHAIN_APPROX_SIMPLE)

# black‑to‑white contours have negative area...
#areas = sorted([cv.contourArea(c, oriented=True) for c in contours])

large_areas = [ c for c in contours if cv.contourArea(c, oriented=True) <= ‑minarea ]

quads = [
    c for c in large_areas
    if len(cv.approxPolyDP(c, epsilon=0.02 * cv.arcLength(c, True), closed=True)) == 4
]

# if there is no quad, or multiple, that's an error (for this example)
assert len(quads) == 1, quads
[quad] = quads

bbox = cv.minAreaRect(quad)
(bcenter, bsize, bangle) = bbox
bcenter = np.array(bcenter)
bsize = np.array(bsize)

# keep orientation upright, fix up bbox size
(rot90, bangle) = divmod(bangle + 45, 90)
bangle ‑= 45
if rot90 % 2 != 0:
    bsize = bsize[::‑1]

# construct affine transformation
M1 = np.eye(3)
M1[0:2,2] = ‑bcenter

R = np.eye(3)
R[0:2] = cv.getRotationMatrix2D(center=(0,0), angle=bangle, scale=1.0)

M2 = np.eye(3)
M2[0:2,2] = +bsize * 0.5

M = M2 @ R @ M1

bwidth, bheight = np.ceil(bsize)
dsize = (int(bwidth), int(bheight))

output = cv.warpAffine(im, M[0:2], dsize=dsize, flags=cv.INTER_CUBIC)

cv.imshow("output", output)
cv.waitKey(‑1)
cv.destroyWindow("output")

방법 4:

What I would do is the following 3 steps (I'm not going to code it for you, sorry, if you need help with one of the stages I'll be happy to help):

Use Hough transform to detect the 4 strongest lines in the picture.

Compute the 4 intersection points of the lines

Apply perspective transformation.

And you should have the cropped image as desired.

방법 5:

The Concept

Process each image to enhance the edges of the photos.

Get the 4 corners of the photo of each processed image by first finding the contour with the greatest area, getting its convex hull and approximating the convex hull until only 4 points are left.

Warp each image according to the 4 corners detected.

The Code

import cv2
import numpy as np

def process(img):
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img_blur = cv2.GaussianBlur(img_gray, (1, 1), 1)
    img_canny = cv2.Canny(img_blur, 350, 150)
    kernel = np.ones((3, 3))
    img_dilate = cv2.dilate(img_canny, kernel, iterations=2)
    return cv2.erode(img_dilate, kernel, iterations=1)

def get_pts(img):
    contours, _ = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
    cnt = max(contours, key=cv2.contourArea)
    peri = cv2.arcLength(cnt, True)
    return cv2.approxPolyDP(cv2.convexHull(cnt), 0.04 * peri, True)

files = ["1.jpg", "2.jpg", "3.jpg"]
width, height = 350, 450
pts2 = np.float32([[width, 0], [0, 0], [width, height], [0, height]])

for file in files:
    img = cv2.imread(file)
    pts1 = get_pts(process(img)).squeeze()
    pts1 = np.float32(pts1[np.lexsort(pts1.T)])
    matrix = cv2.getPerspectiveTransform(pts1, pts2)
    out = cv2.warpPerspective(img, matrix, (width, height))[5:‑5, 5:‑5]
    cv2.imshow(file, out)

cv2.waitKey(0)
cv2.destroyAllWindows()

The Output

I placed each output next to each others to fit in one image:

The Explanation

Import the necessary libraries:

import cv2
import numpy as np

Define a function, process(), that takes in a BGR image array and returns the image processed with the Canny edge detector for more accurate detection of the edges of each photo later. The values used in the function can be tweaked to be more suitable for other images if needed:

def process(img):
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img_blur = cv2.GaussianBlur(img_gray, (1, 1), 1)
    img_canny = cv2.Canny(img_blur, 350, 150)
    kernel = np.ones((3, 3))
    img_dilate = cv2.dilate(img_canny, kernel, iterations=2)
    return cv2.erode(img_dilate, kernel, iterations=1)

Define a function, get_pts(), that takes in a processed image and returns 4 points of the convex hull of the contour with the greatest area. In order to get 4 points out of the convex hull, we use the cv2.approxPolyDP() method:

def get_pts(img):
    contours, _ = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
    cnt = max(contours, key=cv2.contourArea)
    peri = cv2.arcLength(cnt, True)
    return cv2.approxPolyDP(cv2.convexHull(cnt), 0.04 * peri, True)

Define a list, files containing the names of each file you want to extract the photos from, and the dimensions you want the resulting images to be, width and height:

files = ["1.jpg", "2.jpg", "3.jpg"]
width, height = 350, 450

Using the dimensions defined above, define a matrix for each of the 4 soon‑to‑be detected coordinated to be mapped to:

pts2 = np.float32([[width, 0], [0, 0], [width, height], [0, height]])

Loop through each filename, read each the file into a BGR image array, get the 4 points of the photo within the image, use the cv2.getPerspectiveTransform() method to get the solution matrix for the warping, and finally warp the photo portion of the image with the solution matrices using the cv2.warpPerspective() method:

for file in files:
    img = cv2.imread(file)
    pts1 = get_pts(process(img)).squeeze()
    pts1 = np.float32(pts1[np.lexsort(pts1.T)])
    matrix = cv2.getPerspectiveTransform(pts1, pts2)
    out = cv2.warpPerspective(img, matrix, (width, height))[5:‑5, 5:‑5]
    cv2.imshow(file, out)

Finally, add a delay and after that destroy all the windows:

cv2.waitKey(0)
cv2.destroyAllWindows()

(by Harshith VA、crispengari、fmw42、Christoph Rackwitz、Binyamin Even、Ann Zen)

참조 문서

How to crop white patches in image and make passport size photo using OpenCV (CC BY‑SA 2.5/3.0/4.0)

OpenCV를 사용하여 이미지의 흰색 패치를 자르고 여권 크기의 사진을 만드는 방법 (How to crop white patches in image and make passport size photo using OpenCV)

문제 설명

참조 솔루션

방법 1:

How to grab the points of interest.

Demo

방법 2:

방법 3:

방법 4:

방법 5:

The Concept

The Code

The Output

The Explanation

참조 문서

관련 질문

코멘트

OpenCV를 사용하여 이미지의 흰색 패치를 자르고 여권 크기의 사진을 만드는 방법 (How to crop white patches in image and make passport size photo using OpenCV)

문제 설명

참조 솔루션

방법 1:

How to grab the points of interest.

Demo

방법 2:

방법 3:

방법 4:

방법 5:

The Concept

The Code

The Output

The Explanation

참조 문서

관련 질문

Python - 파일 이름에 특수 문자가 있는 파일의 이름을 바꿀 수 없습니다. (Python - Unable to rename a file with special characters in the file name)

구조화된 배열의 dtype을 변경하면 문자열 데이터가 0이 됩니다. (Changing dtype of structured array zeros out string data)

목록 목록의 효과적인 구현 (Effective implementation of list of lists)

for 루프를 중단하지 않고 if 문을 중지하고 다른 if에 영향을 줍니다. (Stop if statement without breaking for loop and affect other ifs)

기본 숫자를 10 ^ 9 이상으로 늘리면 코드가 작동하지 않습니다. (Code fails to work when i increase the base numbers to anything over 10 ^ 9)

사용자 지정 대화 상자 PyQT5를 닫고 데이터 가져오기 (Close and get data from a custom dialog PyQT5)

Enthought Canopy의 Python: csv 파일 조작 (Python in Enthought Canopy: manipulating csv files)

학생의 이름을 인쇄하려고 하는 것이 잘못된 것은 무엇입니까? (What is wrong with trying to print the name of the student?)

다단계 열 테이블에 부분합 열 추가 (Adding a subtotal column to a multilevel column table)

여러 함수의 변수를 다른 함수로 사용 (Use variables from multiple functions into another function)

리프 텐서의 값을 업데이트하는 적절한 방법은 무엇입니까(예: 경사하강법 업데이트 단계 중) (What's the proper way to update a leaf tensor's values (e.g. during the update step of gradient descent))

Boto3: 조직 단위의 AMI에 시작 권한을 추가하려고 하면 ParamValidationError가 발생합니다. (Boto3: trying to add launch permission to AMI for an organizational unit raises ParamValidationError)

코멘트