Document Skewness Detection and Correction
In computer vision, while working with the document data the performance of the machine learning systems is always impacted by the skewness of the documents. If the document skewness is not properly corrected before feeding the document to the model then we couldn’t expect accurate prediction from the model. For example: In the information extraction system, If we pass the skewed image to the OCR(optical character recognition) model, then it will fail to detect texts properly. Also, the text alignment will not preserved. Another example might be a table detection system. If we haven’t corrected the skewness of the image before table detection, then the table detection model couldn’t properly predict the corners and edges of the table. If you are facing the similar problem, then you are in the right place!
Document skewness refers to the degree of tilt or slant in a scanned or digitally captured document. Skewness can occur during the scanning/image-capturing process when the paper/document is not fed into the scanner perfectly straight. Skew estimation is one of the vital tasks in document processing systems, especially for scanned document images, because its performance impacts the subsequent steps directly. I have already written two blogs on image alignment few months ago. You can find these blogs here.
- Text Document Alignment using Probabilistic Houghline Transform
- Text Documents Skewness Correction using OpenCV
In this blog, I will explain a novel skew estimation method that extracts the dominant skew angle of the given document image by applying an Adaptive Radial Projection on the 2D Discrete Fourier Magnitude spectrum. If you want to learn more about how the skewness is calculated, you can refer to the paper ADAPTIVE RADIAL PROJECTION ON FOURIER MAGNITUDE SPECTRUM FOR DOCUMENT IMAGE SKEW ESTIMATION.
In the above paper, the skew estimation process involves several key steps focusing on the use of the Fourier Transform and an adaptive radial projection method. Here’s a breakdown of the process:
- Fourier Transform: The method starts by applying a 2D Discrete Fourier Transform (DFT) to the document image. This transform converts the spatial domain of the image into the frequency domain, resulting in a spectrum where the intensity of each point represents the magnitude of a particular frequency in the image. The DFT is central to this method as it reveals the frequency components that are most important in the skewness of the image.
- Magnitude Spectrum Analysis: The next step involves analyzing the Fourier magnitude spectrum of the image. Skewness in the document image manifests as dominant orientations in this spectrum. It first identifies these orientations to estimate the skew angle.
- Adaptive Radial Projection: This is the novel part of the method. It involves two separate radial projections:
A. Initial Radial Projection: This first projection is used to estimate an initial skew angle. It does this by projecting the magnitudes of the Fourier spectrum along various radial lines emanating from the spectrum’s centre. The radial line that results in the highest projection value is indicative of the primary orientation of the text in the image, which correlates to the skew angle.
B. Correction Projection: This step refines the initial estimate. It recognizes that the initial projection might be influenced by factors like text alignment or the presence of non-text elements in the image. The correction projection adapts to these factors to offer a more accurate estimate. - Angle Calculation: Once the dominant orientation is identified through these radial projections, the corresponding skew angle is calculated. This angle represents the rotation needed to align the text in the image with the horizontal or vertical axes, effectively ‘deskewing’ the image.
- Accuracy Enhancement: The method also includes additional steps to enhance accuracy, such as considering the DC component and low frequencies in the Fourier spectrum, which are crucial for dealing with various types of document images.
Now, let’s deep dive into the implementation. I am using the following document image as the skewed input image.
First, we will calculate the magnitude of the fast Fourier transform using _get_fft_magnitude() function as below:
def _ensure_gray(image):
try:
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
except cv2.error:
pass
return image
def _ensure_optimal_square(image):
assert image is not None, image
nw = nh = cv2.getOptimalDFTSize(max(image.shape[:2]))
output_image = cv2.copyMakeBorder(
src=image,
top=0,
bottom=nh - image.shape[0],
left=0,
right=nw - image.shape[1],
borderType=cv2.BORDER_CONSTANT,
value=255,
)
return output_image
def _get_fft_magnitude(image):
gray = _ensure_gray(image)
opt_gray = _ensure_optimal_square(gray)
# thresh
opt_gray = cv2.adaptiveThreshold(
~opt_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 15, -10
)
# perform fft
dft = np.fft.fft2(opt_gray)
shifted_dft = np.fft.fftshift(dft)
# get the magnitude (module)
magnitude = np.abs(shifted_dft)
return magnitude
Now, we will calculate the skewed angle using the radial projection which projects magnitudes of the Fourier spectrum along various radial lines.
def _get_angle_radial_projection(m, angle_max=None, num=None, W=None):
"""Get angle via radial projection.
Arguments:
------------
:param angle_max :
:param num: number of angles to generate between 1 degree
:param w:
:return:
"""
assert m.shape[0] == m.shape[1]
r = c = m.shape[0] // 2
if angle_max is None:
pass
if num is None:
num = 20
tr = np.linspace(-1 * angle_max, angle_max, int(angle_max * num * 2)) / 180 * np.pi
profile_arr = tr.copy()
def f(t):
_f = np.vectorize(
lambda x: m[c + int(x * np.cos(t)), c + int(-1 * x * np.sin(t))]
)
_l = _f(range(0, r))
val_init = np.sum(_l)
return val_init
vf = np.vectorize(f)
li = vf(profile_arr)
a = tr[np.argmax(li)] / np.pi * 180
if a == -1 * angle_max:
return 0
return a
Once, we will get the skewed angle, We will use that skewed angle to correct the skewness of the above image. Here, I am representing the skewness corrected image by rotated_image variable.
def correct_text_skewness(image):
"""
Method to rotate image by n degree
:param image:
:return:
"""
# cv2_imshow(image)
h, w, c = image.shape
x_center, y_center = (w // 2, h // 2)
# Find angle to rotate image
rotation_angle = get_skewed_angle(image)
print(f"[INFO]: Rotation angle is {rotation_angle}")
# Rotate the image by given n degree around the center of the image
M = cv2.getRotationMatrix2D((x_center, y_center), rotation_angle, 1.0)
borderValue = (255, 255, 255)
rotated_image = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, borderValue=borderValue)
return rotated_image
The above input image skewness has been corrected and this is the final skew-corrected image.
In summary, This approach is very robust as it can handle a wide range of skew angles and is robust against common issues in document images, such as varying text sizes, fonts, and layouts. The adaptive nature of the radial projections makes this method flexible and accurate for different kinds of document images. The above code is available on my GitHub repository blog_code_snippets.
I hope that this blog is very helpful to detect and correct the skewness of the text-based image documents. If it is helpful for you, please don’t forget to clap and follow it with your colleagues. See you in the next blog…
References: