Electronic de-skewing has two components, skew measurement and skew removal. Electronic skew removal may either be performed on grayscale data or binary data.
A variety of techniques for skew measurement have been developed. These may be broadly classified into three groups:
Specially constructed, pre-printed documents such as forms may be designed to incorporate registration marks or other fiducial marks specifically for use by the scanner or by forms processing OCR software.
This is a very reliable method, but may not be used for handling general document input over which the scanning organization has no control.
This is the most common and most reliable method. It works on raw, uncropped scanner data from scanners where the region beyond the edges of the document is black, in contrast to a not all-black document.
An determination of the image skew via analysis of the document surround relies on the assumption that the textual data on the page is not skewed relative to the paper on which it has been printed or photocopied. This is a good assumption in some application environments (such as Census forms, books and bank checks), but can be a poor assumption for applications such as tax forms, which can be multiple generation photocopies with some non-negligible frequency. It is worth noting that a similar assumption is inherent in all mechanical de-skew methods.
Either the entire document surround or just the leading edge of the document may be used. The advantage of the leading edge measurement is that only a small image data buffer is required to hold data while the measurement is being made.
If the entire document surround is used, a full-image buffer is required, but some useful additional information for Automatic Image Quality Analysis (AIQA) purposes can be obtained, such as whether the document has a folded corner.
A variety of methods exist for measuring skew from the image data itself.
Most of these methods are not fast, requiring a full-image buffer (or many of them); some cannot happen in close to real time.
These methods may be broadly classified into two groups: those based on projection profiles or nearest neighbor angles and those based on the Hough transform or related techniques.
A list of some papers on this topic is found in the Resources/References section of this white paper.
Though it is an interesting technical problem, skew measurement from image data is seldom necessary when the raw data is available from the scanner. Measuring the black pattern surrounding the document is simpler, more reliable and produces results as least as good as mechanical deskew methods, since both use paper edge alignment as a definition of zero skew.
The most widely deployed form of electronic de-skewing uses binary, 1 bit per pixel data.
These techniques may use either
Binary de-skewing can be accomplished by software, sometimes long after the scanning has been completed, as long as no portion of the document was cropped off at scan time.
The key fact to remember is that binary de-skewing does not remove the jags placed into straight lines of the document but rather adds compensating jags in the opposite direction. These extra jags do not occur exactly where the original jags were located, so the number of jags is doubled by the de-skew process.
Binary de-skewing produces an image whose features, when viewed at a zoomed-out, full-page view, are nominally straight relative to the bounding rectangle of the image. This is pleasing to the eye and assists forms template software in its work.
When viewed at a closer view, at a higher zoom level, however, it becomes apparent that features in the image are actually twice as ragged as they were in the un-de-skewed image.
Depending on the type of recognition algorithm used, an OCR program may either perform more poorly on a binary de-skewed image or else it will see no effect from the distortions introduced by that process. In no case will a properly designed OCR program (one which tracks slanted baselines) perform better on binary de-skewed characters.
As a general rule, it is a better procedure to leave OCR-bound images alone than to binary de-skew them. One could imagine an OCR program which could benefit from receiving a measurement of the skew as a parameter so that it might better recognize characters as co-linear; some programs operate this way internally.
A variety of methods exist for binary de-skewing.
These techniques typically rely on a variation of the Bresenham algorithm, which formulates a needed skew adjustment as a pair of pixel counts along the row and column axes of the scanning device. This is similar to the way a builder sets the pitch of a roof using different measurements along the two legs of a builder's square.
The simplest application of this technique does not actually result in a rotation of the document, but rather in a shearing, with an inherent inaccuracy proportional to the angle of the skew.
By simply changing where one retrieves a given pixel at read time via manipulation of row and column addresses, the slant of the top and bottom edges of the document may be removed to +/- one pixel accuracy. A similar shear on the other axis will make the left and right document edges appear straight.
In this method, by using the sine and cosine of the skew angle, the ideal coordinate of the pixel in the input grid is found which is the source of the current output pixel being generated.
The nearest neighbor method is then typically used, using the black or white state of the actual pixel nearest to that ideal coordinate to set the color of the output pixel.
This method can be improved upon somewhat by taking a combination of the pixels nearest to that ideal location to create the output pixel.
By an appropriate successive application of one dimensional shears, it is possible to accomplish a good approximation to a true rotation. These methods are not described here.
Grayscale de-skewing gets over many of the drawbacks of binary de-skewing, producing images which look as if they were fed perfectly orthogonally, without any skew. The resulting binary images have no jags in them beyond what are introduced by the binarization or thresholding process itself.
Grayscale images resulting from a grayscale de-skew algorithm look similarly unflawed; the algorithm used is highly accurate, not just accurate enough to have its flaws effectively hidden by binarization.
Grayscale de-skewing requires hardware to cope with the large number of calculations per pixel used in the algorithm. It also requires access to the raw or compressed grayscale data, making it best applied either in the scanner or in a scanner interface board having access to that data.
The algorithm used for grayscale de-skewing uses bi-linear interpolation, where a pixel in the rotated output image is formed from a weighted blending of its nearest neighbors in the original image. Higher-order blending algorithms can be used which use a larger neighborhood, but no perceptible difference results from this effort.