OpenLink Software

About: Image binarization is one of the most relevant preprocessing operations influencing the results of further image analysis conducted for many purposes. During this step a significant loss of information occurs and the use of inappropriate thresholding methods may cause difficulties in further shape analysis or even make it impossible to recognize different shapes of objects or characters. Some of the most typical applications utilizing the analysis of binary images are Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), which may also be applied for unevenly illuminated natural images, as well as for challenging degraded historical document images, considered as typical benchmarking tools for image binarization algorithms. To face the still valid challenge of relatively fast and simple, but robust binarization of degraded document images, a novel two-step algorithm utilizing initial thresholding, based on the modelling of the simplified image histogram using Gaussian Mixture Model (GMM) and the Monte Carlo method, is proposed in the paper. This approach can be considered as the extension of recently developed image preprocessing method utilizing Generalized Gaussian Distribution (GGD), based on the assumption of its similarity to the histograms of ground truth binary images distorted by Gaussian noise. The processing time of the first step, producing the intermediate images with partially removed background information, may be significantly reduced due to the use of the Monte Carlo method. The proposed improved approach leads to even better results, not only for well-known DIBCO benchmarking databases, but also for more demanding Bickley Diary dataset, allowing the use of some well-known classical binarization methods, including the global ones, in the second step of the algorithm.

 Permalink

an Entity references as follows:

Faceted Search & Find service v1.13.91

Alternative Linked Data Documents: Sponger | ODE     Raw Data in: CXML | CSV | RDF ( N-Triples N3/Turtle JSON XML ) | OData ( Atom JSON ) | Microdata ( JSON HTML) | JSON-LD    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] This material is Open Knowledge Creative Commons License Valid XHTML + RDFa
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Copyright © 2009-2025 OpenLink Software