Using the scene text localization model, text areas are identified effectively by exploring each character and affinity between characters. In the second stage, a text recognition model is built, where the locations of text in the image obtained through Text Localization model are used and the text within them are transcribed. In the first stage, the text localization model is built to get text patches in the image, in terms of bounding boxes that correspond to parts of text (words or text lines).
How Scene Text Recognition (STR) model recognizes text in an image The regimes of non-conventional OCR pose novel challenges, including background separation, multiple scales of object detection, coloration, text orientation, text length diversity, font diversity, distraction objects, and occlusions. In such pictures, a primary challenge lies in appropriately segmenting objects in an image to distinguish text blocks. In contrast to structured documents, here the objective is to extract text that might be included on road signs, house numbers, storefronts, advertisements, and so on. For instance, detecting arbitrary text from natural scene images. However, there are many use cases which we may call non-traditional OCR where current conventional solutions are not exactly the correct fit. OCR in this regime requires the detection of text objects which are diverse in fonts, lengths, orientation and has other occlusions.