Recognition functions

module with data classes.

class pygats.recog.KeypointsCluster(keypoints: list, labels: list, coord_rect: tuple)[source]

Data class for storing a cluster of keypoints, labels, and rectangle coordinates. keypoints (list): A list of keypoints representing the cluster. labels (list): A list of labels associated with the keypoints. coord_rect (list): Coordinates of the rectangle that bounds the cluster.

Expected format is (x_min, y_min, x_max, y_max).

__repr__()[source]

Returns a string representation of the KeypointCluster instance, including keypoints, labels, and rectangle coordinates.

class pygats.recog.ROI(x: int, y: int, w: int, h: int)[source]

Data class to store coordinates of region of interest x (int), y (int): coordinates of top-left point of rectangle where text resides w (int), h (int): width and height of rectangle where text resides

rectangle_center_coords()[source]

return center of the rectangle

Returns:

coordinates of the rectangle center

Return type:

tuple

class pygats.recog.SearchedText(content: str, lang: str, area: str)[source]

Data class to store text content, language and crop area to be passed as parameters for Tesseract function

pygats.recog.check_text(ctx, img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, txt)[source]

Checks if text (txt) exists on image (img) printed with language (lang)

Parameters:
  • ctx (Context) – An object that contains information about the current context.

  • img (Image) – image to find text

  • txt (pygats.recog.SearchedText) – text to search

pygats.recog.check_text_on_screen(ctx, txt)[source]

Checks if text (txt) exists on the screen

Parameters:
pygats.recog.click_text(ctx, txt, button='left', skip=0)[source]

Finds text on screen and press mouse button on it

Parameters:
  • ctx (Context) – An object that contains information about the current context.

  • txt (pygats.recog.SearchedText) – text to be searched and clicked

  • button (string, optional) – left, right, middle

  • skip (int) – amount of text should be skipped

pygats.recog.combine_lines(lines, one_word=False)[source]

Function translate lines from Tesseract output format into result tuple

Parameters:
  • lines (List) – Returns result containing box boundaries, confidences, and other information.

  • one_word (bool, optional) – one word to search

Returns:

list of (ROI, text) tuples

Return type:

list

Notes

There is magic number 5 to understand if words on the same line. It should be reworked in the future.

pygats.recog.contrast(img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>)[source]

Function that determines the minimum and maximum brightness and contrast values on the image itself. The metrics are calculated using the YCbCr color model. Image.convert supports all possible conversions between “L”, “RGB” and “CMYK”. https://pillow.readthedocs.io/en/latest/reference/Image.html#PIL.Image.Image.convert

Parameters:

img (Image) – Pil.Image that is converted from the BGR color space to YUV

Returns:

contr (float): contrast value on the image

Return type:

(contr)

pygats.recog.crop_image(img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, width: int | None = 0, height: int | None = 0, extend: bool | None = False) <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>[source]

Crops a portion of the input image based on the specified width and height multipliers. If width and height aren’t specified return an original image

Parameters:
  • img (Image) – The input image to crop.

  • width (int, optional) – The multiplier to determine the beginning of the crop area by width.

  • height (int, optional) – The multiplier to determine the beginning of the crop area by height

  • extend (bool, optional) – Whether to extend the crop area by a factor of 2.

Returns:

x_offset (int), y_offset (int): offset by x and y coordinates img_crop (Image): The cropped image area

Return type:

(x_offset, y_offset, img_crop)

pygats.recog.find_crop_image(img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, crop_area: str | None = 'all', extend: bool | None = False) <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>[source]

Detects the crop area for the input image and crops the image based on the specified crop area.

Parameters:
  • img (Image) – The input image to crop.

  • crop_area (str, optional) – The crop area to use. Defaults to ‘all’. # noqa: DAR003

  • extend (bool, optional) – Whether to extend the crop area by a factor of 2.

  • False. (Defaults to)

Returns:

x_offset (int), y_offset (int): offset by x and y coordinates img_crop (Image): The cropped image area

Return type:

(x_offset, y_offset, img_crop)

pygats.recog.find_cropped_text(ctx, img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, txt: ~pygats.recog.SearchedText, skip: int | None = 0, one_word: bool | None = False)[source]

Find text in image. Several passes are used. First time found area with text on image and then every area passed through recognition again to improve recognition results

Parameters:
  • ctx (Context) – An object that contains information about the current context.

  • img (Image) – image to search text in

  • txt (SearchedText) – text to search

  • skip (int, optional) – number of occurrences of the text to skip.

  • one_word (bool, optional) – flag if only one word has been searched.

Returns:

roi(ROI): region of interest found (bool): whether the text is found in the image

Return type:

(roi, found)

pygats.recog.find_fuzzy_text(recognized_list, search: str)[source]

Fuzzy search of text in list using Levenshtein ratio Return value is list of tuples with following format:

Parameters:
  • recognized_list (list[tuple]) – list of text to match with pattern (format: ROI,text)

  • search (str) – substring to search

Returns:

roi(ROI): region of interest text (str): full text which resides in rectangle

Return type:

(roi,text, substring)

pygats.recog.find_keypoints(img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>)[source]

Function that uses the SIFT algorithm to find keypoints in an image. The function returns three values, one of which contains the coordinates of the key points, which simplifies further use of the data.

Parameters:

img (Image) – Pil.Image which is used to search for keypoints

Returns:

keypoints (tuple): The detected keypoints descriptors (numpy.ndarray): Computed descriptors coord_list (numpy.ndarray): Array of coordinates of keypoints

Return type:

(keypoints, descriptors, coord_list)

pygats.recog.find_regexp_text(recognized_list: list, pattern)[source]

Find text in list by regexp Return value is list of tuples with following format

Parameters:
  • recognized_list (list) – list of text to match with pattern.(format tuple: ROI,text)

  • pattern (str) – regexp pattern to match

Returns:

roi(ROI): region of interest text (str): full text which resides in rectangle substring (str): substring found in text

Return type:

(roi,text, substring)

pygats.recog.find_text(ctx, img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, txt, skip=0, extend=False, one_word=False)[source]

Function finds text in image with Tesseract

Parameters:
  • ctx (Context) – An object that contains information about the current context.

  • img (Image) – image where text will be recognized

  • txt (pygats.recog.SearchedText) – text which fill be searched

  • skip (int) – amount of skipped finding

  • extend (bool, optional) – extended crop area

  • one_word (bool, optional) – one word to search

Returns:

roi(ROI): region of interest found (bool): whether the text is found in the image

Return type:

(roi,found)

pygats.recog.find_text_on_screen(ctx, txt, skip=0, one_word=False)[source]

Function finds text on the screen

Parameters:
  • ctx (Context) – An object that contains information about the current context.

  • txt (pygats.recog.SearchedText) – text to find

  • skip (int, optional) – amount of findings which should be skipped

  • one_word (bool, optional) – search only one world

Returns:

roi(ROI): region of interest found (bool): whether the text is found in the image

Return type:

(roi, found)

pygats.recog.hdbscan_cluster(keypoints: tuple, coord_list: ndarray, min_cluster_size: int | None = 5, min_samples: int | float | None = None, cluster_selection_epsilon: float | None = 0.0, margins: tuple | None = (0, 0))[source]

Function that performs clusterization of keypoints using their coordinates and HDBSCAN The function is used for found coordinates and keypoints. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.HDBSCAN.html#r6f313792b2b7-5

Parameters:
  • keypoints (tuple) – Distinctive points in an image

  • coord_list (np.ndarray) – Array of coordinates of keypoints

  • min_cluster_size (int) – Min number of samples that allows to consider a group as a cluster;

  • min_samples (int | float) – Calculate the distance between a point and its nearest neighbor

  • cluster_selection_epsilon (float) – Distance threshold

  • margins (tuple) – Tuple of values for symmetrical boundary changes along x, y

Returns:

clusters(list): list of cluster objects containing detailed information about labels, keypoints and rectangles

Return type:

(clusters)

pygats.recog.image_difference(img_1: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, img_2: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>)[source]

Function that calculates the difference between two images and returns the coordinates of rectangles enclosing the areas where these differences are observed.

Parameters:
  • img_1 (Image) – First image

  • img_2 (Image) – Second image

Returns:

coord_rect(tuple): Tuple with the coordinates of all the bounding boxes that enclose the regions of difference between the two images

Return type:

(coord_rect)

pygats.recog.move_to_text(ctx, txt, skip=0)[source]

Finds text on the screen and moves the cursor to it

Parameters:
  • ctx (Context) – An object that contains information about the current context.

  • txt (pygats.recog.SearchedText) – text to be searched and clicked

  • skip (int) – amount of text should be skipped

pygats.recog.recognize_text(img, lang)[source]

Function recognizes text in image with Tesseract and combine lines to tuple and return lists

Parameters:
  • img (PIL.Image) – image where text will be recognized

  • lang (string) – language of text (tesseract-ocr)

Returns:

x (int), y (int): coordinates of top-left point of rectangle where

text resides

w (int), h (int): width and height of rectangle where text resides text (string): full text which resides in rectangle

Return type:

(x,y,w,h,text)

Notes

This is wrapper function to pytesseract.image_to_data. Results of image_to_data are combined to lines.

pygats.recog.recognize_text_with_data(img, lang)[source]

Functions recognize all texts on the image with Tesseract

Parameters:
  • img (PIL.Image) – input image to recognize text

  • lang (string) – language in tesseract format

Returns:

recognized text

Return type:

list