Recognition functions
module with data classes.
- class pygats.recog.KeypointsCluster(keypoints: list, labels: list, coord_rect: tuple)[source]
Data class for storing a cluster of keypoints, labels, and rectangle coordinates. keypoints (list): A list of keypoints representing the cluster. labels (list): A list of labels associated with the keypoints. coord_rect (list): Coordinates of the rectangle that bounds the cluster.
Expected format is (x_min, y_min, x_max, y_max).
- class pygats.recog.ROI(x: int, y: int, w: int, h: int)[source]
Data class to store coordinates of region of interest x (int), y (int): coordinates of top-left point of rectangle where text resides w (int), h (int): width and height of rectangle where text resides
- class pygats.recog.SearchedText(content: str, lang: str, area: str)[source]
Data class to store text content, language and crop area to be passed as parameters for Tesseract function
- pygats.recog.check_text(ctx, img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, txt)[source]
Checks if text (txt) exists on image (img) printed with language (lang)
- Parameters:
ctx (Context) – An object that contains information about the current context.
img (Image) – image to find text
txt (pygats.recog.SearchedText) – text to search
- pygats.recog.check_text_on_screen(ctx, txt)[source]
Checks if text (txt) exists on the screen
- Parameters:
ctx (Context) – An object that contains information about the current context.
txt (pygats.recog.SearchedText) – text to search on screenshot
- pygats.recog.click_text(ctx, txt, button='left', skip=0)[source]
Finds text on screen and press mouse button on it
- Parameters:
ctx (Context) – An object that contains information about the current context.
txt (pygats.recog.SearchedText) – text to be searched and clicked
button (string, optional) – left, right, middle
skip (int) – amount of text should be skipped
- pygats.recog.combine_lines(lines, one_word=False)[source]
Function translate lines from Tesseract output format into result tuple
- Parameters:
lines (List) – Returns result containing box boundaries, confidences, and other information.
one_word (bool, optional) – one word to search
- Returns:
list of (ROI, text) tuples
- Return type:
list
Notes
There is magic number 5 to understand if words on the same line. It should be reworked in the future.
- pygats.recog.contrast(img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>)[source]
Function that determines the minimum and maximum brightness and contrast values on the image itself. The metrics are calculated using the YCbCr color model. Image.convert supports all possible conversions between “L”, “RGB” and “CMYK”. https://pillow.readthedocs.io/en/latest/reference/Image.html#PIL.Image.Image.convert
- Parameters:
img (Image) – Pil.Image that is converted from the BGR color space to YUV
- Returns:
contr (float): contrast value on the image
- Return type:
(contr)
- pygats.recog.crop_image(img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, width: int | None = 0, height: int | None = 0, extend: bool | None = False) <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>[source]
Crops a portion of the input image based on the specified width and height multipliers. If width and height aren’t specified return an original image
- Parameters:
img (Image) – The input image to crop.
width (int, optional) – The multiplier to determine the beginning of the crop area by width.
height (int, optional) – The multiplier to determine the beginning of the crop area by height
extend (bool, optional) – Whether to extend the crop area by a factor of 2.
- Returns:
x_offset (int), y_offset (int): offset by x and y coordinates img_crop (Image): The cropped image area
- Return type:
(x_offset, y_offset, img_crop)
- pygats.recog.find_crop_image(img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, crop_area: str | None = 'all', extend: bool | None = False) <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>[source]
Detects the crop area for the input image and crops the image based on the specified crop area.
- Parameters:
img (Image) – The input image to crop.
crop_area (str, optional) – The crop area to use. Defaults to ‘all’. # noqa: DAR003
extend (bool, optional) – Whether to extend the crop area by a factor of 2.
False. (Defaults to)
- Returns:
x_offset (int), y_offset (int): offset by x and y coordinates img_crop (Image): The cropped image area
- Return type:
(x_offset, y_offset, img_crop)
- pygats.recog.find_cropped_text(ctx, img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, txt: ~pygats.recog.SearchedText, skip: int | None = 0, one_word: bool | None = False)[source]
Find text in image. Several passes are used. First time found area with text on image and then every area passed through recognition again to improve recognition results
- Parameters:
ctx (Context) – An object that contains information about the current context.
img (Image) – image to search text in
txt (SearchedText) – text to search
skip (int, optional) – number of occurrences of the text to skip.
one_word (bool, optional) – flag if only one word has been searched.
- Returns:
roi(ROI): region of interest found (bool): whether the text is found in the image
- Return type:
(roi, found)
- pygats.recog.find_fuzzy_text(recognized_list, search: str)[source]
Fuzzy search of text in list using Levenshtein ratio Return value is list of tuples with following format:
- Parameters:
recognized_list (list[tuple]) – list of text to match with pattern (format: ROI,text)
search (str) – substring to search
- Returns:
roi(ROI): region of interest text (str): full text which resides in rectangle
- Return type:
(roi,text, substring)
- pygats.recog.find_keypoints(img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>)[source]
Function that uses the SIFT algorithm to find keypoints in an image. The function returns three values, one of which contains the coordinates of the key points, which simplifies further use of the data.
- Parameters:
img (Image) – Pil.Image which is used to search for keypoints
- Returns:
keypoints (tuple): The detected keypoints descriptors (numpy.ndarray): Computed descriptors coord_list (numpy.ndarray): Array of coordinates of keypoints
- Return type:
(keypoints, descriptors, coord_list)
- pygats.recog.find_regexp_text(recognized_list: list, pattern)[source]
Find text in list by regexp Return value is list of tuples with following format
- Parameters:
recognized_list (list) – list of text to match with pattern.(format tuple: ROI,text)
pattern (str) – regexp pattern to match
- Returns:
roi(ROI): region of interest text (str): full text which resides in rectangle substring (str): substring found in text
- Return type:
(roi,text, substring)
- pygats.recog.find_text(ctx, img: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, txt, skip=0, extend=False, one_word=False)[source]
Function finds text in image with Tesseract
- Parameters:
ctx (Context) – An object that contains information about the current context.
img (Image) – image where text will be recognized
txt (pygats.recog.SearchedText) – text which fill be searched
skip (int) – amount of skipped finding
extend (bool, optional) – extended crop area
one_word (bool, optional) – one word to search
- Returns:
roi(ROI): region of interest found (bool): whether the text is found in the image
- Return type:
(roi,found)
- pygats.recog.find_text_on_screen(ctx, txt, skip=0, one_word=False)[source]
Function finds text on the screen
- Parameters:
ctx (Context) – An object that contains information about the current context.
txt (pygats.recog.SearchedText) – text to find
skip (int, optional) – amount of findings which should be skipped
one_word (bool, optional) – search only one world
- Returns:
roi(ROI): region of interest found (bool): whether the text is found in the image
- Return type:
(roi, found)
- pygats.recog.hdbscan_cluster(keypoints: tuple, coord_list: ndarray, min_cluster_size: int | None = 5, min_samples: int | float | None = None, cluster_selection_epsilon: float | None = 0.0, margins: tuple | None = (0, 0))[source]
Function that performs clusterization of keypoints using their coordinates and HDBSCAN The function is used for found coordinates and keypoints. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.HDBSCAN.html#r6f313792b2b7-5
- Parameters:
keypoints (tuple) – Distinctive points in an image
coord_list (np.ndarray) – Array of coordinates of keypoints
min_cluster_size (int) – Min number of samples that allows to consider a group as a cluster;
min_samples (int | float) – Calculate the distance between a point and its nearest neighbor
cluster_selection_epsilon (float) – Distance threshold
margins (tuple) – Tuple of values for symmetrical boundary changes along x, y
- Returns:
clusters(list): list of cluster objects containing detailed information about labels, keypoints and rectangles
- Return type:
(clusters)
- pygats.recog.image_difference(img_1: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>, img_2: <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pygats/envs/latest/lib/python3.10/site-packages/PIL/Image.py'>)[source]
Function that calculates the difference between two images and returns the coordinates of rectangles enclosing the areas where these differences are observed.
- Parameters:
img_1 (Image) – First image
img_2 (Image) – Second image
- Returns:
coord_rect(tuple): Tuple with the coordinates of all the bounding boxes that enclose the regions of difference between the two images
- Return type:
(coord_rect)
- pygats.recog.move_to_text(ctx, txt, skip=0)[source]
Finds text on the screen and moves the cursor to it
- Parameters:
ctx (Context) – An object that contains information about the current context.
txt (pygats.recog.SearchedText) – text to be searched and clicked
skip (int) – amount of text should be skipped
- pygats.recog.recognize_text(img, lang)[source]
Function recognizes text in image with Tesseract and combine lines to tuple and return lists
- Parameters:
img (PIL.Image) – image where text will be recognized
lang (string) – language of text (tesseract-ocr)
- Returns:
- x (int), y (int): coordinates of top-left point of rectangle where
text resides
w (int), h (int): width and height of rectangle where text resides text (string): full text which resides in rectangle
- Return type:
(x,y,w,h,text)
Notes
This is wrapper function to pytesseract.image_to_data. Results of image_to_data are combined to lines.