How extract number of DNI with Google Cloud Vision
Recently I started to investigate a little Artificial intelligence, I specifically need a tool that would allow me extract text from images because it would serve me for a project that I am doing… And since San Google has everything I found Google Cloud Vision.
Google Cloud Vision in short, it is a service that Google provide to developers and companies for image recognition. This service allows to recognize and extract texts from images, detect inappropriate content, facial detdction and identfication of logos in a image.
Let´s do it
In order to work with Python without having to configure everything from scratch, I made use of the service that Google also provides, Colaboratory… I will write about this service in another post :D.
Well… what we came to
- Install the package Google Cloud Vision
pip install google-cloud-vision
2. Import the libraries
#import api Google Cloud Vision
from google.cloud import vision#to acces funcionalities of SO
#and authenticate with Google api
import os# Use regular expresion
import re
3. Authenticate
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/content/vision_key.json'
The file vision_key.json generated when you create a project in Google Cloud
4. Using the library
vision_client = vision.ImageAnnotatorClient()
image = vision.Image()
5. Get the image to process
IMAGE_URI = 'path-image-to-proccess'
image.source.image_uri = IMAGE_URI
6. Proccesing image to detect text
response = vision_client.text_detection(image=image)
response
The answer has lot of content, but what interests us is the description…
text = response.text_annotations[0].description
print(text)
7. Extract the number of DNI
To extract only the DNI we can use regular expressions..
dni = re.findall('[0-9]{8}', text)
print (dni)
In conclusion, this tool is extremely simple to use… at least what I’m working for.