Speech and image processing has been under research for more than 2 decades. I remember how I struggled with the first generation dragon speech recognition service. Speech and image pattern recognition has taken leaps and bounds since then. The commercially available good example would be the Google, Cortana and Siri. Now Microsoft is working on a project code-named “Oxford”. This project oxford is a bundle of technology that aims to provide human like recognition capabilities to the gadgets of the future.
The Project Oxford is based on the same technology as the Microsoft Cortana or the Skype Translator. In addition to improvements on the speech recognition, Project Oxford offers web-based REST API for Emotion recognition, Spell Check, Video Processing, Speaker recognition, etc… The emotion recognition can scan images and recognize the emotions in the faces on the image and point them to one of the eight basic human emotions. Social media posts are already analysed for sentiments and provide great depth of intelligence for marketing. Now with videos and photos being analyzed, it will just add more details to the big data analysis engine. The project oxford is also working on a service to provide spell checking, tracking a specific person in a video footage and a recognize the speaker from audio streams.
The future just got more interesting and the future gadgets spookier.
For more details and to play with some of the recognition concepts, visit https://www.projectoxford.ai/