SELECTED PATENTS

Patent #10628741
Abstract: Techniques are described for machine-trained analysis for multimodal machine learning. A computing device captures a plurality of information channels, wherein the plurality of information channels includes contemporaneous audio information and video information from an individual. A multilayered convolutional computing system learns trained weights using the audio information and the video information from the plurality of information channels, wherein the trained weights cover both the audio information and the video information and are trained simultaneously, and wherein the learning facilitates emotional analysis of the audio information and the video information. A second computing device captures further information and analyzes the further information using trained weights to provide an emotion metric based on the further information.
Type: Grant
Filed: September 11, 2018
Date of Patent: April 21, 2020
Assignee: Affectiva, Inc.
Inventors: Rana el Kaliouby, Seyedmohammad Mavadati, Taniya Mishra, Timothy Peacock, Panu James Turcot
Patent #10628985
Abstract: Techniques are described for image generation for avatar image animation using translation vectors. An avatar image is obtained for representation on a first computing device. An autoencoder is trained, on a second computing device comprising an artificial neural network, to generate synthetic emotive faces. A plurality of translation vectors is identified corresponding to a plurality of emotion metrics, based on the training. A bottleneck layer within the autoencoder is used to identify the plurality of translation vectors. A subset of the plurality of translation vectors is applied to the avatar image, wherein the subset represents an emotion metric input. The emotion metric input is obtained from facial analysis of an individual. An animated avatar image is generated for the first computing device, based on the applying, wherein the animated avatar image is reflective of the emotion metric input and the avatar image includes vocalizations.
Type: Grant
Filed: November 30, 2018
Date of Patent: April 21, 2020
Assignee: Affectiva, Inc.
Inventors: Taniya Mishra, George Alexander Reichenbach, Rana el Kaliouby
Patent #10573313
Abstract: Audio analysis learning is performed using video data. Video data is obtained, on a first computing device, wherein the video data includes images of one or more people. Audio data is obtained, on a second computing device, which corresponds to the video data. A face within the video data is identified. A first voice, from the audio data, is associated with the face within the video data. The face within the video data is analyzed for cognitive content. Audio features corresponding to the cognitive content of the video data are extracted. The audio data is segmented to correspond to an analyzed cognitive state. An audio classifier is learned, on a third computing device, based on the analyzing of the face within the video data. Further audio data is analyzed using the audio classifier.
Type: Grant
Filed: February 11, 2019
Date of Patent: February 25, 2020
Assignee: Affectiva, Inc.
Inventors: Taniya Mishra, Rana el Kaliouby
Patent #10373603
Abstract: Systems, methods, and computer-readable storage devices for receiving an utterance from a user and analyzing the utterance to identify the demographics of the user. The system then analyzes the utterance to determine the prosody of the utterance, and retrieves from the Internet data associated with the determined demographics. Using the retrieved data, the system retrieves, also from the Internet, recorded speech matching the identified prosody. The recorded speech, which is based on the demographic data of the utterance and has a prosody matching the utterance, is then saved to a database for future use in generating speech specific to the user.
Type: Grant
Filed: April 24, 2017
Date of Patent: August 6, 2019
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Srinivas Bangalore, Taniya Mishra
Patent #10319370
Abstract: Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user’s social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user.
Type: Grant
Filed: May 14, 2018
Date of Patent: June 11, 2019
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Taniya Mishra, Alistair D. Conkie, Svetlana Stoyanchev
Patent #10204625
Abstract: Audio analysis learning is performed using video data. Video data is obtained, on a first computing device, wherein the video data includes images of one or more people. Audio data is obtained, on a second computing device, which corresponds to the video data. A face is identified within the video data. A first voice, from the audio data, is associated with the face within the video data. The face within the video data is analyzed for cognitive content. Audio features are extracted corresponding to the cognitive content of the video data. The audio data is segmented to correspond to an analyzed cognitive state. An audio classifier is learned, on a third computing device, based on the analyzing of the face within the video data. Further audio data is analyzed using the audio classifier.
Type: Grant
Filed: January 4, 2018
Date of Patent: February 12, 2019
Assignee: Affectiva, Inc.
Inventors: Taniya Mishra, Rana el Kaliouby
Patent #10121476
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media relating to speaker verification. In one aspect, a system receives a first user identity from a second user, and, based on the identity, accesses voice characteristics. The system randomly generates a challenge sentence according to a rule and/or grammar, based on the voice characteristics, and prompts the second user to speak the challenge sentence. The system verifies that the second user is the first user if the spoken challenge sentence matches the voice characteristics. In an enrollment aspect, the system constructs an enrollment phrase that covers a minimum threshold of unique speech sounds based on speaker-distinctive phonemes, phoneme clusters, and prosody. Then user utters the enrollment phrase and extracts voice characteristics for the user from the uttered enrollment phrase.
Type: Grant
Filed: March 21, 2016
Date of Patent: November 6, 2018
Assignee: Nuance Communications, Inc.
Inventors: Ilija Zeljkovic, Taniya Mishra, Amanda Stent, Ann K. Syrdal, Jay Wilpon
Patent #10042877
Abstract: Information is aggregated and made available to users. A system monitors over the internet a first set of external information sources for a first user based on instructions from a first user profile that specifies information to aggregate for the first user. The system detects, based on the monitoring, new data at one of the first set of information sources. The system obtains the new data at the one of the first set of information sources, independent of preferences of the one of the first set of information sources. The system updates aggregated information for the first user with the new data from the one of the first set of information sources. The updated aggregated information for the first user is made available to the first user.
Type: Grant
Filed: June 5, 2015
Date of Patent: August 7, 2018
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Junlan Feng, Srinivas Bangalore, Michael James Robert Johnston, Taniya Mishra
Patent #10002608
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.
Type: Grant
Filed: September 17, 2010
Date of Patent: June 19, 2018
Assignee: Nuance Communications, Inc.
Inventors: Srinivas Bangalore, Junlan Feng, Michael Johnston, Taniya Mishra
Patent #9984679
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an ASR model. The saliency values assigned to words within an ASR model are based on human perception judgments of previous transcripts. These saliency values are applied as weights to modify an ASR model such that the results of the weighted ASR model in converting a spoken document to a transcript provide a more accurate and useful transcription to the user.
Type: Grant
Filed: July 18, 2016
Date of Patent: May 29, 2018
Assignee: Nuance Communications, Inc.
Inventors: Andrej Ljolje, Diamantino Antonio Caseiro, Mazin Gilbert, Vincent Goffin, Taniya Mishra
Patent #9972309
Abstract: Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user’s social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialog or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user.
Type: Grant
Filed: August 5, 2016
Date of Patent: May 15, 2018
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Taniya Mishra, Alistair D. Conkie, Svetlana Stoyanchev
Patent #9799323
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.
Type: Grant
Filed: December 14, 2015
Date of Patent: October 24, 2017
Assignee: Nuance Communications, Inc.
Inventors: Alistair D. Conkie, Mark Charles Beutnagel, Taniya Mishra
Patent #9767221
Abstract: Delivering targeted content includes collecting, via at least one tangible processor, user activity data for users during a specified time period. questions asked by the users during the specified time period are extracted from the user activity data, via the at least one tangible processor, and stored in user profiles for the users. The user profiles are clustered, via the at least one tangible processor, based on the questions asked. Targeted content is delivered, via the at least one tangible processor, to a subset of the users based on the clustering.
Type: Grant
Filed: October 8, 2010
Date of Patent: September 19, 2017
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Srinivas Bangalore, Junlan Feng, Michael James Robert Johnston, Taniya Mishra
Patent #9697206
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information.
Type: Grant
Filed: October 7, 2015
Date of Patent: July 4, 2017
Assignee: Interactions LLC
Inventors: Michael J. Johnston, Srinivas Bangalore, Junlan Feng, Taniya Mishra
Patent #9633649
Abstract: Systems, methods, and computer-readable storage devices for receiving an utterance from a user and analyzing the utterance to identify the demographics of the user. The system then analyzes the utterance to determine the prosody of the utterance, and retrieves from the Internet data associated with the determined demographics. Using the retrieved data, the system retrieves, also from the Internet, recorded speech matching the identified prosody. The recorded speech, which is based on the demographic data of the utterance and has a prosody matching the utterance, is then saved to a database for future use in generating speech specific to the user.
Type: Grant
Filed: May 2, 2014
Date of Patent: April 25, 2017
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Srinivas Bangalore, Taniya Mishra
Patent #9570092
Abstract: Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A number of segments of the audio signal are analyzed based on separate lexical and acoustic evaluations, and, for each segment, an emotional state and a confidence score of the emotional state are determined. A current emotional state of the audio signal is tracked for each of the number of segments. For a particular segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and a comparison of the confidence score of the particular segment to a predetermined threshold.
Type: Grant
Filed: April 26, 2016
Date of Patent: February 14, 2017
Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.
Inventors: Dimitrios Dimitriadis, Mazin E. Gilbert, Taniya Mishra, Horst J. Schroeter
Patent #9431009
Abstract: Systems, methods, and computer-readable storage media relate to performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level.
Type: Grant
Filed: September 8, 2014
Date of Patent: August 30, 2016
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Srinivas Bangalore, Taniya Mishra
Patent #9412358
Abstract: Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user’s social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user.
Type: Grant
Filed: May 13, 2014
Date of Patent: August 9, 2016
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Taniya Mishra, Alistair D. Conkie, Svetlana Stoyanchev
Patent #9396725
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an ASR model. The saliency values assigned to words within an ASR model are based on human perception judgments of previous transcripts. These saliency values are applied as weights to modify an ASR model such that the results of the weighted ASR model in converting a spoken document to a transcript provide a more accurate and useful transcription to the user.
Type: Grant
Filed: May 27, 2014
Date of Patent: July 19, 2016
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Andrej Ljolje, Diamantino Antonio Caseiro, Mazin Gilbert, Vincent Goffin, Taniya Mishra
Patent #9355650

Abstract: Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A plurality of segments of the audio signal is received, with the plurality of segments being sequential. Each segment of the plurality of segments is analyzed, and, for each segment, an emotional state and a confidence score of the emotional state are determined. The emotional state and the confidence score of each segment are sequentially analyzed, and a current emotional state of the audio signal is tracked throughout each of the plurality of segments. For each segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and the confidence score of the segment.

Type: Grant
Filed: May 4, 2015
Date of Patent: May 31, 2016
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Dimitrios Dimitriadis, Mazin E. Gilbert, Taniya Mishra, Horst J. Schroeter
Patent #8738375
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media relating to speaker verification. In one aspect, a system receives a first user identity from a second user, and, based on the identity, accesses voice characteristics. The system randomly generates a challenge sentence according to a rule and/or grammar, based on the voice characteristics, and prompts the second user to speak the challenge sentence. The system verifies that the second user is the first user if the spoken challenge sentence matches the voice characteristics. In an enrollment aspect, the system constructs an enrollment phrase that covers a minimum threshold of unique speech sounds based on speaker-distinctive phonemes, phoneme clusters, and prosody. Then user utters the enrollment phrase and extracts voice characteristics for the user from the uttered enrollment phrase.
Type: Grant
Filed: November 24, 2010
Date of Patent: April 19, 2016
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Ilija Zeljkovic, Taniya Mishra, Amanda Stent, Ann K. Syrdal, Jay Wilpon
Patent #9240180

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.

Type: Grant
Filed: December 1, 2011
Date of Patent: January 19, 2016
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Alistair D. Conkie, Mark Charles Beutnagel, Taniya Mishra
Patent #9218815
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for performing speaker verification. A system configured to practice the method receives a request to verify a speaker, generates a text challenge that is unique to the request, and, in response to the request, prompts the speaker to utter the text challenge. Then the system records a dynamic image feature of the speaker as the speaker utters the text challenge, and performs speaker verification based on the dynamic image feature and the text challenge. Recording the dynamic image feature of the speaker can include recording video of the speaker while speaking the text challenge. The dynamic feature can include a movement pattern of head, lips, mouth, eyes, and/or eyebrows of the speaker. The dynamic image feature can relate to phonetic content of the speaker speaking the challenge, speech prosody, and the speaker’s facial expression responding to content of the challenge.
Type: Grant
Filed: November 24, 2014
Date of Patent: December 22, 2015
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Ann K. Syrdal, Sumit Chopra, Patrick Haffner, Taniya Mishra, Ilija Zeljkovic, Eric Zavesky
Patent #9189483
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information.
Type: Grant
Filed: March 19, 2013
Date of Patent: November 17, 2015
Assignee: Interactions LLC
Inventors: Michael Johnston, Srinivas Bangalore, Junlan Feng, Taniya Mishra
Patent #9076146
Abstract: Aggregating information includes configuring, by at least one processor, a user profile that indicates user preferences for aggregated information. The at least one processor monitors information sources including the World Wide Web, business websites of interest, and online social media, based on the user preferences. Data obtained from the information sources is presented, based on the monitoring, by the at least one processor, in accordance with a presentation format, as the aggregated information, based on the user preferences. The at least one processor triggers updating of the presented aggregated information based on a change to the data at least one of the information sources and a change to the user profile.
Type: Grant
Filed: October 15, 2010
Date of Patent: July 7, 2015
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Junlan Feng, Srinivas Bangalore, Michael James Robert Johnston, Taniya Mishra
Patent #9047871

Abstract: Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A plurality of segments of the audio signal is received, with the plurality of segments being sequential. Each segment of the plurality of segments is analyzed, and, for each segment, an emotional state and a confidence score of the emotional state are determined. The emotional state and the confidence score of each segment are sequentially analyzed, and a current emotional state of the audio signal is tracked throughout each of the plurality of segments. For each segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and the confidence score of the segment.

Type: Grant
Filed: December 12, 2012
Date of Patent: June 2, 2015
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Dimitrios Dimitriadis, Mazin E. Gilbert, Taniya Mishra, Horst J. Schroeter
Patent #8897500

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for performing speaker verification. A system configured to practice the method receives a request to verify a speaker, generates a text challenge that is unique to the request, and, in response to the request, prompts the speaker to utter the text challenge. Then the system records a dynamic image feature of the speaker as the speaker utters the text challenge, and performs speaker verification based on the dynamic image feature and the text challenge. Recording the dynamic image feature of the speaker can include recording video of the speaker while speaking the text challenge. The dynamic feature can include a movement pattern of head, lips, mouth, eyes, and/or eyebrows of the speaker. The dynamic image feature can relate to phonetic content of the speaker speaking the challenge, speech prosody, and the speaker’s facial expression responding to content of the challenge.

Type: Grant
Filed: May 5, 2011
Date of Patent: November 25, 2014
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Ann K. Syrdal, Sumit Chopra, Patrick Haffner, Taniya Mishra, Ilija Zeljkovic, Eric Zavesky
Patent #8831944
Abstract: Disclosed herein are systems, methods, and computer-readable storage media for performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level.
Type: Grant
Filed: December 15, 2009
Date of Patent: September 9, 2014
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Srinivas Bangalore, Taniya Mishra
Patent #8738375
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an ASR model. The saliency values assigned to words within an ASR model are based on human perception judgments of previous transcripts. These saliency values are applied as weights to modify an ASR model such that the results of the weighted ASR model in converting a spoken document to a transcript provide a more accurate and useful transcription to the user.
Type: Grant
Filed: May 9, 2011
Date of Patent: May 27, 2014
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Andrej Ljolje, Diamantino Antonio Caseiro, Mazin Gilbert, Vincent Goffin, Taniya Mishra
Patent #8401853
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information.
Type: Grant
Filed: September 22, 2010
Date of Patent: March 19, 2013
Assignee: AT&T Intellectual Property I, L.P.
Inventors: Michael Johnston, Srinivas Bangalore, Junlan Feng, Taniya Mishra

Let's connect: