LANGAWARE FOR
Pioneering Research

We are dedicated to setting the industry standard in AI technology research, and playing a pivotal role in harnessing voice and language biomarkers for cognitive and mental health assessment and overall well-being.

Linguistic cues for automatic assessment of Alzheimer’s disease across languages
Vassiliki Rentoumi, Evangelos Vassiliou, Nikiforos Pittaras, Admir Demiraj, Manolis Papageorgiou, Dimitra Sali, Athina Papatriantafyllou, Panagiotis Griziotis, Artemis Chardouveli, Konstantinos Pattakos, and George Paliouras

Abstract

 

Background
Most common forms of dementia, including Alzheimer’s disease, are associated with alterations in spoken language.

 

Objective
This study explores the potential of a speech-based machine learning (ML) approach in estimating cognitive impairment, using inputs of speech audio recordings.

 

Methods
We develop an automatic ML pipeline that ingests multimodal inputs of audio and transcribed text, mapping speech and language to domain-specific biomarkers optimized for high explainability and predictive ability. The resulting features are fed through a multi-stage pipeline to determine efficient classification configurations.

 

Results
We evaluated the system on large real-world datasets, achieving above 90% and 70% weighted average F1 scores for two-class (AD versus normal controls) and three-class (AD versus mild cognitive impairment versus normal controls) classification tasks, respectively. Model performance remains stable across different population characteristics.

 

Conclusions
The study introduces a robust, non-invasive method for gauging the cognitive status of AD and MCI patients from speech samples, with the potential of generalizing effectively to multiple types of diseases/disorders which may burden language.

Towards Automatic Early Detection: Assessing LANGaware’s Language and Speech Biomarkers in Neurocognitive and Affective Disorders
Vassiliki Rentoumi, Evangelos Vassiliou, Nikiforos Pittaras, Admir Demiraj, George Paliouras, Dimitra Sali

Abstract

 

Background: Recent advancements in automatic language and speech analysis, coupled with machine learning (ML) methods, showcase the effectiveness of digital biomarkers in non-invasively detecting subtle changes in cognitive status. While successfully distinguishing between Alzheimer’s Disease (AD) and Normal Control (NC) individuals, classifying Mild Cognitive Impairment (MCI) proves to be a more challenging task. MCI can progress to AD or result from various factors, including affective disorders, necessitating multiple expert examinations for accurate detection. Building upon previous research, we create an experimental setup to assess LANGaware’s biomarkers pool on three objectives: a) binary separation into Dementia and NC cohorts, b) broad three-class separation into Dementia, NC, and MCI groups, c) binary differentiation into Depression coupled with Anxiety disorder and NC cohorts.

 

Method: Patient audio recordings and ASR-generated transcripts were fed into LANGaware’s multimodal ML pipeline, extracting hundreds of linguistic and audio features, distilled into interpretable categories with a neural network assigning weights. These categorical values served as inputs to a final neural network layer generating probabilities for target labels (Dementia, NC). Similar methodologies were applied to our second (Dementia vs MCI vs NC) and third discrimination task (Depression/Anxiety vs NC), where the neural network allocated varying weights to input features for each of the aforementioned cases.

 

Result: In all scenarios, data were split into a 70% training set and a 30% testing set, validated against medical expert diagnosis. For binary separation, with 2927 Dementia and 815 NC instances, the model demonstrated 89% accuracy and an 85% macro-averaged F1 score. For three-class separation (3752 Dementia, 1117 NC, 5993 MCI instances), the model achieved 70% accuracy and a 71% F1 score. Discriminating affective disorders (1016 Depression/Anxiety, 1630 NC instances) resulted in 71% accuracy and a 71% F1 score.

 

Conclusion: The assessment suggests that our modelling approach aptly discerns language and speech patterns, distinguishing individuals with MCI from those with Dementia or in optimal health (NC). These outcomes contribute significantly to automatic evaluation, offering early diagnosis and timely treatment access. Our third experiment showcases the methodology’s applicability in detecting affective disorders, specifically Depression and Anxiety, which may co-occur with or precede MCI.

LANGaware: Robust Multimodal Machine Learning Methods for the Early Detection of Neurodegenerative and Psychiatric Diseases
Vassiliki Rentoumi, Evangelos Vassiliou, Nikiforos Pittaras, Admir Demiraj, Martha Alexandridou, Maria Danezi, Maria Hatzopoulou, Vassiliki Kamtsadeli, John D Papatriantafyllou, George Paliouras

Abstract

 

Background
Multiple neurodegenerative and psychiatric diseases can affect speech, manifesting as subtle changes in spoken language. We perform a rigorous evaluation of a Machine Learning (ML) pipeline on a large-scale multiaxial experimental setting, to investigate its capacity for detecting early signs of cognitive decline as manifested in speech. Using a diverse dataset of speech samples spanning different languages, patient cohorts and cognitive status diagnoses, we propose that the developed pipeline showcases robust performance, facilitating efficient decision support for early cognitive decline detection.

 

Method
We build upon previous research efforts on data-driven cognitive decline prediction. Random forest models are constructed from a dataset of more than 3000 audio recordings and ASR-generated transcripts, constituting cognitive assessment test tasks (e.g. picture description, everyday activity recounting, etc.). A multimodal NLP and audio analysis procedure derives sophisticated predictive biomarkers, using three methods for feature selection, feature decorrelation preprocessing and rigorous, grid search-based hyperparameter optimization. We introduce novel experimental axes of investigation, including the spoken language (English and Greek), single and multi-task verbal task settings, and different neurodegenerative diseases for the pathological cohorts (Alzheimer’s Disease or Dementia, along with age and education-matched healthy cohorts). Four different NLP analysis tools for tokenization, syntax and dependency extraction are considered, using both ML and Deep Learning (DL) techniques. Model stability and generalization is supported via 5-fold cross-validation and the use of a separate test set.

 

Result
Results are validated against medical expert diagnosis. Αverage performance across all investigative axes reaches high scores for accuracy (0.73), sensitivity (0.72), specificity (0.74), precision (0.77), F1 (0.74), and AUC (0.78). Additional findings suggest that single-task models outperform multitask ones (e.g. reaching AUC scores of 0.85). Moreover, cognitive decline detection is more efficient for pathological cohorts of Alzheimer’s, versus patients with Dementias.

 

Conclusion
The broad empirical evaluation verifies the effectiveness of the deployed ML system, adding evidence in favor of multimodal AI solutions for predicting early cognitive decline and showcasing their robustness in diverse evaluation settings. Future work includes leveraging DL methods for classification and evaluation on differential diagnosis discrimination tasks.

LANGaware: Introducing the right solution for the early detection of neurodegenerative and psychiatric diseases
Vassiliki Rentoumi, Evangelos Vassiliou, Admir Demiraj, Nikiforos Pittaras, Petros Mandalis, Martha Alexandridou, Hollie Kemp, Ιoanna Eleftheriou, Maria Danezi, Maria Hatzopoulou, Vasiliki Kamtsadeli, George Paliouras, John D Papatriantafyllou

Abstract

 

Abstract
Background: There are multiple neurodegenerative diseases that directly affect speech [1]. However, its utilization as a robust indicator for cognitive impairment is under-investigated. In many cases, mild cognitive decline progresses to a neurodegenerative disease and its detection is of utmost importance, since it is at this stage that treatment is most effective. One of our core goals is developing techniques for differentiating between patients with cognitive decline and healthy cohorts, by utilizing only speech samples [2]. Such samples are obtained from verbal elicitation tasks designed for cognitive assessment, e.g. picture descriptions and narration of everyday activities.

 

Method: Audio recordings from cognitive assessment tasks are fed through our platform to a Natural Language Processing and Machine Learning pipeline, employing an automatic discovery procedure of predictive salient biomarkers, to train an advanced classification system. The biomarker collection includes features that characterize voice, speech, language structure, composition and usage, and are engineered to highlight symptoms of neurodegenerative disorders. Biomarkers undergo multiple stages of filtering, processing and transformation to train and fine-tune the final model. Diagnostic performance of the output classifier is obtained on an unseen test set, to ensure a robust generalization of the platform.

 

Result: Our platform utilizes the set of automatically selected, cross-linguistic digital biomarkers to obtain sensitivity and specificity scores of 81% and 84% respectively, compared against medical expert diagnosis. The most salient biomarkers with respect to identifying our pathological cohort, relate to feature categories of Syntactic Complexity, Content Word usage, Lexical Repetition, Syntactic Errors and Function Words, with weight contributions of ∼ 16%, 14%, 13%, 12% and 11% respectively. Our platform provides additional, detailed population and patient-based descriptive analytics to enhance transparency and explainability of the results.

 

Conclusion: Early detection of cognitive decline facilitates early intervention, treatment and proactive care, delaying disease progression and reducing symptom severity. We believe that our platform provides an effective solution for risk factor estimation and our findings incentivize further research into speech analysis techniques for the prediction of cognitive decline.

[1] Boschi, Veronica, et al., *Frontiers in Psychology* 8 (2017): 269.
[2] Vassiliki Rentoumi et al., *Alzheimer’s & Dementia*, Wiley, volume 16, 2020.

Multilingual System for Early Detection of Neurodegenerative and Psychiatric Disorders
Lang Aware , Inc.

Abstract

The present disclosure provides a system for predicting a disease state based on speech occurrences . A feature extraction module extracts a plurality of lingual features from a speech record of the speech occurrence. The lingual features are chosen based on a correlation between the lingual features and the disease state in at least a first language and a second language . The lingual features are consistent for transcripts in at least the first language and the second language . A prediction module including a trained classification model generates a prediction of the disease state for speech occurrences in at least the first language and the second language using the lingual features extracted from the speech records .

Automatic Detection of Linguistic indicators as a means of early detection of Alzheimer’s disease and of related dementias: A Computational Linguistics analysis
Vassiliki Rentoumi, George Paliouras, Dimitra Arfani, Katerina Fragkopoulou, Spyridoula Varlokosta, Eva Danasi, Spyros Papadatos

Abstract

In the present study, we analyzed written samples obtained from Greek native speakers diagnosed with Alzheimer’s in mild and moderate stages and from age matched cognitively normal controls (NC). We adopted a computational approach for the comparison of morpho-syntactic complexity and lexical variety in the samples. We used text classification approaches to assign the samples to one of the two groups. The classifiers were tested using various features: morpho-syntactic and lexical characteristics. The proposed method excels in discerning AD patients in mild and moderate stages from NC leading to the in-depth understanding of language deficits.

Features and Machine Learning Classification of Connected Speech Samples from Patients with Autopsy Proven Alzheimer’s Disease with and without Additional Vascular Pathology
Vassiliki Rentoumi, Ladan Raoufiana, Samrah Ahmedb, Celeste A. de Jagerc and Peter Garrarda

Abstract

Mixed vascular and Alzheimer-type dementia and pure Alzheimer’s disease are both associated with changes in spoken language. These changes have, however, seldom been subjected to systematic comparison. In the present study, we analyzed language samples obtained during the course of a longitudinal clinical study from patients in whom one or other pathology was verified at post mortem. The aims of the study were twofold: first, to confirm the presence of differences in language produced by members of the two groups using quantitative methods of evaluation; and secondly to ascertain the most informative sources of variation between the groups. We adopted a computational approach to evaluate digitized transcripts of connected speech along a range of language-related dimensions. We then used machine learning text classification to assign the samples to one of the two pathological groups on the basis of these features. The classifiers’ accuracies were tested using simple lexical features, syntactic features, and more complex statistical and information theory characteristics. Maximum accuracy was achieved when word occurrences and frequencies alone were used. Features based on syntactic and lexical complexity yielded lower discrimination scores, but all combinations of features showed significantly better performance than a baseline condition in which every transcript was assigned randomly to one of the two classes. The classification results illustrate the word content specific differences in the spoken language of the two groups. In addition, those with mixed pathology were found to exhibit a marked reduction in lexical variation and complexity compared to their pure AD counterparts.

Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse
Peter Garrard, Vassiliki Rentoumi, Benno Gesierich, Bruce Miller and Maria Luisa Gorno-Tempini

Abstract

Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can ‘learn’ from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age- matched controls; and SD patients with left- (n 1⁄4 21) versus right-predominant (n 1⁄4 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of meta narrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction.

Linguistic biomarkers of Hubris syndrome
Peter Garrard, Vassiliki Rentoumi, Christian Lambert and David Owen

Abstract

Owen and Davidson coined the term ‘Hubris Syndrome’ (HS) for a characteristic pattern of exuberant self-confidence, recklessness, and contempt for others, shown by some individuals holding substantial power. Meaning, emotion and attitude are communicated intentionally through language, but psychological and cognitive changes can be reflected in more subtle ways, of which a speaker remains unaware. Of the fourteen symptoms of HS, four imply lexical choices: use of the third person/‘royal we’; excessive confidence; exaggerated self-belief; and supposed accountability to God or History. One other feature (recklessness) could influence language complexity if impulsivity leads to unpredictability. These hypotheses were tested by examining transcribed spoken discourse samples produced by two British Prime Ministers (Margaret Thatcher and Tony Blair) who were said to meet criteria for HS, and one (John Major) who did not. We used Shannon entropy to reflect informational complexity, and temporal correlations (words or phrases whose relative frequency correlated negatively with time in office) and keyness values to identify lexical choices corresponding to periods during which HS was evident. Entropy fluctuated in all three subjects, but consistent (upward) trends in HS-positive subjects corresponded to periods of hubristic behaviour. The first person pronouns ‘I’ and ‘me’ and the word ‘sure’ were among the strongest positive temporal correlates in Blair’s speeches. Words and phrases that correlated in the speeches of Thatcher and Blair but not in those of Major included the phrase ‘we shall’ and ‘duties’ (both negative). The keyness ratio of ‘we’ to ‘I’ was clearly higher throughout the terms of office of Thatcher and Blair that at any point in the premiership of Major, and this difference was particularly marked in the case of Blair. The findings are discussed in the context of historical evidence and ideas for enhancing the signal to noise ratio put forward.

The acute mania of King George III: Acomputational linguistic analysis
Vassiliki Rentoumi, Timothy Peters, Jonathan Conlin, Peter Garrard

Abstract

We used a computational linguistic approach, exploiting machine learning techniques, to examine the letters written by King George III during mentally healthy and apparently mentally ill periods of his life. The aims of the study were: first, to establish the existence of alterations in the King’s written language at the onset of his first manic episode; and secondly to identify salient sources of variation contributing to the changes. Effects on language were sought in two control conditions (politically stressful vs. politically tranquil periods and seasonal variation). We found clear differences in the letter corpus, across a range of different features, in association with the onset of mental derangement, which were driven by a combination of linguistic and information theory features that appeared to be specific to the contrast between acute mania and mental stability. The paucity of existing data relevant to changes in written language in the presence of acute mania suggests that lexical, syntactic and stylometric descriptions of written discourse produced by a cohort of patients with a diagnosis of acute mania will be necessary to support the diagnosis independently and to look for other periods of mental illness of the course of the King’s life, and in other historically significant figures with similarly large archives of handwritten documents.

Join us in Revolutionizing healthcare
Let’s Partner!