Alzheimer's Dementia Recognition through Spontaneous Speech


Alzheimer's Dementia Recognition through Spontaneous Speech
The ADReSS Challenge

News:


Dementia is a category of neurodegenerative diseases that entails a long-term and usually gradual decrease of cognitive functioning. The main risk factor for dementia is age, and therefore its greatest incidence is amongst the elderly. Due to the severity of the situation worldwide, institutions and researchers are investing considerably on dementia prevention and early detection, focusing on disease progression. There is a need for cost-effective and scalable methods for detection of dementia from its most subtle forms, such as the preclinical stage of Subjective Memory Loss (SML), to more severe conditions like Mild Cognitive Impairment (MCI) and Alzheimer's Dementia (AD) itself.

While a number of studies have investigated speech and language features for the detection of Alzheimer's Disease and mild cognitive impairment, and proposed various signal processing and machine learning methods for this prediction task, the field still lacks balanced and standardised data sets on which these different approaches can be systematically compared.

The main objective of the ADReSS challenge is to make available a benchmark dataset of spontaneous speech, which is acoustically pre-processed and balanced in terms of age and gender, defining a shared task through which different approaches to AD recognition in spontaneous speech can be compared. We expect that this challenge will bring together groups working on this active area of research, and provide the community with the very first comprehensive comparison of different approaches to AD recognition using this benchmark dataset.

In sum:

  • The ADReSS Challenge will target a difficult automatic prediction problem of societal and medical relevance, namely, the detection of cognitive impairment and Alzheimer's Dementia (AD). To the best of our knowledge, this will be the first such shared-task event focused on AD.
  • While a number of researchers have proposed speech processing and natural language procesing approaches to AD recognition through speech, their studies have used different, often unbalanced and acoustically varied data sets, consequently hindering reproducibility and comparability of approaches. The ADReSS Challenge will provide a forum for those different research groups to test their existing methods (or develop novel approaches) on a new shared standardized dataset.
  • Th ADReSS Challenge dataset has been carefully selected so as to mitigate common biases often overlooked in evaluations of AD detection methods, including repeated occurrences of speech from the same participant (common in longitudinal datasets), variations in audio quality, and imbalances of gender and age distribution.
  • Unlike some tests performed in clinical settings, where short speech samples are collected under controlled conditions, this task focuses AD recognition using spontaneous speech.

How to participate

The ADReSS challenge consists of two tasks:
  1. an AD classification task, where you are required to produce a model to predict the label (AD or non-AD) for a speech session. Your model can use speech data, language data (transcipts are provided), or both.
  2. an MMSE score regression task, where you will create a model to infer the subject's Mini Mental Status Examination (MMSE) score based on speech and/or language data.
You may choose to do one of these tasks, or both. You will be provided with access to a training set (see relevant section below), and two weeks prior to the paper submission deadline you will be given access to a separate set on which to test your model. You may send your results to us for scoring up to 5 times . You are required to submit all your attempts (up to 5 per task) together, in separate files (see detailed instructions in the Readme file distributed with the test set, below).

You will also be expected to submit a paper to INTERSPEECH 2020, describing your approach and results. If your paper is accepted, it will be presented at the conference in the ADReSS special session.

Access to the data set

In order to gain access to the ADReSS data, you will need to become a member of DementiaBank (free of charge) by contacting Brian MacWhinney on this email. You should include your contact information and affiliation, as well as a general statement on how you plan to use the data, with specific mention to the ADReSS challenge. If you are a student, please ask your supervisor to join as a member as well. This membership will give you full access to the DementiaBank database, where the ADReSS data set will be available and clearly identified. For further information, visit DementiaBank.

Once you have become a member of DementiaBank, please email us at Fasih.Haider@ed.ac.uk for futher instructions.

The test data are now available! Please email ADReSS_is2020@ed.ac.uk for instructions on how to download it.

The data set

The DementiaBank directory to which you will gain access will contain only the training data for the ADReSS Challenge. This will consists of four folders of data (full enhanced audio, normalised sub-chunks, transcriptions) as well as two text files with information on age, gender and MMSE scores for participants with and without a diagnosis of AD (cc_meta_data.txt, cd_meta_data.txt). A README file is also included for further details. The composition of the full dataset is shown below:

AD non-AD
Age Interval Male Female Male Female
[50, 55) 2 0 2 0
[55, 60) 7 6 7 6
[60, 65) 4 9 4 9
[65, 70) 9 14 9 14
[70, 75) 9 11 9 11
[75, 80) 4 3 4 3
Total 35 43 35 43

Each session was segmented for voice activity using a voice activity detection system based on a signal energy threshold. We set the log energy threshold parameter to 65dB with a maximum duration of 10 seconds per speech segment. The segmented dataset contains 1,955 speech segments from 78 non-AD subjects and 2122 speech segments from 78 AD subjects. The average number of speech segments produced per participant was 24.86 (standard deviationsd= 12.84). Audio volume was normalised across all speech segments to control for variation caused by recording conditions, such as microphone placement.

Performance Metrics

Task 1 (AD classification) will be evaluated through the following metrics: \[ \displaystyle \operatorname {Accuracy} = {\frac { TN + TP }{N} } \] and \[ \displaystyle \operatorname {F_1} = { 2 \frac { \pi \times \rho }{\pi + \rho} } \] where \[ \displaystyle \operatorname {\pi} = { \frac { TP }{TP + FP} }, \] \[ \displaystyle \operatorname {\rho} = { \frac { TP }{TP + FN} }, \] N is the number of patients, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN the number of false negatives.

Task 2 (MMSE prediction) will be evaluated using the root mean squared error: \[ \displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum _{i=1}^{N}({\hat {y}}_{i}-y_{i})^{2}}{N}}}. \] where $\hat{y}$ is the predicted MMSE score amd $y$ is the patient's actual MMSE score.

Baseline Results

Stay tuned.

Important Dates

  • January 24, 2020: ADReSS Challenged announced, training data made available
  • March 15, 2020: test data made available
  • March 17, 2020 April 23, 2020: Submission of results opens (period for submision: April 23 to May 8)
  • May 8, 2020: Paper submission deadline
  • July 24, 2020: Paper acceptance/rejection notification
  • October 26-29, 2020: INTERSPEECH'2020, in Shanghai, China.
See other important dates on the INTERSPEECH 2020 website.

Paper Submission

Please format your paper following the INTERSPEECH 2020 guidelines, and submit it indicating that it is meant for the ADReSS Challenge.

Organizers

Saturnino Luz is a Reader at the Usher Institute, University of Edinburgh's Medical School. He works in medical informatics, devising and applying machine learning, signal processing and natural language processing methods in the study of behaviour and communication in healthcare contexts. His main research interest is the computational modelling of behavioural and biological changes caused by neurodegenerative diseases, with focus on the analysis of vocal and linguistic signals in Alzheimers's disease.
Fasih Haider is a Research Fellow at Usher Institute, University of Edinburgh's Medical School, UK. His areas of interest are Social Signal Processing and Artificial Intelligence. Before joining the Usher Institute, he was a Research Engineer at the ADAPT Centre where he worked on methods of Social Signal Processing for video intelligence. He holds a PhD in Computer Science from Trinity College Dublin, Ireland. Currently, he is investigating the use of social signal processing and machine learning for monitoring cognitive health.
Sofia de la Fuente graduated in Psychology (BSc Hons) at the Universidad Complutense de Madrid in 2015, and later in Methodology for Behavioural and Health Sciences (MSc Hons) by the Universidad Autonoma de Madrid in 2017. Recently, she became an Associate Fellow of the Higher Education Academy, and is currently finishing a Doctoral Training Programmein Precision Medicine at the University of Edinburgh. Her doctoral research is an exploratory study of psycholinguistics, linguistics, paralinguistics and acoustic features that may help predict dementia onset later in life.
Davida Fromm is a Special Faculty member in the Psychology Department at Carnegie Mellon University. Her research interests have focused on aphasia, dementia, and apraxia of speech in adults. For the past 12 years, she has helped to develop a large shared database of multi-media discourse samples for a variety of neurogenic communication disorders. The database includes educational resources and research tools for an increasing number of automated language analyses.
Brian MacWhinney is Teresa Heinz Professor of Psychology, Computational Linguistics,and Modern Languages at Carnegie Mellon University. He received his Ph.D. in psycholinguistics in 1974 from the University of California at Berkeley. With Elizabeth Bates, he developed a model of first and second language processing and acquisition based on competition between item-based patterns. In 1984, he and Catherine Snow co-founded the CHILDES (Child Language Data Exchange System) Project for the computational study of child language transcript data. This system has extended to 13 additional research areas such aphasiology, second language learning, TBI, Conversation Analysis, developmental disfluency and others in the shape of the TalkBank Project. MacWhinney's recent work includes studies of online learning of second language vocabulary and grammar, situationally embedded second language learning, neural network modeling of lexical development, fMRI studies of children with focal brain lesions, and ERP studies of between-language competition. He is also exploring the role of grammatical constructions in the marking of perspective shifting, the determination of linguistic forms across contrasting time frames, and the construction of mental models in scientific reasoning. Recent edited books include The Handbook of Language Emergence (Wiley) and Competing Motivations in Grammar and Usage (Oxford).


usher institute saam Supporting Active Ageing through Multimodal coaching
EU
cmu