Remote monitoring of biometric data in the elderly population is an important asset for improving the quality of life and level of independence of elderly people living alone. However, the design and implementation of health technological solutions often disregard the elderly physiological and psychological abilities, leading to low adoption of these technologies. We evaluate the usability of a remote patient monitoring solution, VITASENIOR-MT, which is based on the interaction with a television set. Twenty senior participants (over 64 years) and a control group of 20 participants underwent systematic tests with the health platform and assessed its usability through several questionnaires. Elderly participants scored high on the usability of the platform, very close to the evaluation of the control group. Sensory, motor and cognitive limitations were the issues that most contributed to the difference in usability assessment between the elderly group and the control group. The solution showed high usability and acceptance regardless of age, digital literacy, education and impairments (sensory, motor and cognitive), which shows its effective viability for use and implementation as a consumer product in the senior market.
2021
SMC
How Does the Spotify API Compare to the Music Emotion Recognition State-of-the-Art?
Features are arguably the key factor to any machine learning problem. Over the decades, myriads of audio features and recently feature-learning approaches have been tested in Music Emotion Recognition (MER) with scarce improvements. Here, we shed some light on the suitability of the audio features provided by the Spotify API, the leading music streaming service, when applied to MER. To this end, 12 Spotify API features were obtained for 704 of our 900-song dataset, annotated in terms of Russell’s quadrants. These are compared to emotionally-relevant features obtained previously, using feature ranking and emotion classification experiments. We verified that energy, valence and acousticness features from Spotify are highly relevant to MER. However, the 12-feature set is unable to meet the performance of the features available in the state-of-the-art (58.5% vs. 74.7% F1-measure). Combining Spotify and state-of-the-art sets leads to small improvements with fewer features (top5: +2.3%, top10: +1.1%), while not improving the maximum results (100 features). From this we conclude that Spotify provides some higher-level emotionally-relevant features. Such extractors are desirable, since they are closer to human concepts and allow for interpretable rules to be extracted (harder with hundreds of abstract features). Still, additional emotionally-relevant features are needed to improve MER.
2020
TAFFC
Audio Features for Music Emotion Recognition: a Survey
The design of meaningful audio features is a key need to advance the state-of-the-art in Music Emotion Recognition (MER). This work presents a survey on the existing emotionally-relevant computational audio features, supported by the music psychology literature on the relations between eight musical dimensions (melody, harmony, rhythm, dynamics, tone color, expressivity, texture and form) and specific emotions. Based on this review, current gaps and needs are identified and strategies for future research on feature engineering for MER are proposed, namely ideas for computational audio features that capture elements of musical form, texture and expressivity that should be further researched. Finally, although the focus of this article is on classical feature engineering methodologies (based on handcrafted features), perspectives on deep learning-based approaches are discussed.
TAFFC
Novel Audio Features for Music Emotion Recognition
This work advances the music emotion recognition state-of-the-art by proposing novel emotionally-relevant audio features. We reviewed the existing audio features implemented in well-known frameworks and their relationships with the eight commonly defined musical concepts. This knowledge helped uncover musical concepts lacking computational extractors, to which we propose algorithms - namely related with musical texture and expressive techniques. To evaluate our work, we created a public dataset of 900 audio clips, with subjective annotations following Russell’s emotion quadrants. The existent audio features (baseline) and the proposed features (novel) were tested using 20 repetitions of 10-fold cross-validation. Adding the proposed features improved the F1-score to 76.4% (by 9%), when compared to a similar number of baseline-only features. Moreover, analysing the features relevance and results uncovered interesting relations, namely the weight of specific features and musical concepts to each emotion quadrant, and warrant promising new directions for future research in the field of music emotion recognition, interactive media, and novel music interfaces.
2019
PhD
Emotion-based Analysis and Classification of Audio Music
This research work addresses the problem of music emotion recognition using audio signals. Music emotion recognition research has been gaining ground over the last two decades. In it, the typical approach starts with a dataset, composed of music files and associated emotion ratings given by listeners. This data, typically audio signals, is first processed by computational algorithms in order to extract and summarize their characteristics, known as features (e.g., beats per minute, spectral metrics). Next, the feature set is fed to machine learning algorithms looking for patterns that connect them to the given emotional annotations. As a result, a computational model is created, which is able to infer the emotion of a new and unlabelled music file based on the previously found patterns. Although several studies have been published, two main issues remain open and are the current barrier to progress in field. First, a high-quality public and sizeable audio dataset is needed, which can be widely adopted as a standard and used by different works. Currently, the public available ones suffer from known issues such as low quality annotations or limited size. Also, we believe novel emotionally-relevant audio features are needed to overcome the plateau of the last years. Supporting this idea is the fact that the vast majority of previous works were focused on the computational classification component, typically using a similar set of audio features originally proposed to tackle other audio analysis problems (e.g., speech recognition). Our work focuses on these two problems. Proposing novel emotionally-relevant audio features requires knowledge from several fields. Thus, our work started with a review of music and emotion literature to understand how emotions can be described and classified, how music and music dimensions work and, as a final point, to merge both fields by reviewing the identified relations between musical dimensions and emotional responses. Next, we reviewed the existent audio features, relating them with one of the eight musical dimensions: melody, harmony, rhythm, dynamics, tone color, expressive techniques, musical texture and musical form. As a result, we observed that audio features are unbalanced across musical dimensions, with expressive techniques, musical texture and form said to be emotionally-relevant but lacking audio extractors. To address the abovementioned issues, we propose several audio features. These were built on previous work to estimate the main melody notes from the low-level audio signals. Next, various musically-related metrics were extracted, e.g., glissando presence, articulation information, changes in dynamics and others. To assess their relevance to emotion recognition, a dataset containing 900 audio clips, annotated in four classes (Russell’s quadrants) was built. Our experimental results show that the proposed features are emotionally-relevant and their inclusion in emotion recognition models leads to better results. Moreover, we also measured the influence of both existing and novel features, leading to a better understanding of how different musical dimensions influence specific emotion quad-rants. Such results give us insights about the open issues and help us define possible research paths to the near future.
WF-IoT
VITASENIOR-MT: A distributed and scalable cloud-based telehealth solution
Mendes, Diogo,
Panda, Renato,
Dias, Pedro,
Jorge, Dário,
António, Ricardo,
Oliveira, Luis,
and Pires, Gabriel
In IEEE 5th World Forum on Internet of Things
2019
VITASENIOR-MT is a telehealth platform that allows to remotely monitor biometric and environmental data in a domestic environment, designed specifically to the elderly population. This paper proposes a highly scalable and efficient architecture to transport, process, store and visualize the data collected by devices of an Internet of Things (IoT) scenario. The cloud infrastructure follows a microservices architecture to provide computational scalability, better fault isolation, easy integration and automatic deployment. This solution is complemented with a pre-processing and validation of the collected data at the edge of the Internet by using the Fog Computing concept, allowing a better computing distribution. The presented approach provides personal data security and a simplified way to collect and present the data to the different actors, allowing a dynamic and intuitive management of patients and equipment to caregivers. The presented load tests proved that this solution is more efficient than a monolithic approach, promoting better access and control in the data flowing from heterogeneous equipment.
2018
HealthCom
VITASENIOR-MT: a telehealth solution for the elderly focused on the interaction with TV
Remote monitoring of health parameters is a promising approach to improve the health condition and quality of life of particular groups of the population, which can also alleviate the current expenditure and demands of healthcare systems. The elderly, usually affected by chronic comorbidities, are a specific group of the population that can strongly benefit from telehealth technologies, allowing them to reach a more independent life, by living longer in their own homes. Usability of telehealth technologies and their acceptance by end-users are essential requirements for the success of telehealth implementation. Older people are resistant to new technologies or have difficulty in using them due to vision, hearing, sensory and cognition impairments. In this paper, we describe the implementation of an IoT-based telehealth solution designed specifically to address the elderly needs. The end-user interacts with a TV-set to record biometric parameters, and to receive warning and recommendations related to health and environmental sensor recordings. The familiarization of older people with the TV is expected to provide a more user-friendly interaction ensuring the effectiveness integration of the end-user in the overall telehealth solution.
TAFFC
Emotionally-Relevant Features for Classification and Regression of Music Lyrics
This research addresses the role of lyrics in the music emotion recognition process. Our approach is based on several state of the art features complemented by novel stylistic, structural and semantic features. To evaluate our approach, we created a ground truth dataset containing 180 song lyrics, according to Russell’s emotion model. We conduct four types of experiments: regression and classification by quadrant, arousal and valence categories. Comparing to the state of the art features (ngrams - baseline), adding other features, including novel features, improved the F-measure from 69.9%, 82.7% and 85.6% to 80.1%, 88.3% and 90%, respectively for the three classification experiments. To study the relation between features and emotions (quadrants) we performed experiments to identify the best features that allow to describe and discriminate each quadrant. To further validate these experiments, we built a validation set comprising 771 lyrics extracted from the AllMusic platform, having achieved 73.6% F-measure in the classification by quadrants. We also conducted experiments to identify interpretable rules that show the relation between features and emotions and the relation among features. Regarding regression, results show that, comparing to similar studies for audio, we achieve a similar performance for arousal and a much better performance for valence.
GESTEC
VITASENIOR–MT: Architecture of a Telehealth Solution
VITASENIOR-MT is a telehealth solution under development that aims to monitor and improve the healthcare of elderly people living in the region of Médio Tejo. This solution performs both remote and local monitoring of biometric parameters of the elderly, and also of environmental parameters of their homes. The biometric variables include heart rate and temperature measurements collected automatically, by means of a bracelet, throughout the day. Blood pressure, body weight, and other biometric parameters are measured on a daily basis by the senior’s own initiative, and automatically recorded. The environmental parameters include temperature, carbon monoxide and carbon dioxide measurements. A TV set is used as a mean of interaction between the user and the medical devices. The TV set is also used to receive medical warnings and recommendations according to clinical profiles, and to receive environmental alerts. All data and alerts can be accessible to senior’s family and healthcare providers. In alarm situations, an automatic operational procedure will be triggered establishing communication to predefined entities.
ISMIR
Musical Texture and Expressivity Features for Music Emotion Recognition
We present a set of novel emotionally-relevant audio fea- tures to help improving the classification of emotions in audio music. First, a review of the state-of-the-art regard- ing emotion and music was conducted, to understand how the various music concepts may influence human emo- tions. Next, well known audio frameworks were analyzed, assessing how their extractors relate with the studied mu- sical concepts. The intersection of this data showed an un- balanced representation of the eight musical concepts. Namely, most extractors are low-level and related with tone color, while musical form, musical texture and ex- pressive techniques are lacking. Based on this, we devel- oped a set of new algorithms to capture information related with musical texture and expressive techniques, the two most lacking concepts. To validate our work, a public da- taset containing 900 30-second clips, annotated in terms of Russell’s emotion quadrants was created. The inclusion of our features improved the F1-score obtained using the best 100 features by 8.6% (to 76.0%), using support vector ma- chines and 20 repetitions of 10-fold cross-validation.
2016
KDIR
Classification and Regression of Music Lyrics: Emotionally-Significant Features
This research addresses the role of lyrics in the music emotion recognition process. Our approach is based on several state of the art features complemented by novel stylistic, structural and semantic features. To evaluate our approach, we created a ground truth dataset containing 180 song lyrics, according to Russell’s emotion model. We conduct four types of experiments: regression and classification by quadrant, arousal and valence categories. Comparing to the state of the art features (ngrams-baseline), adding other features, including novel features, improved the F-measure from 68.2%, 79.6% and 84.2% to 77.1%, 86.3% and 89.2%, respectively for the three classification experiments. To study the relation between features and emotions (quadrants) we performed experiments to identify the best features that allow to describe and discriminate between arousal hemispheres and valence meridians. To further validate these experiments, we built a validation set comprising 771 lyrics extracted from the AllMusic platform, having achieved 73.6% Fmeasure in the classification by quadrants. Regarding regression, results show that, comparing to similar studies for audio, we achieve a similar performance for arousal and a much better performance for valence.
ECML/PKDD
Bi-modal music emotion recognition: Novel lyrical features and dataset
In 9th International Workshop on Music and Machine Learning – MML 2016 – in conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases – ECML/PKDD 2016
2016
This research addresses the role of audio and lyrics in the music emo- tion recognition. Each dimension (e.g., audio) was separately studied, as well as in a context of bimodal analysis. We perform classification by quadrant catego- ries (4 classes). Our approach is based on several audio and lyrics state-of-the-art features, as well as novel lyric features. To evaluate our approach we create a ground-truth dataset. The main conclusions show that unlike most of the similar works, lyrics performed better than audio. This suggests the importance of the new proposed lyric features and that bimodal analysis is always better than each dimension.
2015
AAI
Music Emotion Recognition with Standard and Melodic Audio Features
We propose a novel approach to music emotion recognition by combining standard and melodic features extracted directly from audio. To this end, a new audio dataset organized similarly to the one used in MIREX mood task comparison was created. From the data, 253 standard and 98 melodic features are extracted and used with several supervised learning techniques. Results show that, generally, melodic features perform better than standard audio. The best result, 64% f-measure, with only 11 features (9 melodic and 2 standard), was obtained with ReliefF feature selection and Support Vector Machines.
2013
CMMR
Dimensional music emotion recognition: Combining standard and melodic audio features
We propose an approach to the dimensional music emotion recognition (MER) problem, combining both standard and melodic audio features. The dataset proposed by Yang is used, which consists of 189 audio clips. From the audio data, 458 standard features and 98 melodic features were extracted. We experimented with several supervised learning and feature selection strategies to evaluate the proposed approach. Employing only standard audio features, the best attained performance was 63.2% and 35.2% for arousal and valence prediction, respectively (R2 statistics). Combining standard audio with melodic features, results improved to 67.4 and 40.6%, for arousal and valence, respectively. To the best of our knowledge, these are the best results attained so far with this dataset.
ECML/PKDD
Music Emotion Recognition: The Importance of Melodic Features
In 6th International Workshop on Music and Machine Learning – MML 2013 – in conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases – ECML/PKDD 2013
2013
We study the importance of a melodic audio (MA) feature set in music emotion recognition (MER) and compare its performance to an approach using only standard audio (SA) features. We also analyse the fusion of both types of features. Employing only SA features, the best attained performance was 46.3%, while using only MA features the best outcome was 59.1% (F- measure). A combination of SA and MA features improved results to 64%. These results might have an important impact to help break the so-called glass ceiling in MER, as most current approaches are based on SA features.
ECML/PKDD
Music Emotion Recognition from Lyrics: A Comparative Study
In 6th International Workshop on Music and Machine Learning – MML 2013 – in conjunction with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases – ECML/PKDD 2013
2013
We present a study on music emotion recognition from lyrics. We start from a dataset of 764 samples (audio+lyrics) and perform feature extraction using several natural language processing techniques. Our goal is to build classifiers for the different datasets, comparing different algorithms and using feature selection. The best results (44.2% F-measure) were attained with SVMs. We also perform a bi-modal analysis that combines the best feature sets of audio and lyrics.The combination of the best audio and lyrics features achieved better results than the best feature set from audio only (63.9% F- Measure against 62.4% F-Measure).
CMMR
Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis
We propose a multi-modal approach to the music emotion recognition (MER) problem, combining information from distinct sources, namely audio, MIDI and lyrics. We introduce a methodology for the automatic creation of a multi-modal music emotion dataset resorting to the AllMusic database, based on the emotion tags used in the MIREX Mood Classification Task. Then, MIDI files and lyrics corresponding to a sub-set of the obtained audio samples were gathered. The dataset was organized into the same 5 emotion clusters defined in MIREX. From the audio data, 177 standard features and 98 melodic features were extracted. As for MIDI, 320 features were collected. Finally, 26 lyrical features were extracted. We experimented with several supervised learning and feature selection strategies to evaluate the proposed multi-modal approach. Employing only standard audio features, the best attained performance was 44.3% (F-measure). With the multi-modal approach, results improved to 61.1%, using only 19 multi-modal features. Melodic audio features were particularly important to this improvement.
In 8th Music Information Retrieval Exchange – MIREX 2012, as part of the 13th International Society for Music Information Retrieval Conference – ISMIR 2012
2012
In this work, three audio frameworks – Marsyas, MIR Toolbox and PsySound3, were used to extract audio fea-tures from the audio samples. These features are then used to train several classification models, resulting in the different versions submitted to MIREX 2012 mood classi-fication task.
DAFx
Music Emotion Classification: Dataset Acquisition and Comparative Analysis
In this paper we present an approach to emotion classification in audio music. The process is conducted with a dataset of 903 clips and mood labels, collected from Allmusic database, organized in five clusters similar to the dataset used in the MIREX Mood Classification Task. Three different audio frameworks - Marsyas, MIR Toolbox and Psysound, were used to extract several features. These audio features and annotations are used with supervised learning techniques to train and test various classifiers based on support vector machines. To access the importance of each feature several different combinations of features, obtained with feature selection algorithms or manually selected were tested. The performance of the solution was measured with 20 repetitions of 10-fold cross validation, achieving a F-measure of 47.2% with precision of 46.8% and recall of 47.6%.
MML/ICML
Music Emotion Classification: Analysis of a Classifier Ensemble Approach
In 5th International Workshop on Music and Machine Learning – MML 2012 – in conjunction with the 19th International Conference on Machine Learning – ICML 2012
2012
We propose a five regression models’ system to classify music emotion. To this end, a dataset similar to MIREX contest dataset was used. Songs from each cluster are separated in five sets and labeled as 1. A similar number of songs from other clusters are then added to each set and labeled 0, training regression models to output a value representing how much a song is related to the specific cluster. The five outputs are combined and the highest score used as classification. An F-measure of 68.9% was obtained. Results were validated with 10-fold cross-validation and feature selection was tested.
2011
AES
Using Support Vector Machines for Automatic Mood Tracking in Audio Music
In this paper we propose a solution for automatic mood tracking in audio music, based on supervised learning and classification. To this end, various music clips with a duration of 25 seconds, previously annotated with arousal and valence (AV) values, were used to train several models. These models were used to predict quadrants of the Thayer’s taxonomy and AV values, of small segments from full songs, revealing the mood changes over time. The system accuracy was measured by calculating the matching ratio between predicted results and full song annotations performed by volunteers. Different combinations of audio features, frameworks and other parameters were tested, resulting in an accuracy of 56.3% and showing there is still much room for improvement.
SMC
Automatic creation of mood playlists in the thayer plane: A methodology and a comparative study
We propose an approach for the automatic creation of mood playlists in the Thayer plane (TP). Music emotion recognition is tackled as a regression and classification problem, aiming to predict the arousal and valence (AV) values of each song in the TP, based on Yang’s dataset. To this end, a high number of audio features are extracted using three frameworks: PsySound, MIR Toolbox and Marsyas. The extracted features and Yang’s annotated AV values are used to train several Support Vector Regressors, each employing different feature sets. The best performance, in terms of R2statistics, was attained after feature selection, reaching 63% for arousal and 35.6% for valence. Based on the predicted location of each song in the TP, mood playlists can be created by specifying a point in the plane, from which the closest songs are retrieved. Using one seed song, the accuracy of the created playlists was 62.3% for 20-song playlists, 24.8% for 5-song playlists and 6.2% for the top song. \textcopyright 2011 Panda and Paiva.
INForum
MOODetector: A Prototype Software Tool for Mood-based Playlist Generation
We propose a prototype software tool for the automatic generation of mood-based playlists. The tool works as typical music player, extended with mechanisms for automatic estimation of arousal and valence values in the Thayer plane (TP). Playlists are generated based on one seed song or a desired mood trajectory path drawn by the user, according to the distance to the seed(s) in the TP. Besides playlist generation, a mood tracking visualization tool is also implemented, where individual songs are segmented and classified according to the quadrants in the TP. Additionally, the methodology for music emotion recognition, tackled in this paper as a regression and classification problem, is described, along with the process for feature extraction and selection. Experimental results for mood regression are slightly higher than the state of the art, indicating the viability of the followed strategy (in terms of R2 statistics, arousal and valence estimation accuracy reached 63% and 35.6%, respectively).
In this thesis, music emotion recognition is approached, having detection of mood changes over a song as the main focus. The first semester served entirely to gain knowledge on the area and a to produce an initial design. This strong theoretical knowledge was further developed during the second semester, serving as a starting point to build a music analysis Qt application, based on the Thayer’s model of mood, using on Marsyas for feature extraction and libSVM library for classification and regression. Several experimental tests were run to study different approaches and the respective results.