Results 2018

Results of the 2018 OMG-Empathy Prediction Challenge


Date Team Submission Repository Paper Personalized Track Generalized Track Modality
12.2018 Alpha-City
Manual -
Link Link 0.17 0.17 Audio+Images+Text
12.2018 Alpha-City
Filters -
Link Link 0.12 0.12 Audio+Images+Text
12.2018 Alpha-City
KNN -
Link Link 0.03 0.03 Audio+Images+Text
12.2018 EIHW
Submission1 Generalized model trained with multimodal data using a BLSTM with 40 cell units.
Link Link - 0.05 Audio+Images
12.2018 EIHW
Filters Generalized model trained with multimodal data using a BLSTM with 50 cell units.
Link Link - 0.06 Audio+Images
12.2018 EIHW
KNN Personalized models trained with multimodal data using a BLSTM with 50 cell units.
Link Link - 0.11 Audio+Images
12.2018 USTC-AC
Result1 The first result is predicted by our model trained on stories 2,8.
Link Link 0.14 0.14 Audio+Images+Time
12.2018 USTC-AC
Result2 The second and third results are predicted by our model trained on stories 1,2,8
Link Link 0.11 0.11 Audio+Images+Time
12.2018 USTC-AC
Result3 The second and third results are predicted by our model trained on story 1,2,8
Link Link 0.13 0.13 Audio+Images+Time
12.2018 A*STAR AI
G1_Predictions_AT "# G1: Audio+Text multimodal LSTM with local attention Submission G1 is a multimodal LSTM model using audio and text modalities, with local attention applied to the past 3 seconds. Audio features were extracted with OpenSMILE, and text features were GloVe word embeddings averaged over 1 second chunks. It was our best performing model when Story 1 was used as the validation set, achieving a CCC value of 0.29.".
Link Link - 0.14 Audio+Images+Text
12.2018 A*STAR AI
G2_Predictions_T1 "# G2: Text LSTM Submission G2 is a text-only LSTM model. This was our best-performing model on average when we did leave-one-out cross-validation, with an averaged CCC value of 0.133. We chose to use cross-validation to see which combination of features may be most robust when predicting different stories. The predictions we submit are from a text-only model trained only on the train set and achieved a CCC value of 0.183 on the validation set."
Link Link - 0.11 Audio+Images+Time
12.2018 A*STAR AI
G3_Predictions_ATV "# G3: Audio+Text+Visual multimodal LSTM with local attention Submission G3 is a multimodal LSTM model using audio, text and visual modalities, with local attention applied to the past 3 seconds. Audio features were extracted with OpenSMILE, text features were GloVe word embeddings averaged over 1 second chunks, and visual features were VGG facial features extracted for each subject (but not the actor). Out of our multimodal models, it had the best cross-validated CCC score of 0.109. On the original validation set (Story 1), it had a CCC score of 0.228."
Link Link - 0.07 Audio+Images+Text
12.2018 A*STAR AI
P1_Predictions_AT "# P1: Audio+Text multimodal LSTM, finetuned for each subject Submission P1 contains predictions from a set of audio+text multimodal LSTM models which were fine-tuned from model G1. For each subject, we finetuned model G1 by training for 250 epochs on the videos for that subject, using early-stopping to select the epoch with the highest performing CCC score. Averaged across subjects, the personalized CCC score on the validation set (Story 1) is 0.323.".
Link Link 0.14 - Audio+Images+Text
12.2018 A*STAR AI
P2_Predictions_T1 "# P2: Text LSTM, finetuned for each subject Submission P2 contains predictions from a set of text LSTM models which were fine-tuned from model G2. For each subject, we finetuned model G2 by training for 200 epochs on the videos for that subject. Averaged across subjects, the personalized CCC score on the validation set (Story 1) is 0.211."
Link Link 0.07 - Audio+Images+Text
12.2018 A*STAR AI
P3_Predictions_ATV "# G3: Audio+Text+Visual multimodal LSTM with local attention Submission P3 contains predictions from a set of audio+text+visual LSTM models which were fine-tuned from model G3. For each subject, we finetuned model G3 by training for 250 epochs on the videos for that subject, using early-stopping to select the epoch with the highest performing CCC score. Averaged across subjects, the personalized CCC score on the validation set (Story 1) is 0.284."
Link Link 0.07 - Audio+Images+Text
12.2018 USF Affective Vision
Submission1 -
Link Link 0.00 0.00 -
12.2018 Affective Bulls
CNN_RF_Fusion -
Link Link - 0.03 Audio+Image
12.2018 Affective Bulls
RF_Land_Sub_Act -
Link Link - 0.04 Audio+Image
12.2018 Affective Bulls
SubjectActorImages -
Link Link - -0.03 Audio+Image
12.2018 Affective Bulls
SubjectImages -
Link Link 0.02 - Audio+Image
12.2018 Baseline
Baseline Barros, P., Barakova, E., & Wermter, S. (2018). A Deep Neural Model Of Emotion Appraisal. arXiv preprint arXiv:1808.00252. --> trained on the OMG-Emotion Recognition dataset.
Link Link 0.06 0.06 Audio+Image
12.2018 Rosie
SVM For submission 1, valence values are predicted mostly by SVMs that are trained on two features (one visual one semantic).
Link Link 0.08 0.08 Audio+Image+Semantic
12.2018 Rosie
NeuralNet For submission 2, we used neural networks to predict the valence values, and the neural network is trained on five features (verbal and non-verbal features).
Link Link 0.07 0.07 Audio+Image+Semantic