Results 2018
Results of the 2018 OMG-Empathy Prediction Challenge
Date | Team | Submission | Repository | Paper | Personalized Track | Generalized Track | Modality |
---|---|---|---|---|---|---|---|
12.2018 | Alpha-City | Manual
-
|
Link | Link | 0.17 | 0.17 | Audio+Images+Text |
12.2018 | Alpha-City | Filters
-
|
Link | Link | 0.12 | 0.12 | Audio+Images+Text |
12.2018 | Alpha-City | KNN
-
|
Link | Link | 0.03 | 0.03 | Audio+Images+Text |
12.2018 | EIHW | Submission1
Generalized model trained with multimodal data using a BLSTM with 40 cell units.
|
Link | Link | - | 0.05 | Audio+Images |
12.2018 | EIHW | Filters
Generalized model trained with multimodal data using a BLSTM with 50 cell units.
|
Link | Link | - | 0.06 | Audio+Images |
12.2018 | EIHW | KNN
Personalized models trained with multimodal data using a BLSTM with 50 cell units.
|
Link | Link | - | 0.11 | Audio+Images |
12.2018 | USTC-AC | Result1
The first result is predicted by our model trained on stories 2,8.
|
Link | Link | 0.14 | 0.14 | Audio+Images+Time |
12.2018 | USTC-AC | Result2
The second and third results are predicted by our model trained on stories 1,2,8
|
Link | Link | 0.11 | 0.11 | Audio+Images+Time |
12.2018 | USTC-AC | Result3
The second and third results are predicted by our model trained on story 1,2,8
|
Link | Link | 0.13 | 0.13 | Audio+Images+Time |
12.2018 | A*STAR AI | G1_Predictions_AT
"# G1: Audio+Text multimodal LSTM with local attention Submission G1 is a multimodal LSTM model using audio and text modalities, with local attention applied to the past 3 seconds. Audio features were extracted with OpenSMILE, and text features were GloVe word embeddings averaged over 1 second chunks. It was our best performing model when Story 1 was used as the validation set, achieving a CCC value of 0.29.".
|
Link | Link | - | 0.14 | Audio+Images+Text |
12.2018 | A*STAR AI | G2_Predictions_T1
"# G2: Text LSTM
Submission G2 is a text-only LSTM model. This was our best-performing model on average when we did leave-one-out cross-validation, with an averaged CCC value of 0.133. We chose to use cross-validation to see which combination of features may be most robust when predicting different stories. The predictions we submit are from a text-only model trained only on the train set and achieved a CCC value of 0.183 on the validation set."
|
Link | Link | - | 0.11 | Audio+Images+Time |
12.2018 | A*STAR AI | G3_Predictions_ATV
"# G3: Audio+Text+Visual multimodal LSTM with local attention
Submission G3 is a multimodal LSTM model using audio, text and visual modalities, with local attention applied to the past 3 seconds. Audio features were extracted with OpenSMILE, text features were GloVe word embeddings averaged over 1 second chunks, and visual features were VGG facial features extracted for each subject (but not the actor). Out of our multimodal models, it had the best cross-validated CCC score of 0.109. On the original validation set (Story 1), it had a CCC score of 0.228."
|
Link | Link | - | 0.07 | Audio+Images+Text |
12.2018 | A*STAR AI | P1_Predictions_AT
"# P1: Audio+Text multimodal LSTM, finetuned for each subject
Submission P1 contains predictions from a set of audio+text multimodal LSTM models which were fine-tuned from model G1. For each subject, we finetuned model G1 by training for 250 epochs on the videos for that subject, using early-stopping to select the epoch with the highest performing CCC score. Averaged across subjects, the personalized CCC score on the validation set (Story 1) is 0.323.".
|
Link | Link | 0.14 | - | Audio+Images+Text |
12.2018 | A*STAR AI | P2_Predictions_T1
"# P2: Text LSTM, finetuned for each subject
Submission P2 contains predictions from a set of text LSTM models which were fine-tuned from model G2. For each subject, we finetuned model G2 by training for 200 epochs on the videos for that subject. Averaged across subjects, the personalized CCC score on the validation set (Story 1) is 0.211."
|
Link | Link | 0.07 | - | Audio+Images+Text |
12.2018 | A*STAR AI | P3_Predictions_ATV
"# G3: Audio+Text+Visual multimodal LSTM with local attention
Submission P3 contains predictions from a set of audio+text+visual LSTM models which were fine-tuned from model G3. For each subject, we finetuned model G3 by training for 250 epochs on the videos for that subject, using early-stopping to select the epoch with the highest performing CCC score. Averaged across subjects, the personalized CCC score on the validation set (Story 1) is 0.284."
|
Link | Link | 0.07 | - | Audio+Images+Text |
12.2018 | USF Affective Vision | Submission1
-
|
Link | Link | 0.00 | 0.00 | - |
12.2018 | Affective Bulls | CNN_RF_Fusion
-
|
Link | Link | - | 0.03 | Audio+Image |
12.2018 | Affective Bulls | RF_Land_Sub_Act
-
|
Link | Link | - | 0.04 | Audio+Image |
12.2018 | Affective Bulls | SubjectActorImages
-
|
Link | Link | - | -0.03 | Audio+Image |
12.2018 | Affective Bulls | SubjectImages
-
|
Link | Link | 0.02 | - | Audio+Image |
12.2018 | Baseline | Baseline
Barros, P., Barakova, E., & Wermter, S. (2018). A Deep Neural Model Of Emotion Appraisal. arXiv preprint arXiv:1808.00252. --> trained on the OMG-Emotion Recognition dataset.
|
Link | Link | 0.06 | 0.06 | Audio+Image |
12.2018 | Rosie | SVM
For submission 1, valence values are predicted mostly by SVMs that are trained on two features (one visual one semantic).
|
Link | Link | 0.08 | 0.08 | Audio+Image+Semantic |
12.2018 | Rosie | NeuralNet
For submission 2, we used neural networks to predict the valence values, and the neural network is trained on five features (verbal and non-verbal features).
|
Link | Link | 0.07 | 0.07 | Audio+Image+Semantic |