Research on Multi-Modal Question Answering Emotion Recognition Method Based on User Preference

Songyu Ji

Authors

Songyu Ji School of Management, Hefei University of Technology, Hefei 230009, China

Abstract

The current multimodal sentiment recognition methods fall short in handling dynamic changes in modal weights and modeling modal consistency. Specifically, when processing the MELD dataset, multiple rounds of structured processing and feature optimization were not conducted; meanwhile, the Word2Vec similarity-based sentiment lexicon expansion strategy still falls short in terms of semantic consistency and emotional accuracy. Additionally, in the original experimental setup, the model relied solely on cross-entropy loss for training, overlooking the uncertainties and inconsistencies in information fusion across modalities. Therefore, this project proposes a multimodal question-answering sentiment recognition method based on user preferences. By introducing a multimodal attention mechanism guided by sentiment preferences and sentiment prototypes in a three-dimensional sentiment representation space (Valence-Arousal-Dominance, VAD), the method enhances multimodal information fusion and modeling capabilities for modal consistency. Furthermore, an extended sentiment lexicon strategy and context-dependent modeling mechanism are designed to improve the accuracy and stability of dialogue sentiment recognition. The project conducted systematic ablation and comparative experiments on the standard multimodal dialogue sentiment dataset MELD, demonstrating that the proposed method outperforms existing representative models in accuracy, precision, and F1 score, validating its effectiveness and potential for application in multimodal question-answering sentiment recognition tasks.

Research on Multi-Modal Question Answering Emotion Recognition Method Based on User Preference

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

cover

Make a Submission