CINACS Lecture Series: Cross-modal Interaction in Natural and Artificial Cognitive Systems
Winter Semester 2010/2011
||Christopher Habel, Wolfgang Menzel, Stefan Wermter, Jianwei Zhang
||Mo. 14:15-15:45h, weekly
Natural cognitive systems - like humans - profit from combining the input of the different sensory systems not only because each modality provides information about different aspects of the world but also because the different senses can jointly encode particular aspects of events, e.g. the location or meaning of an event. However, the gains of cross-modal integration come at a cost: since each modality uses very specific representations, information needs to be transferred into a code that allows the different senses to interact. Corresponding problems arise in human communication when information about one topic is expressed using combinations of different formats such as written or spoken language and graphics.
In this lecture, we will focus on models and methods suitable to realize processes and representations for cross-modal interactions in artificial cognitive systems, i.e. computational systems. After introducing the core phenomena of cross-modal interaction we exemplify the mono- modal basis of cross-modal interaction and the current development of informatics-oriented research in this field with four topics:
- Cross modal information fusion for a range of non-sensory, i.e. categorial data in the area of speech and language processing, where visual stimuli have to be merged with the available acoustic evidence. Among the language-related information sources certainly lip reading provides one of the major contributions of additional evidence, but more recently eyebrow movement and its relationship to suprasegmental features of human speech has attracted considerable attention as well.
- The interaction of representational modalities - as language and maps - in the interdependence to sensory modalities, in particular to vision, auditory perception and haptics. The computational analysis of multi-modal documents or dialogues is a prerequisite for advanced intelligent information systems as well as for human-computer interaction, in particular human-robot interaction. Furthermore, such computational devices can be used in systems giving assistance to impaired, e.g. blind or visual impaired, or deaf people.
- Multimodal memory plays an important role for the next generation of mobile robots and service robots. Using grounded memories of robot actions - use real-world visual, audio and tactile data collected by the robot - instead of solely a sensorimotor controller, the robot's memory can be enriched and thus robustness of both representations and retrieval process of autonomous agents will increase.
- Neural architectures for multiple modalities. The brain plays the central role in all animal or human behavior. The integration
of various sense information with cognitive processing in neural architectures is therefore particularly relevant. Examples of
computational neural architectures are described, from spiking neural networks to supervised and selforganizing artifical neural networks based on midbrain and cortical brain areas. The focus will be on auditory and visual modalities illustrated by some examples of robotic behaviour.