This article in New Scientist on Project Natal got me thinking about the pros and cons of monitoring overt expression via sophisticated cameras and covert expression of psychological states via psychophysiology. The great thing about the depth-sensing cameras (summarised nicely by one commentator in the article as like having a Wii attached to each foot, hand and your hand) is that: (1) it’s wireless technology, (2) interactions are naturalistic, and (3) it’s potentially robust (provided nobody else walks into the camera view). Also, because it captures overt expression of body position/posture or changes in facial expression/voice tone (the second being muted as a phase two development), it measuring those signs and signals that people are usually happy to share their fellow humans – so the feel of the interaction should be as naturalistic as a regular discourse.
So why bother monitoring psychophysiology in real time to represent the user? Let’s face it – there are big question marks over its reliability, it’s largely unproven in the field and normally involves attaching wires to the person – even if they are wearable.
But to view a face-off between the two approaches in terms of sensor technology is missing the point. The purpose of depth cameras is to give computer technology a set of eyes and ears to perceive & respond to overt visual or vocal cues from the user. Whilst psychophysiological methods have been developed to capture covert changes that remain invisible to the eye. For example, a camera system may detect a frown in response to an annoying email whereas a facial EMG recording will often detect increased activity from the corrugator or frontalis (i.e. the frown muscles) regardless of any change on the person’s face.
One approach is geared up to the detection of visible cues whereas the physiological computing approach is concerned with invisible changes in brain activity, muscle tension and autonomic activity. That last sentence makes the physiological approach sound superior, doesn’t it? But the truth is that both approaches do different things, and the question of which one is best depends largely on what kind of system you’re trying to build. For example, if I’m building an application to detect high levels of frustration in response to shoot-em-up gameplay, perhaps overt behavioural cues (facial expression, vocal changes, postural changes) will detect that extreme state. On the other hand, if my system needed to resolve low vs. medium vs. high vs. critical levels of frustration, I’d have more confidence in psychophysiological measures to provide the necessary level of fidelity.
Of course both approaches aren’t mutually exclusive and it’s easy to imagine naturalistic input control going hand-in-hand with real-time system adaptation based on psychophysiological measures.
But that’s the next step – Project Natal and similar systems will allow us to interact using naturalistic gestures, and to an extent, to construct a representation of user state based on overt behavioural cues. In hindsight, it’s logical (sort of) that we begin on this road by extending the awareness of a computer system in a way that mimics our own perceptual apparatus. If we supplement that technology by granting the system access to subtle, covert changes in physiology, who knows what technical possibilities will open up?