This post represents some thoughts on the use of psychophysiology to evaluate the player experience during a computer game. As such, it’s tangential to the main business of this blog, but it’s a topic that I think is worth some discussion and debate, as it raises a whole bunch of pertinent issues for the design of physiological computer games.
Psychophysiological methods are combined with computer games in two types of context: applied psychology research and game evaluation in a commercial context. With respect to the former, a researcher may use a computer game as a platform to study a psychological concept, such as effects of game play on aggression or how playing against a friend or a stranger influences the experience of the player (see this recent issue of Entertainment Computing for examples). In both cases, we’re dealing with the application of an experimental psychology methodology to an issue where the game is used as a task or virtual world within which to study behaviour. The computer game merely represents an environment or context in which to study human behaviour. This approach is characterised by several features: (1) comparisons are made between carefully controlled conditions, (2) statistical power is important (if you want to see your work published) so large numbers of participants are run through the design, (3) selection of participants is carefully controlled (equal number of males and females, comparative age ranges if groups are compared) and (4) counterbalanced designs, i.e. if participants play 2 different games, half of them play game 1 then game 2 whilst the other half play game 2 and then game 1; this is important because the order in which games are presented often influences the response of the participants.
It has been said that every cloud has a silver lining and the only positive from chronic jet lag (Kiel and I arrived in Vancouver yesterday for the CHI workshop) is that it does give you a chance to catch up with overdue tasks. This is a post I’d been meaning to write for several weeks about my involvement in the REFLECT project.
For the last three years, our group at LJMU have been working on a collaborative project called REFLECT funded by the EU Commission under the Future and Emerging Technology Initiative. This project was centred around the concept of “reflective software” that responds implicitly to changes in user needs and in real-time. A variety of physiological sensors are applied to the user in order to inform this kind of reflective adaptation. So far, this is regular fare for anyone who’s read this blog before, being a standard set-up for a biocybernetic adaptation system.
First of all, an apology – Kiel and I try to keep this blog ticking over, but for most of 2011, we’ve been preoccupied with a couple of large projects and getting things organised for the CHI workshop in May. One of the “things” that led to this hiatus on the blog is a new research project funded by the EU called ARtSENSE, which is the topic of this post.
This is a short post to inform regular readers that I’ve made some changes to the FAQ document for the site (link to the left). Normally people alter the FAQ because the types of popular questions have changed. In our case, it is my answers to those questions that have changed in the time since I wrote my original responses – hence the need to revise the FAQ.
The original document firmly identified physiological computing with affective computing/biocybernetic adaptation. There was even a question making a firm division between BCI technology and physiological computing. In the revised FAQ, I’ve dumped this distinction and attempted to view BCI as part of a broad continuum of computing devices that rely on real-time physiological data for input. This change has not been made to arrogantly subsume BCI within the physiological computing spectrum, but to reconcile perspectives from different research communities working on common measures and technologies across different application domains. In my opinion, the distinction between research topics and application domains (including my own) are largely artificial and the advancement of this technology is best served by keeping an open mind about mash-ups and hybrid systems.
I’ve also expanded the list of indicative references to include contributions from BCI, telemedicine and adaptive automation in order to highlight the breadth of applications that are united by physiological data input.
The FAQ is written to support the naive reader, who may have stumbled across our site, but as ever, I welcome any comments or additional questions from domain experts.
The Emotiv system is a EEG headset designed for the development of brain-computer interfaces. It uses 12 dry electrodes (i.e. no gel necessary), communicates wirelessly with a PC and comes with a range of development software to create applications and interfaces. If you watch this 10min video from TEDGlobal, you get a good overview of how the system works.
First of all, I haven’t had any hands-on experience with the Emotiv headset and these observations are based upon what I’ve seen and read online. But the talk at TED prompted a number of technical questions that I’ve been unable to satisfy in absence of working directly with the system.
I recently read a paper by Rosalind Picard entitled “emotion research for the people, by the people.” In this article, Prof. Picard has some fun contrasting engineering and psychological perspectives on the measurement of emotion. Perhaps I’m being defensive but she seemed to have more fun poking fun at the psychologists than the engineers, but the central impasse that she identified goes something like this: engineers develop sensor apparatus that can deliver a whole range of objective data whilst psychologists have decades of experience with theoretical concepts related to emotion, so why haven’t people really benefited from their union through the field of affective computing. Prof. Picard correctly identifies a reluctance on the part of the psychologists to define concepts with sufficient precision to aid the work of the engineers. What I felt was glossed over in the paper was the other side of the problem, namely the willingness of engineers to attach emotional labels to almost any piece of psychophysiological data, usually in the context of badly-designed experiments (apologies to any engineers reading this, but I wanted to add a little balance to the debate).
I just watched a TEDMED talk about the iBrain device via this link on the excellent Medgadget resource. The iBrain is a single-channel EEG recording collected via ‘dry’ electrodes where the data is stored in a conventional handheld device such as a cellphone. In my opinion, the clever part of this technology is the application of mathematics to wring detailed information out of a limited data set – it’s a very efficient strategy.
The hardware looks to be fairly standard – a wireless EEG link to a mobile device. But its simplicity provides an indication of where this kind of physiological computing application could be going in the future – mobile monitoring for early detection of medical problems piggy-backing onto conventional technology. If physiological computing applications become widespread, this kind of proactive medical monitoring could become standard. And the main barrier to that is non-intrusive, non-medicalised sensor development.
In the meantime, Neurovigil, the company behind the product, recently announced a partnership with Swiss pharmaceutical giants Roche who want to apply this technology to clinical drug trials. I guess the methodology focuses the drug companies to consider covert changes in physiology as a sensitive marker of drug efficacy or side-effects.
I like the simplicity of the iBrain (1 channel of EEG) but speaker make some big claims for their analysis, the implicit ones deal with the potential of EEG to identify neuropathologies. That may be possible but I’m sceptical about whether 1 channel is sufficient. The company have obviously applied their pared-down analysis to sleep stages with some success but I was left wondering what added value the device provided compared to less-intrusive movement sensors used to analyse sleep behaviour, e.g. the Actiwatch
Like a lot of people, I came to the area of physiological computing via affective computing. The early work I read placed enormous emphasis on how systems may distinguish different categories of emotion, e.g. frustration vs. happiness. This is important for some applications, but most of all I was interested in user states that related to task performance, specifically those states that might precede and predict a breakdown of performance. The latter can take several forms, the quality of performance can collapse because the task is too complex to figure out or you’re too tired or too drunk etc. What really interested me was how performance collapsed when people simply gave up or ‘exhibited insufficient motivation’ as the psychological textbooks would say.
People can give up for all kinds of reasons – they may be insufficiently challenged (i.e. bored), they may be frustrated because the task is too hard, they may simply have something better to do. The prediction of motivation or task engagement seems very important to me for biocybernetic adaptation applications, such as games and educational software. Several psychology research groups have looked at this issue by studying psychophysiological changes accompanying changes in motivation and responses to increased task demand. A group led by Alan Gevins performed a number of studies where they incrementally ramped up task demand; they found that theta activity in the EEG increased in line with task demands. They noted this increase was specific to the frontal-central area of the brain.
We partially replicated one of Gevins’ studies last year and found support for changes in frontal theta. We tried to make the task very difficult so people would give up but were not completely successful (when you pay people to come to your lab, they tend to try really hard). So we did a second study, this time making the ‘impossible’ version of the task really impossible. The idea was to expose people to low, high and extremely high levels of memory load. In order to make the task impossible, we also demanded participants hit a minimum level of performance, which was modest for the low demand condition and insanely high for the extremely high demand task. We also had our participants do each task on two occasions; once with the chance to win cash incentives and once without.
The results for the frontal theta are shown in the graphic below. You can clearly see the frontal-central location of the activity (nb: the more red the area, the more theta activity was present). What’s particularly interesting and especially clear in the incentive condition (top row of graphic) is that our participants reduced theta activity when they thought they didn’t have a chance. As one might suspect, task engagement includes a strong component of volition and brain activity should reflect the decision to give up and disengage from the task. We’ll be following up this work to investigate how we might use the ebb and flow of frontal theta to capture and integrate task engagement into a real-time system.
I attended a workshop earlier this year entitled aBCI (affective Brain Computer Interfaces) as part of the ACII conference in Amsterdam. In the evening we discussed what we should call this area of research on systems that use real-time psychophysiology as an input to a computing system. I’ve always called it ‘Physiological Computing’ but some thought this label was too vague and generic (which is a fair criticism). Others were in favour of something that involved BCI in the title – such as Thorsten Zander‘s definitions of passive vs. active BCI.
As the debate went on, it seemed that we were discussing was an exercise in ‘branding’ as opposed to literal definition. There’s nothing wrong with that, it’s important that nascent areas of investigation represent themselves in a way that is attractive to potential sponsors. However, I have three main objections to the BCI label as an umbrella term for this research: (1) BCI research is identified with EEG measures, (2) BCI remains a highly specialised domain with the vast majority of research conducted on clinical groups and (3) BCI is associated with the use of psychophysiology as a substitute for input control devices. In other words, BCI isn’t sufficiently generic to cope with: autonomic measures, real-time adaptation, muscle interfaces, health monitoring etc.
My favoured term is vague and generic, but it is very inclusive. In my opinion, the primary obstacle facing the development of these systems is the fractured nature of the research area. Research on these systems is multidisciplinary, involving computer science, psychology and engineering. A number of different system concepts are out there, such as BCI vs. concepts from affective computing. Some are intended to function as alternative forms of input control, others are designed to detect discrete psychological states. Others use autonomic variables as opposed to EEG measures, some try to combine psychophysiology with overt changes in behaviour. This diversity makes the area fun to work in but also makes it difficult to pin down. At this early stage, there’s an awful lot going on and I think we need a generic label to both fully exploit synergies, and most importantly, to make sure nothing gets ruled out.
Just read a very interesting and provocative paper entitled “How emotion is made and measured” by Kirsten Boehner and colleagues. The paper provides a counter-argument to the perspective that emotion should be measured/quantified/objectified in HCI and used as part of an input to an affective computing system or evaluation methodology. Instead they propose that emotion is a dynamic interaction that is socially constructed and culturally mediated. In other words, the experience of anger is not a score of 7 on a 10-point scale that is fixed in time, but an unfolding iterative process based upon beliefs, social norms, expectations etc.
This argument seems fine in theory (to me) but difficult in practice. I get the distinct impression the authors are addressing the way emotion may be captured as part of a HCI evaluation methodology. But they go on to question the empirical approach in affective computing. In this part of the paper, they choose their examples carefully. Specifically, they focus on the category of ‘mirroring’ (see earlier post) technology wherein representations of affective states are conveyed to other humans via technology. The really interesting idea here is that emotional categories are not given by a machine intelligence (e.g. happy vs. sad vs. angry) but generated via an interactive process. For example, friends and colleagues provide the semantic categories used to classify the emotional state of the person. Or literal representations of facial expression (a web-cam shot for instance) are provided alongside a text or email to give the receiver an emotional context that can be freely interpreted. This is a very interesting approach to how an affective computing system may provide feedback to the users. Furthermore, I think once affective computing systems are widely available, the interpretive element of the software may be adapted or adjusted via an interactive process of personalisation.
So, the system provides an affective diagnosis as a first step, which is refined and developed by the person – or even by others as time goes by. Much like the way Amazon makes a series of recommendations based on your buying patterns that you can edit and tweak (if you have the time).
My big problem with this paper was that a very interesting debate was framed in terms of either/or position. So, if you use psychophysiology to index emotion, you’re disregarding the experience of the individual by using objective conceptualisations of that state. If you use self-report scales to quantify emotion, you’re rationalising an unruly process by imposing a bespoke scheme of categorisation etc. The perspective of the paper reminded me of the tiresome debate in psychology between objective/quantitative data and subjective/qualitative data about which method delivers “the truth.” I say ‘tiresome’ because I tend towards the perspectivist view that both approaches provide ‘windows’ on a phenomenon, both of which have advantages and disadvantages.
But it’s an interesting and provocative paper that gave me plenty to chew over.