The phrase “smart technology” has been around for a long time. We have smart phones and smart televisions with functional capability that is massively enhanced by internet connectivity. We also talk about smart homes that scale up into smart cities. This hybrid between technology and the built environment promotes connectivity but with an additional twist – smart spaces monitor activity within their confines for the purposes of intelligent adaptation: to switch off lighting and heating if a space is uninhabited, to direct music from room to room as the inhabitant wanders through the house.
If smart technology is equated with enhanced connectivity and functionality, do those things translate into an increase of machine intelligence? In his 2007 book ‘The Design Of Future Things‘, Donald Norman defined the ‘smartness’ of technology with respect to the way in which it interacted with the human user. Inspired by J.C.R. Licklider’s (1960) definition of man-computer symbiosis, he claimed that smart technology was characterised by a harmonious partnership between person and machine. Hence, the ‘smartness’ of technology is defined by the way in which it responds to the user and vice versa.
One prerequisite for a relationship between person and machine that is cooperative and compatible is to enhance the capacity of technology to monitor user behaviour. Like any good butler, the machine needs to increase its awareness and understanding of user behaviour and user needs. The knowledge gained via this process can subsequently be deployed to create intelligent forms of software adaptation, i.e. machine-initiated responses that are both timely and intuitive from a human perspective. This upgraded form of human-computer interaction is attractive to technology providers and their customers, but is it realistic and achievable and what practical obstacles must be overcome?
The obvious strategy for a budding machine intelligence is to take a longitudinal approach. In other words, monitor the behaviour of the individual over a period of time and use patterns derived from those data to predict future behaviour. This determinism underpins the recommendations that you receive from Amazon or the iTunes store. Alternatively, behaviour can be predicted via profiling. In this case, the individual is categorised with respect to demographics (age, gender, socio-economic status) associated with a large population – and their choices are predicted by generalising from the behaviour of a like-minded group. Both approaches rely upon ex-ante models where behaviour can be forecast in advance based upon known properties of the individual, which (and this is the important part) are assumed to be static.
Unfortunately (or not), in the real-world, there are a multitude of dynamic, probabilistic variables capable of blowing huge holes in those deterministic models of user behaviour. People can deviate from established patterns of behaviour because the room happened to be hot or perhaps they are hungry or maybe they just got an email that made them really angry. Models of human behaviour and their associated predictions are inherently stochastic in the sense that established patterns of behaviour can be dramatically remoulded by the specifics of a certain place at a particular moment in time.
In order to be smart, technology must draw inferences about user behaviour from the available data that accurately represent a dynamic awareness of the here and now. One starting point might be to monitor what particular task(s) the user is engaged with at present: which apps are open? Are they composing email or writing text? Are they browsing the web? The same approach could be used to gauge productivity: how many key presses per min during text composition, how long to complete a particular part of a game? But these measures can be deeply misleading, just because a person isn’t pressing any keys does not mean they aren’t thinking or have disengaged from text composition.
Moving away from productivity at the interface, behaviour can be derived from overt markers such as: posture, gestures, eye movements, facial expression and vocalisation. This category yields information about the way in which people are working at the computer: are they happy or sad? Are they talking? Are they leaning forward in their seat to focus on the task? The measurement of user behaviour via techniques derived from psychophysiology and neuroscience represents a third approach, which differs from the other two in a number of significant ways. Unlike measures of productivity and overt behaviour, these measures do not require any explicit or observable manifestation of behaviour. Applied psychophysiology and neuroscience also tap mental activity that is covert and unconscious and may occur without any awareness on the part of the individual.
This kind of “under-the-hood” monitoring may sound like a great way to enhance machine awareness of behaviour. After all, these measures seem to pick up on everything, even stuff that doesn’t break into conscious awareness, but the sensitivity of this third category of measure is also their Achilles Heel.
Psychophysiological and neuroscientific techniques are developed and derived in laboratory environments where the context of testing is both known and stable. By controlling variables in a rigorous way, we are able to make inferences about behaviour with a degree of confidence. But once these measures migrate from the lab to the real-world, measures from brain and body are subject to a massive range of unknown influences. This problem isn’t unique to psychophysiology, the same is true, albeit to a lesser extent, of overt markers, such as posture, gaze and facial expression. The user may lean forward because his chair is uncomfortable, a user may smile because she just thought about something funny that happened over the weekend.
One potential resolution to this problem is to capture data simultaneously from all three categories of measure and look for points of correspondence. The logic of this multidimensional approach is based upon convergent validity. It also goes by the label – multimodal measures of user state. There are two approaches to making inferences about behaviour from multimodal data. One is to capture data from productivity, facial expression and autonomic psychophysiology (as examples) and enter them all into a machine learning algorithm. Taking variables from different methods has the advantage that each category should make a unique and cumulative contribution to classification accuracy.
The alternative to this ‘blender’ approach is to use different multimodal measures strategically to construct a multifaceted representation of behaviour. For example, we may use keypresses and the state of the software to establish different task activities (e.g. listening to music, composing email, browsing a Twitter feed.) Each category of task activity provides context for the interpretation of dynamic data derived from overt behaviour or psychophysiological measures. A smile whilst browsing a Twitter feed signifies something different to a smile after your word processor has just crashed. An increase of heart rate during the first 30 seconds of a new computer game indicates increased motivation or workload – whereas the same spike in heart rate after 1o mins on the same challenge and 25 failed attempts is equated with frustration.
This is a hierarchical approach to monitoring user behaviour where task activity provides a context for the interpretation of dynamic data. The same approach can be applied to dynamic data originating from different categories. If a camera detects an angry expression, is this expression accompanied by a change in heart rate or blood pressure? The former is used to signify an overt expression of an emotional state, the latter to assess the cardiovascular manifestation of that emotional state. This kind of dynamic cross-referencing requires an expert system to frame interpretation, which would take time to develop, but offers the possibility of great rewards in terms of our understanding human-computer interaction.
Smart technology is a category of device with the potential to massively enhance the way in which we interact with machines. But in order to be smart, technology must be informed by a process of data analysis and interpretation that is equally intelligent.