Human Factor in Human-Computer Interaction

2023-06-29 · 29 min read

This is my Rinkou script in the 2023-06-20

The Human Factor is crucial in researching fields such as human-computer interaction and human-robot interaction. When designing user-centered three-dimensional interfaces, considering the Human Factor is of utmost importance.

The Human Factor takes into account human abilities, characteristics, and limitations, while also considering the human body, senses, and brain. Products or designs that take into consideration the Human Factor can enable people to use them more safely, efficiently, and comfortably.

To consider the Human Factor, we first need to have a clear understanding of it. Therefore, Chapter Three will provide an overview of how the Human Factor influences UI design.

Next, I will introduce the Human Factor from three aspects: Perception, Cognition, and Physical Ergonomics.

Before introducing the Human Factor, it is necessary to understand some concepts related to information processing in order to have a better understanding

First, let's confirm one thing: What is the word for "perception" in Japanese? And how about "cognition"?

Perception focuses on the process through which humans acquire and interpret external information through sensory organs. It involves the reception, transmission, filtering, and interpretation of information. Perception allows us to directly experience the real world.

On the other hand, cognition focuses on processing and interpreting the perceived information in order to establish cognition and understanding. Cognition involves steps such as attention, memory, thinking, and learning.

Let's use the example of seeing Sawabe-sensei to illustrate the information processing flow of perception and cognition. When we see Sawabe-sensei, our eyes receive the light reflected from him, and at the same time, we hear his voice. Our attentional resources are then required to focus on Sawabe-sensei. Drawing upon our experience and long-term memory, we can effectively recognize him as a person, as a male, and specifically as Sawabe-sensei. This is the process of perception.

Next, cognition comes into play. We notice that Sawabe-sensei is coming over to greet us. We temporarily store this information in our short-term memory and then make decisions and responses, such as greeting Sawabe-sensei in return.

That is the entire information processing flow when we see Sawabe-sensei.

Here, we mentioned working memory, which is essentially the same as short-term memory. It has a limited capacity and can only operate in the present moment. For example, when Sawabe-sensei approaches us, if we focus our attention on observing his gestures, we may not pay attention to what clothes he is wearing. In contrast, long-term memory has a larger capacity and is not influenced by attention.

Now, let's shift our focus to the discussion on attention because understanding attention is crucial for designers to better design user interfaces.

Attention can be divided into different forms:

Selective Attention: This refers to the ability to selectively focus on and process one stimulus while ignoring others when faced with multiple sensory stimuli. Through selective attention, we can concentrate on specific information in complex environments while disregarding irrelevant or secondary information. For example, when Sawabe-sensei approaches, we can choose to focus our attention on him or on making Rinko's presentation slides.
Focused Attention: This refers to the ability to concentrate all attention on a single task or stimulus. Focused attention plays a crucial role when we need to solve complex problems, learn new knowledge, or engage in detailed observations. In a state of focused attention, we direct most of our attention to a specific task or stimulus to better process and understand it. For example, when I choose to focus my attention on making Rinko's presentation slides, I may not notice Sawabe-sensei greeting me.
Divided Attention: This refers to the ability to allocate attention simultaneously to multiple tasks or stimuli. It involves switching and allocating attentional resources among different tasks or stimuli. Divided attention allows us to handle multiple tasks or perceive multiple stimuli simultaneously but may lead to divided attention and decreased task performance efficiency. For example, during a lecture, we have to listen to the teacher while also reading their presentation slides.

These three attentional states frequently occur in our daily lives. Selective attention helps us filter information, focused attention aids us in deep thinking and problem-solving, and divided attention allows us to handle multiple tasks or stimuli in the same time. Depending on the specific task and environment, we need to switch these attentional states.

However, due to physiological limitations, our attention is limited. For instance, when you see lots of information in front of us, we may struggle to determine where to start reading.
Continuous and rapid stimuli can also lead to uncertainty about where to direct our attention.

The decision-making stage heavily relies on behavior and skills. Behavior is influenced by many factors. For example, in China, when you see Sawabe-sensei, it is common to wave and exchange greetings. However, in Japan, you might need to bow and greet. These behavioral differences are determined by past experiences and can be influenced by emotional states. Additionally, with the formation of habits, the body's response to stimuli may decrease or even stop. For example, if I suddenly touch your body, initially you might have a strong reaction. But after multiple stimuli, you may gradually stop responding to it. This theory also forms the basis for VR therapy for phobias.

User behavior is based on skills and can be divided into the cognitive stage, associative stage, and autonomous stage. This concept is easy to understand. For example, if you are using HoloLens 2 and see something resembling a button in your field of view, but you don't know how to interact with it. If I tell you that you can directly tap in the air or pinch with two fingers, this is the cognitive stage. Then, when you see similar buttons, you will try to interact with them by tapping or pinching with two fingers, which is the associative stage. Finally, you learn that buttons in the distance can be activated by pinching with two fingers, while buttons nearby can be directly tapped in the air. This is the autonomous stage.

Next is the feedback action. There is a trade-off between speed and accuracy. If you want a quick response, you may sacrifice accuracy. Here, there is a law called Fitts's Law that describes the relationship between time and accuracy when selecting and executing actions.

Fitts's Law states that the execution time of an action is inversely proportional to the size and distance of the target. It can be mathematically expressed as:
MT = a + b * log2(2D/W)
ID = log2(2D/W) bits
IP = ID/MT bits/s
MT：Movement time
a、b：Parameters used to adjust the specific model, obtained through experiments
D：Distance from the starting position to the target
W：Width of the target
ID：Index of difficulty, representing the task difficulty
IP：Index of performance, describing the efficiency of action execution

Additionally, there is a law called Steering Law. For example, in a virtual reality game, you need to navigate through a tunnel without touching the walls. This law is described as follows:

T=a+b \int_{c} \frac{d s}{W(s)}

C is a parameterized path, and W(s) represents the tunnel width that varies along the path. The parameters a and b are obtained through experimentation.
According to the Steering Law, when the width of the path remains constant, the length of the path affects the execution time of the action. A simple path refers to a path with a constant width. In a simple path, the execution time of the action can be represented by the following formula:
T = a + b(A/W)
Where T represents the execution time of the action, A represents the length of the path, W represents the width of the path, and a and b are parameters obtained through experimentation.

From the formula, it can be observed that as the width of the path increases, the execution time of the action decreases. This is similar to the trade-off between speed and accuracy in Fitts's Law.

Page 75

Regarding perception,
First is visual perception. In the design of 3D UI, visual perception plays a crucial role, and it is necessary to accurately convey visual scenes in order to effectively attract the user's attention. Therefore, providing some cues to the users is necessary.
First, there are monocular static cues. Objects appear larger when they are closer and smaller when they are farther away from the horizon. For example, in an experiment, the green cube remains at the same distance from your head, while the white cube maintains the same distance from the environment.
Occlusion of objects also helps in understanding the relative positions of objects. For example, the occlusion between cube 5 and cube 4. Sometimes, shadows can also provide information, such as between cube 3 and cube 4.
Linear perspective also helps in explaining the depth relationships of objects (as seen in the example of HoloLens). The sides of nearby cubes are visible, while those of distant cubes are not.
The atmospheric perspective effect shows that objects closer to us have higher saturation, while objects farther away have lower saturation. Can you determine which cube, cube 1 or cube 2, is farther away?
Oculomotor cues adjust the position of the eyeballs based on the distances of different objects.
Motion parallax refers to the perception that objects closer to us appear to move with us when we walk. You can observe this in the example of the white cube.
Binocular vision. Observe the white cube and alternate between using your left eye and right eye. The closer the object, the greater the difference between the images perceived by the left and right eyes.

Page 82

Let's experience it through auditory perception.
Auditory perception is the sensory system that comes second only to vision. The human auditory system relies primarily on interaural time differences (ITD), interaural level differences (ILD), and spectral information for sound localization. Here, I would like to explain head-related transfer functions (HRTFs).

When we listen to sounds using headphones, we often find that although they create a stereo effect between the left and right channels, we struggle to discern and locate the sound in space. This indicates that there are another things involved in our perception and localization of sound. To explain this phenomenon more comprehensively, scientists have proposed a theory known as the pinna filtering effect. As a sound source emits sound waves from a specific point in space and reaches the ears, it undergoes scattering and reflection by the listener's torso, head, and outer ears (pinna). By the time the sound waves reach the eardrums, certain frequencies are enhanced while others are decreased, and the phase of the sound waves is altered. This process of spectral transformation can be described using the concept of Head-Related Transfer Functions (HRTFs).

The method of measuring HRTFs is currently the most accurate but also very time-cost. Therefore, computational methods are commonly used to simulate HRTFs.
You may have heard of AirPods, which feature spatial audio functionality that allows for more accurate perception of sound from different directions. This is achieved through the calculation and simulation of HRTFs to mimic the frequency and phase differences of sound reaching the ears from different sources.
Now, let's take a look at a demonstration of three-dimensional audio: HoloLens.

Page 87

Somatosensory（haptic）
Somatosensory refers to the sensory system related to touch and bodily sensations.
Tactile perception is based on the sense of touch, while haptic perception involves the perception of forces or sensations of pressure. Let's imagine two balls, both weighing 2kg, one made of iron and the other wrapped in leather.
The somatosensory perception can vary in terms of spatial and temporal resolution across different parts of the body.
Pain is also considered a somatosensory sensation. Interestingly, research has shown that the perception of pain can be influenced by factors such as expectations and attention. The intensity of pain can be modulated by an individual's expectations and their level of attention to the painful stimulus.

Page 91

The chemical sensory system primarily involves taste and smell. Olfaction, or the sense of smell, has been shown to trigger emotional and memory responses. Whether consciously or unconsciously, odors can have an impact on our emotions. For example, when we encounter a particular scent, it can evoke memories or associations with past experiences, a phenomenon known as the Proustian effect.
Taste combines both olfactory and gustatory sensations. This also explains why our appetite decreases during a cold or hay fever episode, as we are unable to perceive the aroma of food through olfaction.

Page95
Methods for evaluation will be discussed in detail in Chapter 4 and Chapter 11, so I won't provide a detailed description here.

Page 97
Situation Awareness (SA) is the first concept we'll discuss under cognition. It refers to a person's ability to perceive and comprehend the surrounding environment and relevant events. This includes the perception, understanding, and assessment of people, objects, events, and circumstances in the environment. Situation awareness helps individuals develop a comprehensive understanding of their surroundings, enabling them to make accurate decisions and take appropriate actions.

Using Figure 3-10 from the book as an example, we can explore the cognitive process involved in 3D UI navigation.

A cognitive map is a mental representation of the environment and spatial information that individuals construct in their brains. It serves as a cognitive tool for organizing, storing, and manipulating information about geographic space and the environment. Cognitive maps can be personal mental representations or collective cognition shared by teams, organizations, or societies.

There are several types of spatial knowledge, including:

Landmark Knowledge: Landmark knowledge refers to the recognition of visual features in the environment. It includes visually salient objects or landmarks and other visual characteristics such as shape, size, and texture. For example, in London, landmarks like Big Ben and the London Eye are immediately recognizable to many tourists.
Procedural Knowledge or Path Knowledge: Procedural knowledge describes the sequence of actions required to navigate along specific paths or move between different locations. Only minimal visual information is needed to correctly utilize procedural knowledge. For instance, someone visiting London would quickly learn the path from their hotel to the nearest subway station.
Survey knowledge describes the structure or topological relationships of the environment, including object locations, distances between objects, and orientations. This type of knowledge is akin to a mental map and can be acquired through maps, although map-derived knowledge tends to be orientation-based. Among these three types of spatial knowledge, survey knowledge represents the highest level of quality and often requires the longest time to construct mentally.

These different types of spatial knowledge play crucial roles in our spatial cognition and navigation processes. They complement each other and assist in understanding the environment, planning routes, and navigating. By studying and understanding these different types of spatial knowledge, we can gain better insights into human spatial cognition abilities and behavioral performance.

Reference Frames and Spatial Judgments

Reference frames are the frameworks or points of reference we use to describe and understand positions, orientations, and movements in space. In spatial judgment processes, we utilize reference frames to assess and determine the positions, directions, and relationships of objects, locations, or actions.

Spatial judgments involve our ability to evaluate and infer various relationships and properties in space. It encompasses our perception and understanding of spatial attributes such as location, direction, distance, and shape.

During motion in real-life scenarios, we perceive ourselves as being at the center of space, which is known as egomotion. In such motion processes, we need to align the information from our egocentric (first-person) perspective with our cognitive map, which typically stores information from an object-centered (third-person) perspective. The egocentric reference frame is defined with respect to a part of the human body, while the object-centered reference frame is defined with respect to objects or the world. In egocentric tasks, judgments are made based on the egocentric reference frame (Figure 3.11), which includes stations (the viewpoint of the eyes), retinal centers (the fovea), head centers (focusing solely on the head), body centers (the torso), and the proprioceptive subsystem (visual and non-visual cues from body parts such as hands and legs) .

The egocentric reference frame provides us with important information such as distance (through physical feedback like step count or arm length) and direction (obtained through the orientation of the eyes, head, and torso). The positions, orientations, and movements of objects are related to the positions and orientations of the eyes, head, and body.

In object-centered tasks, the positions, orientations, and movements of objects are defined in an external coordinate system, which is based on the shape, orientation, and motion of the object itself.

The properties of object-centered reference frames are not influenced by our own orientation or position. In 3D user interfaces, multiple reference frames can be used to achieve different viewpoints. The egocentric reference frame corresponds to the first-person viewpoint, while the object-centered reference frame is associated with a third-person (bird's-eye or external) viewpoint.

For example, in many video games, users typically see a first-person (egocentric) view of the environment during navigation, but they can also access an overview map of the environment that shows their current location (object-centered). When we find a path in the environment, we build a representation based on the object-centered reference frame (survey knowledge). However, when we first enter an environment, we primarily rely on egocentric information (landmark and procedural knowledge). Therefore, we often rely on landmarks initially, then establish paths between them, and eventually generalize egocentric spatial information into object-centered survey knowledge. However, it is currently not clear how the human brain determines the relationship between egocentric and object-centered spatial knowledge.

Figure 3.11 illustrates the human reference frames (on the right) and the corresponding viewpoints (on the left).
In the top-left corner of the figure, there is an egocentric view, representing the user's perspective from within the environment. This viewpoint is centered around the user themselves, allowing for the evaluation and understanding of spatial positions and directions based on self-perception. In this view, the user feels situated at the center of the environment and can see surrounding objects and scenes.

In the bottom-left corner, there is an exocentric view, representing the user's perspective from outside the environment, looking in. This viewpoint is based on external references, such as objects or other points of reference within the environment, for evaluating and understanding spatial positions and directions. In this view, the user can see the relative positions and relationships of objects within the environment, as well as the overall layout and structure.

The human reference frames (on the right) in the figure depict the visual, motor, and body perception associated with different reference points. These include the viewpoint (eye position), retina center (retina), head center (focused on the head only), body center (trunk), and proprioceptive subsystem (visual and non-visual cues from body parts such as hands and legs). These reference points and perceptual systems play important roles in spatial judgments and behavior, helping us locate and understand the environment.

By combining viewpoints and reference frames, humans can switch between egocentric and exocentric perspectives to better comprehend and navigate space. This switching can occur in different contexts and tasks, such as using an egocentric view for observing the surrounding environment during everyday navigation, and utilizing an exocentric view when examining a map to understand the overall layout.

This description provides an overview of the visual representation of human reference frames and viewpoints, helping us understand the perceptual and judgment processes involved in human spatial cognition.

The methods for assessing mental workload can be divided into two types: subjective measures and objective measures. Subjective measures rely on user self-reports and subjective experiences for evaluation. Common methods in this category include SBSOD (Spatial Behavior Scale for Objective Determination) and NASA TLX (NASA Task Load Index). SBSOD is a self-report questionnaire used to assess users' cognitive abilities and workload in spatial environments. NASA TLX is a task load index that evaluates the level of cognitive workload based on users' subjective ratings of tasks. These methods are primarily used to measure users' spatial abilities and mental workload.

Objective measures are used to assess cognitive workload through performance indicators. One method in this category is SAGAT (Situational Awareness Global Assessment Technique), which evaluates task situation awareness by querying users' perception of the situation. Additionally, methods such as map drawing and motion estimation can be used to assess users' performance in spatial knowledge. Furthermore, human errors can also be evaluated through techniques such as task analysis and human reliability analysis.

Psychophysiological methods are based on the relationship between cognitive workload and physiological responses, using various techniques for measurement. These techniques include measuring heart rate, pupil dilation, eye movements, and electroencephalography (EEG) to measure brain activity. EEG has shown potential in evaluating cognitive workload, but obtaining good data and interpreting the results can be challenging. EEG is used to detect event-related potentials, such as the P300 component, which reflects responses to discrete events. However, EEG measurements can also be recorded in the absence of these events, which can be useful when monitoring slow-changing content for operators.

Additionally, psychological workload and human errors are closely related to performance research and human-computer interaction evaluation. When analyzing performance issues in three-dimensional user interfaces, it is beneficial to consider different dimensions of resource allocation, such as processing stages, processing codes, perceptual modalities, and visual channels. These dimensions help in better understanding resource allocation and cognitive requirements.

Physical ergonomics plays a crucial role in the design and analysis of 3D user interfaces. It focuses on the human body's musculoskeletal system, making a fundamental understanding of human anatomy and physiology essential.

The muscles, bones, and lever systems of the human body are crucial for the ability to perform specific tasks. The human body consists of approximately 600 different muscles, each composed of a mixture of fast and slow muscle fibers. Muscles function by exerting tension on their attachment points on the bones, which form the lever systems. The human body is capable of various movements, some requiring greater force while others necessitating longer ranges of motion. These movements correspond to the lever systems and the length of the muscles. Muscle contractions can be either isometric (constant length) or isotonic (constant tension), which is important to consider in the design of 3D input devices.

Human movement is generated by the interaction of joints and muscles, typically in response to stimuli. Control tasks involve the peripheral nervous system triggering effectors through electrical signals, resulting in voluntary and involuntary actions. Most human outputs can be defined as control tasks and can take the form of interactive tasks. Control tasks can be characterized by features such as accuracy, speed and frequency, degrees of freedom, direction, and duration, and are influenced by the anatomical capabilities of the human body. The characteristics of a task directly influence the choices made in mapping control onto the human body. Control tasks can be performed using various body parts, including hands, arms, eyes, the brain, and are not limited to the musculoskeletal system alone. The distribution of the sensorimotor cortex is crucial for the performance of different body parts. There are mappings between different body parts and regions in the cortex, with larger body parts generally providing more precise movements.

The hand and arm are the primary channels of control for humans, allowing us to perform a wide range of actions. The musculoskeletal structure of the hand includes the wrist, palm, and fingers, enabling various movements such as wrist sliding, angular movements of the fingers and wrist, and oppositional movements of the grasping fingers. The combination of the hand's components forms a lever system that enables control in multiple dimensions. Hand movements can be classified into power grip and precision grip. Power grip refers to gripping and manipulating objects within the palm of the hand, while precision grip allows for finer movement control. The design of handles and gripping shapes is crucial for handheld input or output devices, considering the different requirements of power grip and precision grip. Designers need to choose appropriate shapes and gripping methods to ensure device stability and user comfort.

In conclusion, physical ergonomics is a crucial factor in designing comfortable and effective systems. Understanding the human body's musculoskeletal system, types of movements, and control tasks is essential for designing and analyzing 3D user interfaces.

In the evaluation of physical ergonomics, fatigue and user comfort are two closely related and inseparable issues. The following are methods and techniques used to assess physical ergonomics:

Subjective assessment: Customized questionnaires are used to evaluate user comfort. The questionnaires should consider the type and duration of specific tasks, as well as the different postures users may adopt. It is also helpful to ask for user opinions on the physical aspects of the input devices used. Subjective assessments often combine user comfort and fatigue because they are interrelated and difficult for users to distinguish. References to Neuberger (2003) and Marras (2012) can provide insights into questionnaires designed for specific populations with physical impairments and fatigue-related muscle issues.

Performance evaluation: Although assessing performance in relation to fatigue or user comfort presents certain challenges, there are still methods available. By analyzing task performance and errors, correlations can be made with factors such as fatigue that change over time. Investigate whether task performance decreases or remains stable with fatigue. While performance changes over time can be influenced by multiple factors (such as learning effects), signs of decreased performance related to fatigue may be evident. Video observation can also be used to mark signs of users regripping the device or taking short breaks and compare them with task execution time.

Psychophysiological methods: Various physiological methods are used to assess fatigue. These methods often combine specific models that define the biomechanical principles. For example, electromyography (EMG) can measure muscle activity and evaluate muscle tension and fatigue. Although implementing EMG can be complex, it provides valuable information that cannot be reliably predicted solely based on models. Additionally, there are physical devices available for measuring specific muscle and joint groups, such as devices used to measure spinal motion. For more detailed discussions, refer to Marras (2012).

During the evaluation process, it is important to measure the specific physiological limitations of individual users, as comfort and fatigue depend on the musculoskeletal characteristics and abilities of each user. Comparing and correlating the results from different assessment methods, including subjective assessments and user observations, is crucial.