.

From Research To The Real-world: A Looming Example

Marc Green


It takes knowledge to understand the limitations of knowledge.

Investigators often must apply research data to analyze real-world events. A good example is collision analysis, where four absolute values are often used: contrast threshold, looming threshold, perception-response time (PRT) and visibility distance, most commonly of pedestrians. The values can only originate from research studies. However, the research-world and the real-world are very different places. Research data cannot be used directly in the analysis of a specific real-world event. At the very least, there must be:

1. A research study conducted under conditions reasonably similar to event being investigated. Since research studies have many limitations (see below), there are often no such studies;

2. A "field factor" to adjust for the advantages that research subjects have over actual drivers. Very few well-documented field factors exist. Even those are at best ballpark estimates;

3. A measure of variability to assess the range of normal behavior. Behavior is variable within and across people performing every task. Simple means and medians are not enough to define normal behavior. As explained below, research studies are designed to minimize variability, so they drastically underestimate real-world variability. There must be a field factor for variability, but none exists.

Lastly, there are also qualitative differences between the research-world and the real-world that are difficult or impossible to quantify. These are discussed at length elsewhere, but I briefly I outline some of the differences here.

Looking For Applicable Research

Problems begin immediately when trying to find applicable research studies that are suitable data sources. There are usually major differences between conditions in any research study and in any real-world collision. In fact, research studies in most areas generally employ only a very narrow range of test conditions, ones that are simple and easily controlled. As a result, many real-world scenarios have never been investigated at all so there are no applicable data. PRT studies, for example, generally have a driver responding to a clear, expected, unambiguous hazard presented in good visibility conditions. There are no data for many variables, including some of the most important.

Data obtained in one set of environmental conditions are generally not applicable to others. Applying daytime thresholds to nighttime conditions is a very common error. For example, there isn't a single research study that has ever measured PRT for the common nighttime scenarios of a pedestrian crossing the road in front of a vehicle or of a driver traveling at high speed toward an unexpectedly stopped tractor-trailer. In such cases, accurate PRT estimation is so uncertain that it is better not to try. Citing data from a study performed under different conditions may give the illusion of scientific rigor, but it is only an illusion. The lack of data for so many common situations is not simply due to oversight by researchers. It is often too difficult to perform the needed research because there are too many variables to quantify and/or control. Since the required data are for emergency conditions, performing the required research is also often unsafe. Fortunately, new techniques such as naturalistic studies are beginning to provide some useful data.

Field Factors: Research Subjects Are Different from Real Drivers

Research drivers differ from real drivers in many ways. Most of the differences give the research driver great performance advantages over the real driver who must suddenly respond to an emergency. In some cases, "field factor" multipliers are used to equate research and real performance. For example, it is common to apply a field factor of two for pedestrian visibility distance. The other common example is contrast threshold, which uses a multiplier called the "visibility level", that is often set at 10. However, situational factors are likely to change the value. There are no absolute field factors. A few attempts have been made to determine field factors for other behaviors, but none has gained much traction - and for good reason. Human behavior is too context-dependent for absolute values to be common.

Research Drivers Vs. Real Drivers

The advantages of research over real drivers are broad and deep. They constitute one of the major reasons that research data cannot be directly transferred to the real-world.

1. Heightened Arousal. Research drivers know that their behavior is being monitored. They are in an unusual situation, putting them in a high state of arousal and in an active information-seeking mode. Behavior operates in two fundamentally different modes, automatic and controlled. In normal driving, behavior is largely automatic. Drivers are not blank slates, but largely guide behavior automatically based on expectations derived from learned scripts and schemata. A sudden emergency requires a shift to controlled mode. This sudden switch is likely to disrupt the driver and lead to accident (Kay, 1971). In contrast, research subjects are in controlled mode at all times. The artificially increased performance due to this arousal factor has been already noted, e.g., "the presence of an authority figure (the experimenter) in the back seat may have made them more cautious and attentive than usual" (Olson, Cleveland, Fancher & Kostyniuk, 1984).

2. High Anticipation. Real drivers are largely acting in an automatic mode during normal circumstances. A sudden emergency creates a shock of surprise that catches the drivers off guard. Research drivers wait for something to happen. Subjects know that they are being tested for something, even if they don't know exactly what. "The knowledge that the driving is under test conditions causes the driver to wait for something to happen which is very different from normal driving conditions" (Prynne & Martin, 1995).

3. Expectations. Human drivers develop schemata. Research drivers know that they are in an unusual situation, so their normal schemata might not apply.

4. Lack of Stress. In most studies, drivers do not face real emergencies and real consequences, which can create emotional effects that drastically degrade real-world performance (e.g., Malaterre, Ferrandez, Fleury & Lechner, 1988; Dilich, Kopernik, Goebelbecker, & Michael, 2002). They do not experience the sensorimotor disintegration, hypervigilance, and perceptual narrowing that accompany intense emotion and time stress. The reason is that research subjects believe that they cannot suffer harm (Prynne & Martin, 1995). This is most obvious in simulator studies where a real crash cannot occur. Unlike the real world, the driver can make extreme braking and steering responses with complete impunity. This is doubtless one of the reasons that response times in simulators are faster than in real driving (e.g., Guzek, Lozia, Zdanowicz, Jurecki, Stanczyk & Pieniazek, 2012).

5. Demand characteristics. To some extent, research studies have inherent properties that force subjects to respond in a given way. Research subjects feel compelled to act in conformity with the research protocol and researcher expectations even if it is not their normal behavior, e.g., they can't simply stop because they are bored or tired, drive at a slower or faster speed, or do anything else outside the experimental protocol. They can't engage in risk compensatory behavior. If they intuit the purpose of the study, they may attempt to perform in accordance with the experimenter's desires.

6. Practice: In most research, subjects supply multiple data points. In other words, they are exposed to the same conditions over and over, so they get to practice their response. In the real world, drivers have one chance to get it right. This also means that research drivers know what to expect, what will happen, etc.

7. Initial Responses are eliminated. Since the test subject must learn the task, the initial responses are usually not included in the data. As a result, the very responses that are most representative of a real driver are eliminated. The reason for doing this is to achieve one of the major goals of any research study, low variability (see below).

8. Lack of representativeness. Virtually all studies screen potential subjects for any visual, cognitive, motivational or other performance impairment. The test population is seldom a fair representation of the general population and is almost certain to perform better. The lack of representativeness is greatest in special populations such as older people, who are always highly screened.

Researchers and Investigators Have Different Goals

Some of the differences are due to the different goals of researchers and accident investigators. This point is often overlooked although it is critical. Culture, values and goals always matter in every human endeavor. It is important to understand the mindset and incentive structure of researchers generating the data to understand why there are many limitations when interpreting the data for real-world application.

Research is most often concerned with "effect," i.e., how manipulating some independent variable alters the outcome. It is the effect size, or more precisely its statistical significance, that determines whether the research will ever be published. The goal of statistical significance has several practical consequences. One is the need to keep variability to a minimum. The greater the variability, the more difficult it is to obtain a statistically significant result (usually the 0.05 probability of chance) and the less likely the research is to be published. Even very large effects can fail to be publishable if variability is also very large. Moreover, high variability increases the research's expense, since more subjects are required. Since the researcher's career depends on being published, the motivation to ruthlessly stamp out any source of variability is strong. This invariably compromises ecological validity. More importantly, the need for low variability dictates many methodological choices in the subject pool, the experimental design, instructions and protocol. It is why subjects are screened, given practice and their initial responses are discarded.

1. Uniformity. I have already explained why minimizing variability is a prime research goal. Subjects are screened, initial data are discarded, practice is employed to obtain a steady baseline, etc. The research drivers have almost identical experiences, see the identical road, perform the identical task follow the identical instructions, etc. They almost invariably get to practice the identical task repeatedly. This why subjects identified as aberrant from the study and initial responses are eliminated. Studies also do not include trials where the driver does not respond at all or makes a response outside the protocol. The result is very misleading data at times.

2. Simplified Environment. Clean experimental control means that the result is due only to the independent variable which is being manipulated. The experimental situation must then be highly simplified compared to the real world, where many factors may be operating at once. For example, studies are often run rural road or test tracks with no other traffic and not in normal streets. A rural road or test track provides few viewer distractions to draw attention and little clutter to create masking and crowding. There is no need to monitor other traffic, check mirrors, etc.

The simplification is due to, you guessed it, the goal of having low variability. When drivers are tested on more urban landscapes, the PRT will likely be longer and more variable. For example, the presence of vehicles parked on the roadway was sufficient to almost double PRT to suddenly appearing pedestrians (Edquist, Rudin-Brown, & Lenné, 2012). The greater traffic of urban areas also slows situational awareness (Gugerty, 1997). This is a far different world from the laboratory, test track or dark rural road where most research is performed.

Simulators allow more complex and controllable visual environments, but their relationship to real-world behavior is uncertain, to say the least. Even better ones provide only a limited field of view and have no stereo, vestibular and usually no auditory information. They create neither real emergencies nor the hypervigilence and emotional upheaval of real emergencies. In any event, they do not remove the need to keep the environment simple so that the result can be attributed to a particular variable.

This list of artificial factors is far from complete. Studies are generally conducted in good visibility, good weather conditions and during daylight, so they cannot be extrapolated to poor visibility, bad weather, background clutter, and late/night early hours when circadian rhythms are at a low point, etc.

Looming Threshold Example

Looming is a good example. In some fields such as visibility, for example, contrast thresholds are routinely multiplied by a field factor of 10 to 20, increasing the research thresholds to account for the complexities of the real world. While the need for field factors is taken for granted in visibility, the concept is apparently unknown for those who (mis)apply research values like looming threshold, perception-reaction time, etc. The fundamental problem is that most of the people who attempt to use research numbers are not scientists and don't understand how science works. They just want a simple number that can use in order to cast blame or to build a model. They want psychological variables to be like gravity, virtual constants that are independent of context and can be used without any understanding of the underlying science.

The 0.003 radian/second looming threshold is a particularly abused example. It has become like the 1.5 second PRT, a cookbook number that allows that the user to avoid unpleasant complications, such as reality. People who use looming thresholds seem never to have heard of field factors or why they are necessary. Instead, they parrot the consensus research value that arises primarily in several controlled studies by Hoffman and Mortimer.

Interestingly, the Hoffman and Mortimer studies did not even use real drivers or even a simulator. They didn't even actually measure looming threshold. They used film clips of driving scenes and only inferred the threshold from the accuracy of time-to-collision (TTC) judgments. A very few analyses use 0.006 rad/second from Muttart et al. (2005), but its relationship to real driving is completely unfathomable because 1) drivers were alert and aware and got to practice, 2) they were performing a concurrent task, 3) the looming estimate relied on guesses as to PRT, 4) the study was performed in a very low fidelity simulator with simple visuals, 5) the viewing distance was very short and 6) .006 rad/second wasn't even a threshold, it was an asymptotic value. Does this method seem like it would produce data that could be directly or even indirectly applied to real drivers in the real world? Besides, both numbers ignore variability, which is just as important as the mean in characterizing "normal" and expected behavior.

Virtually all looming thresholds found in research studies have been obtained the same scenario. The studies use the same specific situation - car-following - and may not generalize to other scenarios for many reasons. In car-following, the driver follows another car at a relatively short fixed distance for a period of time and then responds when the lead vehicle decelerates (usually without brake lights to provide a cue). 1) The change in relative speed is quick and the TTC changes rapidly from infinity to some moderate/high value. 2) The speeds are invariably low to moderate and following distance is short. 3) The drivers are expecting the lead vehicle to brake suddenly. 4) The studies are performed in good daylight visibility when the driver can see the entire lead vehicle clearly.

Many of the most severe real-world crashes follow a very different scenario. For example, collisions often occur when a driver on a high-speed road approaches a slowed or stopped vehicle at 60-70 mph without only taillights visible. There are no PRT or looming data for anything resembling this situation because it differs from the research in many ways:

*Research in car-following is performed only with sudden motion changes. In the approach of driver toward a stopped vehicle in the distance, looming rate increases gradually over a long period. Humans are much poorer at detecting gradual changes than sudden ones (e.g., Simons, Franconeri & Reimer, 2000), so the passing of the looming threshold may not be so obvious. Car-following thresholds are not necessarily representative of other collision scenarios.

*Expectation. In car-following studies, the drivers anticipate the deceleration and have their eyes eye to the lead vehicle looking for the slightest change in motion. This is not real-world behavior, except when spacing is short and speeds are high, i.e. when temporal headway is small.

*The situation mimics the classic psychophysical methods of "ascending limits", which produces higher thresholds.

*While looming can trigger an innate fear and an avoidance response at short distance, it is also questionable whether an object that is 300-400 feet away is likely to elicit the innate mechanism for collision avoidance to looming objects. At that distance, a tractor-trailer rear creates a retinal image of 1.5o. That's the size of the thumbnail held at arm's length. Such a small image is unlikely to trigger any emotion or feeling or need to suddenly respond.

*In car-following the sudden deceleration makes the front end of braking vehicles buck down, offering a potential additional cue to deceleration in car-following paradigms. Responding drivers may become tuned to respond to the bucking and not to looming.

*There are few or no looming thresholds for nighttime driving, for retroreflective tape as opposed to an entire vehicle, for Type 1 tau, low visibility, bad weather, etc. Data from ideal daytime conditions should not be assumed valid for the visually impoverished conditions, which doubtless raise looming thresholds. It is more difficult to perceive the motion of small objects (i.e., two-inch high retroreflective tape) than large objects (i.e., the eight-foot high box of a tractor-trailer.)

*It is unclear what this threshold represents. Is it a simple motion detection threshold? A hazard perception threshold? Do real drivers even use looming? If so, when and how do they use it? Some evidence (DeLucia, 2008), for example, suggests that drivers only use optical information at short distances. Determining the threshold, problematic as it may be, holds little value without knowing how the driver will use it to make his decision to respond.

In sum, the data from car-following studies performed in daylight to expected braking likely underestimate the looming needed for a driver approaching a slowed/stopped vehicle to judge TTC. How much is the underestimate? A recent naturalistic study (Markkula, Engström, Lodin, Bärgman, & Victor,2016) found that most real drivers responded at 0.02 rad/second, a looming rate 7 times higher than the research value. Looming threshold is probably much like contrast threshold - it is the point where sensing is possible. However, interpreting the sensation requires a suprathreshold sensation level many times greater than simple sensory threshold. In car-following research, research drivers are told to brake when they seem a change in distance (presumably looming). This does not mean that they perceived the need to brake. It is a demand characteristic of the study.

Conclusion

There are many reasons why research data cannot be accepted at face value. It is safe to say that real-world performance is almost certain to be worse, more variable and sometimes just plain different than predicted by research. The explanation lies in the different goals of researchers, in the inherent nature of controlled research and in the advantages that the research subjects have over real-world drivers.

The main lesson is that any attempt to determine an absolute value must come from research and any attempt to use research relies on the ability to interpret the research despite the problems. This can only be done by people who have an extensive research background - and sometimes not even then. It certainly cannot be done solely by any computer program. In the case of looming, the 0.003 rad/second looming threshold, by itself is virtually worthless without an attempt to compensate for the problems discussed above. Specifically, application of the research data requires:

1. Data from a similar situation. Data from night cases and Type 1 tau are very rare. Data from non car-following situations (a driver approaches an unexpectedly stopped/slow moving vehicle from a long distance) are nonexistent;

2. A field factor to apply it to the real world. Data from real-world crashes suggests a value as high a seven, but this may not represent the motion detection threshold;

3. An account of how situational factors affect the threshold. There are likely many situational factors that would cause the number to vary. There are few or no data for many situations and the existing data rely on a research paradigms and conditions of questionable generalizeability;

4. A measure of variability to characterize the range of the normal behavior by the population as a whole - not just those screened by the researcher; and

5. A model of driver behavior that explains how the driver interprets and incorporates the perceived looming into his decision making. It doesn't seem to be the only or even prime cue for driver response (Markkula, Engström, Lodin, Bärgman, & Victor, 2016; Green, 2024).

Lastly, I am not saying that research data are useless. They are often good at revealing variables which are likely to increase or decrease performance, produce slower PRT, shorter pedestrian visibility distance and poorer contrast and motion thresholds. When it comes to determining absolute values for real behavior, they can at best get you in the general ballpark.