Investigating behavior assessment instruments to predict aggression in dogs

To best understand this article in the context of the behavior evaluation literature, please see National Canine Research Council’s complete analysis here.

Article Citation:

Bennett, S. L., Litster, A., Weng, H., Walker, S. L., & Luescher, A. U. (2012). Investigating behavior assessment instruments to predict aggression in dogs. Applied Animal Behaviour Science, 141(3-4), 139-148. doi:

National Canine Research Council Summary and Analysis:  

The retrospective 2012 study by Bennett, Litster, Weng, Walker and Leuscher raises questions on the practical application of behavioral tests. Researchers evaluated two behavioral assessment tools commonly used in shelters to attempt to predict behavior in adoptive homes: Meet Your Match (MYM) Safety Assessment for Evaluating Rehoming (SAFER), and a modified version of Assess-A-Pet (mAAP). The purpose was to determine the extent to which each could predict “aggression” and to compare the tests’ sensitivity and specificity. It was a masked control study; that is, the evaluators were unaware of the dogs’ behavior history.

The validated Canine Behavior Assessment and Research Questionnaire (C-BARQ) was used to collect behavior histories for the 67 pet dogs in the sample. The behavior histories were then used to assess the validity of SAFER and mAAP. The subjects were recruited via convenience sample, and they were owned dogs, not shelter dogs. Dogs in this study had been living in their home environment for a minimum of 3 months, which, as discussed in the literature review, raises doubts as to whether the results can be extrapolated to try to predict future behavior in dogs living in a shelter environment.

The authors chose C-BARQ, an owner behavior observation instrument to assign dogs to categories (“low/no aggression, moderate/sever aggression, further subdivided between moderate and severe) as the standard for categorizing dogs because it has previously been validated for measuring stranger directed threats or biting, owner directed threats or biting, dog directed threats or biting, dog directed fear, stranger directed fear, nonsocial fear, separation related problems, and attachment/attention seeking. Three common stimuli for threatening and biting behavior in dogs (strangers, owners, and other dogs) were used as factors for types of aggression. Scores for these factors were calculated using 4-10 survey questions with a 5-point ordinal scale (0-4). A 0-score was reserved for no history of growling, snarling, snapping or biting or barking, and a score of 4 represented snapping, biting, or “attempting to bite.” For each factor, dogs were then grouped based on maximum scores for each factor; a maximum score of 0 or 1 resulted in a “0” label, a maximum score of 2 or 3 resulted in a “1” label, and a maximum score of 4 resulted in a “2” label. Thus, each dog was assigned three scores, one for each potential threat target. Finally, the maximum scores from the three factors were used to calculate an overall score. Dogs who scored 0 on all three factors were given an overall score of “0,” dogs who scored a maximum of 1 across the three factors were given an overall score of “1,” and dogs who scored at least one 2 were given an overall score of “2.”

SAFER – Seven subtests were administered. First, the experimenter held the dog’s head and gazed into its eyes (“Look”). The second test involved gently grasping fur and skin along the dog’s body (“Sensitivity”). The third subtest was an attempt to initiate play by speaking excitedly and lightly poking the dog (“Tag”). During the fourth subtest the evaluator said, “squeeze” and then gently squeezed the dog’s leg and paw (“Squeeze”). This was repeated to determine whether the dog appeared to learn to anticipate the squeeze based on the vocal cue. The fifth and sixth subtests used a plastic hand to take away food and toy items, respectively (“Food Behavior” and “Toy Behavior”). Finally, in the seventh subtest the subject was led into a room occupied by a second, passive dog. Initial approach behavior was recorded, but the dogs were not allowed to touch or interact further (“Dog To Dog Behavior”).

mAAP – The mAAP was comprised of nine subtests, administered in the following order. First, the evaluator stood in front of the dog’s cage for 5 s, made eye contact, and then knelt and spoke to the dog in a friendly manner (“Cage Presentation”). Next, while holding the dog on a leash, the evaluator ignored the dog for 60 s, then stroked the dog’s back, sat down and ignored the dog briefly, and finally spoke to the dog in a friendly manner (“Sociability”). The third subtest required the evaluator to lift the dog’s lips to expose its teeth for 5 s, then repeat this 5 times (“Teeth Examination”). Next the evaluator touched the dog all over, including pressing on its shoulders, tugging its tail and ears, and lifting its paw (“Handling”). In the fifth subtest (“Arousal”) the evaluator initiated play using toys for 30 s. For the “Food Bowl” and “Possessions” subtests, dogs were presented with a bowl of dog food and a basted pig’s ear, respectively, then a plastic hand was used to attempt to remove the items from the dog. Next, a person dressed in a hat and coat knocked on the door, entered the room, and approached the dog. The stranger made eye contact, knelt down, and talked to the dog in a friendly manner (“Stranger”). Finally, an unfamiliar, neutral, leashed dog was presented to the subject (“Dog Introduction”).

For SAFER, each subtest was scored using behavioral descriptions on a 1-5 scale with higher numbers indicating higher levels of aggression. A legend was used to categorize some of the dogs’ scores, with “P” corresponding to a numeric 3 on the scale and indicating some concerning behavior and potential behavior modification necessary. An “R” corresponded to a score of 4, and indicated moderate aggression and strongly recommended behavior modification. Finally, an “S” indicated that a test had to be stopped for safety reasons, which was scored as a level 5 result.

Next, the data were transformed in two ways to allow multiple types of analyses. First a binary score was created, with dogs who did not receive a score corresponding to “P,” “R,” or “S” on any subtest categorized as “0” (no concerning behavior) and dogs who received at least one “P,” “R,” or “S” label categorized as “1” (concerning behavior). Next, an ordinal categorization method was created; if a dog did not receive any “P,” “R,” or “S” scores they were marked 0, if they received at least one “P” but no “R” or “S” scores they were marked “1,” if they received at least one “R” but no “S” scores they were marked “2,” and finally if they received any “S” scores they were categorized as “3.”

Two parallel transformations were done with the mAAP data, resulting in a binary 0 (no issue or unsocial) or 1 (borderline or fail) for the first analyses, and an ordinal “0” (no issue), “1” (unsocial), “2” (borderline), and “3” (fail) for the second analyses.

A series of statistical tests were conducted to determine sensitivity, specificity, false positives, false negatives, and odds ratios for both instruments. mAAP had better sensitivity (0.73) and specificity (0.59) than SAFER (0.6 and 0.5 respectively) and both tests were more likely to classify a dog with a history of threatening or biting behavior as aggressive than one without such a history (odds ratios of 4.1- and 1.5-fold, respectively). However, both tests produced false positives and false negatives, even when the data were further categorized to be more precise. With the transformed data, there was a statistically significant positive correlation between mAAP and threatening and biting behavior history (based on C-BARQ results), but the correlation was not statistically significant for SAFER.

Many strengths stand out for this study. The researchers were meticulous in following the written instructions for both of the instruments; they used the exact materials described (e.g., an English slip lead for mAAP versus a buckle collar and 1.8 m nylon leash for SAFER) and performed the evaluations precisely. This attention to detail increases the study’s ecological validity.

However, in terms of validating either assessment, the data fall short. This is primarily due to the validation method used. The C-BARQ was used as the standard to which mAAP and SAFER were compared, but C-BARQ is based on owner-reported observations. Like all self-reported data, C-BARQ results are subject to owner memory, honesty, knowledge, and clarity. Furthermore, 28 dogs in the sample had been adopted from shelters, and 24 of those 28 (86%) were categorized in the moderate to severe aggression group. This is curious and raises several questions. Since they were adopted from a shelter, is it possible that they had previously undergone a behavioral assessment (such as one of the tests used in this study, no information was presented either way), which means there could be learning or history effects affecting the assessment scores? Is it also possible that when owners adopted these dogs from the shelter they were informed about assessment scores or were given information about the dog’s behavior as provided from prior owners (again, subject to honesty, memory, etc.)? Either of these could affect how the owners responded on the C-BARQ. Thus, the main issue is whether or not C-BARQ is an appropriate standard for validating these assessments for this particular sample.

To conclude, this study has shown a discrepancy between two assessments meant to measure the same thing, and because of this, shelter staff and other canine professionals should take pause when using a behavior evaluation to determine a dog’s fate. For this reason, the authors recommend caution when applying these assessments in shelters, and suggest only using them in conjunction with other data, such as intake history and staff observations.

Abstract and Link to Purchase Full Text of the Original Article: