Behavioral analyzes are used for a wide range of topics ranging from ethology to more applied studies, making them invaluable in assessing animal welfare and adaptability. In this context, the use of the scan-sampling method has become widespread; although it results in partial data loss compared to the continuous method, it represents a useful approach when a large amount of data is collected. Continuous behavioral sampling remains the gold standard for behavioral assessment because actual frequencies and durations, times when behaviors stop and start, and sequences of behaviors can be obtained for the entire duration of the video.17,19. Nevertheless, Martin and Bateson17 argue that “trying to record everything can mean nothing is measured reliably.” The scanning sampling method is less demanding than the continuous method and facilitates the observation of multiple animals in multiple categories over a long period of time.17, thus offering a compromise between precision and time savings. In particular, in studies similar to ours, characterized by numerous video recordings and a high number of animals, the use of continuous sampling is not feasible; thus, sweep sampling seems to be the only possible option. However, the length of the sampling interval per analysis is a complex choice that can affect the final results. Researchers often do not have reference points to make a reasoned choice and must rely solely on their own experience. However, researchers must take into account the objective of the study, the characteristics of the animal and the different factors that can influence the behavior of interest, such as the experimental setting and the environmental context. One of the main objectives of this study was to provide a reliable methodology to broadly compare different chicken genotypes without compromising the final results and taking into account the observer’s time spent for video analysis.
Thus, this study explored the behavioral analysis of chickens with a rigorous statistical approach which, in addition to feasibility, investigated (i) reliability (i.e. agreements between observers and between methods of sampling), (ii) accuracy (i.e. agreements between observers and sampling methods). error rates and bias), and (iii) the validity (accuracy of inferences in practical application) of different sampling intervals20. First, the 10-minute interval was considered “the shortest applicable” in our context based on preliminary assessments that primarily considered feasibility (i.e., the time required for each analysis and the number of analyzes/session) and which excluded a sampling interval. 5 minutes. The 10-minute interval was also chosen as a reference because, compared to the continuous method, it demonstrates the best ability to predict real events and the smallest absolute error.
Next, the behaviors reported in the chicken ethogram were classified into three main groups based on the frequency of the behavior (i.e., low, medium, and high occurrence), as it is known that frequency can affect the reliability of the recordings.20. The classification obtained is consistent with other studies21,22 where it was reported that broilers spent 80% of their budgeted time on locomotor, stationary and foraging behaviors, as confirmed by the high frequency of grass pecking, other pecking, walking, rest, rest and self-grooming.
The frequency of behaviors influenced interobserver agreement, as the lowest indices were obtained for low-occurrence behaviors, such as Allo-grooming. As expected, occurrence categories were also relevant for agreement between sampling methods. In particular, the shortest sampling interval (10 minutes) was better able to identify low and medium frequency behaviors compared to the 15 and 30 minute intervals, while no differences between sampling intervals were found. was found for the most frequent behavior. Comparison with previous studies is not easy because, despite the use of the same video analysis software, the animal rearing conditions can be significantly different. In fact, when investigating alternative observation methods for behavioral assessment of young broilers raised individually indoors, Ross et al.23 claimed that sweep sampling methods with interval duration greater than 5 min were inaccurate because the average duration of each behavior was less than 30 s. It is well known that the breeding system can affect the expression of various behaviors, both in terms of frequency and duration; indeed, some specific behaviors, such as social interactions, were not expressed in the Ross et al. study.23 setting. While Ross et al.23 suggesting an average duration of less than 30 s for each poultry behavior, our observation duration for each analysis was 10 s, regardless of the three analysis intervals adopted. In this way, thanks to the high number of animals (50 birds/enclosure), observers had the best possible conditions to evaluate the animals’ behavior. In addition, the presence of specific tools (zoom, slow motion, etc.) in the Observer XT software allowed us to analyze the individual behaviors of each scan as precisely as possible.
The present study indicated that the accuracy was affected by the sampling method based on the frequency of occurrence. In particular, error analysis confirmed that sampling intervals of 15 and 30 minutes underestimated the occurrence of rare behaviors. However, it is interesting to note that there was no difference between the estimated errors of these two methods. This could suggest that the 15 and 30 minute sampling intervals are interchangeable and that 30 minutes of sweep sampling does not reduce data accuracy compared to the 15 minute interval. On the other hand, half of the sampling points in the 30-minute interval overlapped with the sampling points in the 15-minute interval. This could explain the lack of difference between the errors obtained at sampling intervals of 10 and 30 minutes.
The accuracy and differences between the methods were further investigated using Bland–Altman plots. Bland-Altman analysis is a graphical approach typically used to validate clinical measures and as an indicator of agreement.24,25, because it can quantify and visualize the differences between values recorded with different methods. In agreement studies, these graphs are preferable to the correlation coefficient, which measures the strength of a linear relationship between two variables. Regarding behavioral studies in animals, this method has recently been used to determine the concordance between data obtained from collar-based sensors and human observations on dairy cows.26. In our study, Bland–Altman plots confirmed that the 15 and 30 min intervals underestimated the percentage of animals performing a specific behavior and indicated that the absolute bias increased with increasing occurrences of the same behaviors. This bias could exceed 10% for the 15 and 30 minute sampling intervals.
Bland–Altman plots where the bias was expressed as a percentage of the expected values better showed the practical impact of these biases. This result indicates that the highest relative differences were found for rare behaviors. Specifically, the 15 and 30 minute intervals overestimated or underestimated the “true” value by up to 2 times. Such a bias could have important consequences on the interpretation of the data. However, the relative differences decreased and tended to be negligible in the case of high-frequency behaviors such as perching, walking, grass pecking, and resting. In this regard, the following specific considerations should be taken into account for the behaviors observed in chickens raised outdoors: (i) frequent behaviors account for approximately 80% of the time spent on the chicken budget.21,22(ii) they represent the behaviors most studied to characterize and compare genetic strains27and they are used to evaluate the adaptability of chickens under different rearing conditions28. Therefore, when the study requires the assessment of frequent behaviors, the 30-minute sampling interval may be a good compromise between resources (funding and time) and results. In fact, it can be assumed that the true value of frequent behaviors does not differ across the tested sampling intervals. In contrast, studies of rare behaviors (such as the assessment of behavioral sequences in relation to stressful conditions)29the fear response or positive behaviors, such as playing30) require very short intervals or continuous recording.
These considerations were strongly supported by the application of the three sampling intervals to assess behavioral differences between genotypes. Our data showed that for a rare behavior, such as Attacking, the scan interval influenced OR, leading to conflicting conclusions regarding the genotype-behavior relationship. In particular, the 15 and 30 min sampling intervals indicated a difference in attack between LD and CB chickens, which was not found using the 10 min interval. Misinterpretations of this behavior were also found in comparisons between LD and Red chickens and between LD and NN chickens. Quantitative differences could also appear for high and medium frequency behaviors (e.g., walking and dust bathing), but in this case they did not lead to different interpretations of the relationship between race and behavior.
The results of this applied inferential statistic (i.e. the OR) confirmed the indications of the Bland–Altman plots and, in particular, the plots constructed using the bias expressed as a percentage of the expected values. Both approaches suggest that the 30-minute sampling interval provides a valid assessment of high-frequency behaviors, useful for broadly characterizing the chicken genotype, but bias in rare behaviors such as attacking could compromise the validity of these estimates. Thus, Bland–Altman analysis appears to be a useful tool for comparing sampling intervals and informing researchers’ choices, because it provides an effective visual representation and allows for practical considerations based on the practical importance of bias.
The main limitation of this study is that the comparison between continuous and sweep sampling methods was only performed for a single session/genotype. However, this choice was largely motivated by the feasibility of evaluating a large number of animals per enclosure and by the numerous behavioral variables included in the ethogram.