# Set-up

This vignette will describe the decision rules used in the original method of Song (2013) and the High CMC method of Tong et al. (2015). For illustrative purposes, we will consider a comparison between a known match and known non-match pair of cartridge cases from the stuy performed by Fadul et al. (2011). The raw cartridge case scans can be downloaded from the NIST Ballistics Toolmark Research Database. The scans were preprocessed using functions available in the cmcR package and are not discussed here. Refer to the fadul_examples.R script available on the cmcR GitHub page for how these scans were preprocessed. We will also not discuss how similarity features are extracted from two processed scans. Refer to the documentation of the comparison_allTogether function on the cmcR website for information regarding this procedure.

library(cmcR)
library(dplyr)
library(ggplot2)
library(purrr)
library(tidyr)
library(gridExtra)

We will consider comparisons between three cartridge case scans. Fadul 1-1 and Fadul 1-2 are known matches (i.e., were fired from the same firearm) while Fadul 2-1 is a non-match. The comparisons considered are Fadul 1-1 vs. Fadul 1-2 and Fadul 1-1 vs. Fadul 2-1.

data("fadul1.1_processed")
data("fadul1.2_processed")
#Download a non-matching cartridge case to Fadul 1-1 and Fadul 1-2

preProcess_crop(region = "exterior",
preProcess_crop(region = "interior",
preProcess_removeTrend(statistic = "quantile",
tau = .5,
method = "fn") %>%
preProcess_gaussFilter() %>%
x3ptools::sample_x3p()

The three processed cartridge cases are shown below.

The cell-based comparison procedure implemented in the comparison_allTogether function returns a data frame/tibble of similarity features between two cartridge case scans. For each cell in the “reference” scan (Fadul 1-1 in this example), the similarity features include

• Estimated horizontal and vertical translations, (x,y), required to align the reference cell in the target scan
• Cross-correlation between the reference cell and its associated target region after aligning by the estimated (x,y) values
• Rotation performed on the target scan that resulted in the (x,y,CCF) feature set

The fundamental assumption underlying all CMC decision rules is that truly matching cartridge case pairs should have similarity features that are consistent across the cell/region pairs. In particular, a plurality of cell/region pairs should “vote” for similar (x, y, theta) alignment values. In contrast, the cell/region pairs of a truly non-matching cartridge cases should have seemingly random (x, y, theta) votes. The two decision rules implemented in the cmcR package can be understood as two different systems by which cells vote for (x, y, theta) values that they “believe” to be the true alignment values for the overall cartridge case scans.

# Original method of Song (2013)

An actual implementation of the original method of Song (2013) is described in Song et al. (2014). The decision rule Song et al. (2014) describe using is based on

a virtual reference with three reference registration parameters $$\theta_{\text{ref}}$$, $$x_{\text{ref}}$$ and $$y_{\text{ref}}$$ generated by the median values of the collective $$\theta$$, and $$x$$-, $$y$$-translation values of all cell pairs.

That is, a consensus is determined by finding the median registration phase values across the cell/region pairs for a particular cartridge case pair comparison. Then, the distances between the consensus registration values and the cell comparison values are assessed to determine whether they are within a specified distance of the consensus. This consensus assessment introduces threshold parameters $$T_{x}, T_{y}, T_\theta, T_{\text{CCF}}$$.

Let $$x_i, y_i, \theta_i$$ denote the translation and rotation parameters which produce the highest CCF for the alignment of cell/region pair $$i$$. Also let $$x_{\text{ref}}, y_{\text{ref}}, \theta_{\text{ref}}$$ be the median over alignment values for a particular cartridge case comparison (these are the “virtual reference” values). A cell/region pair $$i$$ is declared a match if all of the following conditions hold:

• $$|x_i - x_{\text{ref}}| \leq T_{x}$$
• $$|y_i - y_{\text{ref}}| \leq T_{y}$$
• $$|\theta_i - \theta_{\text{ref}}| \leq T_{\theta}$$
• CCF$$_{\max,i} \geq T_{\text{CCF}}$$.

With respect to the voting system analogy, we might interpret this decision rule as a single-choice voting system similar to the system used in U.S. presidential elections. That is, every cell is allowed to submit one vote corresponding to the registration phase with the highest CCF$$_{\max}$$ value. Some of these votes are discarded if the associated CCF$$_{\max}$$ are below the $$T_{\text{CCF}}$$ threshold. A consensus is determined by counting the number of votes that are close to the reference values $$x_{\text{ref}}, y_{\text{ref}}, \theta_{\text{ref}}$$ (which is dyadically defined based on the $$T_x,T_y,T_{\theta}$$ thresholds).

The plot below shows the values of $$x_i, y_i, \theta_i$$, and $$CCF_{\max,i}$$ for each cell/region pair between Fadul 1-1 and Fadul 1-2 as well as Fadul 1-1 and Fadul 2-1. These values are shown as blue/red bars. The purple bands indicate the range of acceptable values within $$T_{x} = 20 T_{y}, T_{\theta} = 6$$ within $$x_{\text{ref}}, y_{\text{ref}}, \theta_{\text{ref}}$$ and above $$T_{\text{CCF}} = .5$$ to be declared “congruent.” As we might expect, a larger proportion of $$x_i, y_i, \theta_i$$, and $$CCF_{\max,i}$$ values are within these acceptable ranges for the comparison between Fadul 1-1 and Fadul 1-2 than the comparison between Fadul 1-1 and Fadul 2-1. This indicates that there is a clearer “consensus” about the true alignment values for the matching cartridge case pair than the non-matching.

The first step in the High CMC method is to count the CMCs under the original method of Song (2013) in both comparison “directions,” meaning each scan plays the role as the “reference” and “target” scan. After these CMCs are counted, Tong et al. (2015) propose using the minimum of the 2 CMC counts as an initial CMC count prior to applying the High CMC decision rule. The figure below shows the behavior of the $$x_i, y_i, \theta_i$$, and $$CCF_{\max,i}$$ values in each direction via a parallel-coordinates plot, which is useful for visualizing multi-dimensional data sets. Each connected path represents a single cell/region pair. The purple regions again represent the acceptable regions that are sufficiently “close” to the reference values (or above .5 in the case of the CCF). Paths that only traverse through purple regions are deemed congruent under the decision rule of the original method of Song (2013) and are colored blue. We can see that 19 cells are deemed congruent for the comparison in which Fadul 1-1 is treated as the reference while 18 are considered congruent in the other direction. As such, the initial CMC count used for the High CMC method would be 18.

By considering only the “top vote” of each cell as is done in the decision rule of the original method of Song (2013), information is lost regarding other registration phases for which a cell might also rank highly. As Tong et al. (2015) observe:

some of the valid cell pairs may be mistakenly excluded from the CMC count because by chance their correlation yields a higher CCF value at a rotation angle outside the threshold range $$T_\theta$$.

The High CMC method lifts the single-choice restriction by allowing cells to cast a vote for the translation phase at every $$\theta$$ value for which it has a sufficiently large associated CCF$$_{\max}$$ value. Under this system, each vote represents the translation phase that the cell considers to be the true translation phase of the overall scans conditional on a particular $$\theta$$ value. In this way, the High CMC method might be viewed as an approval voting system in which an individual may cast a vote for all of the candidates that they would like. For each $$\theta$$ value, the number of translation phase votes that are close to the $$\theta$$-specific reference values $$x_{\text{ref},\theta}, y_{\text{ref},\theta}$$ are counted (now defined based only on the $$T_x,T_y$$ thresholds). This yields what refer to as a “CMC-$$\theta$$” distribution representing, as they consider it, the number of “congruent cells” per $$\theta$$ value. Thus, there may be more than one $$\theta$$ value for which a single cell/region pair is considered congruent. While seemingly contradictory (as there should be only one “true” $$\theta$$ alignment value), justify their method by the empirical observation:

[i]f two images are truly matching, the CMC-$$\theta$$ distribution of matching image pairs should have a prominent peak located near the initial phase angle $$\Theta_0$$, while non-matching image pairs may have a relatively flat and random CMC-$$\theta$$ distribution pattern.

The assumption underlying the High CMC method is that the number of cells classified as congruent should be larger near the true $$\theta$$ value (the “initial phase angle $$\Theta_0$$”, as they call it) than for other $$\theta$$ values if the cartridge case pair is indeed a match. These phenomena are illustrated in Figures and . shows the CMC counts per rotation value in both directions for the known match pair Fadul 1-1 and Fadul 1-2 from . We can clearly see a CMC mode around $$\theta = -24$$ in one direction and $$21$$ in the other, which is to be expected for a known match pair. , on the other hand, shows the CMC counts for the known non-match pair Fadul 1-1 and Fadul 2-1; in this comparison, no such CMC count mode is achieved.

An example of the CMC-$$\theta$$ distribution for the comparison between Fadul 1-1 and Fadul 1-2 is shown below. We can see that, conditional on $$\theta = -24$$ degrees, more cells tend to have similar x, y values than conditional on $$\theta = 30$$. The CCF values are also larger. This indicates that $$\theta = -24$$ is likely closer to the “true” rotation than $$\theta = 30$$ or elsewhere. The two darker-shaded bars represent the $$\theta$$ values that have a “High CMC” count as described above. Because these $$\theta$$ values are adjacent rather than being far from each other, there is evidence that the “true” $$\theta$$ value is approximately $$\theta = -24$$ or $$-27$$ degrees. We say that this comparison direction would “pass” the High CMC criteria because the $$\theta$$ values with high CMC counts are adjacent.

Based on this observation, outline the following procedure for the High CMC method:

Conduct both forward and backward correlations at each rotation and record the registration based on CCF$$_{\max}$$, $$x$$, and $$y$$ for each cell at each rotation. These data will be used in the next two steps separately.

At every rotation angle, each cell in the reference image finds a registration position in the compared image with a maximum CCF value. By selecting the registration with the maximum CCF value for each cell, the two CMC numbers determined by the four thresholds can be obtained based on the original algorithm []. The lower CMC number is used as the initial result.

Build CMC-$$\theta$$ distributions using the data generated in step 1, by counting the number of cells that have congruent positions at each individual rotation angle. Calculate the angular range of “high CMCs” using both the forward and backward CMC-$$\theta$$ distributions, as illustrated in Figs. 2 and 3.

If the angular range of the “high CMCs” is within the range $$T_\theta$$, identify the CMCs for each rotation angle in this range and combine them to give the number of CMCs for this comparison in place of the original CMC number. In this step, if the range is narrower than $$T_\theta$$, the nearby angles are included to make the range equal to $$T_\theta$$; CMCs with same index in each rotation are only counted once.

introduce an additional criteria to identify a mode in the CMC count per $$\theta$$ distribution. Let $$\{\text{CMC}_{\theta} : \theta \in \Theta \}$$ denote the CMC-$$\theta$$ distribution where $$\Theta$$ is the set of rotation values considered for the comparison. Define CMC$$_{\max} \equiv \max_{\theta} \{\text{CMC}_{\theta} : \theta \in \Theta\}$$. a “high” CMC threshold as CMC$$_{\text{high}} \equiv$$ CMC$$_{\max} - \tau$$ for some constant $$\tau$$ (they choose $$\tau = 1$$). Now let $$\Theta_{\text{high}} \equiv \{\theta : \text{CMC}_{\theta} \geq \text{CMC}_{\text{high}}\}$$. That is, $$\Theta_{\text{high}}$$ consists of the $$\theta$$ values with “high” CMC counts. propose calculating $$R = \max_{\theta} \Theta_{\text{high}} - \min_{\theta} \Theta_{\text{high}}$$. If $$R \leq T_{\theta}$$, then there is evidence that a single mode exists in the CMC-$$\theta$$ distribution (and thus that the cartridge case pair is a match). Otherwise, no such mode exists (by their definition) and the cartridge case pair is likely not a match. The horizontal dashed lines in Figures and represent the CMC$$_{\text{high}}$$ thresholds. The $$\theta \in \Theta_{\text{high}}$$ are represented by blue bars. For the matching pair shown in , the range of $$\Theta_{\text{high}}$$ is less than the threshold $$T_{\theta} = 6$$ degrees, so this pair would “pass” the High CMC criteria. In contrast, the range of $$\Theta_{\text{high}}$$ is larger than $$T_{\theta} = 6$$ degrees for the non-match pair shown in . Thus, the non-match pair would “fail” the High CMC criteria.

The “prominent peak” empirical observation upon which the High CMC method is based does seem to hold for many known match and known non-match pairs in our experience. However, we’ve observed that the behavior of the CMC-$$\theta$$ distributions depend heavily on the preprocessing procedures used and thresholds set. In particular, the CMC-$$\theta$$ distributions for some KNM pairs exhibit the prominent peak behavior for a wide range of threshold values making them difficult to distinguish from KM pairs.