Background:

Watch the entire CSAFE webinar (including questions at the end) posted here. (The actual presentation starts about 7 minutes in.) While you are watching, keep in mind the following questions to respond to when you’re finished:

1. What data do Hare et al. use?

Hare discusses utilization of the Hamby data and the use of the NIST data base with 3D ballistics.

2.In what ways do the methods used by Hare et al. differ from the “traditional” methods of bullet matching?

In traditional bullet matching the threshold consists of six consecutively matching striae. The traditional method does not include error rates or reliability tests. It is behoove to note that in the evaluation of bullets by specialists using the traditional method that there were no false positives.

Hare utilizes seven steps to evaluate bullets. He scans them using a micron microscope and then evaluates the striae computationally.

3. How do Hare et al. use clustering to help perform bullet matching tasks?

Hare uses a random forest. “For each of the trees in a random forest, only two thirds of the observations are used for fitting, while the remaining third is used to evaluate the tree’s predictive power and accuracy, or its reverse, the error rate. Because errors are determined from the one third of held-back observations, this error rate is called the out of bag (OOB) error. Figure 18 shows the cumulative out-of-bag error (OOB) rate for 300 trees. After about 100 trees, the error rate of land-to-land comparisons stabilizes at 0.0039. This is a weighted average between false positive error rate of 0.0001 and an error rate of false negatives of 0.2267.” (20-21 Hare et al.) The random forest allows for calculations of error rates.

4. Identify one statistics and/or probability concept in the presentation that you have not heard of before.

Loess Regression:

Within the study:

This regression model fits a smooth line to the bullet data and then they plot the variance of the residuals to get a signature on the lands.(https://github.com/CSAFE-ISU/REU-blog/blob/master/Assignments/Week7b/bulletmatchingpaper.pdf)

Wikipedia:

It is difficult to compute. It is a type of linear regression. Loess Regression fits a smooth line to the data. I’d probably draw out a picture when trying to explain Loess Regression to have something to visualize and reference. The mathematics deals with polynomials and averages when fitting the line. It is most efficiently used with large data.(https://en.wikipedia.org/wiki/Local_regression)

www.statsdirect.com:

Lowess(locally weighted scatter-plot smoother). It is a “non-parametric model because the linearity assumptions of conventional regression have been relaxed.”(Stats Direct) Lowess is a weighted model. There is a trade-off between variance and bias when using this model. Degrees of freedom are set empirically and thus are not necessarily integers. (http://www.statsdirect.com/help/nonparametric_methods/loess.htm)