This page will give you a little more information on integration, as well as an example of how we used it
As an input, handwriter take a .png image of handwriting. This input needs to be taken through the core steps defined in the methods section. As a result, handwriter outputs the information and measurements for each glyph as a list of lists. You will want to keep this in mind as you integrate handwriter into your own workflow
The handwriter package came about as apart of our larger process of attempting to automate the process of identifying handwriting from the same individiual in a closed set. The package is one of 3 distinct parts of our workflow that are as follows:
We are conducting a large data collection study to gather handwriting samples from a variety of participants across the world (most in the Midwest). Each participant provides handwriting samples at three sessions. Session packets are prepared, mailed to participants, completed, and mailed back.
Once recieved, we scan all surveys and writing samples. Scans are loaded, cropped, and saved using a Shiny app. The app also facilitates survey data entry, saving that participant data to lines in an excel spreadsheet.
A public database of handwriting samples we have collected can be found at forensicstats.org/handwritingdatabase.
A data article regarding these samples was accepted at Data in Brief
Crawford, A., Ray, A., & Carriquiry, A. (2020). A database of handwriting samples for applications in forensic statistics. Data in brief, 28, 105059.
Information on computational tools can be found in the methods section.
Rather than impose rigid grouping rules (the previously used ‘’adjacency grouping’’) we consider a more robust, dynamic K − means type clustering method that is focused on major glyph structural components.
For a clustering algorithim we need two things:
We begin by defining edge to edge distances. Edge to edge distances are subsequently combined for an overall glyph to glyph distance.
Consider the following single edge glyphs e1 and e2. Make 3 edits to e1 to match e2. Then combine the magnitude of each edit.
Measure 1 (Left) - Shift: Anchor to the nearest endpoint by shifting. In our example, the shift value is 1.4.
Measure 2 (Center) - Stretch: Make the end points the same distance apart. Stretch value of 9.9.
Measure 3 (Right and Bottom) - Shape: Bend and twist the edge using 7 shape points. Shape points are 'matched' and the distance between them is averaged to obtain the shape value. Shape value of 8.4 after averaging
So, our edge to edge distance: D(e1, e2) = 1.4 + 9.9 + 8.4 = 19.7
For this measurement, we take the weighted average of endpoints, 7 shape points, and edge length
We implement a standard K-means. We begin with a fixed K and set of exemplars. Iterate between the following steps until cluster assignments don't change:
1. Assign each glyph to the exemplar it is nearest to given the distance measure taken before
2. Calculate each cluster mean as defined. Find the exemplar nearest the cluster center.
An example cluster when K = 40 is shown below. Examplar is shown on the left in red, with the members shown
as in black behind it. On the right is the cluster mean.
During clustering, outliers are considered glyphs that are a certain distance from the exemplar. The algorithim sets a ceiling on the allowable number of clusters.
The most appropriate approach to modeling is found through the wrapped model for rotational angles
We consider the rotational angles found in measurements before in the polar coordinate syste and treat them as spanning the full circle. So we map the upper half plane values to (0, 2π), where the values above the x-axis indicate a right leaning graph and below the x-axis indicate left leaning graphs. Graphs that are relatively straight up and down will have values near 0/2π if they are wider than tall, and near π if they are taller than wide.
We consider two distributions to approach this circular data:
The von Mises distribution is a close approximate to the wrapped normal distribution, circular analog of the normal distribution. This is the go-to model for unimodal wrapped distribution. It is specified through the mean, μ, and concentration, κ (1/k analogous to σ^{2}).
The wrapped Cauchy distribution is the wrapped version of the Cauchy distribution. Similar to Cauchy, this distribution is symmetric, unimodel, and specified by a location paramter, μ, and concetration parameter ρ. This is a heavily-tailed distribution in the sense that it will place more density on the "back" of the circle opposite the peak of distribution.
Now consider:
Where RA_{w,k,i}, i = 1,...,n_{w,k}, are roation angles from cluster k, and writer w, on the full circle (0, 2π). For a given writer and cluster, μ_{w,k} and ϕ_{w,k} are the location and concentration parameters, respectivley. We place non-informative uniform priors on each. Again, there is no borrowing set up in the model for rotation angles. Each writer/cluster combination ets its own estimated rotation angle distribution.