Integrating handwriter into a workflow


Interested in how to integrate handwriter into your own project?

This page will give you a little more information on integration, as well as an example of how we used it


Inputs & Outputs

As an input, handwriter take a .png image of handwriting. This input needs to be taken through the core steps defined in the methods section. As a result, handwriter outputs the information and measurements for each glyph as a list of lists. You will want to keep this in mind as you integrate handwriter into your own workflow


How we use it

The handwriter package came about as apart of our larger process of attempting to automate the process of identifying handwriting from the same individiual in a closed set. The package is one of 3 distinct parts of our workflow that are as follows:

  1. Data Collection
    • Collect handwriting samples
    • Scan, load, and crop images via batch processing
  2. Computational Tools
    • Binarize: Turn image to black and white
    • Skeltonize: Reduce writing to 1 pixel wide
    • Break into glyphs: Decompose into managable pieces
    • Measure: Extract various measurements of these glyphs
  3. Statistical analysis
    • Clustering: Separate glyphs based on shape
    • Model: Fit a statistical model to the data
    • Identify: Identify a writer in a closed set

Step 1: Data Collection

We are conducting a large data collection study to gather handwriting samples from a variety of participants across the world (most in the Midwest). Each participant provides handwriting samples at three sessions. Session packets are prepared, mailed to participants, completed, and mailed back.

Once recieved, we scan all surveys and writing samples. Scans are loaded, cropped, and saved using a Shiny app. The app also facilitates survey data entry, saving that participant data to lines in an excel spreadsheet.

A public database of handwriting samples we have collected can be found at forensicstats.org/handwritingdatabase.

A data article regarding these samples was accepted at Data in Brief
Crawford, A., Ray, A., & Carriquiry, A. (2020). A database of handwriting samples for applications in forensic statistics. Data in brief, 28, 105059.


Step 2: Computational Tools

Information on computational tools can be found in the methods section.


Step 3: Statistical Analysis

Clustering

Rather than impose rigid grouping rules (the previously used ‘’adjacency grouping’’) we consider a more robust, dynamic K − means type clustering method that is focused on major glyph structural components.

For a clustering algorithim we need two things:

  1. A distance measure - For us, a way to measure the discrepency between glyphs.
  2. A measure of center - A glyph-like structure that is the exemplar representation of a group of glyphs.

Glyph Distance Measurement

We begin by defining edge to edge distances. Edge to edge distances are subsequently combined for an overall glyph to glyph distance.

Consider the following single edge glyphs e1 and e2. Make 3 edits to e1 to match e2. Then combine the magnitude of each edit.

Measure 1 (Left) - Shift: Anchor to the nearest endpoint by shifting. In our example, the shift value is 1.4.

Measure 2 (Center) - Stretch: Make the end points the same distance apart. Stretch value of 9.9.

Measure 3 (Right and Bottom) - Shape: Bend and twist the edge using 7 shape points. Shape points are 'matched' and the distance between them is averaged to obtain the shape value. Shape value of 8.4 after averaging

edge_to_edge_6

Shape measurements averaged
edge_to_edge_2


So, our edge to edge distance: D(e1, e2) = 1.4 + 9.9 + 8.4 = 19.7


Measure of Glyph Centers

For this measurement, we take the weighted average of endpoints, 7 shape points, and edge length

measure_center_1


K-means clustering algorithim for glyphs

We implement a standard K-means. We begin with a fixed K and set of exemplars. Iterate between the following steps until cluster assignments don't change:

1. Assign each glyph to the exemplar it is nearest to given the distance measure taken before

2. Calculate each cluster mean as defined. Find the exemplar nearest the cluster center.

k_means_1


An example cluster when K = 40 is shown below. Examplar is shown on the left in red, with the members shown as in black behind it. On the right is the cluster mean.

During clustering, outliers are considered glyphs that are a certain distance from the exemplar. The algorithim sets a ceiling on the allowable number of clusters.


Statistical Modeling

The most appropriate approach to modeling is found through the wrapped model for rotational angles


Wrapped Model for Rotation Angles

We consider the rotational angles found in measurements before in the polar coordinate syste and treat them as spanning the full circle. So we map the upper half plane values to (0, 2π), where the values above the x-axis indicate a right leaning graph and below the x-axis indicate left leaning graphs. Graphs that are relatively straight up and down will have values near 0/2π if they are wider than tall, and near π if they are taller than wide.

We consider two distributions to approach this circular data:

The von Mises distribution is a close approximate to the wrapped normal distribution, circular analog of the normal distribution. This is the go-to model for unimodal wrapped distribution. It is specified through the mean, μ, and concentration, κ (1/k analogous to σ2).

The wrapped Cauchy distribution is the wrapped version of the Cauchy distribution. Similar to Cauchy, this distribution is symmetric, unimodel, and specified by a location paramter, μ, and concetration parameter ρ. This is a heavily-tailed distribution in the sense that it will place more density on the "back" of the circle opposite the peak of distribution.

Now consider:

wrapped_cauchy_1

Where RAw,k,i, i = 1,...,nw,k, are roation angles from cluster k, and writer w, on the full circle (0, 2π). For a given writer and cluster, μw,k and ϕw,k are the location and concentration parameters, respectivley. We place non-informative uniform priors on each. Again, there is no borrowing set up in the model for rotation angles. Each writer/cluster combination ets its own estimated rotation angle distribution.