Research Pipeline: K-means Clustering

Handwriter also supports K-means clustering of graphs.

To create a clustering template you can use createClusterTemplates()

createClusterTemplates() only has one required parameter, documentDirectory which is the folder where the documents to process reside. Paths can also be provided for storage of log files, temp files, and results files. Otherwise, they will be stored within documentDirectory

Typical use cases:

                      #The most basic call, with only the documentDirectory provided
                      createClusterTemplates("path/to/directory")
                      
                      #A call with the documentDirectory, resultsFile, 25 clusters and 5 cluster sets
                      createClusterTemplates(documentDirectory = "path/to/directory", resultsFile = "path/to/resultsfile", K = 25, numberToRun = 5)
                    

The parameters for createClusterTemplates() include:

  • documentDirectory | STRING | REQUIRED
      The directory where the .png files are located.
  • logDirectory | STRING | OPTIONAL | DEFAULT: documentDirectory
      Where the logs will be saved.
  • dataDirectory | STRING | OPTIONAL | DEFAULT: documentDirectory
      The directory where the temporary data will be saved.
  • resultsFile | STRING | OPTIONAL | DEFAULT: documentDirectory
      Representing the file where the results will be saved.
  • K | INTEGER | OPTIONAL | DEFAULT: 40
      How many clusters will be created
  • numberToRun | INTEGER | OPTIONAL | DEFAULT: 1
      How many cluster sets will be created.
  • numCores | INTEGER | OPTIONAL | DEFAULT: 1
      Number of cores. Each clustering template will be created on a different core. If you have the necessary resources this can significantly improve processing time.
  • numDistCores | INTEGER | OPTIONAL | DEFAULT: 1
      Integer number of cores to use for distance calculations. If you have the necessary resources this can significantly improve processing time.
  • iter.max | INTEGER | OPTIONAL | DEFAULT: 500
      Integer maximum number of iterations to allow cluster centers to converge
  • gamma | INTEGER | OPTIONAL | DEFAULT: 3
      Float parameter for calculating the outlier cutoff. If numOutliers is zero, gamma has no effect.
  • meanGraph | STRING | OPTIONAL | DEFAULT: 'slow_change'
    • String choice of algorithm for calculating mean graphs. The choices are 'basic', 'slow_change' and 'kmeans'
  • meanGraphOrder | STRING | OPTIONAL | DEFAULT: 'sequential'.
      Chose the order in which to add graphs to the mean graph calculations. The options are 'sequential' and 'random'
  • fillClusters | STRING | OPTIONAL | DEFAULT: 'farthest'
      String choice of how to choose which graph to reassign to an empty cluster. Choices are 'random' and 'farthest'