KMeans

Runs the KMeans algorithm over the input dataset.

output from the workload vs. benchmark output

The KMeans workload allows users to optionally capture the results of running KMeans over their input dataset by specifying a file for the workload output.

Whether or not users specify an output file for the workload, the benchmark output will be passed upwards and outputted with the workload suite.

Parameters

Name	Required	Default	Description
name	yes	–	“kmeans”
input	yes	–	the input dataset
output	no	–	If users wish to capture the actual results of the kmeans algorithm, they can specify an output file here.
save-mode	no	errorifexists	Options are “errorifexists”, “ignore” (no-op if exists), and “overwrite”
k	no	2	number of clusters
seed	no	127L	initial values
maxiterations	no	2	maximum number of times the algorithm should iterate

Examples

  {
    name = "kmeans"
    input = "/tmp/kmeans-data.parquet"
    k = 10
  }

  {
    name = "kmeans"
    input = "/tmp/kmeans-data.parquet"
    k = 10
  }