Runs the KMeans algorithm over the input dataset.

output from the workload vs. benchmark output

The KMeans workload allows users to optionally capture the results of running KMeans over their input dataset by specifying a file for the workload output.

Whether or not users specify an output file for the workload, the benchmark output will be passed upwards and outputted with the workload suite.

Parameters

Name Required Default Description
name yes “kmeans”
input yes the input dataset
output no If users wish to capture the actual results of the kmeans algorithm, they can specify an output file here.
save-mode no errorifexists Options are “errorifexists”, “ignore” (no-op if exists), and “overwrite”
k no 2 number of clusters
seed no 127L initial values
maxiterations no 2 maximum number of times the algorithm should iterate

Examples

  {
    name = "kmeans"
    input = "/tmp/kmeans-data.parquet"
    k = 10
  }
  {
    name = "kmeans"
    input = "/tmp/kmeans-data.parquet"
    k = 10
  }