Parameters
Name |
Required |
Default |
Description |
name |
yes |
– |
“data-generation-kmeans” |
rows |
yes |
– |
number of rows to generate |
cols |
yes |
– |
number of columns to generate |
output |
yes |
– |
output file |
save-mode |
no |
errorifexists |
Options are “errorifexists”, “ignore” (no-op if exists), and “overwrite” |
k |
no |
2 |
number of clusters generated |
scaling |
no |
0.6 |
scaling factor of the the dataset |
partitions |
no |
2 |
number of partitions |
Examples
{
name = "data-generation-kmeans"
rows = 100000000
cols = 24
output = "/tmp/kmeans-data.csv"
}
{
name = "data-generation-kmeans"
rows = 100000000
cols = 24
output = "/tmp/kmeans-data.parquet"
k = 4500
scaling = 1.6
parititions = 10
}