Partitioning Software
From New Wiki
The (h)METIS algorithms (and apparently most modern multilevel partitioning algorithms) have several stages:
- coarsening: gradually reduce to simpler, smaller graph
- initial partitioning: operate on the simpler graph
- uncoarsening & refinement: gradually expand back into regular graph, refining the partitions as more edges/vertices introduced
I found http://www2.research.att.com/~yifanhu/PROJECT/pdcp_siam/node18.html to be instructive in visualizing what these meant.
METIS and hMETIS are both serial algos. hMETIS doesn't have source available, though.
I also looked at some other software, including Karypis' own CLUTO clustering toolkit, which provides a whole bunch of stuff like agglomerative hierarchical clustering (here's an idea for the future of relcloud: multi-level locality, e.g. NUMA-, rack-, and datacenter-level). It's a different formulation from the edges-and-weights graph formulation, since graphs in these types of problems are typically characterized mainly by their distance/similarity measures. Interesting quote from http://glaros.dtc.umn.edu/gkhome/projects/gp/products:
"In fact, the quality of the partitionings produced by CLUTO are in general better than those produced by the serial graph partitioning algorithm in METIS."
Anyway, there are a couple of parameters that stood out in hMETIS 2.0.
The first is the algorithm used, which could have far-reaching consequences in itself:
-ptype=string
Specifies the scheme to be used for computing the k-way partitioning.
The possible values are:
rb - Recursive bisection [default]
kway - Direct k-way partitioning.
This is the balance factor among the partitions:
-ufactor=float (rb, kway)
Specifies the unbalance factor. The meaning of this parameters depends
on the partitioning scheme (ptype):
For ptype=rb
Specifies the maximum difference between each successive bisection.
For instance, a value of 5 leads to a 45-55 split at each bisection.
For ptype=kway
Specifies the maximum load imbalance among all the partitions. For
instance, a value of 5 produces partitions in which the weight of
the largest partition over the average weight is bounded by 5%.
These are randomized algorithms, so this is the number of "trials" the algorithm makes before taking the best one. Not sure what the default value is.
-nruns=int (rb, kway) Specifies the number of different bisections to be computed at each level. The final bisection corresponds to the one that has the smallest cut
Which of those trials should we try to refine using a technique called V-cycle refinement before finally picking the winner? I believe this is separate from the refinement defined earlier.
-nvcycles=int (rb, kway) Specifies the number of solutions to be further refined using V-cycle refinement. If the supplied number is k, then the best k bisections are further refined using V-cycling. This number should be less or equal to that specified for -nruns. The default value is one.
This is a very basic form of constraint, which we may use if we want to *require* certain items to be in the same partition:
-fixed=string (rb, kway) Instructs hmetis to read the file specified as the argument of this parameter for specifying the groups of cells to be placed in the same partition
Important for repeatable results:
-seed=int (rb, kway) Selects the seed of the random number generator.