Partitioning Software

From New Wiki

Jump to: navigation, search

The (h)METIS algorithms (and apparently most modern multilevel partitioning algorithms) have several stages:

  • coarsening: gradually reduce to simpler, smaller graph
  • initial partitioning: operate on the simpler graph
  • uncoarsening & refinement: gradually expand back into regular graph, refining the partitions as more edges/vertices introduced

I found http://www2.research.att.com/~yifanhu/PROJECT/pdcp_siam/node18.html to be instructive in visualizing what these meant.

METIS and hMETIS are both serial algos. hMETIS doesn't have source available, though.

I also looked at some other software, including Karypis' own CLUTO clustering toolkit, which provides a whole bunch of stuff like agglomerative hierarchical clustering (here's an idea for the future of relcloud: multi-level locality, e.g. NUMA-, rack-, and datacenter-level). It's a different formulation from the edges-and-weights graph formulation, since graphs in these types of problems are typically characterized mainly by their distance/similarity measures. Interesting quote from http://glaros.dtc.umn.edu/gkhome/projects/gp/products:

"In fact, the quality of the partitionings produced by CLUTO are in general better than those produced by the serial graph partitioning algorithm in METIS."

Anyway, there are a couple of parameters that stood out in hMETIS 2.0.

The first is the algorithm used, which could have far-reaching consequences in itself:

-ptype=string
   Specifies the scheme to be used for computing the k-way partitioning.
   The possible values are:
      rb       - Recursive bisection [default]
      kway     - Direct k-way partitioning.

This is the balance factor among the partitions:

-ufactor=float                (rb, kway)
   Specifies the unbalance factor. The meaning of this parameters depends
   on the partitioning scheme (ptype):
     For ptype=rb
       Specifies the maximum difference between each successive bisection.
       For instance, a value of 5 leads to a 45-55 split at each bisection.
     For ptype=kway
       Specifies the maximum load imbalance among all the partitions. For
       instance, a value of 5 produces partitions in which the weight of
       the largest partition over the average weight is bounded by 5%.

These are randomized algorithms, so this is the number of "trials" the algorithm makes before taking the best one. Not sure what the default value is.

-nruns=int                    (rb, kway)
   Specifies the number of different bisections to be computed
   at each level. The final bisection corresponds to the one that
   has the smallest cut

Which of those trials should we try to refine using a technique called V-cycle refinement before finally picking the winner? I believe this is separate from the refinement defined earlier.

-nvcycles=int                 (rb, kway)
   Specifies the number of solutions to be further refined using
   V-cycle refinement. If the supplied number is k, then the best
   k bisections are further refined using V-cycling. This number
   should be less or equal to that specified for -nruns. The default
   value is one.

This is a very basic form of constraint, which we may use if we want to *require* certain items to be in the same partition:

-fixed=string                 (rb, kway)
   Instructs hmetis to read the file specified as the argument
   of this parameter for specifying the groups of cells to be
   placed in the same partition

Important for repeatable results:

-seed=int                     (rb, kway)
   Selects the seed of the random number generator.
Personal tools