This page is no longer maintained. The graph generator package
will soon be shifted to a new place.
We provide two different graph generator tools for external memory BFS algorithms. The first one is based on the external memory BFS code by Frederik Juul Christiani and Vitaly Osipov. The other one is based on my external memory BFS code and requires STXXL . Both tools generate graphs in the format of DIMACS shortest path challenge and can be used for shortest paths as well (though all the edge weights are one).
This graph generator is based on linux mmap function and popt.h. In order to be able to generate large graphs using this tool, one needs a 64-bit architecture.
Generated graph types:
- Random This graph type is generated by adding edges with random end-points, until the desired number of edges has been added.
- Layered The layered graph is a graph constructed such that all the BFS levels have approximately the same size. The source is connected to all vertices in the first level. For every vertex in a level edges to 3 random vertices in the next level are added. Two adjacent levels thus forms a bipartite subgraph. To ensure connectivity of the graph, the vertices in the last level are connected with a Hamiltonian cycle.
-
Torus A 3D lattice of vertices, where all vertices are connected to their neighbors (no
diagonal edges). Opposite surfaces of the lattice are then connected to form a
hypertorus giving all vertices a degree of 6. This is an extension of the grid graphs in order to get the desired density. The diameter is 3/2|V|1/3.
- Hamiltonian Cycle As the name suggests, this graph type contains a Hamiltonian cycle. In this graph type, the vertices have the same degree with very high probability and the graph itself has a very small diameter. First a random Hamiltonian cycle is added to the empty graph. The edges corresponding to a random pairing of all the vertices is then added. In total four
random pairings is added to give all vertices an approximate degree of 6.
-
Simple Line Let all vertices of a graph be sorted by the vertex index. A graph whose edges are in the form (v,v+1), where v<|V| is a Simple Line graph.
-
Random Line Unlike the case of the simple line, nodes of a line graph are not stored in a contiguous manner, but randomly distributed over the disk.
- Spider Web graph This graph class is a specialization of web graph defined in http://mathworld.wolfram.com/WebGraph.html. It consists
of B levels, each having n/B nodes. All nodes in a level
are connected in a cyclic fashion and a node has an edge
to it's corresponding node in the level before and after.
The command-line options for this tool are:
| -t, --type=TYPE |
which type of graph to generate (random,layer,3dgrid,hamilton,bullseye,line,spiderweb) |
| -v, --vertices=NUM |
number of vertices in the generated graph |
| -e, --edges=NUM |
number of edges in the generated graph |
| -r, --ratio=DOUBLE |
ratio of edges to vertices in the generated graph |
| -l, --layer=NUM |
number of vertices in each layer for Layered graphs |
| -s, --shuffle |
shuffle vertices (use this option with the -t line in order to create Random Line graph) |
| -n, --name=STRING |
name of the generated graph |
| -o, --filename=FILE |
output filename of the generated graph |
| -d, --verbose |
increase verbosity |
| -V, --version |
show version information |
| -b, --blocksize=NUM |
memory block size (this is a parameter B used for generating Spider Web graphs) |
This graph generator tool runs on all architectures, but requires STXXL .
It generates the following graphs (Most graphs are also described in [EM_BFS_AJW]):
- Random graph: On a n node graph, we randomly select m edges with replacement (i.e., m times selecting a source and target node such that source and target are different) to obtain random graphs.
- B-level random graph: This graph consists of B levels, each (except the level 0 containing only the source node) having n/B nodes. The edges are randomly distributed between consecutive levels, such that these B levels approximate the BFS levels. The initial layout of the nodes on the disk is such that the n/B nodes in the same level are all in different blocks. This graph causes MR_BFS ([MR_BFS])to incur it's worst case of Ω(n) I/Os.
- B-level spider web graph: This graph class is a specialization of web graph defined in http://mathworld.wolfram.com/WebGraph.html. Again, it consists of B levels, each having n/B nodes. All nodes in a level are connected in a cyclic fashion and a node has an edge to it's corresponding node in the level before and after. The initial layout of the nodes on the disk is random.
- One dimensional geometric distance graph: Consider the nodes as points uniformly spread on a straight line. In this graph, the probability of an edge existing between two nodes is exponentially decreasing with the distance between the corresponding points. The base of the exponent is α
- Grid graph: It consists of a ⌊ n ⌋ x ⌈ n ⌉ grid, where each edge exists with a probability p.
- Web graph translator: This is a translator for the web-crawl based graph from Stanford WebBase Project
- MM BFS worst graph: This graph causes the (randomized pre-processing variant) of MM_BFS ([MM_BFS]) to incur it's worst case of Θ(n sqrt{log(n)/B} + sort(n)) I/Os.
- Line graphs: A line graph consists of n nodes and n-1 edges such that there exists two nodes u and v, with the path from u to v consisting of all the n-1 edges. We took three different initial layouts - simple, in which all blocks consists of B consecutively lined nodes; B-interleaved in which the consecutive nodes are all in different but consecutive blocks; random in which the arrangement of nodes on disk is given by a random permutation.
The command-line options for this tool are:
| -a, --alpha |
One of the parameters for generating 1-dimensional geometric graphs |
| -e, --edges |
Specify the number of edges for the graph to be generated. For line_graphs, grid _graphs and some other graph classes, the number of edges are directly computed from the number of nodes and this parameter becomes dysfunctional. If the number of edges are greater than 231, please use -f instead. |
| -f, --log_edges |
Specify the (base 2) log of number of edges. It has same restrictions as -e, --e dges. For large number of edges, it's recommended to use this parameter. |
| -h, --help |
Displays this usage guide and exit succesfully. No other output is generated. |
| -l, --log_nodes |
Specify the (base 2) log of number of nodes. For large number of nodes, it's recommended to use this parameter. |
| -n, --nodes |
Specify the number of nodes for the graph to be generated. If the number of nodes are greater than 231, please use -l instead. |
| -p, --p_grid |
A grid graph is a X_nodes x Y_nodes grid graph in which each edge in the grid exists with a probability p_grid. |
| -t, --type |
Specify the graph type. Valid graph types are "random", "grid", "geometric", "mr_worst", "mm_worst", "real_web", "spider_web", "simple_line", "random_line" and "worst_line". |
| -u, --noU |
One of the parameters for generating MM BFS worst graphs |
| -w, --web_graph |
Specify the path of the web-graph. Only required if you are using this generator as a DIMACS format translator of web-crawl graph. |
| -x, --X_nodes |
A grid graph is a X_nodes x Y_nodes grid graph in which each edge in the grid exists with a probability p_grid. |
| -y, --Y_nodes |
A grid graph is a X_nodes x Y_nodes grid graph in which each edge in the grid exists with a probability p_grid. |
| --mu |
One of the parameters for generating MM BFS worst graphs |