# Artificial project data

This webpage gives an overview of the use of artificial project data for research and shows the contributions of the OR&S group in both the development of artificial project data generators as well as the creation of datasets that can be shared among researchers for use in their research.

For a summary of all artificial data, visit the summary data page.

## 1 Network Generators

A short overview is given along the following lines:

**RanGen 1.**The RanGen1 network generator has been proposed in the Journal of Scheduling to generate networks for various project scheduling problems. The generator relies on a very efficient generation process to generate networks with a pre-specified value for the order strength and the complexity index in a very small amount of CPU time. The reference is Demeulemeester, E., Vanhoucke, M. and Herroelen, W., 2003, "A random network generator for activity-on-the-node networks", Journal of Scheduling, 6, 13-34.**RanGen 2.**The generation principle of RanGen1 has been extended to the "RanGen2" network generator, which makes use of six topological measures to describe the structure of a network. In this section, you can download the two-dimensional scatterplots described in the 'computational results' section of the paper mentioned below. The dataset used in this paper is the RG30 dataset that can be downloaded elsewhere on this webpage.**Beyond RanGen.**Students who use RanGen to generate project data are here on the right place. RanGen is indeed an ideal tool to generate project data. The output of RanGen is given in the well-known Patterson format, which is often not known by the RanGen users. Therefore, we give more information here. Students who wish to do more than just generating project data better use P2 Engine, which includes the RanGen data generator, and much more. Note that RanGen runs only on Windows. In case you want to generate networks using Mac or Linux, you better use P2 Engine. The software tool ProTrack also generates random networks, and runs on Windows.

__ Reference:__ When using the RanGen1 and/or RanGen2 generators, please, make a reference to the following papers:

- RanGen1: Demeulemeester, E., Vanhoucke, M. and Herroelen, W., 2003, "RanGen: A random network generator for activity-on-the-node networks", Journal of Scheduling, 6(1), 17–38 (doi:10.1023/a:1022283403119).
- RanGen2: Vanhoucke, M., Coelho, J., Debels, D., Maenhout, B. and Tavares, L.V., 2008, "An evaluation of the adequacy of project network generators with systematically sampled networks", European Journal of Operational Research, 187(2), 511–524 (doi:10.1016/j.ejor.2007.03.032).

## 2 Datasets

In this section, a summary is given of some of the important and existing datasets used in project scheduling. Many of these datasets have been generated by the OR&S group in the well-known Patterson format and are available for download using the links below. In case the dataset has been generated by another research group, a link to this research group is provided. A summary of these datasets and more information on the generation process is written in a paper that is published in the journal of modern project management.

** Reference:** When you make use of the information on this website (e.g. you download the MS Excel table), please, make a reference to the following paper

- Paper: Vanhoucke, M., Coelho, J. and Batselier, J., 2016, "An overview of project data for integrated project management and control", Journal of Modern Project Management, 3(2), 6–21.
- MS Excel file: The MS Excel file with detailed calculations for all parameters is also available, and gives a summare of each seperate dataset.
- Rather than downloading each dataset separately, you should download everything at once (375 MB) by clicking on the download button below.

### 2.1 RCPSP

The __Patterson__ dataset has played an important role in testing algorithms for the resource-constrained project scheduling problem, and despite the fact that it has been shown that the 120 project instances are now too easy for the current algorithms, the format is still used in the RanGen generators described above. Nowadays, the majority of the research on the RCPSP has been tested on the well-known __PSPLIB__ testset containing four subsets J30, J60, J90 and J120, with Jx the dataset to refer to projects with x activities. Additionally, a new set __RG300__ has been proposed by Debeld and Vanhoucke (2007) that contains 480 instances with 300 activities each.

Dataset | Subsets | # Instances | Reference | Generator | Parameters | OR&S |
---|---|---|---|---|---|---|

RG300 | RG300 | 480 | Debels and Vanhoucke (2007) | RanGen1 | OS, RU, RC | Yes |

RG30 | Set 1, Set 2, Set 3, Set 4, Set 5 | 1,800 | Vanhoucke et al. (2008) | RanGen 2 | I2, I3, I4, I5, I6 | Yes |

PSPLIB | J30, J90, J90, J120 | 2,040 | Kolisch and Sprecher (1996) | ProGen | CNC, RF, RS | No |

Patterson | - | 110 | Patterson (1984) | - | - | No |

### 2.2 RCPSPDC

OR&S has presented two datasets for the well-known resource-constrained project scheduling problem with discounted cash flows. In a first set, data is generated with RanGen1 with 10, 20, 30, 40 and 50 activities, and has been used to solve the problems to optimality in the paper "On maximizing the net present value of a project under renewable resource constraints” (Management Science, 2001). It is adviced to use this benchmard set for solving project scheduling with discounted cash flows using exact algorithms, and it is referred to as set __DC2__. A second set has been used for heuristically solving the problem with discounted cash flows (set __DC1__) and contains instances with 25, 50, 75 and 100 activities, proposed in the paper "A scatter search heuristic for maximising the net present value of a resource-constrained project with fixed activity cash flows" (International Journal of Production Research, 2010).

Dataset | Subsets | # Instances | Reference | Generator | Parameters | OR&S |
---|---|---|---|---|---|---|

DC1 | mv | 1,800 | Vanhoucke and Demeulemeester (2001) | ProGen/Max | OS, RF, RS | Yes |

DC2 | npv25, npv50, npv75, npv100 | 720 | Vanhoucke (2010) | RanGen1 | OS, RU, RC | Yes |

The datasets only contain network and resource information for each instance. in order to use the for solving the RCPSPDC, additional data is required such as activity cash flows and project deadlines. More information can be found at the RCPSP webpage. Moreover, these sets are used to extend the RCPSP-DC with other payment models, and have resulted in other papers using extended cash flow models, for which information is also provided.

### 2.3 MMRCPSP

The most well-known library used for testing algorithms to solve the multi-mode resource-constrained project scheduling problem is the multi-mode version of the PSPLIB that contains instances with 10, 12, 14, 16, 18, 20 and 30 activities. However, in a recent paper by Van Peteghem and Vanhoucke (2014), it has been shown that the multi-mode PSPLIB suffer from a number of shortcomings. Therefore, three new sets have been proposed to solve the multi-mode RCPSP, known as sets MMLIB50, MMLIB100 and MMLIB+. Sets __MMLIB50__ and __MMLIB100__ with each 540 instances containing 50 and 100 activities and 3 modes per activity, respectively, and set __MMLIB+__ with 3,240 instances containing 50 and 100 activities and 3, 6 or 9 modes per activity.

Dataset | Subsets | # Instances | Reference | Generator | Parameters | OR&S |
---|---|---|---|---|---|---|

MMLIB | MMLIB50, MMLIB100, MMLIB+ | 4,320 | Van Peteghem and Vanhoucke (2014) | RanGen1 | OS, RF, RS | Yes |

PSPLIB | J10, J12, J14, J16, J18, J20, J30 | 3,840 | Kolisch and Sprecher (1996) | ProGen | CNC, RF, RS | No |

Boctor | boct50, boct100 | 360 | Boctor (1993) | ? | ? | No |

### 2.3 MP

Four sets have been generated using the RanGen2 generator that do not contain resource data, but are instead generated under a wide set of values for the topological structure. More precisely, the network structure has been varied using network topology indicators such as SP, AD, LA and TF that have originally been defined in Vanhoucke et al. (2008) and redefined in Vanhoucke (2010). Set 1 constains 900 instances, set 2 contains 800 instances and sets 3 and 4 each contain 1,200 instances. Each instance has 30 activities.

Dataset | Subsets | # Instances | Reference | Generator | Parameters | OR&S |
---|---|---|---|---|---|---|

MT | Set 1, Set 2, Set 3, Set 4 | 4,100 | Vanhoucke (2010) | RanGen2 | SP, AD, LA, TF | Yes |

### 2.4 Other datasets

On the summary page, more datasets have been proposed, including the hard CV set, datasets for project portfolio management and data for an extension of the RCPSP, known as the RCPSP-AS.

### 2.5 Resource data (to create new instances): NetRes = MT + ResSet

This new set is not discussed the summary paper "An overview of project data for integrated project management and control" and has only be added in a later paper discussed here. This set contains only resource data (and is therefore called ResSet), and should be merged with the MT set (that contains only project networks, but no resources) to create new project instances. The new instances are then assembled in the dataset called NetRes. Therefore, you should visit the upload system first and download the "**SolutionUpdater**" to merge the ResGen files (resource data) with the MT files (project networks). Hence, the resource files do not contain project networks data, but only contain resource data for 30 activities. The set consists of 4 subsets, and each subset contains only one single file.

Dataset | Subsets | # Instances | Reference | Generator | Parameters | OR&S |
---|---|---|---|---|---|---|

ResSet | Set 1. Basic R4 | 600 | Working paper | RanGen2 | NR, RU, RC | Yes |

Set 2. R4 with extended RC | 1,800 | NR, RU, RC |
Yes |
|||

Set 3. R10 with extended RU | 900 | NR, RU, RC |
Yes |
|||

Set 4. R10 with extended RC | 1,800 | NR, RU, RC |
Yes |

**Abbreviations**

The network topology and resource parameters used above are abbreviated.

Network topology metrics:

- CNC: Coefficient of Network Complexity
- OS: Order Strength
- SP: Serial/Parallel indicator (also equal to I2)
- AD: Activity Distribution indicator (also equal to I3)
- LA: Length of Arcs indicator (also equal to I4)
- I5: Length of Long Arcs indicator (while the I4 is equal to the length of short arcs)
- TF: Topological Float indicator (also equal to I6)

Resource metrics:

- RF: Resource Factor
- RS: Resource Strength
- RU: Resource Use
- RC: Resource Constrainedness

More information on these parameters can be found in Vanhoucke et al. (2003, 2008) and in an overview paper that provides a summary of all this data (explained in the movie below).

**Movie**