Artificial project data

This webpage gives an overview of the use of artificial project data for research and shows the contributions of the OR&S group in both the development of artificial project data generators as well as the creation of datasets that can be shared among researchers for use in their research.

For a summary of all artificial data, visit the summary data page.

1 Network Generators

In this section, you can download our two network generators to construct project networks under a controlled design. Both generators rely on the same and efficient principle proposed by Demeulemeester, Vanhoucke and Herroelen (2003). Each generator has two types of input parameters: parameters measuring the topological structure of a network and resource related parameters. The choice between RanGen1 and RanGen2 is completely determined by the input parameters measuring the topological structure of a network.

RanGen (1 and 2)
RanGen tutorial

A short overview is given along the following lines:

RanGen 1. The RanGen1 network generator has been proposed in the Journal of Scheduling to generate networks for various project scheduling problems. The generator relies on a very efficient generation process to generate networks with a pre-specified value for the order strength and the complexity index in a very small amount of CPU time. The reference is Demeulemeester, E., Vanhoucke, M. and Herroelen, W., 2003, "A random network generator for activity-on-the-node networks", Journal of Scheduling, 6, 13-34.
RanGen 2. The generation principle of RanGen1 has been extended to the "RanGen2" network generator, which makes use of six topological measures to describe the structure of a network. In this section, you can download the two-dimensional scatterplots described in the 'computational results' section of the paper mentioned below. The dataset used in this paper is the RG30 dataset that can be downloaded elsewhere on this webpage.
Beyond RanGen. Students who use RanGen to generate project data are here on the right place. RanGen is indeed an ideal tool to generate project data. The output of RanGen is given in the well-known Patterson format, which is often not known by the RanGen users. Therefore, we give more information here. Students who wish to do more than just generating project data better use P2 Engine, which includes the RanGen data generator, and much more. Note that RanGen runs only on Windows. In case you want to generate networks using Mac or Linux, you better use P2 Engine. The software tool ProTrack also generates random networks, and runs on Windows.

Reference: When using the RanGen1 and/or RanGen2 generators, please, make a reference to the following papers:

RanGen1: Demeulemeester, E., Vanhoucke, M. and Herroelen, W., 2003, "RanGen: A random network generator for activity-on-the-node networks", Journal of Scheduling, 6(1), 17–38 (doi:10.1023/a:1022283403119).
RanGen2: Vanhoucke, M., Coelho, J., Debels, D., Maenhout, B. and Tavares, L.V., 2008, "An evaluation of the adequacy of project network generators with systematically sampled networks", European Journal of Operational Research, 187(2), 511–524 (doi:10.1016/j.ejor.2007.03.032).

2 Datasets

In this section, a summary is given of some of the important and existing datasets used in project scheduling. Many of these datasets have been generated by the OR&S group in the well-known Patterson format and are available for download using the links below. In case the dataset has been generated by another research group, a link to this research group is provided. A summary of these datasets and more information on the generation process is written in a paper that is published in the journal of modern project management.

Reference: When you make use of the information on this website (e.g. you download the MS Excel table), please, make a reference to the following paper

Paper: Vanhoucke, M., Coelho, J. and Batselier, J., 2016, "An overview of project data for integrated project management and control", Journal of Modern Project Management, 3(2), 6–21.
MS Excel file: The MS Excel file with detailed calculations for all parameters is also available, and gives a summare of each seperate dataset.
Rather than downloading each dataset separately, you should download everything at once (375 MB) by clicking on the download button below.

2.1 RCPSP

The Patterson dataset has played an important role in testing algorithms for the resource-constrained project scheduling problem, and despite the fact that it has been shown that the 120 project instances are now too easy for the current algorithms, the format is still used in the RanGen generators described above. Nowadays, the majority of the research on the RCPSP has been tested on the well-known PSPLIB testset containing four subsets J30, J60, J90 and J120, with Jx the dataset to refer to projects with x activities. Additionally, a new set RG300 has been proposed by Debeld and Vanhoucke (2007) that contains 480 instances with 300 activities each.

Dataset	Subsets	# Instances	Reference	Generator	Parameters	OR&S
RG300	RG300	480	Debels and Vanhoucke (2007)	RanGen1	OS, RU, RC	Yes
RG30	Set 1, Set 2, Set 3, Set 4, Set 5	1,800	Vanhoucke et al. (2008)	RanGen 2	I2, I3, I4, I5, I6	Yes
PSPLIB	J30, J90, J90, J120	2,040	Kolisch and Sprecher (1996)	ProGen	CNC, RF, RS	No
Patterson	-	110	Patterson (1984)	-	-	No

2.2 RCPSPDC

OR&S has presented two datasets for the well-known resource-constrained project scheduling problem with discounted cash flows. In a first set, data is generated with RanGen1 with 10, 20, 30, 40 and 50 activities, and has been used to solve the problems to optimality in the paper "On maximizing the net present value of a project under renewable resource constraints” (Management Science, 2001). It is adviced to use this benchmard set for solving project scheduling with discounted cash flows using exact algorithms, and it is referred to as set DC2. A second set has been used for heuristically solving the problem with discounted cash flows (set DC1) and contains instances with 25, 50, 75 and 100 activities, proposed in the paper "A scatter search heuristic for maximising the net present value of a resource-constrained project with fixed activity cash flows" (International Journal of Production Research, 2010).

Dataset	Subsets	# Instances	Reference	Generator	Parameters	OR&S
DC1	mv	1,800	Vanhoucke and Demeulemeester (2001)	ProGen/Max	OS, RF, RS	Yes
DC2	npv25, npv50, npv75, npv100	720	Vanhoucke (2010)	RanGen1	OS, RU, RC	Yes

The datasets only contain network and resource information for each instance. in order to use the for solving the RCPSPDC, additional data is required such as activity cash flows and project deadlines. More information can be found at the RCPSP webpage. Moreover, these sets are used to extend the RCPSP-DC with other payment models, and have resulted in other papers using extended cash flow models, for which information is also provided.

2.3 MMRCPSP

The most well-known library used for testing algorithms to solve the multi-mode resource-constrained project scheduling problem is the multi-mode version of the PSPLIB that contains instances with 10, 12, 14, 16, 18, 20 and 30 activities. However, in a recent paper by Van Peteghem and Vanhoucke (2014), it has been shown that the multi-mode PSPLIB suffer from a number of shortcomings. Therefore, three new sets have been proposed to solve the multi-mode RCPSP, known as sets MMLIB50, MMLIB100 and MMLIB+. Sets MMLIB50 and MMLIB100 with each 540 instances containing 50 and 100 activities and 3 modes per activity, respectively, and set MMLIB+ with 3,240 instances containing 50 and 100 activities and 3, 6 or 9 modes per activity.

Dataset	Subsets	# Instances	Reference	Generator	Parameters	OR&S
MMLIB	MMLIB50, MMLIB100, MMLIB+	4,320	Van Peteghem and Vanhoucke (2014)	RanGen1	OS, RF, RS	Yes
PSPLIB	J10, J12, J14, J16, J18, J20, J30	3,840	Kolisch and Sprecher (1996)	ProGen	CNC, RF, RS	No
Boctor	boct50, boct100	360	Boctor (1993)	?	?	No

2.3 MP

Four sets have been generated using the RanGen2 generator that do not contain resource data, but are instead generated under a wide set of values for the topological structure. More precisely, the network structure has been varied using network topology indicators such as SP, AD, LA and TF that have originally been defined in Vanhoucke et al. (2008) and redefined in Vanhoucke (2010). Set 1 constains 900 instances, set 2 contains 800 instances and sets 3 and 4 each contain 1,200 instances. Each instance has 30 activities.

Dataset	Subsets	# Instances	Reference	Generator	Parameters	OR&S
MT	Set 1, Set 2, Set 3, Set 4	4,100	Vanhoucke (2010)	RanGen2	SP, AD, LA, TF	Yes

These four sets have been used in research on Schedule Risk Analysis (SRA) research (e.g. Vanhoucke (2010) and Elshaer (2013)) and Earned Value Management (EVM) research (Vanhoucke (2011), Colin and Vanhoucke (2014), Wauters and Vanhoucke (2014) and many others) where it has been shown that the network topology is one of the main drivers of accuracy of SRA and EVM forecasts. An overview can be found in the books "Measuring Time" and "Integrated Project Management and Control".

2.4 Other datasets

On the summary page, more datasets have been proposed, including the hard CV set, datasets for project portfolio management and data for an extension of the RCPSP, known as the RCPSP-AS.

2.5 Resource data (to create new instances): NetRes = MT + ResSet

This new set is not discussed the summary paper "An overview of project data for integrated project management and control" and has only be added in a later paper discussed here. This set contains only resource data (and is therefore called ResSet), and should be merged with the MT set (that contains only project networks, but no resources) to create new project instances. The new instances are then assembled in the dataset called NetRes. Therefore, you should visit the upload system first and download the "SolutionUpdater" to merge the ResGen files (resource data) with the MT files (project networks). Hence, the resource files do not contain project networks data, but only contain resource data for 30 activities. The set consists of 4 subsets, and each subset contains only one single file.

Dataset	Subsets	# Instances	Reference	Generator	Parameters	OR&S
ResSet	Set 1. Basic R4	600	W orking paper	RanGen2	NR, RU, RC	Yes
	Set 2. R4 with extended RC	1,800			NR, RU, RC	Yes
	Set 3. R10 with extended RU	900			NR, RU, RC	Yes
	Set 4. R10 with extended RC	1,800			NR, RU, RC	Yes

Abbreviations

The network topology and resource parameters used above are abbreviated.

Network topology metrics:

CNC: Coefficient of Network Complexity
OS: Order Strength
SP: Serial/Parallel indicator (also equal to I2)
AD: Activity Distribution indicator (also equal to I3)
LA: Length of Arcs indicator (also equal to I4)
I5: Length of Long Arcs indicator (while the I4 is equal to the length of short arcs)
TF: Topological Float indicator (also equal to I6)

Resource metrics:

RF: Resource Factor
RS: Resource Strength
RU: Resource Use
RC: Resource Constrainedness

More information on these parameters can be found in Vanhoucke et al. (2003, 2008) and in an overview paper that provides a summary of all this data (explained in the movie below).

Movie

Random Network Generation