The test problems used for the evaluation of the proposed algorithm was also used in 33, 35, 36 and is available at the Scheduling Research website (http://www.schedulingResearch.com). However, the structure of the benchmark problems is of two categories first is the processing times and second is the setup times. The two datasets are randomly generated using a discrete uniform distribution on the range of U50, 100 and U125, 175 respectively. The uniform distribution bounds for both processing and setup times determine the level of dominance. What this means is that when processing and setup times are balanced, they are both drawn from U50, 100. In the case when the processing times are dominant, then both the processing and setup times are drawn from U125, 175 and U50, 100. When setup times are dominant, then both processing and setup times are drawn from U50, 100 and U125, 175 respectively. The test problem components consist of jobs n and machines m, where n is in the range of 20, 40, 60, 80 and 120 jobs respectively, while m is in the range of 2, 4, 6, 8, 10 and 12 machines respective. Previous research such as the work presented by Rabadi et al. 34 and Arnaout et al. 36 generated 15 instances for each combination of machine number, job number, processing time distribution and setup time distribution, which gave a total of 6×6×3×15=1620 test problem instances. The same test instances were used to evaluate the performance of the FA in this paper. In the analysis of the different methods described in this paper, the firefly and other existing algorithms’ parameters are configured as shown in Table 1 (experimentation based on number of generations) and Table 2 (experimentation based on the number of fitness evaluation consumed). However, the parameter settings for some existing algorithms such as PH, TS and Meta-RaPS were not provided in the literature and therefore, was not covered in these tables.