I can only think that the sampler is processor intensive - some of them can be very heavy on resources. Running multiple versions could easily be the problem. Does the overhead increase much when you add each instance?

ROG.