Being spoken word with a musical backdrop you need to focus on the recording and treatment of that voice.
De-essing is the 1st thing you really need to do - whether with a plug in, by manual level attenuation or by side chain compression.
You also need to consider using a gate or manual levels to remove some of the mouth noise.
Finally, the reverb/delay or whatever you are using is accentuating the two above issues.


Cheers
rayc
"What's so funny about peace, love & understanding?" - N.Lowe