I guess the only other thing I would say is use consistency. If you're going to have professional voice work done, go all out. Have it done everywhere. Not just half here, half there, text here, no voice here, etc. make it consistent.
The hard part is programming for trigger mechanics in an MMO though. Since a lot of content is usually repeatable, same goes for that audio. Plus when you're grouped, then you need to add skipping features for the party who've already watched and heard that cutscene before, etc.
No easy task.