It is extracted for one because most video does not interweave the structure of the audio and video so it's not about counting bits or multiplying. They are separate inside the file.
I can see that when I (yet again) asked Mike to explain
how the extraction is done that my tounge-in-cheek question was taken literally. So let me try again, this time being more obvious.
Does their software skip every 3rd byte in the source file only if you ask nicely, dance around the flag pole 4 times under a full moon in the rain while howling with lit candles in both hands and . . . . pooof, audio appears?
If you haven't noticed, I'm evidence-driven. Personal beliefs, heresay and opinions don't mean much to me.
Q. As for semantics, could it be that you are the one wrong about semantics/terminology and not RealNetworks?
A. Yes
Here is my evidence. But I am flexible. Get RealNetworks to change their "semantics" and I'll certainly change mine.