If I recall correctly, the most common upsampling schema is to simply fill in the "???" in DCUNY's example with zeros.

Probably the biggest 'aha' moment that I had in EE-638 at Purdue, was in the Proakis and Manolakis text when it showed how analog signals are reconstructed from digitally sampled data, through the analog impulse response superposition of the individual analog impulse responses.

This kind of sounds like gobbledegook, I understand, but just think of it this way:

A Digital to Analog converter has a certain impulse response - that is, you hit is with a little spike of voltage - the actual digitized voltage, and it rings out in a certain way in analog world. Not unlike hitting a bell with a striker.

Ever so slightly later in time, you hit that filter again with a spike of voltage, and it rings again, but it's still ringing from the previous spike.

Repeat ad infinitum.

If you add up all of the ringings from getting hit by the spikes, in other words, superposing them, you get a smooth analog response of voltage which you send to your analog amp/speakers/etc.

I've long forgotten most of what I learned in the course (1995 time frame). The long and short of it as it pertains to this discussion: the data compression algorithms with online video obliterate most of the actual real concerns of recording in 44.1kHz vs. 48 kHz, and most DAW software can handle the up/down sampling without issue or heavy hitting on the CPU.