In theory, if something is learnable, AI will eventually find a method of learning it.
For the purposes of AI training, audio needs to be converted into something that computers can deal with. That's typically FFT, which then is split into frequency bands to better simulate human hearing.
If that's the case, getting "missing" audio data and filling in holes is a matter of extrapolation for the neural network.
Who can deny that David always brings intelligent perspectives to any AI discussion?
The way that I understand "if something is learnable, AI will eventually find a method of learning it" is that human areas of study can be broken down into "languages". Be they formal languages like English, French, Fortran, music and math. Or areas of study like chemistry, physics, biology or engineering. Because the info and data of these languages can be encoded within the architecture and parameters of a neural net, an AI can learn it. Indeed, we all carry neural nets in our heads, of the biologic kind.
It's funny you mention FFT, in "another life" I actually programmed an FFT analyzer from scratch in
Mathematica to process captured accelerometer data from a flight test I participated in over the Pacific Ocean at 37,000 ft. in a Boeing test aircraft. The final result was a set of broadband random vibration profiles that we could then run in our test lab . . . one of the most fun projects I ever participated in.
My idea for capturing the missing audio data (if even required) is to utilize a separate AI, perhaps one that uses more of an LLM approach. But so far, based on my very limited test cases, Studio One ver 7, doesn't seem to suffer with the "missing audio data phenomenon". But, who knows how well it will perform with music of higher complexity like that of
Snarky Puppy and similar.
Attached is one slide of the slide-deck I'm working on. One point of this slide is to compare the 13,002 parameters of my simple example to the 24+ million parameters of
Banquet to that of
ChatGPT-4 where I've seen reported values of 1.8 trillion.