Wikipedia had these explanations to offer from the experts:
On May 16, 2018, a report in The New York Times noted a spectrogram analysis confirmed how the extra sounds for "yanny" can be graphed in the mixed re-recording.[3][18] The sounds also were simulated by combining syllables of the same Vocabulary.com voice saying the words "Yangtze" and "uncanny" as a mash-up of sounds which gave a similar spectrogram as the extra sounds graphed in the laurel re-recording.[3]
Benjamin Munson, a professor of audiology at the University of Minnesota, suggested that "Yanny" can be heard in higher frequencies while "Laurel" can be heard in lower frequencies.[1] Older people, whose ability to hear higher frequencies is more likely to have degraded, usually hear "Laurel". Kevin Franck, the director of audiology at the Boston hospital Massachusetts Eye and Ear says that the clip exists on a "perceptual boundary" and compared it to the Necker Cube illusion.[19] Professor David Alais from the University of Sydney's school of psychology also compared the clip to the Necker Cube or the face/vase illusion, calling it a "perceptually ambiguous stimulus".[15]
Brad Story, a professor of speech, language, and audiology at the University of Arizona said that the low quality of the recording creates ambiguity.[20] Dr. Hans Rutger Bosker, psycholinguist and phonetician at the Max Planck Institute for Psycholinguistics, showed that it is possible to make the same person hear the same audio clip differently by presenting it in different acoustic contexts: if one hears the ambiguous audio clip after a lead-in sentence without any high frequencies (>1000 Hz), this makes the higher frequencies in the following ambiguous audio clip stand out more, making people report "Yanny" where they previously maybe heard "Laurel".[21]