Brilliant Art Dudley's article

Actually JJ is a strong proponent of & has been continually calling for the use of internal positive & negative controls in the blind tests that are not following the ITU/MUSHRA standard i.e the majority of blind tests run in audio
 
Thing is, every test has to show the limitations before any useful conclusions can be drawn.
Yes, absolutely agree but I've yet to meet any of those who strongly propose these blind tests (except JJ) actually including such internal controls. As you say without them no useful conclusions can or should be drawn from such tests, no matter how much hand-waving, talk of science, etc. is engaged in.
 
The argument that ITU/MUSHRA testing is the only true criteria for the evaluation of audio devices seems to lead to the logical conclusion that any audio devices not backed up by such tests are dubious & any manufacturers not willing to invest the resources into such tests are again questionable
Then why did you posit it???
I agree that it's foolish to throw out everything between Fletcher-Munson in 1933 through to Amirs results in 2014, because they were non-ITU/MUSHRA, so I'm surprised you stated:
The case for well controlled DBTs using ITU or MUSHRA standards seems to be well established but I don't see many tests of this standard appearing in audio. The reason for these standards is to attempt to control the known biases that possibly influence the results. As a result one has to assume the vast majority of blind tests are probably biased & flawed.
What is your actual position, as of 10/30/2014???

cheers,

AJ
 
I believe this went sailing over the heads of those who strongly advocate blind testing.
No need to engage in this type of insult. The mods have been very lenient here and the conversation has remained cordial and informative to most, I would assume.
I doubt this medical term "flew over the head" of Amir when he took and (you both) touted that online unsupervised computer files blind test, without a shred of "internal control".
In audio science, it's referred to as a hidden reference/anchor. As used in MUSHRA etc.
It simply adds robustness to the test. Feel free to suggest it to Amir if you feel he missed it.

cheers,

AJ
 
Then why did you posit it???
Huh? Am I positing it? I didn't think so. I' was attempting to show the logical conclusion of the position i.e everyone should buy Harmon speakers & all other speaker manufacturers (who don't run valid DBTs) should be shunned. Mind you I would question the validity of Harmon's tests but use them just as an example of where the logic leads you. Same applies to other audio device manufacturers (not just high-end) they should be boycotted if they don't run DBTs. Is this your position? If not, why not?
I agree that it's foolish to throw out everything between Fletcher-Munson in 1933 through to Amirs results in 2014, because they were non-ITU/MUSHRA, so I'm surprised you stated:

What is your actual position, as of 10/30/2014???

cheers,

AJ

I'm not sure I understand what that part of your post is about but my position is clear, I hope? Most blind tests conducted in audio are not of a sufficiently high enough standard to be deemed reliable so why the big deal demanding such flawed blind tests?

Conducting rigorous blind tests is impractical & too expensive for most so let's be pragmatic & deal with reality - what we have are flawed tests.

The question then becomes which flawed tests do you prefer - I prefer listening tests for the reasons I stated - I can repeat them in many different systems, at many different times, etc. Being able to do this gives me a better handle on the sonic nature of a particular device.

In my experience the time involved in setting up & administering blind tests determines that this variety of listening will not be entered into. I have no problem with anyone doing blind tests (as long as they know they are most likely flawed) & I certainly wouldn't demand that they use a sighted test to "prove" to me that their listening "results" were valid. Yet, I see the reverse of this happening all the time on forums - people demanding of others, a blind test to "prove" what they report hearing. It just doesn't make sense to me!
 
Blind trials are used for drug testing for a specific reason.
Yes, to eliminate confounders associated with humans. Ditto for blind audio tests for cel phone intelligibility, like the one in your pocket. Ditto for blind audio tests of TV audio, like the one in your living room. Ditto for blind audio tests of Orchestra members, which has resulted in more female performers, as you may have noticed the last time you attended a symphony.
But perhaps the best reason why blind audio tests have worked since Fletcher-Munson in the 1930s, is that you can see examples of that entire 80+ year foundation of audio science/blind tests today, in the form of how Harman has coalesced it into predicting preference. A preference for smooth on axis (low FR deviation as determined by blind audio tests) coupled with smooth off axis (low FR deviation as determined by blind audio tests). Here is a nice example in a non-Harman product, the Magico S5:
attachment.php


Smooth on/off axis is preferred by most (there will always be outliers, just like in Dur tests, wine tests, etc, etc). Exactly as predicted by the KEF-Eureka/NRC-Athena/Toole/olive>Harman blind audio tests, built from the foundations laid by Fletcher-Munson all the way back in the 1930s.
Their is no question that blind audio tests work, whether one is, or is not in denial of these facts. That's the beauty of science.:)

cheers,

AJ
 

Attachments

  • fr_on1530.gif
    fr_on1530.gif
    34.5 KB · Views: 25
AJ I'm speaking to you not JJ. Nor are we talking about cell phones, etc.
Hi Myles,
Yes, I'm aware you're speaking to me.:)
We are talking about blind audio tests, so yes we are talking about cel phones too. Perhaps because your focus is on high end audio, you are unaware that things like cel phone audio SQ/ineligibility are subject to....blind tests.

You are the one who says you carry out blind testing. Specifically what standards are being used in your tests to establish the validity and sensitivity of your tests.
That isn't an important question in terms of the discussion. The only use would be to try to red herring the argument, ad hominem or possibly straw man the argument.
The science behind blind audio testing doesn't change based on what blind tests AJ has personally performed, or their sensitivity/validity.

But I'll bite anyway.:cool:
The type of "blind" test I performed, was on show attendees, who may or may not have heard tube/ss amplification and may/may not have heard very expensive cabling, etc. is the system they were viewing/hearing.
Hopefully not being aware there was any form of test, might have alleviated that pesky sameness/deafness inducing "tension", that can plague the aware-of-blind-test variety.
More for curiosity sake than anything to submit to AES.

The real problem is that these tests are set up by engineers who are simply clueless when it comes to biology. And reductionism does not work when it comes to biological systems unlike what the engineering books tell them.

An even greater problem would be if folks didn't understand that Toole, Greisinger, JJ et al have a much firmer grasp of physiology and human ear>brain interaction than assumed by casual observation.
But of course, cel phones, TVs, VOIP and things like PSR (jj), LARES (Greisinger), Revel Salons/Magico S5s, seem to work just fine based on this Physiology knowledge.

cheers,

AJ
 
A lot of time is being wasted on this thread that should be used for listening to music.

This is beginning to look a lot like the thread Amir's initiated on WBF that is at ~1475 posts and not one individual had changed their position. Like politics in the US.
 
No need to engage in this type of insult. The mods have been very lenient here and the conversation has remained cordial and informative to most, I would assume.
Sorry, if I insulted you & I apologise. I was trying to make the point that in my experience the concept of internal controls generally goes over the head of (or is ignored by) DBT advocates. I have seldom, if ever seen these internal controls used in DBTs.

I doubt this medical term "flew over the head" of Amir when he took and (you both) touted that online unsupervised computer files blind test, without a shred of "internal control".
Some things are inconsistent here. When this exact same test with the exact same files stood for 15 years without positive results, nobody was calling for controls. Now that some positive results have been returned there is a call for them. So what are we to conclude? The test was flawed for 15 years yet it's negative results often cited to prove high-res is not differentiable from RB audio? Negative results were not questioned. Does this not show, as do other examples of blind tests, that there is no real desire to make the test valid by using internal controls that verify & calibrate it's results? This test design with lack of internal controls is a well known bias called "experimenter's bias", designing a test which will likely give the expected outcome.


In audio science, it's referred to as a hidden reference/anchor. As used in MUSHRA etc.
It simply adds robustness to the test. Feel free to suggest it to Amir if you feel he missed it.

cheers,

AJ
I don't think robustness is the correct term as it's a sliding scale & this isn't. Internal controls verifies that the test/participants are capable of audibly differentiating what is under test - it should test for false negatives & false positives. The likelihood is false negatives for blind tests, false positives for sighted tests. For blind tests the controls are geared towards verifying that the whole test is capable of differentiating between known audible differences. Without it the test results are unqualified & unusable. I liken it to calibrating a measuring device - without calibration we have no way of trusting the measurements. What is being done in blind tests is a measurement - a measurement of the perception of those involved in hearing an audible difference between devices. There are many obstacles that can interfere with this measurement & short of going full hog & using ITU/MUSHRA standard tests, I believe these internal controls (yes, the HRA part of MUSHRA) need to be used by anybody who wants their blind test to be considered more trustworthy than a sighted test. I haven't seen this happen, have you? Do you use these controls?
 
Huh? Am I positing it?
Unambiguously:
The case for well controlled DBTs using ITU or MUSHRA standards seems to be well established but I don't see many tests of this standard appearing in audio. The reason for these standards is to attempt to control the known biases that possibly influence the results. As a result one has to assume the vast majority of blind tests are probably biased & flawed.
Then you said:
I'm not sure if you are a frequenter of WB forum but Amir's thread showed his positive ABX results for ArnyK's High res Vs RB files; for Winer's A/D D/A loopback tests (both of which stood for 15 years or so as always providing negative results) caused the reaction you predict but this time from those that swear by blind tests - suggestions of cheating, of IMD, of bad resampling software, etc. None of which was ever proven to be the reason for the positive results - so I guess nobody is immune to what you say?
Why would you not assume this as a biased and flawed non-ITU/MUSHRA test?
Why would anyone need "prove" anything other than these positive results being the result of a "biased and flawed" methodology (your assertion)??

Most blind tests conducted in audio are not of a sufficiently high enough standard to be deemed reliable
Based on what evidence? Where is your evidence to support this assertion? Where is your evidence that non-ITU/MUSHRA blind test X has been rendered flawed/biased by results showing the opposite, using method Y, be it Triangle (not "Triangulation";)), ANOVA, MUSHRA, etc?
When exactly the opposite is reality, when Fletcher-Munson et al are re-run....

cheers,

AJ
 
Sorry, if I insulted you & I apologise.
No need, it's impossible to insult me.:lol:
if you met me you would understand. I meant simply in the general sense, of those doing amateur public tests, like Amir, Arny, etc.

I was trying to make the point that in my experience the concept of internal controls generally goes over the head of (or is ignored by) DBT advocates. I have seldom, if ever seen these internal controls used in DBTs.
I think that is presumptuous, but also extraneous to the Dudley article, which throws out the baby with the bathwater, i.e. blind controlled testing are inapplicable to "audio", which is really a synonym for "high end audio".

Some things are inconsistent here. When this exact same test with the exact same files stood for 15 years without positive results, nobody was calling for controls. Now that some positive results have been returned there is a call for them. So what are we to conclude? The test was flawed for 15 years yet it's negative results often cited to prove high-res is not differentiable from RB audio? Negative results were not questioned. Does this not show, as do other examples of blind tests, that there is no real desire to make the test valid by using internal controls that verify & calibrate it's results? This test design with lack of internal controls is a well known bias called "experimenter's bias", designing a test which will likely give the expected outcome.
The conclusion is a flawed amateur gamable online test is just that. Flawed.
Hardy evidence against "Blind tests".

I don't think robustness is the correct term as it's a sliding scale & this isn't. Internal controls verifies that the test/participants are capable of audibly differentiating what is under test - it should test for false negatives & false positives. The likelihood is false negatives for blind tests, false positives for sighted tests. For blind tests the controls are geared towards verifying that the whole test is capable of differentiating between known audible differences. Without it the test results are unqualified & unusable. I liken it to calibrating a measuring device - without calibration we have no way of trusting the measurements. What is being done in blind tests is a measurement - a measurement of the perception of those involved in hearing an audible difference between devices. There are many obstacles that can interfere with this measurement & short of going full hog & using ITU/MUSHRA standard tests, I believe these internal controls (yes, the HRA part of MUSHRA) need to be used by anybody who wants their blind test to be considered more trustworthy than a sighted test. I haven't seen this happen, have you? Do you use these controls?
There is no question that controls/anchors increase the robustness/validity of the tests....but that does not in itself invaildate methods that do not include anchors.
If that were the case, you would have to throw out the results of the (non-anchor) blind tests that resulted in a large increse in female Orchetra members. You consider these hopelessly biased/flawed?
Is there evidence MUSHRA et al tests have upended loads of non-anchor tests? Where?

cheers,

AJ
 
Unambiguously:

Then you said:

Why would you not assume this as a biased and flawed non-ITU/MUSHRA test?
I guess I was imprecise in my post - I think this is getting into debating/nit-picking techniques rather than discussion. I didn't mean to exclude ABX with my phrase "non ITU/MUSHRA" so again, apologies for this confusion
Why would anyone need "prove" anything other than these positive results being the result of a "biased and flawed" methodology (your assertion)??
OK, you are pulling me up on my mistake - touche


Based on what evidence? Where is your evidence to support this assertion? Where is your evidence that non-ITU/MUSHRA blind test X has been rendered flawed/biased by results showing the opposite, using method Y, be it Triangle (not "Triangulation";)), ANOVA, MUSHRA, etc?
When exactly the opposite is reality, when Fletcher-Munson et al are re-run....

cheers,

AJ
Ah, I think this is descending now (readers probably thought this happened pages ago?) into a terribly boring nit-picking exercise so I'll restrict my participation
 
The conclusion is a flawed amateur gamable online test is just that. Flawed.
Hardy evidence against "Blind tests".

cheers,

AJ
Just want to ask one question about your post rather than get into the nit-picking downward spiral. You already had decided that ArnyK's ABX test was flawed during the last 15 years of it's existence? Can I ask what were your criteria in reaching this opinion?
 
BTW, this thread now reminds me more of a blind test in that we learn more about the participants than about the subject/device under discussion/test :)
 
Just want to ask one question about your post rather than get into the nit-picking downward spiral. You already had decided that ArnyK's ABX test was flawed during the last 15 years of it's existence? Can I ask what were your criteria in reaching this opinion?
Absolutely. One was a guy on AVS (MZillich iirc) who easily scored like Amir on the test...by detecting an audible glitch in the file, rather than anything in the music. The 2nd was a gent on HA who simply modified the filename in Fooobar and immediately scored 100%...without listening:).
I had a similar experience once when I took an online Klippel test. The bell curve had an odd spike at the top that didn't seem right. So I reran and realized I was listening to the wrong thing (the music). There was a barely audible "pop" glitch in one of the files. Instead of parading my "win"....I reported it to Klippel;).
Best to have the likes of Toole, Olive et al running these things. The online stuff like the Phillips GE test can be fun, for what they are.

Maybe that's the difference. I don't find blind testing/basic honesty controls in audio to be incompatible with "high end" audio. They are neither a threat to me or my product.
And as I've pointed out, regardless of one's views/beliefs on this, the world still turns, cel phone calls and VOIP and TV audio is still intelligible, orchestras get more gender diverse, Myles and I still like the same type of sounding speakers:P.....etc, etc.
We still end up with whatever we prefer.

cheers,

AJ
 
What I was asking was if you have doubted these non-proctored ABX tests during the last 15 years? If so why didn't you say anything?

I don't remember seeing anyone questioning the negative ABX results from the last 15 years so they were taken as valid & used as proof that high-res sounds no different to RB audio

Proctoring is not a hidden anchor & reference internal control of the type specified in ITU/MUSHRA that Myles & I are referring to - this is a procedural control for administering the test, same as specifying the quality of equipment to be used, (free of IMD with ultrasonics). The internal controls are designed to avoid false positive or false negative results. An example of the internal controls are the ultrasonic tones included later in the test files - to determine if your equipment creates IMD i.e. for testing false positives. But, of course, this also relies on honesty :)


I believe that you are on a very slippery slope here about the proctoring as the honesty of the proctor comes into question. The idea of a company using blind tests that show it's speakers are superior to others really becomes very questionable.

So can I check if I understand your position correctly? You are saying that all non-proctored ABX tests are questionable & should be ignored? This rules out most ABX tests run so far (BTW, I would like to see you take that line on Hydrogen Audio). You are saying that ITU/MUSHRA standards are the scientific blind testing standard but you will accept lower standards. I still don't know how you qualify these lower standard tests as reliable?

Sure, the world still turns - this is only a hobby & nothing to get exercised about.
 
What I was asking was if you have doubted these non-proctored ABX tests during the last 15 years? If so why didn't you say anything?
Umm, I just gave you an example above where I doubted and did say something! (to Klippel). Perhaps it failed to make worldwide headlines, but hey, I tried.:D

I don't remember seeing anyone questioning the negative ABX results from the last 15 years so they were taken as valid & used as proof that high-res sounds no different to RB audio.
We must be seeing different things, because I saw a lot of audiophiles question that sort of comparison, especially those that had been swooning over the glory of SACD/DVDA/"Hi Rez" media....which turned out be upsampled RB:lol:

Proctoring is not a hidden anchor & reference internal control of the type specified in ITU/MUSHRA that Myles & I are referring to - this is a procedural control for administering the test, same as specifying the quality of equipment to be used, (free of IMD with ultrasonics). The internal controls are designed to avoid false positive or false negative results. An example of the internal controls are the ultrasonic tones included later in the test files - to determine if your equipment creates IMD i.e. for testing false positives. But, of course, this also relies on honesty :)
You've lost me here. You were both talking about "Positive Controls". I've known what the "HRA" in MUSHRA stands for, for a long time. I also know what proctoring is, now that you've mentioned it.

I believe that you are on a very slippery slope here about the proctoring as the honesty of the proctor comes into question. The idea of a company using blind tests that show it's speakers are superior to others really becomes very questionable.
Fair enough, but the topic is blind tests/Dudleys thoughts, rather than a referendum on "Harmans practices".

So can I check if I understand your position correctly? You are saying that all non-proctored ABX tests are questionable & should be ignored?
Nope. I'm saying the results of non-proctored tests, where it has been clearly shown gaming and file corruption are factors, should be discarded as invalid.

This rules out most ABX tests run so far (BTW, I would like to see you take that line on Hydrogen Audio).
If those factors exist for "most ABX tests run so far". And the evidence to support this can be found where...?

You are saying that ITU/MUSHRA standards are the scientific blind testing standard but you will accept lower standards. I still don't know how you qualify these lower standard tests as reliable?
I'm saying ITU/MUSHRA is the most robust, but unless there is specific evidence to the contrary, the "lower standards" can also be valid. It really depends on what is being tested. Both, are far better than "I heard it, I said so", if you are concerned with sound>ears.;)
I'll accept the results of more women in orchestras also, despite the "lower standards" utilized.

Sure, the world still turns - this is only a hobby & nothing to get exercised about.
Exactly. There are those who pooh-pooh pharmaceutical blind testing also. I'll take my drugs, thanks.:P

I accept the reality that (the science of) blind tests apply to audio....and I accept the fact that I'm going to buy/listen to whatever I prefer, regardless.
I really don't see any need for fear, loathing or conflict between the two. YMMV.

cheers,

AJ
 
Your position is not logical to me but I'll not go any further in the discussion as I see it's become the usual forum debating style which really is of no interest to anyone.
 
What all this boils down to is all you folks got your stereo systems and you like them (at the moment perhaps! ha ha) and if we all sat and listened to your systems in the same room with some sort of correction so they were represented to some audio curve at the listening position, well, we would like some more than others.

So, the only fear from all this idea of blind testing (even if its not full of controls) is that we might find out that we like the sound of a system that costs a few thousand dollars over one that costs tens of thousands. And if you perceive value based on cost in audio, then you would be crushed. No matter, if you preferred that lower cost system, over time you wll end up with some other system anyway...and so it goes.

The real question is the point of diminishing returns, the point of different sound as opposed to "better" sound, and apparently some folks don't really know what they like in their "sound". For some, its cost as a reassuring value proposition, the more the better, for others its type of product, ss or tubes, or type of speaker, etc. But in any case, all these systems will sound different, and golden ears would perceive this, but different vs better certainly is a preference thing and blah blah.
 
Back
Top