Yes, absolutely agree but I've yet to meet any of those who strongly propose these blind tests (except JJ) actually including such internal controls. As you say without them no useful conclusions can or should be drawn from such tests, no matter how much hand-waving, talk of science, etc. is engaged in.Thing is, every test has to show the limitations before any useful conclusions can be drawn.
Then why did you posit it???The argument that ITU/MUSHRA testing is the only true criteria for the evaluation of audio devices seems to lead to the logical conclusion that any audio devices not backed up by such tests are dubious & any manufacturers not willing to invest the resources into such tests are again questionable
What is your actual position, as of 10/30/2014???The case for well controlled DBTs using ITU or MUSHRA standards seems to be well established but I don't see many tests of this standard appearing in audio. The reason for these standards is to attempt to control the known biases that possibly influence the results. As a result one has to assume the vast majority of blind tests are probably biased & flawed.
No need to engage in this type of insult. The mods have been very lenient here and the conversation has remained cordial and informative to most, I would assume.I believe this went sailing over the heads of those who strongly advocate blind testing.
Huh? Am I positing it? I didn't think so. I' was attempting to show the logical conclusion of the position i.e everyone should buy Harmon speakers & all other speaker manufacturers (who don't run valid DBTs) should be shunned. Mind you I would question the validity of Harmon's tests but use them just as an example of where the logic leads you. Same applies to other audio device manufacturers (not just high-end) they should be boycotted if they don't run DBTs. Is this your position? If not, why not?Then why did you posit it???
I agree that it's foolish to throw out everything between Fletcher-Munson in 1933 through to Amirs results in 2014, because they were non-ITU/MUSHRA, so I'm surprised you stated:
What is your actual position, as of 10/30/2014???
cheers,
AJ
Yes, to eliminate confounders associated with humans. Ditto for blind audio tests for cel phone intelligibility, like the one in your pocket. Ditto for blind audio tests of TV audio, like the one in your living room. Ditto for blind audio tests of Orchestra members, which has resulted in more female performers, as you may have noticed the last time you attended a symphony.Blind trials are used for drug testing for a specific reason.
Hi Myles,AJ I'm speaking to you not JJ. Nor are we talking about cell phones, etc.
That isn't an important question in terms of the discussion. The only use would be to try to red herring the argument, ad hominem or possibly straw man the argument.You are the one who says you carry out blind testing. Specifically what standards are being used in your tests to establish the validity and sensitivity of your tests.
The real problem is that these tests are set up by engineers who are simply clueless when it comes to biology. And reductionism does not work when it comes to biological systems unlike what the engineering books tell them.
Sorry, if I insulted you & I apologise. I was trying to make the point that in my experience the concept of internal controls generally goes over the head of (or is ignored by) DBT advocates. I have seldom, if ever seen these internal controls used in DBTs.No need to engage in this type of insult. The mods have been very lenient here and the conversation has remained cordial and informative to most, I would assume.
Some things are inconsistent here. When this exact same test with the exact same files stood for 15 years without positive results, nobody was calling for controls. Now that some positive results have been returned there is a call for them. So what are we to conclude? The test was flawed for 15 years yet it's negative results often cited to prove high-res is not differentiable from RB audio? Negative results were not questioned. Does this not show, as do other examples of blind tests, that there is no real desire to make the test valid by using internal controls that verify & calibrate it's results? This test design with lack of internal controls is a well known bias called "experimenter's bias", designing a test which will likely give the expected outcome.I doubt this medical term "flew over the head" of Amir when he took and (you both) touted that online unsupervised computer files blind test, without a shred of "internal control".
I don't think robustness is the correct term as it's a sliding scale & this isn't. Internal controls verifies that the test/participants are capable of audibly differentiating what is under test - it should test for false negatives & false positives. The likelihood is false negatives for blind tests, false positives for sighted tests. For blind tests the controls are geared towards verifying that the whole test is capable of differentiating between known audible differences. Without it the test results are unqualified & unusable. I liken it to calibrating a measuring device - without calibration we have no way of trusting the measurements. What is being done in blind tests is a measurement - a measurement of the perception of those involved in hearing an audible difference between devices. There are many obstacles that can interfere with this measurement & short of going full hog & using ITU/MUSHRA standard tests, I believe these internal controls (yes, the HRA part of MUSHRA) need to be used by anybody who wants their blind test to be considered more trustworthy than a sighted test. I haven't seen this happen, have you? Do you use these controls?In audio science, it's referred to as a hidden reference/anchor. As used in MUSHRA etc.
It simply adds robustness to the test. Feel free to suggest it to Amir if you feel he missed it.
cheers,
AJ
Unambiguously:Huh? Am I positing it?
Then you said:The case for well controlled DBTs using ITU or MUSHRA standards seems to be well established but I don't see many tests of this standard appearing in audio. The reason for these standards is to attempt to control the known biases that possibly influence the results. As a result one has to assume the vast majority of blind tests are probably biased & flawed.
Why would you not assume this as a biased and flawed non-ITU/MUSHRA test?I'm not sure if you are a frequenter of WB forum but Amir's thread showed his positive ABX results for ArnyK's High res Vs RB files; for Winer's A/D D/A loopback tests (both of which stood for 15 years or so as always providing negative results) caused the reaction you predict but this time from those that swear by blind tests - suggestions of cheating, of IMD, of bad resampling software, etc. None of which was ever proven to be the reason for the positive results - so I guess nobody is immune to what you say?
Based on what evidence? Where is your evidence to support this assertion? Where is your evidence that non-ITU/MUSHRA blind test X has been rendered flawed/biased by results showing the opposite, using method Y, be it Triangle (not "Triangulation"Most blind tests conducted in audio are not of a sufficiently high enough standard to be deemed reliable
No need, it's impossible to insult me.:lol:Sorry, if I insulted you & I apologise.
I think that is presumptuous, but also extraneous to the Dudley article, which throws out the baby with the bathwater, i.e. blind controlled testing are inapplicable to "audio", which is really a synonym for "high end audio".I was trying to make the point that in my experience the concept of internal controls generally goes over the head of (or is ignored by) DBT advocates. I have seldom, if ever seen these internal controls used in DBTs.
The conclusion is a flawed amateur gamable online test is just that. Flawed.Some things are inconsistent here. When this exact same test with the exact same files stood for 15 years without positive results, nobody was calling for controls. Now that some positive results have been returned there is a call for them. So what are we to conclude? The test was flawed for 15 years yet it's negative results often cited to prove high-res is not differentiable from RB audio? Negative results were not questioned. Does this not show, as do other examples of blind tests, that there is no real desire to make the test valid by using internal controls that verify & calibrate it's results? This test design with lack of internal controls is a well known bias called "experimenter's bias", designing a test which will likely give the expected outcome.
There is no question that controls/anchors increase the robustness/validity of the tests....but that does not in itself invaildate methods that do not include anchors.I don't think robustness is the correct term as it's a sliding scale & this isn't. Internal controls verifies that the test/participants are capable of audibly differentiating what is under test - it should test for false negatives & false positives. The likelihood is false negatives for blind tests, false positives for sighted tests. For blind tests the controls are geared towards verifying that the whole test is capable of differentiating between known audible differences. Without it the test results are unqualified & unusable. I liken it to calibrating a measuring device - without calibration we have no way of trusting the measurements. What is being done in blind tests is a measurement - a measurement of the perception of those involved in hearing an audible difference between devices. There are many obstacles that can interfere with this measurement & short of going full hog & using ITU/MUSHRA standard tests, I believe these internal controls (yes, the HRA part of MUSHRA) need to be used by anybody who wants their blind test to be considered more trustworthy than a sighted test. I haven't seen this happen, have you? Do you use these controls?
I guess I was imprecise in my post - I think this is getting into debating/nit-picking techniques rather than discussion. I didn't mean to exclude ABX with my phrase "non ITU/MUSHRA" so again, apologies for this confusionUnambiguously:
Then you said:
Why would you not assume this as a biased and flawed non-ITU/MUSHRA test?
OK, you are pulling me up on my mistake - toucheWhy would anyone need "prove" anything other than these positive results being the result of a "biased and flawed" methodology (your assertion)??
Ah, I think this is descending now (readers probably thought this happened pages ago?) into a terribly boring nit-picking exercise so I'll restrict my participationBased on what evidence? Where is your evidence to support this assertion? Where is your evidence that non-ITU/MUSHRA blind test X has been rendered flawed/biased by results showing the opposite, using method Y, be it Triangle (not "Triangulation"), ANOVA, MUSHRA, etc?
When exactly the opposite is reality, when Fletcher-Munson et al are re-run....
cheers,
AJ
Just want to ask one question about your post rather than get into the nit-picking downward spiral. You already had decided that ArnyK's ABX test was flawed during the last 15 years of it's existence? Can I ask what were your criteria in reaching this opinion?The conclusion is a flawed amateur gamable online test is just that. Flawed.
Hardy evidence against "Blind tests".
cheers,
AJ
Absolutely. One was a guy on AVS (MZillich iirc) who easily scored like Amir on the test...by detecting an audible glitch in the file, rather than anything in the music. The 2nd was a gent on HA who simply modified the filename in Fooobar and immediately scored 100%...without listeningJust want to ask one question about your post rather than get into the nit-picking downward spiral. You already had decided that ArnyK's ABX test was flawed during the last 15 years of it's existence? Can I ask what were your criteria in reaching this opinion?
Umm, I just gave you an example above where I doubted and did say something! (to Klippel). Perhaps it failed to make worldwide headlines, but hey, I tried.What I was asking was if you have doubted these non-proctored ABX tests during the last 15 years? If so why didn't you say anything?
We must be seeing different things, because I saw a lot of audiophiles question that sort of comparison, especially those that had been swooning over the glory of SACD/DVDA/"Hi Rez" media....which turned out be upsampled RB:lol:I don't remember seeing anyone questioning the negative ABX results from the last 15 years so they were taken as valid & used as proof that high-res sounds no different to RB audio.
You've lost me here. You were both talking about "Positive Controls". I've known what the "HRA" in MUSHRA stands for, for a long time. I also know what proctoring is, now that you've mentioned it.Proctoring is not a hidden anchor & reference internal control of the type specified in ITU/MUSHRA that Myles & I are referring to - this is a procedural control for administering the test, same as specifying the quality of equipment to be used, (free of IMD with ultrasonics). The internal controls are designed to avoid false positive or false negative results. An example of the internal controls are the ultrasonic tones included later in the test files - to determine if your equipment creates IMD i.e. for testing false positives. But, of course, this also relies on honesty![]()
Fair enough, but the topic is blind tests/Dudleys thoughts, rather than a referendum on "Harmans practices".I believe that you are on a very slippery slope here about the proctoring as the honesty of the proctor comes into question. The idea of a company using blind tests that show it's speakers are superior to others really becomes very questionable.
Nope. I'm saying the results of non-proctored tests, where it has been clearly shown gaming and file corruption are factors, should be discarded as invalid.So can I check if I understand your position correctly? You are saying that all non-proctored ABX tests are questionable & should be ignored?
If those factors exist for "most ABX tests run so far". And the evidence to support this can be found where...?This rules out most ABX tests run so far (BTW, I would like to see you take that line on Hydrogen Audio).
I'm saying ITU/MUSHRA is the most robust, but unless there is specific evidence to the contrary, the "lower standards" can also be valid. It really depends on what is being tested. Both, are far better than "I heard it, I said so", if you are concerned with sound>ears.You are saying that ITU/MUSHRA standards are the scientific blind testing standard but you will accept lower standards. I still don't know how you qualify these lower standard tests as reliable?
Exactly. There are those who pooh-pooh pharmaceutical blind testing also. I'll take my drugs, thanks.Sure, the world still turns - this is only a hobby & nothing to get exercised about.