Important: Please read the Qt Code of Conduct  https://forum.qt.io/topic/113070/qtcodeofconduct
Lies, Damned Lies, and Statistics

I don't get it. The test has nothing to do with your chances of having the disorder (.1%) ...

So...you're claiming that, after you know the test results, your chances are the same as before?

Yeah, pretty much, I guess.

So, here's the deal. As Mike Caro (a brilliant professional gambler) has observed, "in the beginning, everything was even money." In other words, lacking any other information, one's best guess as to the probability of ANYTHING is 5050.
Now, consider the problem I posed. If all I told you was a certain population was (partially) afflicted with a disorder, and I asked you what the chances were that a given individual in that population is afflicted, your best guess would be 5050, because you have absolutely NO other information upon which to base an estimate.
So, now I feed you another datum: the population is afflicted with an incidence of .1%. You immediately change your answer from 5050 to 1 in 1000.
Nothing has changed except the amount of information you possess, yet you've just profoundly altered your estimate (and correctly so).
So, I ask you, why would my giving you a second datum (your test result) not cause you to further revise your answer?

@mzimmers said in Lies, Damned Lies, and Statistics:
So, I ask you, why would my giving you a second datum (your test result) not cause you to further revise your answer?
The second piece of information relates to the accuracy of the test, not the incidence level. The incidence level is unchanged by the reliability of the test.
I am one person, not a population to base measure on. So with some probability (99%) the test is correct and if you average the test measure you'd get that from the 0.1% of people that have the condition 99% were correctly diagnosed and 1% were incorrectly diagnosed (have had false positives). Still, this does not affect the incidence level, just the reliability of the testing.

But I'm not asking what the incidence level is  I'm asking, what are the chances that you have the disorder? Your goal is to use the available information to make the best guess/estimate possible.
With no other information, your best estimate is 5050.
With knowledge that your population has an incidence rate of .0%, your best estimate is 1 in 1000 (or 9991 against to express it as odds).
With knowledge that your test came back positive, your best estimate is...?

Yeah, I got it now, but I have to point out I really hated statistics in the university and Bayes' theorem wasn't one of my favorite topics. I would have the particular disease with probability of 1% and change ...

I'll wait to see if anyone else wants to hazard a guess before I give the answer.

Okay but you do realize this is different from gambling (i.e. the lottery), where every run is independent.

@mzimmers said in Lies, Damned Lies, and Statistics:
I'll wait to see if anyone else wants to hazard a guess before I give the answer.
Can you wait 24 hours on that? I want to read & get my head around what you're saying so I can try to answer, but it's way too late tonight now .... :)

@JonB heh...sure, I'm not going anywhere. Anyone who can't wait for the answer can message me...

@mzimmers ... tell me tomorrow how many ppl messaged you ... :)

@mzimmers
Right, let's start my logical analysis :)First, let me see if I've got the figures from what you have said:
 Out of every 1,000 people, 1 has the affliction.
 The test will always identify that one person as being afflicted.
 Additionally, the test will report 10* other people as being afflicted who in fact are healthy.
[* Actually, the remaining population is 999, so really 9.99 rather than 10.0. This would affect my final figure, but I imagine you're not looking for that degree of accuracy, so my answer will be right to nearest couple of decimal places!]
Obviously I have misunderstood them I reserve the right to be corrected by you and then reanalyse! Otherwise, please continue....
So, I take the test, and it reports me positive. (I knew it! Just my luck :( This is about my smoking, isn't it?)
Well, in this case, the test has reported 11 people as positive. 1 is genuinely positive, while 10 are false positive.
My conclusion:
 Before the test result I had 1 in 1,000 chance of the terminal illness you are imposing.
 After the test I have a 1 in 11 chance of being the positive one, and a 10 in 11 chance of being one of the falsies.
If it helps any, you can also think of this as balls in a bag:
 There is 1 black ball, which has "You're toast" on a piece of paper inside it.
 There are 10 black balls, which have "Only kidding" on a piece of paper inside them.
 There are 989 white balls.
You put your hand in the bag and pull out a ball. It's black :( Given that, until you open the ball and look at the piece of paper, there's a 1 in 11 chance it contains the fateful news.
Right?
======================================================
Meanwhile....
You also wrote:As Mike Caro (a brilliant professional gambler) has observed, "in the beginning, everything was even money." In other words, lacking any other information, one's best guess as to the probability of ANYTHING is 5050.
I don't know if there was a context in which he wrote this which you have omitted, but that's a very strange statement. Lacking any information at all, one's "best guess" of a probability should not be anything like "5050". I can only think a gambler might think that way!
BTW, a quick analysis:
 I tell you I have a bag of balls, which you cannot see.
 I ask you to guess how many balls are in the bag.
 This is an example of "you have absolutely NO [other] information upon which to base an estimate".
 You say: There are 23 balls in the bag.
 According to you/him, the odds of this being correct are 0.5.
 You decide to guess again. This time you predict 587.
 Again, you/he claim the odds of this being right are 0.5.
 Finally, you decide to change your mind to 77.
 One more time, it's 0.5 likely you're right.
3 guesses, each of which has a 0.5 chance of being right? I don't think so!
Now, we could reanalyse precisely what you mean by "one's best guess as to the probability of ANYTHING is 5050", because perhaps you didn't have just the case above in mind.
But the point is: "lacking any other information, one's best guess as to the probability of ANYTHING is 5050." is not a "good guess". The correct answer is: "Lacking any information, a 'probability' is simply meaningless." Probability requires some information in order to have anything to say.

Ok, I give it a try myself

We have the starting position, you either have the illness or your don't, with a 0.1% chance that you have it.

The test always has a result, but there's a 1 % chance the result is the exact opposite.

it is asked only for the cases that the test says "You have it"
 you have it 0.001 and the test shows it 0.99 => 0.00099
 you don't have it 0.999 but the test says you have it 0.01 => 0.00999
=> 0.01098 ~ 1.1 % chance you're diagnosed with the illness when only 0.1% off all people have it ?


@J.Hilk
The question posed is: "Given that your result is reported as positive, what is the probability that you actually do have the disease?"Are you claiming that the answer to that is your "1.1%"? I say it's ~ 1 in 11, more like "9.09%".

@JonB said in Lies, Damned Lies, and Statistics:
@J.Hilk
The question posed is: "Given that your result is reported as positive, what is the probability that you actually do have the disease?"Are you claiming that the answer to that is your "1.1%"? I say it's ~ 1 in 11, more like "9.09%".
well it is 0.099 % you have it and it is diagnosed
to 0.999 % you don't have it and it is diagnosed=> ~10% chance you actually have it, when it is diagnosed ?

@J.Hilk
Well, your "~ 10%" is not far off my "~ 9.1%", so we're close, though I'll stick (as per my blackballsinbag) to my 9.1% being closer than 10%.BTW, your:
well it is 0.099 % you have it and it is diagnosed
is slightly off. We know (before the test) there is a 0.1% chance you have the ailment, and that the test "correctly diagnoses all [actual] cases". So, depending on your phrasing, this should remain at 0.1%. The bit where you originally wrote:
you have it 0.001 and the test shows it 0.99 => 0.00099
should have read:
you have it 0.001 and the test shows it 1.0 => 0.001
Then you have:
The test always has a result, but there's a 1 % chance the result is the exact opposite.
Not quite. It does not do "the exact opposite". There is a 1% chance it reports positive when it should be negative. But the opposite is not the case: it does not report negative when it should be positive ever.
 There is a 0.01% I have the disease, in which case I will deffo be told I do.
 There is a 0.1% I don't have the disease, but will be told I do.
 [Note that the above 2 cases are mutually exclusive, with no dependencies.]
 The test will report 0.11% total positives. 11 people out of 1,000. 10 will be incorrect, 1 will be correct. You have a 1 in 11 chance of being the positive one, and a 10 in 11 chance of being one of the false ones. Period.

JonB nailed it. His analysis was spoton, with one minor nit: The problem stated:
Someone devises a test for this disorder which, in correctly diagnoses all cases, but also reports a false positive exactly 1% of the time. As stated, the false positive rate is given without regard to any true positives  it occurs at a rate of 1%. In a population, it will "lie" about 10 individuals. So, the correct answer is exactly (not nearly) 1 in 11.
Regarding the "everything is 5050" assertion: this has its roots in philosophy as much as it does in probability, but it's still valid IMO. I'd love to hear how Mike Caro
would respond to your interesting point.

I like algebra, so here's some notation from probability theory:
 P(A): Probability that A occurs
 P(A ∩ B): Probability that A and B both occur
 P(A  B): Probability that A occurs, given that B occurs
In this example,
 A: Have the disease
 B: Get a positive test result
@JonB said in Lies, Damned Lies, and Statistics:
 Out of every 1,000 people, 1 has the affliction.
P(A) = 0.001
 The test will always identify that one person as being afflicted.
P(B  A) = 1
By the axiom of probability, P(B∩A) = P(BA) * P(A) so
P(B ∩ A) = 0.001
 Additionally, the test will report 10* other people as being afflicted who in fact are healthy.
[* Actually, the remaining population is 999, so really 9.99 rather than 10.0. This would affect my final figure, but I imagine you're not looking for that degree of accuracy, so my answer will be right to nearest couple of decimal places!]
P(B  ¬A) = 0.01
Similarly to before,
P(B ∩ ¬A) = 0.00999
Well, in this case, the test has reported 11 people as positive. 1 is genuinely positive, while 10 are false positive.
P(B) = P(B ∩ A) + P(B ∩ ¬A) so
P(B) = 0.01099
My conclusion:
 Before the test result I had 1 in 1,000 chance of the terminal illness you are imposing.
Yep, before the test, nothing was given, so you can only use P(A). We already know
P(A) = 0.001
. After the test I have a 1 in 11 chance of being the positive one, and a 10 in 11 chance of being one of the falsies.
Given that you got a positive test result, what is the probability that you have the disease?
P(A  B) = P(A ∩ B) / P(B) = 0.001 / 0.01099 so
P(A  B) = 0.0909918...
which is a teensy bit more than 1 in 11.If it helps any, you can also think of this as balls in a bag:
 There is 1 black ball, which has "You're toast" on a piece of paper inside it.
 There are 10 black balls, which have "Only kidding" on a piece of paper inside them.
 There are 989 white balls.
You put your hand in the bag and pull out a ball. It's black :( Given that, until you open the ball and look at the piece of paper, there's a 1 in 11 chance it contains the fateful news.
Right?
Haha, awesome analogy!
As Mike Caro (a brilliant professional gambler) has observed, "in the beginning, everything was even money." In other words, lacking any other information, one's best guess as to the probability of ANYTHING is 5050.
I don't know if there was a context in which he wrote this which you have omitted, but that's a very strange statement. Lacking any information at all, one's "best guess" of a probability should not be anything like "5050". I can only think a gambler might think that way!
I don't think that "5050" means "Each answer has a 50% chance to be correct". Rather, it means "Each answer has the same chance of being correct as every other possible answer".
So, if you're guessing heads or tails, there are only 2 possible answers so you have a 50% chance of getting it right. However, with the bag of balls, if the bag is big enough to hold 99 balls then there are 100 possible answers, so you have a 1% chance of getting it right.

JKSH got it right as well (with the same very minor glitch as JonB).
@J.Hilk said in Lies, Damned Lies, and Statistics:
well it is 0.099 % you have it and it is diagnosed
Actually 0.1% (as discussed above).
to 0.999 % you don't have it and it is diagnosed
1%.
=> ~10% chance you actually have it, when it is diagnosed ?
10 false positives, one real positive: your chances are 1 in 11, or about 9%. You were pretty close.

Since you guys did so well on that one, here's another: I hand you a bag, inside which are three coins. The coins appear identical, but while two are "fair," one will always land headsup.
You pull a coin from the bag, and toss it three times. You get a head every time. What are the chances you pulled the unfair coin?
(Those who get this right might be ready for the extremely unintuitive Monte Hall problem...)

@JKSH
Two quick observations:P(A  B) = P(A ∩ B) / P(B) = 0.001 / 0.01099 so P(A  B) = 0.0909918... which is a teensy bit more than 1 in 11.
There is still something wrong here with where you go about calculating these figures, but I'm too tired to spot it. @mzimmers said of my solution above:
His analysis was spoton, with one minor nit:
[...]
So, the correct answer is exactly (not nearly) 1 in 11.In my first attempt, at the end I stated:
After the test I have a 1 in 11 chance of being the positive one, and a 10 in 11 chance of being one of the falsies.
And in my second clarification earlier, I had come to the same conclusion when I wrote:
The test will report 0.11% total positives. 11 people out of 1,000. 10 will be incorrect, 1 will be correct. You have a 1 in 11 chance of being the positive one, and a 10 in 11 chance of being one of the false ones. Period.
That was my attempt to say ("Period") that I had realized my previous talk about "999" & "roundings" was unnecessary & inaccurate. Like @mzimmers I conclude the chance is exactly 1 in 11.
I don't think that "5050" means "Each answer has a 50% chance to be correct". Rather, it means "Each answer has the same chance of being correct as every other possible answer".
The second sentence might be a better way of phrasing it. Which, certainly to my mind/understanding, should never be referred to as "5050".

@mzimmers said in Lies, Damned Lies, and Statistics:
(Those who get this right might be ready for the extremely unintuitive Monte Hall problem...)
Darn, I was going to quote that one! :) (If you do, I won't say a word, till it's solved by someone who doesn't know.)
P.S.
Are you old enough to have watched the show live in the USA? ;)

@mzimmers said in Lies, Damned Lies, and Statistics:
You pull a coin from the bag, and toss it three times. You get a head every time. What are the chances you pulled the unfair coin?
8 in 10

@JonB said in Lies, Damned Lies, and Statistics:
@mzimmers said in Lies, Damned Lies, and Statistics:
You pull a coin from the bag, and toss it three times. You get a head every time. What are the chances you pulled the unfair coin?
8 in 10
Correct (though I would have said 4 in 5). Care to share with the other students how you arrived at this answer?

@mzimmers
I chose to write "8 in 10" rather than "4 in 5" deliberately, because of the way I reached the figure mentally.I thought I would not explain, at least for now, so that others might have their opportunity to think it through and see what they came up with. Like you did for the other one, perhaps I should wait for 24 hours before explaining! BTW, I found this one easier to think through than the first one, for some reason  perhaps because the other one gave me medical frights? ;)

Hah...fair enough, though I'm now curious as to how you ended up at 8 in 10...but if everyone else can can wait for the answer, I suppose I can wait for the explanation.

@mzimmers I'll post over the weekend... :) Probably only you & I care now!

@mzimmers said in Lies, Damned Lies, and Statistics:
JonB nailed it. His analysis was spoton, with one minor nit: The problem stated:
Someone devises a test for this disorder which, in correctly diagnoses all cases, but also reports a false positive exactly 1% of the time. As stated, the false positive rate is given without regard to any true positives  it occurs at a rate of 1%. In a population, it will "lie" about 10 individuals.
So you meant "1% of the whole population receives a false positive" (
P(B ∩ ¬A) = 0.01
).I thought you meant "1% of the healthy people receive a false positive" (
P(B  ¬A) = 0.01
).@JonB said in Lies, Damned Lies, and Statistics:
P(A  B) = P(A ∩ B) / P(B) = 0.001 / 0.01099 so P(A  B) = 0.0909918... which is a teensy bit more than 1 in 11.
There is still something wrong here with where you go about calculating these figures, but I'm too tired to spot it.
It boiled down to the interpretation of the falsepositive rate (see above). With the correct interpretation, we have:
P(A) = 0.001
(0.1% of the population have the disorder)P(B  A) = 1
(The test detects the disorder 100% of the time)P(B ∩ ¬A) = 0.01
(The test has a 1% false positive rate within the whole population)
Finding intermediate parameters,
 P(B∩A) = P(BA) * P(A) ⇒
P(B ∩ A) = 0.001
(0.1% of the whole population have the disorder AND get a positive result)  P(B) = P(B∩A) + P(B ∩ ¬A) ⇒
P(B) = 0.011
(1.1% of the whole population get a positive test result)
Finally,
 P(A  B) = P(A∩B) / P(B) ⇒
P(A  B) = 1/11
(Given that I got a positive result, I have 1 in 11 chance of having the disorder)
All good! :D
@mzimmers said in Lies, Damned Lies, and Statistics:
Since you guys did so well on that one, here's another: I hand you a bag, inside which are three coins. The coins appear identical, but while two are "fair," one will always land headsup.
You pull a coin from the bag, and toss it three times. You get a head every time. What are the chances you pulled the unfair coin?
I used the same method as my first attempt. Same equations, just different starting numbers.
P(XY) = 0.8
where X: Got the unfair coin
 Y: Flipped 3 times and got 3 heads
P.S. Thanks for the fun puzzles, @mzimmers! I used to do them in school/university but haven't done any in a while.

I don't use @JKSH 's equations  too much brainache!
The method is just:
 There are 8 permutations from flipping a coin 3 times.
 The unfair coin produces 3 heads in all of its permutations.
 The fair coins each produce 1 set of 3 heads in each of theirs.
 Thus of the possible 24 outcomes, there are 10 with all heads, and of those 2 are produced by the fair coins while 8 are produced by the weighted one.
Hence my initial writing of
8 in 10
, rather than simplifying :)You should probably now throw Monte Hall at @JKSH :)

Well done, and well presented. When I was faced with this problem, I did it slightly differently (1/3 * 100%) vs. (2/3 * 12.5%). The underlying logic is the same.
JKSH's notations are just a formal representation of what we're doing. Given that I took my only statistics class nearly 40 years ago, I've forgotten all the notation, though I remember most of the principles. As long as we all get to the right answers, the various approaches are equally valid.
I'll bring up Monte Hall if KJSH chimes in. And yes, I can remember watching that show live...good entertainment (if you're 12 years old).

Given that I took my only statistics class nearly 40 years ago, I've forgotten all the notation, though I remember most of the principles
In that case, please remind me what the "Chi squared" test thingy is? I remember the teacher banging on about that one. And no, you are not allowed to look it up. :)

Chi squared...ew.
"Math's hard; let's go shopping!" (Barbie from the pre menarepigs era)

@mzimmers said in Lies, Damned Lies, and Statistics:
"Math's hard; let's go shopping!" (Barbie from the pre menarepigs era)
LOL.

@JonB said in Lies, Damned Lies, and Statistics:
I don't use @JKSH 's equations  too much brainache!
I do find verbal descriptions more meaningful and intuitive, but I also find equations more systematic and comprehensive.
Descriptions help me to understand the "reality" of a problem, while equations help me to see connections and patterns (either within the same problem, or across different problems)
@JonB I've taken the liberty of translating English into Equations :) (Your statements in bold)
 X: Got the unfair coin
 Y: Flipped 3 times and got 3 heads
Starting info:
P(X) = 1/3
(I have a 1 in 3 chance of getting the unfair coin)P(Y  X) = 1
(The unfair coin produces 3 heads in all of its permutations / Given that I got the unfair coin, I'm guaranteed to flip 3 heads in a row)P(Y  ¬X) = 1/8
(There are 8 permutations from flipping a coin 3 times. The fair coins each produce 1 set of 3 heads in each of theirs. / Given that I didn't get the unfair coin, I have a 1 in 2^3 chance of flipping 3 heads in a row)
Intermediate parameters:
 P(¬X) = 1  P(X) ⇒
P(¬X) = 2/3
(I have a 2 in 3 chance of getting a fair coin)  P(Y ∩ X) = P(YX) * P(X) ⇒
P(Y ∩ X) = 1/3
(I have a 1 in 3 chance of getting the unfair coin AND flipping 3 heads in a row)  P(Y ∩ ¬X) = P(Y¬X) * P(¬X) ⇒
P(Y ∩ ¬X) = 1/12
(I have a 1 in 12 chance of getting a fair coin AND flipping 3 heads in a row)  P(Y) = P(YX) + P(Y¬X) ⇒
P(Y) = 5/12
(of the possible 24 outcomes, there are 10 with all heads)
Finally:
 P(X  Y) = P(X ∩ Y) / P(Y) ⇒
P(X  Y) = 4/5
(...[of these 10,] 8 are produced by the weighted [coin]. / Given that I flipped 3 heads in a row, there is a 4 in 5 chance that I have the unfair coin)
You should probably now throw Monte Hall at @JKSH :)
Sorry, I looked up the Wikipedia article when it was first mentioned here!
@JonB said in Lies, Damned Lies, and Statistics:
In that case, please remind me what the "Chi squared" test thingy is? I remember the teacher banging on about that one. And no, you are not allowed to look it up. :)
I don't remember how to use it anymore, but I remember using it lots in biology class to test for mutations in a population.
@mzimmers said in Lies, Damned Lies, and Statistics:
"Math's hard; let's go shopping!" (Barbie from the pre menarepigs era)
For me, shopping is hard. Too many choices; need to guard against marketers' tactics; need to research to find a good deal; need to haggle or negotiate...
...let's do math! It's just me, my comfy chair, and my trusty pen+paper.

@JonB said in Lies, Damned Lies, and Statistics:
In that case, please remind me what the "Chi squared" test thingy is?
You pose a hypothesis (e.g you have a model of something) and you want to test how well your model fits the experimental data you have  you calculate the χ squared and you get your answer. There's a lot of theory behind it, but you can think of it in simple terms as the (quadratic) measure of the population's dispersion around your model  i.e. how far the real population is from the modelled population.

@JKSH said in Lies, Damned Lies, and Statistics:
@mzimmers said in Lies, Damned Lies, and Statistics:
"Math's hard; let's go shopping!" (Barbie from the pre menarepigs era)
For me, shopping is hard. Too many choices; need to guard against marketers' tactics; need to research to find a good deal; need to haggle or negotiate...
In that case, you don't seem to have a woman. If you did, I would expect her to insist on making all the shopping choices on your behalf, so it wouldn't be an issue... ;)
["JB: Unreconstructed from the prePC era."]


So that's a rather different thing from standard deviation, right. So you make a model, calculate with it, then discover how inaccurate it is by going back and examining the real population, and then make something squared out of it. Is that it?

Bonus point for finding the χ key on your keyboard.
While I'm at it.... The other thing I remember the teach banging on about forever was to do with (unlike you I haven't a clue/the will to go find symbols to type) "xbar" [
x
with a horizontal bar on top of it] versus "mu" [the Greek letter]. xbar was the mean you got from a sample, while mu was the actual mean, which you didn't know.Now, the problem was something about how you had to phrase what you said about xbar & mu in your conclusion. I presume this was to do with confidence limits, you were trying to say something like "I'm 95% sure xbar is within one standard deviation of mu". Only there was some deep rule you had to adhere to in phrasing it some way round with some wording. Like, you couldn't say xbar or mu was likely to be whatever, because it wasn't subject to probability (perhaps that was for mu, because the mean of the population just is whatever it is, even if you don't what that is, or somesuch). So what was that one all about? :)


@JonB said in Lies, Damned Lies, and Statistics:
 So that's a rather different thing from standard deviation, right. So you make a model, calculate with it, then discover how inaccurate it is by going back and examining the real population, and then make something squared out of it. Is that it?
Yep. If you say you're doing a least squares fit, then chi squared would be how good the fit was  basically the sum of the square of distances between the sampled data and the actual regression curve.
 Bonus point for finding the χ key on your keyboard.
I'm well versed in the greek alphabet, being a physicist and all. ;)
While I'm at it.... The other thing I remember the teach banging on about forever was to do with (unlike you I haven't a clue/the will to go find symbols to type) "xbar" [
x
with a horizontal bar on top of it] versus "mu" [the Greek letter]. [...] So what was that one all about? :)In principle the real expectation value (or mean) will not coincide with the one you got by sampling. So there's some probability that the sampling mean will be in some range around the real one. That's what this is about  Student's distribution.

@kshegunov said in Lies, Damned Lies, and Statistics:
In principle the real expectation value (or mean) will not coincide with the one you got by sampling. So there's some probability that the sampling mean will be in some range around the real one. That's what this is about  Student's distribution.
Yeah, but the nightmare recollection is something about what you were/were not allowed to "say" about something to do with the probability/confidence limits of the relationship between my & xbar, if you phrased it wrong you lost all your marks....
BTW, on that subject there was something similar (though not as hard to remember as the muxbar one) when you did "proof by induction". You did your
k
, then you did yourk + 1
. But when you wrote the final conclusion forn
instead ofk
, you had to phrase that one in a particular way too... !