Why does increasing the sample size of coin flips not improve the normal curve approximation?

I'm reading the Statistics (Freeman, Pisani, Purves) book and I'm trying to reproduce an example where a coin is tossed say 50 times, the number of heads counted and this is repeated say 1,000 times.

First, I kept the number of tosses (sample size) at 1000 and increased the repetitions. The more repetitions, the better the data fit the normal curve.

So next, I tried keeping the number of repetitions fixed at 1,000 and increased the sample size. The larger the sample size, the worst the normal curved seemed to fit the data. This seems to be contradictory to the book example which better approximates the normal curve as the sample size increases.

I wanted to see what would happen if I increased the sample size, but with a larger number of repetitions which were fixed at 10,000. This seems to also contradict the book.

Any ideas what I'm doing wrong?

Code and graphs below.

%matplotlib inline



def plot_hist(num_repetitions, num_tosses):

    tosses = np.random.randint(0, 2, size=[num_repetitions, num_tosses])

    sums = np.apply_along_axis(lambda a: np.sum(a == 1), 1, tosses)



    xmin, xmax = min(sums), max(sums)  

    lnspc = np.linspace(xmin, xmax, len(sums))



    m, s = stats.norm.fit(sums) # get mean and standard deviation  

    pdf_g = stats.norm.pdf(lnspc, m, s) # now get theoretical values in our interval  



    bins = np.arange(xmin, xmax) - 0.5

    step = int((xmax - xmin)/5)



    fig, ax = plt.subplots()

    _ = ax.hist(sums, bins, edgecolor='black', linewidth=1.2, density=True)

    _ = ax.plot(lnspc, pdf_g, label="Norm", color='red')

    _ = ax.set_xticks(bins[::step] + 0.5)

    _ = ax.set_title('{:,} tosses - {:,} repetitions'.format(num_tosses, num_repetitions))

1. Experiment with increasing number of repetitions (fixed sample size of 1000)

plot_hist(1000, 1000)

enter image description here

plot_hist(10000, 1000)

enter image description here

plot_hist(100000, 1000)

enter image description here

2. Experiment with increasing sample size (fixed at 1000 repetitions)

plot_hist(1000, 100)

enter image description here

plot_hist(1000, 1000)

enter image description here

plot_hist(1000, 10000)

enter image description here

3. Experiment with increasing sample size (fixed at 10,000 repetitions)

plot_hist(10000, 100)

enter image description here

plot_hist(10000, 1000)

enter image description here

plot_hist(10000, 10000)

enter image description here

plot_hist(10000, 100000)

enter image description here

edited 42 mins ago

osuka_

1032

asked 4 hours ago

Chris Snow

198110

2

$begingroup$
You need a better histogram plotter--this one is particularly poor and it's creating artifacts. Better yet, compare distributions with a probability plot rather than a histogram.
$endgroup$
– whuber♦
2 hours ago

$begingroup$
This is useful to know and has increased my knowledge today. Thanks!
$endgroup$
– Chris Snow
1 hour ago

add a comment |

I'm reading the Statistics (Freeman, Pisani, Purves) book and I'm trying to reproduce an example where a coin is tossed say 50 times, the number of heads counted and this is repeated say 1,000 times.

First, I kept the number of tosses (sample size) at 1000 and increased the repetitions. The more repetitions, the better the data fit the normal curve.

So next, I tried keeping the number of repetitions fixed at 1,000 and increased the sample size. The larger the sample size, the worst the normal curved seemed to fit the data. This seems to be contradictory to the book example which better approximates the normal curve as the sample size increases.

I wanted to see what would happen if I increased the sample size, but with a larger number of repetitions which were fixed at 10,000. This seems to also contradict the book.

Any ideas what I'm doing wrong?

Code and graphs below.

%matplotlib inline



def plot_hist(num_repetitions, num_tosses):

    tosses = np.random.randint(0, 2, size=[num_repetitions, num_tosses])

    sums = np.apply_along_axis(lambda a: np.sum(a == 1), 1, tosses)



    xmin, xmax = min(sums), max(sums)  

    lnspc = np.linspace(xmin, xmax, len(sums))



    m, s = stats.norm.fit(sums) # get mean and standard deviation  

    pdf_g = stats.norm.pdf(lnspc, m, s) # now get theoretical values in our interval  



    bins = np.arange(xmin, xmax) - 0.5

    step = int((xmax - xmin)/5)



    fig, ax = plt.subplots()

    _ = ax.hist(sums, bins, edgecolor='black', linewidth=1.2, density=True)

    _ = ax.plot(lnspc, pdf_g, label="Norm", color='red')

    _ = ax.set_xticks(bins[::step] + 0.5)

    _ = ax.set_title('{:,} tosses - {:,} repetitions'.format(num_tosses, num_repetitions))

1. Experiment with increasing number of repetitions (fixed sample size of 1000)

plot_hist(1000, 1000)

enter image description here

plot_hist(10000, 1000)

enter image description here

plot_hist(100000, 1000)

enter image description here

2. Experiment with increasing sample size (fixed at 1000 repetitions)

plot_hist(1000, 100)

enter image description here

plot_hist(1000, 1000)

enter image description here

plot_hist(1000, 10000)

enter image description here

3. Experiment with increasing sample size (fixed at 10,000 repetitions)

plot_hist(10000, 100)

enter image description here

plot_hist(10000, 1000)

enter image description here

plot_hist(10000, 10000)

enter image description here

plot_hist(10000, 100000)

enter image description here

edited 42 mins ago

osuka_

1032

asked 4 hours ago

Chris Snow

198110

2

$begingroup$
You need a better histogram plotter--this one is particularly poor and it's creating artifacts. Better yet, compare distributions with a probability plot rather than a histogram.
$endgroup$
– whuber♦
2 hours ago

$begingroup$
This is useful to know and has increased my knowledge today. Thanks!
$endgroup$
– Chris Snow
1 hour ago

add a comment |

I'm reading the Statistics (Freeman, Pisani, Purves) book and I'm trying to reproduce an example where a coin is tossed say 50 times, the number of heads counted and this is repeated say 1,000 times.

First, I kept the number of tosses (sample size) at 1000 and increased the repetitions. The more repetitions, the better the data fit the normal curve.

So next, I tried keeping the number of repetitions fixed at 1,000 and increased the sample size. The larger the sample size, the worst the normal curved seemed to fit the data. This seems to be contradictory to the book example which better approximates the normal curve as the sample size increases.

I wanted to see what would happen if I increased the sample size, but with a larger number of repetitions which were fixed at 10,000. This seems to also contradict the book.

Any ideas what I'm doing wrong?

Code and graphs below.

%matplotlib inline



def plot_hist(num_repetitions, num_tosses):

    tosses = np.random.randint(0, 2, size=[num_repetitions, num_tosses])

    sums = np.apply_along_axis(lambda a: np.sum(a == 1), 1, tosses)



    xmin, xmax = min(sums), max(sums)  

    lnspc = np.linspace(xmin, xmax, len(sums))



    m, s = stats.norm.fit(sums) # get mean and standard deviation  

    pdf_g = stats.norm.pdf(lnspc, m, s) # now get theoretical values in our interval  



    bins = np.arange(xmin, xmax) - 0.5

    step = int((xmax - xmin)/5)



    fig, ax = plt.subplots()

    _ = ax.hist(sums, bins, edgecolor='black', linewidth=1.2, density=True)

    _ = ax.plot(lnspc, pdf_g, label="Norm", color='red')

    _ = ax.set_xticks(bins[::step] + 0.5)

    _ = ax.set_title('{:,} tosses - {:,} repetitions'.format(num_tosses, num_repetitions))

1. Experiment with increasing number of repetitions (fixed sample size of 1000)

plot_hist(1000, 1000)

enter image description here

plot_hist(10000, 1000)

enter image description here

plot_hist(100000, 1000)

enter image description here

2. Experiment with increasing sample size (fixed at 1000 repetitions)

plot_hist(1000, 100)

enter image description here

plot_hist(1000, 1000)

enter image description here

plot_hist(1000, 10000)

enter image description here

3. Experiment with increasing sample size (fixed at 10,000 repetitions)

plot_hist(10000, 100)

enter image description here

plot_hist(10000, 1000)

enter image description here

plot_hist(10000, 10000)

enter image description here

plot_hist(10000, 100000)

enter image description here

edited 42 mins ago

osuka_

1032

asked 4 hours ago

Chris Snow

198110

I'm reading the Statistics (Freeman, Pisani, Purves) book and I'm trying to reproduce an example where a coin is tossed say 50 times, the number of heads counted and this is repeated say 1,000 times.

First, I kept the number of tosses (sample size) at 1000 and increased the repetitions. The more repetitions, the better the data fit the normal curve.

So next, I tried keeping the number of repetitions fixed at 1,000 and increased the sample size. The larger the sample size, the worst the normal curved seemed to fit the data. This seems to be contradictory to the book example which better approximates the normal curve as the sample size increases.

I wanted to see what would happen if I increased the sample size, but with a larger number of repetitions which were fixed at 10,000. This seems to also contradict the book.

Any ideas what I'm doing wrong?

Code and graphs below.

%matplotlib inline



def plot_hist(num_repetitions, num_tosses):

    tosses = np.random.randint(0, 2, size=[num_repetitions, num_tosses])

    sums = np.apply_along_axis(lambda a: np.sum(a == 1), 1, tosses)



    xmin, xmax = min(sums), max(sums)  

    lnspc = np.linspace(xmin, xmax, len(sums))



    m, s = stats.norm.fit(sums) # get mean and standard deviation  

    pdf_g = stats.norm.pdf(lnspc, m, s) # now get theoretical values in our interval  



    bins = np.arange(xmin, xmax) - 0.5

    step = int((xmax - xmin)/5)



    fig, ax = plt.subplots()

    _ = ax.hist(sums, bins, edgecolor='black', linewidth=1.2, density=True)

    _ = ax.plot(lnspc, pdf_g, label="Norm", color='red')

    _ = ax.set_xticks(bins[::step] + 0.5)

    _ = ax.set_title('{:,} tosses - {:,} repetitions'.format(num_tosses, num_repetitions))

1. Experiment with increasing number of repetitions (fixed sample size of 1000)

plot_hist(1000, 1000)

enter image description here

plot_hist(10000, 1000)

enter image description here

plot_hist(100000, 1000)

enter image description here

2. Experiment with increasing sample size (fixed at 1000 repetitions)

plot_hist(1000, 100)

enter image description here

plot_hist(1000, 1000)

enter image description here

plot_hist(1000, 10000)

enter image description here

3. Experiment with increasing sample size (fixed at 10,000 repetitions)

plot_hist(10000, 100)

enter image description here

plot_hist(10000, 1000)

enter image description here

plot_hist(10000, 10000)

enter image description here

plot_hist(10000, 100000)

enter image description here

normal-distribution normal-approximation

edited 42 mins ago

osuka_

1032

asked 4 hours ago

Chris Snow

198110

edited 42 mins ago

osuka_

1032

asked 4 hours ago

Chris Snow

198110

edited 42 mins ago

osuka_

1032

edited 42 mins ago

osuka_

1032

edited 42 mins ago

osuka_

1032

asked 4 hours ago

Chris Snow

198110

asked 4 hours ago

Chris Snow

198110

asked 4 hours ago

Chris Snow

198110

2

$begingroup$
You need a better histogram plotter--this one is particularly poor and it's creating artifacts. Better yet, compare distributions with a probability plot rather than a histogram.
$endgroup$
– whuber♦
2 hours ago

$begingroup$
This is useful to know and has increased my knowledge today. Thanks!
$endgroup$
– Chris Snow
1 hour ago

add a comment |

2

$begingroup$
You need a better histogram plotter--this one is particularly poor and it's creating artifacts. Better yet, compare distributions with a probability plot rather than a histogram.
$endgroup$
– whuber♦
2 hours ago

$begingroup$
This is useful to know and has increased my knowledge today. Thanks!
$endgroup$
– Chris Snow
1 hour ago

You need a better histogram plotter--this one is particularly poor and it's creating artifacts. Better yet, compare distributions with a probability plot rather than a histogram.

– whuber♦
2 hours ago

This is useful to know and has increased my knowledge today. Thanks!

– Chris Snow
1 hour ago

add a comment |

2 Answers
2

active

oldest

votes

In the second case, by increasing the number of tosses, you increase the number of bins a single trial can fall into. While the first case of experiment 2 only has a maximum of 100 bins that can be filled, the last example has 10000 bins. You increased the "resolution" of your experiment by a factor 100(i.e. one bin in your first experiment is now represented by roughly 100 in your second). Of course this means that you would expect to require a factor 100 more data to fill your bins.

edited 1 hour ago

Chris Snow

198110

answered 2 hours ago

Ulfgard

584

$begingroup$
Ah, makes sense. By setting the number of bins to 35 (_ = ax.hist(sums, bins=35, edgecolor='black', linewidth=1.2, density=True)) the third experiment now approximates the normal curve.
$endgroup$
– Chris Snow
1 hour ago

add a comment |

You can think of an individual coin flip as an independent Bernoulli trial. One trial will give you either heads/tails or success/failure, respectively. If you repeat this say 100,000 times, the average number of heads will be very close to 0.5, if the coin is fair.

Now if you increase the number of trials to 1,000 and keep the repetition at 1, you will get a sequence of 1,000 successes/failures and cannot say much about the probability of observing, on average, 500 heads unless you increase the number of repetitions for each of those independent trials. As the number of repetitions increases, you will get a better and better approximation to the normal distribution.

For me it is easier to think of the trials not as “tosses” or “sample sizes” but instead of separate coins and the repetitions as the number of flips of each of those coins. Then it also makes intuitively sense that by increasing the number of coins (or trials), while keeping the number of repetitions constant, the approximation of the data to the normal distribution gets worse.

edited 2 hours ago

answered 3 hours ago

Stefan

3,3911931

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389590%2fwhy-does-increasing-the-sample-size-of-coin-flips-not-improve-the-normal-curve-a%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited 1 hour ago

Chris Snow

198110

answered 2 hours ago

Ulfgard

584

$begingroup$
Ah, makes sense. By setting the number of bins to 35 (_ = ax.hist(sums, bins=35, edgecolor='black', linewidth=1.2, density=True)) the third experiment now approximates the normal curve.
$endgroup$
– Chris Snow
1 hour ago

add a comment |

edited 1 hour ago

Chris Snow

198110

answered 2 hours ago

Ulfgard

584

$begingroup$
Ah, makes sense. By setting the number of bins to 35 (_ = ax.hist(sums, bins=35, edgecolor='black', linewidth=1.2, density=True)) the third experiment now approximates the normal curve.
$endgroup$
– Chris Snow
1 hour ago

add a comment |

edited 1 hour ago

Chris Snow

198110

answered 2 hours ago

Ulfgard

584

edited 1 hour ago

Chris Snow

198110

answered 2 hours ago

Ulfgard

584

edited 1 hour ago

Chris Snow

198110

edited 1 hour ago

Chris Snow

198110

edited 1 hour ago

Chris Snow

198110

answered 2 hours ago

Ulfgard

584

answered 2 hours ago

Ulfgard

584

answered 2 hours ago

Ulfgard

584

$begingroup$
Ah, makes sense. By setting the number of bins to 35 (_ = ax.hist(sums, bins=35, edgecolor='black', linewidth=1.2, density=True)) the third experiment now approximates the normal curve.
$endgroup$
– Chris Snow
1 hour ago

add a comment |

$begingroup$
Ah, makes sense. By setting the number of bins to 35 (_ = ax.hist(sums, bins=35, edgecolor='black', linewidth=1.2, density=True)) the third experiment now approximates the normal curve.
$endgroup$
– Chris Snow
1 hour ago

Ah, makes sense. By setting the number of bins to 35 (_ = ax.hist(sums, bins=35, edgecolor='black', linewidth=1.2, density=True)) the third experiment now approximates the normal curve.

– Chris Snow
1 hour ago

add a comment |

edited 2 hours ago

answered 3 hours ago

Stefan

3,3911931

add a comment |

edited 2 hours ago

answered 3 hours ago

Stefan

3,3911931

add a comment |

edited 2 hours ago

answered 3 hours ago

Stefan

3,3911931

edited 2 hours ago

answered 3 hours ago

Stefan

3,3911931

edited 2 hours ago

answered 3 hours ago

Stefan

3,3911931

answered 3 hours ago

Stefan

3,3911931

answered 3 hours ago

Stefan

3,3911931

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Fdzfgy