The following is a guest post by Kurtis Frank from

More and more athletes and gym-goers have turned to the internet to achieve their goals. As a result, they have begun to rely on scientific data and evidence to guide their decisions. This sounds incredible and some would argue it heralds the end of broscience (the practice of passing information along by word of mouth, usually based on things that ‘should work’).

Broscience is not necessarily wrong. Since there is no quality control on word of mouth, information can be right, or completely wrong, but disregarding broscience by claiming “it’s not science” is rash.

Science doesn’t always investigate what you want it to. A study that finds an increased power output in newbies, accessed by a leg extension, does not mean that leg extensions will increase an advance athlete’s running speed or squatting abilities. It could, but that study alone is not proper evidence.

Science is fantastic, but what you find online may not always be science. Science is a process, one known as the scientific method, which fills databases around the world. Science is also a highly marketable buzzword, designed to sell products to people with a university education. Not everything claimed to be scientific is, because science sells.

Still, science is a great tool, and should be considered the main tool in your better-physique toolbox.

When should science be used? Whenever it applies. Below are some guidelines for assessing when and how to use science.

Statistical power and the structure of studies

Sample size is a measure of how many people were included in the study. A study on one person is not reliable, as individual results can vary wildly. This is why case studies, or studies with one or a few individuals in uncontrolled conditions, are poor evidence. The larger a sample size is, the less individual problems will influence the final results. Though a larger sample size cannot force quality evidence, statistically speaking, larger studies are better.

The design of a study is vital, and can be broken down into a few categories:

  • A case study, mentioned above, is a study on isolated individuals in uncontrolled settings. If an interesting result occurs, it can lead to a paper on the topic and an interesting discussion. A case study can lead to hypothesis formation, but should never be used as evidence for a decision. A case study lacks both sample size and a controlled setting.
  • The study could be a survey. A survey could go one of two ways. It could be a prospective study, where researchers would follow a group of people over a period of time and study them, or it could be a retrospective study, which means researchers ask a group of people questions on their past. Surveys have large samples, but are not controlled studies. They tend to be used to determine what future studies should investigate. For example, a survey could uncover a potential link between fish oil and aggravated prostate cancer, but future controlled studies would be required before claiming the new link as fact.
  • Studies that can be used as evidence are controlled trials. This kind of study includes a defined methodology (how the study is conducted) and a set of drawn conclusions. The nature of the study allows for the control of variables, which will limit result interference. Another variant of the controlled study is the double blind trial, where the placebo effect is also controlled for. In many cases, a double blind trial is very important, as the placebo effect can be very potent.
  • The controlled trial can also be carried out in a crossover design. This means the treatment and control (or placebo) groups switch places at least once, to ensure every individual is in each group. This is another layer of controls on the study and can improve results, assuming the period between the crossover is sufficiently long.

Use controlled trials for evidence. Double blind studies are the go-to, and in some cases a crossover study can improve results. Numerous trials are the ideal. One trial is usually not sufficient to guide a decision.

Who are the subjects of the study?

The demographics of a study, the people that are being investigated, matter. A study conducted on elderly people may not apply to young adults, and a study of men may not apply to women. Ideally, the demographics of a study should match the person that will be affected by any decisions drawn from the study, or at the very least, care should be taken to confirm that there are no known differences between the groups being tested and the individual seeking to make a decision.

Similarly, animal studies are important to critically access. Something that occurs in a rat could occur in a human, but unless there is some evidence to show that this is the case, information from the rat study should be taken with a grain of salt. Animal studies are valuable, but it would be prudent to avoid basing conclusions solely off of one. They can, however, add valuable insight into a conclusion involving human studies, by providing an example of how things happen, rather than just what happens.

The baseline conditions of the people being studied also matter. A compound that improves insulin sensitivity in diabetics or the obese may not do the same thing in lean athletes. Try to match the participants of the study to whoever the study is being applied to. The closer conditions match, the better.

Try to match the subjects of the study to the person you’re applying scientific evidence to. Possible differences do not nullify the data, but they may explain why something worked in the study but does not work for you.

What happened in the study?

Several aspects of the methodology are important to consider when evaluating a study. Keep in mind:

Dosage, or how much treatment was administered. This is of particular interest in rat studies, since the animal may be force fed to force a particular result, one that would not normally occur. This also applies to exercise. What kind of exercise was done and how much?

How was the dose administered? Sometimes a dosage prior to a specific event (sleep or exercise, for example) is mandatory during the study. Other studies might involve random dosages or meal-timed dosages.

Duration, or how long the study lasted. Measuring bone mass or muscle growth during a study that lasts a few weeks or days is impossible. Similarly, a study looking into a pre-workout supplement should be a single dose study that lasts less than a day.

Just like you have to pay attention to the demographics and conditions of a study, you should also pay attention to what was administered, and for how long. Usually, this is held constant between studies, but in some cases these changes will impact the observations.

What were the results?

A conclusion is the final takeaway of a study. It is important to determine whether the results rely on correlation or causation and whether the final result was a measure of a biomarker or phenomena.

A correlation means that A and B both occurred and were either positively or negatively connected. A positive correlation means that when A increased, B increased. A negative correlation means that when A increased, B decreased, or vice versa. A correlation is usually the best you can hope for in a human study.

A causation is usually found in animal studies. Researchers are able to genetically modify an animal to block the relationship between A and B, to observe the effects. For example, eliminating the creatine kinase enzyme will block the ability of creatine to increase levels of ATP in cells. If a specific compound produced a result in the presence of creatine kinase, but did not produce the same result without creatine kinase, a causation can be inferred.

The difference between correlation and causation is why researchers use animal models and in vitro studies. If a correlation is uncovered in a human study, further non-human studies may be required to determine how a process works. When evaluating studies, keep in mind that a correlation is not a bad thing, and may serve as enough evidence if the how of a process is less important than what happened

A phenomena is an end point that researchers want to influence.

A biomarker is measurable and thought to accurately represent a phenomena. Measuring a biomarker allows researchers to indirectly measure the associated phenomena.

For example, dying from heart disease is a negative phenomena. Biomarkers for heart disease include the concentrations of homocysteine and triglycerides in the blood. Homocysteine and triglycerides are not inherently bad, but they are positively associated with dying from heart disease. Sometimes, reducing biomarkers also targets an unknown factor that actually causes the phenomena.

Another example is the relationship between muscle protein synthesis (the biomarker) and actual muscle growth (the phenomena). Nobody wants to increase muscle protein synthesis unless it actually results in building muscle.

Remember to differentiate between correlation and causation when evaluating a study. Know the difference between biomarkers and phenomena. Both correlations and biomarkers can provide great information, but care should be taken not to confuse them with their trickier and harder-to-evaluate cousins.

Magnitude of Effect and Clinical Significance

Statistical significance and clinical significance are two very different things. Unless the word ‘clinical’ is specifically referenced in the study, any significant effect most likely refers to the statistical.

Statistical Significance determines whether a particular outcome happens because of what was done in the study. This significance is based on probability. Researchers decide on an alpha value, usually 0.05, or 5%, though they may go as low as 0.01, 1%, or 0.001, 0.1%. If a result is statistically significant at an alpha level of 0.05, that means there is a 5% chance the result is due to chance.

Clinical Significance is a judgement call made by the researchers as to whether the result or change has enough significance to make a difference in the real world.

For example, in a study that shows a reduction of triglycerides by 2% (P < 0.05), the result is statistically significant, since there’s less than a 5% chance that triglycerides were reduced by chance, rather than by the effects of the study. However, the result is probably not clinically significant, since a 2% reduction in triglycerides is very small. Practically, the intervention practiced in the study technically reduces triglycerides, but not by enough to matter to you, the evaluator of the study.

If a result fails to reach a statistical significance, it could be due to the methodology. Usually, a study that fails to reach a statistical significance is forgotten about and not reinvestigated. However, if the results are clinically significant (if, for example, triglyceride levels fell by 50%), future studies would be conducted and researchers would take care to change the methodology to reach a statistical significance. The next study might be conducted on only people with high triglycerides.

This is why the replication of studies is so important. A study that is statistically significant (P < 0.05) still has a 5% chance of being a false positive. If two independent studies both suggest a statistically significant result, the chances of a false positive drop to 0.025%.

In a nutshell, statistical significance represents the researcher’s confidence that their result is due to their study, not random chance, while clinical significance is a marker real-world relevance. The best results meet both criteria.

Consensus of information and cherry-picking

Even if you find the perfect study, it still may not be sufficient to draw a conclusion. Scientific evidence requires a consensus. Data can be chosen specifically (cherry-picked) by a writer to present their side of a story. Cherry-picking can be intentional or accidental, but the harm is done to the reader, who does not get the full story.

Databases play a vital role in creating consensus. Look for independent resources like Medline (Pubmed). When you find a good study, compare it to other studies with a similar methodology and determine whether the topic is debatable or there is consensus on the issue.

Unfortunately, the only way to determine if a study has been cherry-picked is to compare it to the other studies on the topic.

Reading science and other tools

Science should be your favorite tool, but don’t discount the other gadgets at your disposal. Experience, word of mouth, appeal to authority, and broscience can also be useful, when science doesn’t apply.

A good example is exercise physiology. Compared to pharmaceutical sciences, exercise science tends to:

  • Have lower sample sizes. A pharmaceutical study on aspirin or fish oil may have up to 200 people. Some glucosamine studies have 500 or more, whereas many studies on squatting use 12-20 college-aged males, since they were available to researchers.
  • Have skewed demographics. Most studies are on new trainees that the researchers could recruit, since the people interested in improving their physical performance have already done the research and are at a moderate or advanced level.
  • Have non-matching methodology. Leg extensions and VO2 max training on a treadmill are very common methods to access physical function. Meanwhile, many supplement users prefer barbells or run the track.

Sometimes, the science behind a supplement or drug, as is the case with fish oil and aspirin, is so well-established that it can answer all of your questions, and other tools are unnecessary. When addressing more difficult questions like “How can I be the best athlete I can be?” you may need to combine science with other tools.

Read science, let it guide you, then turn to the experts in the field and see what they have to say. It can be fun to try to find a bridge between broscience and science. Try starting at ‘the pump’ and nitric oxide signalling. If you defer to an authority, investigate their track record of helping people.

The scientific method and its collection of evidence, when examined properly, are very powerful tools. Find what works for you, using whatever tool is best suited to answer the questions at hand.

Author Bio:

195 Kurtis Frank is the Director of Research at, an independent and un-biased site that investigates the scientific evidence behind supplement claims.

 Go here to check out the The Supplement-Goals Reference Guide.