Saturday, August 14, 2021


 

A Basic Guide to Drug Studies

Kenn Brody August 14, 2021

In this pandemic age, every pundit and panic-monger has an opinion. Masks or no masks for kids? Vaccine or not for COVID survivors? Lockdowns or no lockdowns? Who is right? What is the best thing to do?

There is rarely any evidence presented. It all comes down to which authority we trust, as if authorities had no political biases. But they do. They all claim “science” as if it was a religion. Science is not an article of faith.

 

Science is based on evidence, not opinions and not on authorities. Evidence in COVID research is presented in published papers generally reviewed by panels of experts prior to publishing. Not everyone is going to be able to read these papers. But it is now necessary for at least SOME of us to be able to read and understand what a scientific study is, and whether it is good evidence that supports some policy. At least a few people in every community should have this level of understanding and then interpret it for the rest of us. That is the only way we are going to overcome the mainstream narrative.

The goal of this article is to give you a start on this understanding, some of the methods, some of the terminology and some of the sources of data. You, the reader, will form your own opinions about the issues after you read a few scientific papers.

Unlike physics or mathematics, medicine relies less on fundamental theory and more on comparisons. What works versus what does not work seems like a simple concept, but untangling the threads of causes and effects takes a bit of cleverness.

Charlatans all know that giving you a sugar pill with a fancy name and a rigmarole will cure aches and pains, at least for a short time. This is the placebo effect. In order to be an effective drug, it has to be better than a placebo. So we have two- branched, double blind, placebo controlled studies. What this means is that a study population is divided roughly in half. One half gets the drug, the other gets the placebo. Now imagine running such a study with thousands of subjects,

dozens of nurses and data collectors, over months of time, under the supervision of doctors and academic specialists approved by a institutional study committee. It is a significant task, not something done on a whim.

There are two branches. In one branch subjects get the drug, in the other branch subjects gets the placebo. Neither the subjects nor the administrators know which person is getting the drug or the placebo. It is blind to the subjects and blind to the administrators. It is double blind. Each subject is assigned a random number. Half the numbers selected at random get the drug, the rest get the placebo, or some equivalent arrangement.

The administrators follow the subjects until some pre-selected effect is observed, or some time limit is reached. That pre-selected effect is called the endpoint. It may be the onset of a measurable response such as a negative blood test for COVID. Or a release from an ICU. The design of the study and the endpoint is critical. It isn’t always possible to find a perfect endpoint. For example, we cannot always determine which variant of the virus was involved in particular subject, so we may use a proxy endpoint. In the U.K. they used a general test for COVID, the paqPCR,to distinguish between the original alpha variant and the delta variant. The paqPCR detected a spike protein for the alpha and no spike protein but other virus parts for the delta variant. Was this appropriate to the purpose of the study? Did the paper clearly explain why they used a proxy endpoint? You read and decide.

Sometimes you will find that the endpoint has little or no relation to the purpose of the study. This is a big red flag.

The number of subjects is referred to as n. A study with a small n is going to yield a less reliable result. Larger studies are not always better, though. Results can get messy. What you’re looking for is n in the range of thousands of subjects, not tens or even hundreds in most cases. The reliability of a study, the length of the error bars, is proportional to the square root of n. Larger studies have smaller error bars.

Some math is unavoidable here.

Suppose you take a list of all the subjects of some study with their endpoint virus clearance results. This might be the number of virus particles detected per nanoliter of saliva sampled. It might range from 0 to 1000. You add them all up and divide by n. This gives you an average. But how good is that average? It might be all 0 and 1000s and nothing in between, or it might be a good distribution of numbers. So you do another calculation, the standard deviation. The raw numbers, if you plot then on a graph, will give you a curve. You are hoping for a normal distribution curve with a single hump in the middle pretty close to the average. The standard deviation then gives you the slope of the skirts of this curve. The result of the measurement is properly represented as the average plus or minus three standard deviations. Another way of saying this is that the average, say 55, is likely to be accurate to within plus or minus (=/-) 1.2 for 3 sigma. Sigma is one standard deviation. The higher n, the small sigma will be.

Remember we have TWO branches. We have to reduce each branch to a separate standard deviation and sigma, and compare the two results. Suppose we find that the drug branch has an average of 22 with a sigma of 2 and the placebo has an average of 43 with a sigma of 3. These ranges do not overlap. We can say that the drug, whatever that is, is effective with a ratio of 43/22 or 1.95. This is the OR the odds ratio.

There is one other test that is common in medicine. It is possible that the result we got, 1.9 OR, is just an accident. Maybe some other effect is hidden here. So, we ask, what is the possibility that the same lists of numbers could yield this result? So we mix up the results in the two branches, randomly, do the averages and sigmas again and compare this mixed, random result with the result we got before. Do the error bars overlap? We want them to be separated by more than two sigma. In ordinary medical parlance, this gives us an acceptable p-value l of .02. In plain language, the chance that some weird random data error caused our OR is less than 1 chance in a fifty.

Beware of sources that quote something like, “Your chances are 26% better with our product.” Better than what? Suppose your chances of getting across a highway without being hit is 26% greater if you wear a red shirt than a blue shirt. OK, I’d better wear a red shirt all the time, you say. But they never gave you the incidence of people with blue shirts getting hit on that highway. It turns out to be

12 in 100,000 crossings. Hardly anything to be concerned about no matter what color shirt you wear.

Anytime you are given a percent improvement without both a comparison and an incidence you are getting propaganda, not data.

Let’s do a quick review here:

  • -  Is the study from a peer reviewed journal

  • -  Is it double-blind placebo controlled?

  • -  Is the endpoint or the proxy endpoint reasonable for the purpose?

  • -  Is n a reasonably high number?

  • -  Is the OR significant?

  • -  Is the p-value good enough?

    It is customary for the researchers to make their raw data available for further analysis by other researchers. I would be somewhat concerned if this data was being withheld. In for-profit Big Pharma, this is not uncommon. However, since we are talking among us amateurs, this is a nit.

    Let’s take a look at some of the complications of studies and how they can be designed to correct for other factors.

    Usually, a well-designed study looks for one effect as determined by a single endpoint. There are such things as trial and error studies, but the expense and difficulty of doing a large enough study with a significant n makes such trials uncommon. We won’t consider them here.

    How much drug do we give? How much is needed to work? What are the safe limits and side effects? FDA requirements separate new drug studies into three phases: Phase 1 is for the safety of the drug. Phase 2 is for the effectiveness of the drug. Phase 3 is a mass study for dosage, tolerance and side effects. You can get more details here:

https://www.fda.gov/drugs/information-consumers-and-patients-drugs/fdas- drug-review-process-continued

Each phase has a separate set of statistics, endpoints and results. You should get the Phase 3 study if you want the best picture. The earlier phases are usually too small to be of benefit to a casual reader.

What about confounding factors? Suppose we look at studies to determine whether masks work for COVID. How would you design such a study? Not all masks are equal. An N95 mask filters out particles down to 300 nanometers in diameter. Who knows what your hankie-gater filters? How often do you wash that mask in your pocket? How susceptible are you to COVID? Do you have natural immunity? Did you take the shots? Are you outside in sunlight or trapped in an elevator? Age and access to health care are significant in COVID infections. How do you sperate out those factors?

What would you choose for an endpoint? Whether you get sick, go to the hospital, get put in an ICU or die?

It’s a mess. Here’s how you begin to separate this out. It’s called a cohort- matched multi-phasic study.

Get a lot of subjects together. Thousands. You’ll need a lot. For each subject gather the following data: mask vs no mask; age; co-morbidities like cancer or heart disease that may make them ineligible; economic sector; sex; ethnicity, geographic location.

You take all this data and separate the subjects into the usual two branches, call them A and B. What are you looking for? Mask efficacy. So you start with masks users vs no maskers. Then, within each branch you try to match subjects in A with subjects in B that have the same data: the same geographic location, age, etc. You need a significant number of subjects in each data classification. These are your matched cohorts.

The simple idea is that if two people with the same age, location, etc. differ only in mask usage, than looking at how many in each branch get COVID will tell you something about how well masks work. You cannot just see how many people in Florida get sick vs how many people in New Jersey get sick. There are too many confounding factors, such as weather, population age, economic sector, etc.

At this writing, there is no such cohort-matched study on masking. Finally, let’s mention other types of studies.

A retrospective study is done on other studies that are all relevant to a single determination. Instead of a new study, the researchers are effectively mining previous studies for new insights. This can be done by looking at older studies and reviewing newer results, a retrogressive study. Or it can combine the data of several studies and attempt to winnow out confounding factors using a linear analysis.

The kinds of ways studies can be designed varies with the science. Physics is different from medicine. Medicine is different from biology. In each science a researcher must apply cleverness to yield insight to some natural phenomenon, and that is a creative thing.

I hope you have found this guide useful. I welcome your feedback.

Where to find COVID data:
World-wide COVID data on WorldOmeter:

https://www.worldometers.info/coronavirus/

COVID deaths by age in US:

https://www.statista.com/statistics/1191568/reported-deaths-from-covid-by- age-us/

Children state by state COVID report:

https://www.aap.org/en/pages/2019-novel-coronavirus-covid-19- infections/children-and-covid-19-state-level-data-report/

Deaths per Million Population by Country:

https://www.worldometers.info/coronavirus/

scroll down on this page
New York City COVID statistics by age, sex, morbidity:

https://www.worldometers.info/coronavirus/coronavirus-age-sex-demographics/

P-values, confidence intervals and odds ratios:

http://www.bandolier.org.uk/painres/download/whatis/What_are_Conf_Inter.pd f

Standard deviation calculator:

https://www.calculator.net/standard-deviation-calculator.html

No comments:

Post a Comment