I compared the Mevo ($450) to the current gold standard of doppler radar launch monitors, the Trackman 4 ($19,000)
The Mevo performs extremely well for ball speed (on any given shot, less than 1% off)
The Mevo performs very well for club speed (usually less than 3% off)
The Mevo performs pretty well for carry (usually less than 5% off)
Spin and launch angle are highly erratic and are almost unusable (20-60% off)
Below are the simplest and fastest graphs to summarize the accuracy:
The Mevo ($450) is a portable radar launch monitor, which uses Doppler radar to measure different parameters of a golf shot. There is a wide variety of launch monitors at various price points, but at the top end for radar models is the Trackman 4 (starting at $18,995), which is the gold standard I used for these tests. Obviously owning one is impractical for almost everyone outside of pros and coaches, but thankfully they can be rented for $30/hour where I live. I spent a couple sessions hitting on one and comparing its read of each shot to the Mevo. Both devices were calibrated for the altitude I usually play at in Akron (1,000 ft. above sea level).
Data was collected indoors with 8 feet of unobstructed ball flight. Reflective metallic stickers were used on each ball to improve accuracy, per the recommendations of Flightscope. I used Taylormade TP5x’s for all shots. I include “2022” in the title because a disgusting amount of data interpretation happens behind the scenes with these companies and their proprietary calculations, and Flightscope’s firmware updates improve accuracy even as the hardware remains the same.
I think it’s realistic to break this down to two parts – accuracy for individual shots, and accuracy for club gapping purposes (i.e., looking at averages for 5-10 shots with each club). For the latter, if someone hits 10 shots with each club, the individual variation doesn’t matter as much so long as the averages are close to the true (Trackman) averages. So for individual shots, I’m going to avoid using “average % error” as that could smooth out the nuance if the Mevo mixes over-reads with under-reads – it would improperly return a value very close to zero. I’ll use the absolute value of % difference on each shot. I’ll show the % difference for individual shots for a couple sets of data (there is too much data for this to be practical for each shot and club). For session averages, these will be true averages.
Below is some of the raw data after I pasted everything from both Trackman and Flightscope’s websites to Excel:
Results: Comparing session averages (5-15 shots with each club)
Results: Comparing the parameters for individual shots:
For 16 swings of a driver, I compared the ball speed (mph) that Trackman measured and the ball speed that Mevo measured. Below is the percent difference from Trackman’s speed that Mevo’s measured on these 16 individual swings. The Mevo never missed a shot and never differed by more than 0.8%. On average, it slightly misread on the high side. It is important to point out that the differences here are extremely minute – the session average on Trackman was 149.2mph, whereas Mevo’s was 148.8mph. This difference is too small to have any actionable impact, even on a professional fitting.
Below is the same data for the other parameters:
Results: Average difference from Trackman (Abs):
As discussed above, I think using the absolute value of the difference from Trackman on each individual shot is a fair way to gauge “how far from the true value might this be?” when you are using your Mevo on the range and receive a carry distance or ball speed. So below is the average of this difference on each shot:
Mevo performed admirably given the price point and its competitors. Its ball speed measurements are effectively indistinguishable from Trackman. At 160mph, a 0.5% difference would be reading 159.2-160.8mph, which is not significant enough for anyone to care about, outside of robotic equipment testing. Club speed is largely consistent, if slightly more inaccurate. Launch angle was consistently over-estimated. Spin was highly erroneous. Carry is influenced by spin and launch, and poorly read spin numbers influenced carry distance at times. To give an example, here is a tale of two reads with a 4 Hybrid shot that I struck thin:
The gruesomely over-estimated spin and launch lead to a low carry distance, even though Mevo nailed the ball speed.
I think the best use of the Mevo is if you have some sort of baseline familiarity with your launch monitor numbers, particularly ball speed. For people trying to build swing speed, ball speed is an important parameter to watch when using your driver. And the Mevo’s ball speed is effectively indistinguishable from your true (Trackman) ball speed. If Mevo reads a spin of 6,000rpm on a well struck driver shot, it is useful to be able to say “I know that’s not true” and throw out the carry distance, which would be skewed, while understanding the ball speed is still fine to interpret.
Trackman isn’t infallible, and even though it was treated as the ‘true’ values, it’s only an estimate. In indoor settings some professionals prefer photometric launch monitors (e.g., GC Quad). Its true strength lies in outdoor use where it can track the full flight of a shot. As these tests were performed indoors, they are limited. However Trackman likely represents the technical limitation of Doppler radar launch monitors as of 2022, and my goal was to see how close a $450 device can come to this standard.
I will add more data with irons this fall as I have time.
Can be successfully approximated and multiple methods will work ✅
Below is an image of an ECG I made from my own heart, following the methods characterized in this article:
At the bottom of the article I include an actual heart attack as measured by an Apple Watch (compared to the patient’s professional ECG)
The art of using a 1-lead device (or 2-electrode device) to record all 12 ECG leads is actually not new. In 2008, Dr. Grier at NDSU outlined the framework for doing this with handheld devices, far before smart watches even hit the market! However, unlike those 15 year old devices and even new home ECG readers (e.g., Kardia’s products), Apple Watches offer a ubiquity that makes them very interesting for this purpose if chest pain strikes in a location with delayed healthcare access (skiing, hiking, airplane rides).
Measuring I, II, III, aVL, aVF, aVR:
Leads I, II, and III can be easily measured on the Apple Watch. The watch uses its back crystal as the positive electrode and the crown as the negative electrode. Below is a sketch of how Lead I can be measured. Leads II and II can be similarly measured.
Lead I: Watch on left wrist, right finger on crown
Lead II: Watch on lower abdomen, right finger on crown
Lead III: Watch on lower abdomen, left finger on crown
The augmented leads are a bit more complicated. Unlike the other leads which are largely just the potential difference between two different points on our body, the augmented leads are calculated. Their calculation averages two locations and then uses the actual potential from another location – the same professor mentioned above, Dr. Grier, has a very nice graphic that explains it better than I can:
Measuring these leads would thus involve using a wire to connect different body parts (e.g., left arm and left leg), then holding that to the watch, while the other part (e.g., right arm) completes the circuit on the watch. I did not have wire handy and also wanted to simulate a situation where someone could be in a remote or inaccessible location (e.g., skiing, on a plane, etc.) where an Apple Watch is the only tool they have.
Measuring Precordial Leads, V1-V6:
In ECGs, leads I-III have easy to understand measurements, as there is a positive and negative electrode, and the difference in potential between them is measured. The precordial leads (V1-V6) overly the chest wall at 6 different positions, however they don’t have a clearly apparent second electrode. There is no actual second electrode, but instead a theoretical position deep in the chest (under the heart) known as Wilson’s Central Terminal to which the precordial leads are compared. That theoretical point obviously doesn’t have a lead placed on it (that sounds painful!) but is approximated by averaging all the limb leads and treating that potential as the reference to which the precordial leads are compared against. The details and derivation of this are discussed in detail in section 15.3 of this text.
In summary, Wilson’s Central Terminal cannot be approximated on an Apple Watch without connecting all of your limbs with copper wires and then connecting that to one of the Apple Watch electrodes. This is unwieldy and defeats the purpose of a portable, wearable ECG. However, there are means to approximate it which are effective. I found two approaches in the literature to this approximation:
Using the right arm (wrist) as an approximation was the simplest approach, and was described and validated by a cardiologist in this 2019 publication
Below is the monstrosity I spent the last 30 minutes recording, outputting as 9 separate PDFs, then stitching together in Photoshop. Of note, these are not simultaneous readings from each lead, but 9 discrete intervals recorded individually.
Brief overview of how this 9-lead ECG does and doesn’t provide more information than a 1-lead ECG:
could be obtained without this 9-lead nonsense
likewise to above
likewise to above
likewise to above
lack of aVF would initially appear limiting, however I being positive and III being (roughly) isoelectric suggests my QRS axis is normal and something like +30 degrees. So the added leads do give us more information!
Definitely not possible on a 1-lead ECG. As for the quality of this reading, the technician who placed these leads might need to review his basic anatomy (the technician is me). I am unaware of any current or prior infarct (I’m a healthy 25 y/o M) so I will use my best judgement to say the curious R-wave progression (near-nonexistent in V1-v3 -> massive increase in V6) is a combination of me being a thin young male as well as sub-optimal precordial lead placement. Even with the questionable recordings here, some sort of progression is still clearly apparent.
Definitely not possible on a 1-lead ECG unless changes are only in lateral leads. My ST segments are visualized and lie on the baseline on every lead. I’m a bad patient as I (hopefully) don’t have any current ischemia, and this ECG supports that. In studies where patients have had actual ST segment elevations, MI patients have been followed with these sort of Apple Watch 6 or 9-lead recreations… they have routinely been detected!
Academics who have published on this topic have published interesting results such as using actual STEMI patients, comparing the Watch ECGs to professionally performed ones, as well as having actual cardiologists perform blind reads of ECGs to assess sensitivity and specificity. The results have consistently shown that the Apple Watch, while unwieldy, can perform largely accurate ECGs.
An undergraduate student’s December 2020 review article on this topic, “Einthoven and precordial lead accuracy of smartwatch-acquired electrocardiographs: a review of the literature”, can be found here. It is a great discussion of the underlying electrophysiology, successes, and limitations of this topic.
I think the most interesting cases are taking actual STEMI patients and seeing how accurately the Apple Watch could measure them. Out of Italy, this 2020 JAMA Cardiology article used Series 4 Apple Watch ECGs to record 100 patients (54 STEMIs, 27 NSTEMIs, 19 controls). In their supplemental materials, they included a comparison of a STEMI as perceived by the Apple Watch vs. their standard ECG equipment:
I think there is true utility to these, but only in very specific scenarios where access to medicine is forcibly delayed. If transit to a hospital is possible, then precious door-to-balloon time would be wasted as you place an Apple Watch in various spots over your bare chest. However, situations that come to mind are remote recreation (skiing, hiking), airplane travel, and catastrophe, where most owners of Apple Watches would be expected to have them on-hand, and there is a forced delay until emergency personnel can get you to a hospital / cath lab.
The natural follow up question to this is whether or not this hyper-early detection (not only pre-ED but pre-ambulance!) would actually change outcomes. I’m not sure I’m qualified to answer that. The impact of early ECGs on STEMI care has been explored in several studies, however those have been more directed at whether or not having paramedics perform ECGs pre-hospital will improve outcomes.
This article was last updated on August 22nd, 2022, and I will add to it as I make better graphics and receive access to some articles I am waiting on.
I am honored to present the 2021-22 NFL All-COVID team, presented by the PFFA*, with the motto “COVID got their guy”. Defensive selections to be announced in a follow up post.
*PFFA, the Pro Football Fans Association, is not a real association and its only member is me. For legal reasons I’d like to emphasize that I have no relationship to the NFL.
Brief selection notes:
This is my selection of an All-Pro team that only consists of those who contracted COVID this season. Players who contracted COVID only during the 2020 season were not considered eligible (e.g., Trent Williams). Players who contracted it during the camps and OTAs that preceded the 2021 season were considered eligible (e.g., Penei Sewell). Players needed to test positive for the virus – only landing on the COVID list doesn’t count as that includes players in the protocol for a close contact, etc.
The Pro 7 with an i3 or i5 processor doesn’t come with a fan and relies on heat conduction through a heatsink and its metal frame to cool itself (‘passive cooling’). This is problematic when the device is under high load, or doing hard work. In most computers, a fan overlying the CPU would speed up to counteract the higher temperatures. The Surface Pro does not have this luxury and therefore has no safety net to avoid unsafely high temperatures. Because of this, its only option when things get too hot is to throttle the power to its own CPU. As the CPU is the engine that runs the laptop, this slows the laptop down noticeably. To keep it from throttling I wanted to do something to improve its ability to stay cool.
Linked above, I used a liquid cooler from Aliexpress that was intended for mobile phone gamers. I ripped off the clasp that holds one end of the phone down and it is perfectly suited to sit on the back of a laptop.
The heatsink would be most effectively placed over the hottest point on the device. In a Surface Pro, this is the part of the case that overlies the CPU (center-upper right if you are looking at the screen). I have included a thermal image below which supports my experience of this being true.
Ideally we would have some sort of thermally conductive paste gluing the metal case to our liquid cooler’s heatsink, but as anyone who has installed a CPU can tell you, it’s gross and I don’t want it on my laptop. I found that the heatsink was able to do a satisfactory job just resting against the laptop.
The graph I attached at the bottom uses Passmark’s PerformanceTest 10.1, specifically testing the CPU benchmark. The grey line demonstrates what will happen under typical conditions – after only 6 minutes of heavy use, CPU benchmarks have dropped by ~35%.
With the liquid cooler running during these serial benchmarks, the CPU was able to maintain close to full speeds after several minutes of work. The raw data here involved a drop from 9944.6 to 9330.6, only a ~6% decrease in CPU benchmarks.
If you are seeing this post, you’ve stumbled onto something I’m working on that is incomplete. I’ve published it anyway as I work on it
This is the 2nd half of an essay I wrote for an upper division Classics course at Ohio State University. With the scarcity of recent writing on this film, I wanted to give others a chance to stumble on it. I change it much from the PDF I submitted so sorry for any formatting errors…
The Siege of Sarajevo was a 1,425-day long siege that is among the most brutal attacks on a civilian population in recent memory. For background, there were three prominent ethnic groups in Bosnia and Herzegovina; Croats, Serbs, and Bosniaks. Ethnic tensions were at a breaking point. After the early 1990’s breakup of Yugoslavia, nationalism was at a peak. In a 1991 census, “44% of the population considered themselves Muslim (Bosniak), 32.5% Serb and 17% Croat, with 6% describing themselves as Yugoslav” (Klemencic 2004). The three ethnic groups each had a vision for their state’s future. An independence referendum was voted on completely by ethnic lines. As war began, Serbs fought to keep territories from claiming independence, and Croats and Bosniaks fought together against them. Eventually war broke out between Bosniaks and Croats as well.
The Serbs began an ethnic cleansing of Bosniaks, which Serbia and some others still deny to this day (several prominent Serb leaders – both Bosnian Serbs and Serbians – were found guilty of genocide in UN war tribunals). By April 1992, the rebel Serbs controlled 70% of the country and began shelling the Bosnian capital of Sarajevo (Carroll 2009). This war torn city has become of the most identifiable parts of the Bosnian War.
Though it may be only a few paragraphs in history textbooks on our side of the Atlantic, it was a World War-esque Hell in Bosnia. In September 1992 (in the 5th month of the siege which would last 41 months more), William Pfaff wrote;
“They will not be conquered because a large modern industrial city of 350,000 people cannot be taken other than by a street-by-street infantry and tank assault, which is entirely beyond the abilities of the Serb militia. The actual resistance of Sarajevo`s people consists mainly of getting up each morning, going to whatever work they are doing, doing it, carrying on, finding something to eat, avoiding the snipers, and sleeping fitfully through the nighttime bombardments.” (Pfaff 1992)
Serb forces fired an average of 300 shells per day into the city. Snipers were described as shooting “anything that moved”. This ‘anything’ very much included children. Thousands died in the siege, many (although not all) by the tactical, calculated bullets of snipers.
Various accounts of their actions were brought up in UN war tribunals, however I did not find any information on snipers being indicted for war crimes. In a quest to avoid biased sources (and regarding the Bosnian War, there is no shortage of these), I found YouTube footage from a 2012 UN International Criminal Tribunal for the former Yugoslavia (ICTY) prosecution, where prosecutor Dermot Groome lays out a case against former Bosnian Serb general Ratko Mladic. I realize that a prosecutor is, by nature, a biased party, but I am drawn to the legitimacy of the UN. Here, he described the sniper fire:
“Mladic’s use of snipers in the context of the attack on the civilian population was not at all like the use of snipers in armed conflict. It was a strategy of shooting civilians from a hiding spot, giving them no warning or reasonable prospect of taking cover. It was about creating insecurity, about creating terror.” (ICTY 2012)
Groome also presents the observations of a volunteer firefighter in Sarajevo:
“The thing I noticed about certain attacks was that Serb shooters would go after the youngest in the family… in a crowd of girls, it seems the most attractive would be shot. It seems there was something very personal, almost grudge attacks, doing whatever would cause the most pain to survivors.” (ICTY 2012)
All in all, I have probably spent enough time delving into the horrors of the Siege of Sarajevo for the purposes of this essay. In short – it was Hell.
Knowing what we do about the Siege, it makes sense now that a foggy day was such a special occasion for Sarajevans in the final scenes of Ulysses’ Gaze. Under the blanket of fog, the civilians were shrouded from sniper fire. I found this quote from the archive curator, Ivo Levy, especially powerful:
“Footsteps and voices? […] the fog… I sensed it. In this city, the fog is man’s best friend. Does it sound strange? It’s because it’s the only time the city gets back to normal. Almost like it used to be. The snipers have zero visibility. Foggy days are festive days here, so let’s celebrate!”
The curator hears music in the background, and excitedly says;
“Music… oh yes. A youth orchestra… Serbs, Croats, [Bosniak] Muslims… they come out when there is a ceasefire. They go from place to place and make music in the city. How about it, shall we go out too?”
After spending several paragraphs above discussing the atrocities of the Siege, this scene is especially meaningful. In a bloody, genocidal war, this youth orchestra is symbolic and idealistic, representing a group of the three ethnic groups coming together in the time of peace. It represents how things could be, or maybe how they used to be.
A arrives at Sarajevo after escaping Calypso’s island, notably wearing another man’s clothes. However, A’s time in Sarajevo does not end as happily as Odysseus’ journey ended in Ithaca. As he and Levy enjoy the foggy day in Sarajevo, they run into Naomi, the daughter of Levy (played by Morgenstern, the same actress who plays essentially every woman A interacts with). They dance, as she likely represents a Penelope figure. Naomi, A, and Levy begin to walk down the river, but a car of soldiers rolls up off-screen. Levy tells A to stay back as he follows his daughter to investigate. Following a Classical Greek tradition of off-screen deaths, Levy and his daughter are gunned down by soldiers behind the wall of fog. A finds their bodies and wails in a long shot where the camera backs away just enough for him to be completely obscured by the fog.
In a great stroke of luck, I was able to find a PDF  of ‘Interviews’, 195 pages of combined Theo Angelopolous interviews translated and published as a book. Here, I found the transcript of an Israeli radio show where Angelopolous described his fascination with Saravejo:
“My Interest in politics and the Balkans is very easy to explain. Look at the history of this century and you will notice that its first momentous event took place in Sarajevo, and now, as we approach the end of the century, we are again in Sarajevo. This proves to what extent we have all failed. Living in the Balkans, I am naturally much closer to the events, and much more concerned than the rest of Europe. I wanted to shoot in Sarajevo, but couldn’t. Everything was lined up for us to go there. We were all ready to go, waiting for our plane in Ancona, when the plane that left before us was turned back because the bombing had started again.” (Interviews 2001)
The 20th century history of Bosnia began and ended with tragedies in Sarajevo. To some degree, the entire 20th century history of Europe is bookended by these tragedies. In 1914, Archduke Franz Ferdinand was assassinated in Sarajevo (Backhouse 2018). At the time of the film’s release in 1995, Sarajevo was in the midst of its siege.
This argument about the cyclical nature of the film originally came from Marinos Pourgouris, a Greek academic, and his chapter in “Mythistory and Narratives of the Nation in the Balkans”. The famous journey of Odysseus had a beginning and an end in Ithaca, and his return brings a sense of order back to Ithaca. In Ulysses’ Gaze, references are made to the cyclical nature of A’s journey. In the beginning of the film, as he has returned to Greece from America, he states, “my end is my beginning”. 
This film is far from a typical Hollywood film, the typical type of film I engage with. Angelopolous himself has expressed a great deal of disdain for where film is heading and how Hollywood has changed cinema in Greece:
“We know that European cinema is not doing very well lately; less tickets are sold. The theatres today are no longer that privileged place of encounter between the creative artist and his audience. There is a small elitist minority still looking for that encounter, but the vast majority is favoring the American movies, which, as far as I am concerned, are not films but just images printed on celluloid.”
As I discussed on page 2 (note – this refers to part of the essay that I did not upload), the reviews for this film were highly polarized. Considering the raving endorsements and scathing critiques of Ulysses’ Gaze, I think this film is a great example of how a film can be received in such profoundly different ways based on the lived experiences of different people. Long spans of it still don’t make sense to me, but having learned more about the history of the region, I can appreciate the depth of the film. As someone who generally doesn’t understand deep, symbolic art and film, it was very exciting to take a deep dive into some of this film’s scenes as they related to Balkan history and Homer’s Odyssey.
“In any case, Greeks are a nation of emigrants. At the turn of the century, half of them went to America. There are one and a half million Greeks in the U.S. There are already 300,000 in Germany. They are everywhere, and instead of contributing to Greek economy at home, they are working for others. The Americans are coming into Greece now, claiming they wish to industrialize the country, but of course they will do it only if it is profitable for them. And Greece, for many, is now the fifty-first state of the Union.”
Question: You are implying Greece is a Third World country.
Angelopolous: That is the way things are. The Third World is not limited to Africa and Latin America. If you ask me, it includes Greece and Turkey too. We do not belong to the West, we are not part of Eastern Europe-we live at the crossroads of modern civilization. However, we happen to occupy a strategic point in the Middle East; therefore, we are important to American politics. Had it not been the case, their attitude towards us would have been completely different.
Q: You mentioned that Ulysses’ Gaze is, among other things, a love story. But is it really about love or the impossibility of loving? At one point, your protagonist says, “I am crying because I cannot love you.”
A : That phrase is taken from Homer’s Odyssey. Ulysses remained seven years on Calypso’s island, but he would often go down to the sea and cry. For he could not love Calypso; he was always thinking of Penelope. He wanted to love her, but couldn’t. As a matter of fact, at the end of my film, the hero meets once again his first love. It’s a film about firsts-first love, first look, the initial emotions that will always be the most important in one’s life.
: Available from Amazon here – https://www.amazon.com/dp/1578062160
: timestamp 13:08
Andersen, Odd. A Young Boy Plays 22 April 1996 on a Tank. April 22, 1996. Accessed December 5, 2018. https://www.gettyimages.co.uk/detail/news-photo/young-boy-plays-22-april1996-on-a-tank-in-the-sarajevo-news-photo/134244637.
Getty Images states “Photo credit should read ODD ANDERSEN/AFP/Getty Images”
Angelopoulos, Theo. Greece, 1997. Accessed November 29, 2018. https://www.youtube.com/watch?v=hO3b-bHmu1Q.
(Ulysses’ Gaze available on YouTube here)
Alberó, Pere. “A Gaze by Ulysses towards the Balkans.” Quaderns De La Mediterrània 23 (2016): 115-23. Accessed November 27, 2018. https://www.iemed.org/observatori/arees-danalisi/arxius-adjunts/quaderns-de-la-mediterrania/qm23/A Gaze by Ulysses towards the Balkans_Pere_Albero_QM23.pdf.
(3/26/18 – this is an unfinished post that I’ve put up so the data can be of use to others while I add to it / revise it. Also for best experience view on a desktop, the graphs are butchered on mobile)
The AAMC only has 3 scored practice exams released, leaving prospective test takers in a tough situation – to know where they stand, they have to take an AAMC scored exam. But those exams are a precious resource, as they’re the closest thing we can get to the real MCAT. If their score isn’t where they want it to be, they’re left with only 2 AAMC scored exams. Test takers of the past had 10 AAMC exams, but the post-2015 MCAT renders those practice tests worthless. Here I attempt to predict MCAT scores using 3rd party practice exam scores and user-submitted data from Reddit.
The data has been taken from this user submitted score spreadsheet, comprised of 844 user-submitted scores. It’s comprised of users of the MCAT subreddit and the Student Doctor Network forums. I only included individuals who took the MCAT between January and September 2017.
There is a tremendous amount of self-reporting bias in this data, which I’ll touch on at the end. Impossible scores were thrown out (one user reported a ‘406’ on NextStep Exam 1, which, like the real MCAT, is scored 472-528). I also excluded data from one other user who reported a 472 on the real exam after reporting 505, 504, 509, and 509 on NextStep Exams 1-4, and a 509 and 510 on AAMC #1 and #2. I don’t know if it’s possible to drop the ball that badly on test day so I’m calling it a fluke. These were the only two scores excluded.
The short story is – Kaplan’s scores are heavily, heavily, deflated, but still have predictive power. As an extremely crude conversion, you can add 10 points to your Kaplan score to get your AAMC score. This becomes less predictive at the upper and lower extremes. Kaplan and NextStep had the strongest correlation to actual MCAT scores, though this isn’t necessarily saying they’re the best practice material. It does mean that their scaling is the most accurate.
NextStep exams were slightly less deflated than Kaplan, but they had a similarly tight distribution (r2 of .536). An extremely crude conversion factor would be to add 7 to your NextStep average to estimate your actual score. NextStep seems to pride themselves in giving accurate scaled scores, which makes me wonder why theirs are still so deflated still. There is no ‘crude conversion’ for NextStep scores, as it is less deflated for average scores (498-510) and heavily deflated for high scores (510+).
The Princeton Review (n=190):
Princeton Review’s exams are absurdly deflated. The average person who scores a 503 on a TPR exam gets a 518 on the real exam. Princeton Review’s exams had the worst correlation to actual MCAT scores (this graph is an unfinished stand-in for one I’ll post later but the data should be the same).
So why are these tests so deflated? Part of it is my skewed data set (see next section). But I think it’s primarily because the test prep companies prefer someone scoring 505 on their practice exams and a 515 on the actual MCAT, rather than the opposite. Their “100% money back guarantees” rely on you outperforming your practice test score. Kaplan’s “Higher Score Guarantee” program will only redeem if your actual MCAT score is below your diagnostic score. They’ve built a ~10 point cushion into the scaling of their practice exams to ensure this won’t be redeemed often. Kaplan is the most popular MCAT prep company, and they have a treasure trove of student data that could be used to give accurate scores, if they desired.
Princeton Review offers a similar program, but the baseline score can be either your previous actual MCAT score (if taken within 90 days of the start of the review course) or the Princeton Review diagnostic exam taken at the beginning of your course. If you aren’t a retaker, and the latter is used, there is almost no chance you fail to beat that score.
I don’t think it’s a coincidence that NextStep, a prep company without a full-refund ‘Better Score Guarantee’, doesn’t deflate their practice test scores as heavily. They do have a guarantee for their tutoring, but it only redeems a free 2-hour tutoring session.
Personally, I was dejected when I got a 500 on my first Princeton Review practice exam. In reality, most people who score that do fine on the actual exam. Had I not worked with this data, I would have thought I was on track for a 500 on the real exam – a score that has a dreadful mean acceptance rate of 22.3%, per the AAMC . Princeton Review at no point notes how deflated their scores are, and many students probably don’t realize this. I now understand it’s their way of protecting their revenue while still offering a money-back score guarantee.
When I have more time, I’d like to add more test prep companies (Altaius and ExamKrackers), and compare individual sections to actual MCAT section scores to see if any companies have a notably higher correlation on one subsection.
Among the many shortcomings of this data, the most damning is the fact that the 3rd party test prep companies can change their scaling algorithms anytime without warning (and they do). NextStep has stated via email that they are constantly fine-tuning their algorithm, with a major overhaul in January 2017. Kaplan appears to do the same (as discussed above, I think NextStep may actually desire to be accurate in their score reporting and I imagine they adjust more frequently than other companies).
Then there all the biases that played into the data I had:
I am betting that users visiting the MCAT subreddit and other pre-medical internet forums are more dedicated than the average test taker.
Massive self-reporting bias – people are more likely to submit their scores if they’re impressive (the average score in my data was an absurdly high 515).
These practice tests are expensive, and the demographics of these test takers are almost certainly skewed towards rich folk.
3rd party exam scores were averaged and no consideration was given to how many exams they took. An individual who took 10 Kaplan exams and got 510 each time was treated exactly the same as an individual who took 1 Kaplan exam and got a 510, even though I suspect the person who took 10 exams is better prepared to score well on the actual exam. I’ll look into this one when I have more time but I’m guessing it’s less important than the scores themselves.
All in all, this is an extremely imperfect science with many pitfalls, but it was able to predict my score within two points, and serves as a guide for those wondering where they stand without any AAMC exam scores in hand.
1: Applicants with MCAT scores between 498 and 501 had an acceptance rate of 22.3% (1,241 / 5,571) for 2016-2017 and 2017-2018, from AAMC’s Table A-23.