Youyang Gu

Data Scientist

Six Months Later

September 28th, 2020

It’s been six months since I made my first COVID-19 projections. What started as a small side project became a months long endeavor. 180 days and 180 forecasts later, the pandemic shows no signs of abating as we head towards winter. When I started, there were only a handful of existing models, and very few of them were accurate. Now, there are over 30 models on the CDC Forecasting page. A lot of great progress has been made in the modeling space over the past six months, and I hope others will get to better know these other models in the months to come. After much consideration, I have come to the difficult decision to not extend my projections beyond November 1, 2020. I plan to make the last forecast update to on Monday, October 5. This was undoubtedly a tough choice for me, and I hope to convey my thoughts in this post.

Winding Down

There are several reasons that went into my decision, which I describe below:

  • Back in March and April, I was concerned by the lack of high-quality models being cited in the news and media. The numbers being referenced ranged widely from 60,000 deaths to 2,200,000 deaths in the US by August. My goal was to create a more realistic and accurate model, and hence was born. Looking back now, I believe I was able to achieve what I had set out to accomplish. In the months since, several other reliable models have emerged. Hence, I believe this is an appropriate time for me to wind down.
  • In the beginning, the majority of my time was spent on building a model from scratch: learning about infectious diseases, incorporating a machine learning layer, and iteratively learning the various epidemiolgical parameters. As time went on and my model became more mature, the focus of the work changed. Lately, a lot of my time is being spent making minor adjustments and tweaks to refine the model’s performance. I feel that I am now spending more time on maintaining the model rather than making new advancements, which is something I hope to change.
  • As one can imagine, building and maintaining a COVID-19 model takes a lot of time and effort. Many models have entire groups dedicated to the project, as well as the funding and resources necessary to continue this project for the foreseeable future. Unfortunately, I do not fall under this category. Since Day 1, I have been the sole author of the model and have not relied on any external funding. The only things I used to build this model was a laptop, a Twitter account, and $20 to buy the domain name As much as I would like to continue working on this project, my current setup is not sustainable or scalable in the long run.
  • Looking at COVID-19 data on a daily basis for the past six months can be exhausting. Taking a step back would allow me to explore new ideas.

With that said, I firmly believe that the modeling community is in good hands. Below, I will present a few models that I have found to be the most reliable.

Model Alternatives

I know this news is disappointing for the many people who have been closely following my model over the past few months, so I want to provide a few reputable alternatives. It’s important that we focus on models which have a proven track record and not just those that have generated the biggest headlines. No single model is perfect, hence this is why I believe it is important to look at different models and understand the assumptions of each one in order to interpret the forecasts. Due to this reasoning, I recommend the COVID-19 Forecast Hub, which aggregates forecasts from over 30 models and sends them to the CDC each week to help inform public health decision making.

From among the Forecast Hub, below are a list of models that I have found to be the most reliable over the past few weeks and months. You can find a visualization of all the models listed below here. In addition to forecasting reported deaths, the below models also have forecasts for confirmed cases. The UCLA, COVIDAnalytics, USC and LANL models also have forecasts for international countries.

  • COVIDhub Ensemble - An aggregation of the forecasts of ~30 models submitted to the COVID-19 Forecast Hub. The combined forecast is then published on the CDC website. You can find the pre-print here. Because it is able to combine the forecasts of so many models, it is more accurate than any single model alone. Hence, if one were to only use one model, this would be the one to use.
  • UMass Amherst - An early model that has consistently performed well since its release in May. It is made by the Reich Lab, the same group that runs the COVID-19 Forecast Hub. The downside is that it only forecasts 4 weeks out and has no visualizations (other than on the Forecast Hub).
  • UCLA - Another early model that has consistently performed well. It also has estimates for the reproduction number (Rt). The visualizations are well-done.
  • Oliver Wyman - A new model released in June that instantly became one of the top-performing models since its release. It is one of the few other models to have estimates of true infections. It only has public forecasts 4 weeks into the future.
  • COVIDAnalytics (MIT DELPHI) - A top-performing model for US nationwide forecasts.
  • USC - A new model released in July that has made great improvements over the past few weeks. It is one of the few other models to make daily updates.
  • Los Alamos National Lab (LANL) - One of the top-performing models from April-July, but has been under-forecasting recently.
  • London School of Hygiene & Tropical Medicine]( - While its forecasts are unproven, it is one of the few other models to have US and global Rt estimates.
  • Other up-and-coming Forecast Hub models that have performed well thus far: CMU, LNQ, JCB

I highly recommend those who have been following my work to take some time studying the aforementioned models. I have personally spoken to most of the groups and have listened to their presentations. I can attest to their proven track record and hope they can continue to provide reliable forecasts in the weeks and months to come. When viewed in tandem, these models can help provide a clearer picture of what will most likely happen in the upcoming weeks. While not crystal balls, I believe these forecasts can be very useful tools for researchers and policy makers.

The above list is not necessarily an exhaustive list of reliable models. You can learn more about my weekly evaluations of the different models here. I hope to continue updating these model evaluations in the near future.

What’s Next

Ending my model forecasts does not mean that my work in COVID-19 is over. This decision will allow me to dedicate my freed up time to other areas of COVID-19 data analysis. In this day and age, misinterpretation of data (both intentional and unintentional) is pervasive. Anyone can cherry-pick data to support his or her narrative. My goal is to continue presenting COVID-19 data in a rigorous, unbiased manner. Follow me on Twitter at @youyanggu to stay up to date with my latest analysis.

I am forever grateful to have the support of so many people from across the US and around the world. I want to thank everyone who believed in my work from the early days, especially Nicholas Reich, his group at UMass Amherst, and the scientists at the CDC. I also want to thank all the scientists, researchers, and everyone else with whom I’ve had the pleasure of interacting with online; at a time where in-person contact has been limited, these interactions have been tremendously helpful. I feel honored to be able to contribute to the scientific community in improving our understanding of the disease. This was certainly not something that I, a data scientist with no background in infectious diseases, expected just a year ago. I haven’t always been right, but I’m thankful to be part of a community that is constantly helping me learn.

I am currently working on a piece that outlines the things I have learned over the past six months. I hope to post it here in the next week or two. Stay tuned!

Get in Touch

While I no longer will be making public forecasts, I hope to continue to be involved in the forecasting space in some shape or form. If you are interested in hearing more about my work, feel free to send me a message.

I still don’t know what the future holds. I am always open to new challenges and projects, especially those involving the use of data-oriented modeling to tackle public health problems. If you have any suggestions or ideas, please don’t hesitate to reach out to me using the contact button below. I would love to get in touch.

Contact Me

In the meantime, let’s all work together to continue fighting this pandemic. Each one of us can make a difference.

- Youyang

Back to Top