Youyang Gu

Data Scientist

One Year Later

February 26th, 2021

It’s hard to fathom that it’s been almost one year since I launched and five months since I first started making daily US infections estimates. I began estimating true infections in November 2020 because I couldn’t find any good models that were doing that in real time during a critical moment in the pandemic (though there were 30+ models for forecasting deaths). While there are still a lot of uncertainties today, things have improved significantly over the past two months. Our main data source, the COVID Tracking Project (CTP), is stopping data collection on March 7. After much thought, I have decided to also release the last daily update on Sunday, March 7.

Winding Down (one more time)

Below are the main reasons why I believe ending the daily updates is the right path forward:

  • As the COVID Tracking Project (CTP) iterates in their announcement: “the work [of] compiling, cleaning, standardizing, and making sense of COVID-19 data from 56 individual states and territories is properly the work of federal public health agencies”. In the same spirit, the work of modeling and forecasting COVID-19 is properly the work of epidemiologists and the greater public health community. I am hopeful that there is considerable progress being made to tackle the country’s massive deficits in public health data and modeling infrastructure. We are more prepared now than we were a year ago. As an “outsider”, it is important for me to come to terms that a prolonged stay in this space would hamper my ability to have fresh perspectives and bring forth new ideas.
  • I use daily case and testing data from CTP to estimate true infections. While this generates a pretty good heuristic that can be easily updated on a daily basis, it is not without flaws. Even though we are a year into the pandemic, there are still no standardized methods across states for the reporting of cases and tests, which can lead to skewed estimates for a model that’s agnostic to these differences (more detail here). The simplicity of my model makes it particularly helpful for real-time use cases, such as during a pandemic. But as the pandemic nears an end and normal life resumes, I expect better, more robust methods will be developed to estimate true prevalence in each region.
  • From a feasibility side, many challenges exist in integrating alternate data sources to replace the soon-to-be-decommissioned CTP data. The federal testing data provided by the US Department of Health and Human Services (HHS) contain many discrepancies with the existing CTP data. Reconciling the two sources does not appear to be a straightforward task, making a simple “drop and replace” of the data difficult.

For our Path to Normality page, I will also end the daily updates on vaccination progress based on CDC data. I started the “Path to Herd Immunity” page in December 2020 because I felt that there were too many unfounded claims and misinformation being floated around regarding the vaccine (e.g. calculations that completely ignore immunity acquired from natural infection). The CDC also did not publish any vaccination data at the time. This has change significantly in the months since. You can now see daily snapshots of the latest vaccination progress, vaccination time series, and vaccinations by demographics or long-term care facilities. This has made my work somewhat redundant and obsolete (in a good way).

That said, I plan to continue to update the “Path to Normality” plots on an as-needed basis. When I launched the page in early December, I estimated a return to normal in Summer 2021, which initially faced criticism from both sides as being either too optimistic or too pessimistic. Almost three months later, it is seeming more and more likely that normality may indeed return in the summer.

Data & Alternatives

For a list of resources that I have found to be useful over the past few months, click here.

What’s Next

I will not be completely ending my work on COVID-19. Instead, I will be shifting some of my efforts to better understand and model emerging COVID-19 variants, especially the B.1.1.7 variant that was first identified in the UK. There are a lot of misleading communications in recent weeks regarding the future of the variant, as I alluded to here. I hope to spend more time analyzing the data surrounding these variants.

In the meantime, I believe it is critical that we prioritize 1) vaccination rollouts, especially in neighborhoods that are most at-risk 2) the reopening of schools and 3) the expansion of rapid testing as a bridge to normal activities.

Be sure to follow me on Twitter at @youyanggu to stay up to date with my latest thoughts and findings. I am always open to new challenges and projects, especially in the area of making public health efforts more data-driven. If you have suggestions, let’s talk.

My Advice to Others

The most frequent question I receive is how I was able to do what I did with no background in infectious disease modeling. This is my advice for others, especially my fellow millenials and Gen Zers: You don’t need decades of experience to be able to think critically and adapt to new information. In fact, being an outsider and bringing in a fresh perspective can often be an advantage. In this digital age where information is so readily available, don’t let a lack of domain expertise or experience deter you from pursuing what interests you! Don’t be afraid to ask questions and challenge the status quo - innovation has always come from non-traditional approaches and non-traditional individuals.

The road won’t always be easy. For example, when I first launched in early April 2020, I emailed/messaged dozens of reporters and scientists, and none of them got back to me. I thought about giving up the project but decided to keep going for two more weeks. During that time my model performed very well, and my efforts on social media finally paid as Twitter became influential in helping me get the word out. That same week, my model was added to the CDC website with a little help from the Reich Lab. This experience taught me that progress is not always linear, so please be patient!

Final Words

My goal when I started was to create the most accurate COVID-19 model. I continued the project for the next year because I saw a striking need for an unbiased, unpolitical, data-driven take of the pandemic. I feel incredibly fortunate to be in the situation I am today. I have made countless mistakes, but each one has taught me to be a better scientist. I’m thankful that I was able to use my skillset to help improve our understanding of this pandemic. A year ago, I could not have remotely imagined that I would be where I am today. I am also extremely grateful for the work of scientists whose dedicated research I’ve relied on over the past year. Science is and will always be a collaborative effort - no single person’s work stands alone.

The next few months are still somewhat uncertain, but the finish line is in sight and I look forward to emerging on the other side.

If you have any questions, suggestions, or comments, don’t hesitate to drop me a message.

- Youyang

Back to Top