Monday, April 27, 2020

Youyang Gu

He made a machine-learning model in a week and ran it daily on his laptop (it only took an hour), generating remarkably accurate covid-19 predictions.

Siobhan Robertsarchive page

April 27, 2021

MS TECH | COURTESY PHOTO

The data scientist Youyang Gu thinks of himself as a realist—he declares it in his Twitter profile: “Presenter of unbiased takes. Realist.”

When he noticed the scattershot covid-19 projections last spring—one model projected 2 million US deaths by the summer, another predicted 60,000—Gu questioned whether that was as good as the modeling could be. He decided to take a shot at making a covid-19 model himself. “My whole entire goal was to produce the most accurate model possible,” Gu says, from his apartment in Manhattan. “No ‘if this’ or ‘if that.’ Basically, no ‘ifs.’ It doesn’t really matter what the scenarios are. I just wanted to lay it out: ‘This is the most likely or realistic forecast for what’s going to happen.’”

Within a week, he’d built a machine-learning model and launched his COVID-19 Projections website. He ran the model every day—it only took one hour on his laptop—and posted covid-19 death projections for 50 US states, 34 counties, and 71 countries.

By the end of April, he was attracting attention—ultimately, millions checked his website daily. Carl Bergstrom, a professor of biology at the University of Washington, took notice and commented on Twitter that Gu’s model was “making predictions that seem as good as any I’ve seen.”

“I can be a bit of an ML skeptic. But in this case, don’t let the ‘machine learning’ text fool you into thinking this is snake oil,” Bergstrom tweeted.

An MIT grad with a master’s degree in electrical engineering and computer science (plus a degree in math), Gu, 27, had been working on a sports analytics startup when the pandemic hit. But he put that venture on pause as major league sports shut down. And then, by simply googling “epidemiology,” he began his foray into covid-19 modeling.

“I had zero background in infectious-disease modeling,” he says. But he did have a few years’ experience as a data scientist in finance, working with statistical models—models that, based on certain statistical assumptions, analyze data and make projections about, say, where the price of a stock will be in the future.

“It turns out that a lot of infectious-disease modeling is basically statistical modeling,” says Gu. And the finance industry’s profit-driven goal for accuracy served him well in the epidemiological domain. “If you can’t make an accurate model in finance, you won’t have a job anymore,” he says. By contrast, the goal in academia—from Gu’s perspective, at least—is not so much to make accurate models, but rather to publish papers and inform public policy. “That’s not to say they don’t make accurate models—just that they don’t optimize specifically for accuracy,” he says.

Gu’s model combines machine learning with a classic infectious-disease simulator called an SEIR model (factoring in individuals in the population who are susceptible, exposed, infectious, recovered, or removed due to death).

The SEIR component uses as input a simulated set of parameters—a best-guess range for variables such as the basic reproduction number (the rate at which new cases arise in an entirely susceptible population at the start of an outbreak, before interventions or immunity), infection rate, lockdown date, reopening date, and effective reproduction number (the rate at which new cases arise after some interventions). In terms of outputs, the SEIR simulator first computes the infections over time, and then computes the deaths (multiplying infections by the infection fatality rate).

Gu’s machine-learning layer then generates thousands of different combinations for those parameter sets in trying to find the real-life parameters for each geographical region. It learns which parameters generate the most accurate death projections by comparing the SEIR predictions with real data on daily deaths from Johns Hopkins University. “It tries to learn what parameter sets generate deaths that most closely match the actual observed data, looking back,” says Gu. “And then it uses those parameters to forecast and make projections about deaths into the future.”

The forecasts proved remarkably accurate. For instance, on May 3, he made an appearance on CNN Tonight and shared his model’s projections that the US would reach 70,000 deaths on May 5, 80,000 deaths on May 11, 90,000 deaths on May 18, and 100,000 deaths on May 27. On May 28, he tweeted, “covid19-projections.com got all 4 dates exactly correct.” With some rounding, that was true.

“I’m not saying I’ve been perfect over this past year. I’ve been wrong many times. But I think we can all learn to approach science as a method of finding the truth, rather than the truth itself.”
Youyang Gu

The model wasn’t perfect, of course, but it impressed Nicholas Reich, a biostatistician and infectious-disease researcher at the University of Massachusetts, Amherst, whose lab, in collaboration with the US Centers for Disease Control and Prevention, aggregates results from about 100 international modeling teams. Among all the aggregated models, Reich observed, Gu’s model was “consistently among the top.”

On October 6, Gu posted his final death forecast, just before the fall wave. The model projected there would be 231,000 deaths in the US by November 1. The total recorded by that date: 230,995.

Gu shut down his first model in early October because by then there were lots of teams doing good death forecasts. He turned instead to modeling true infections versus reported infections. And then in December he started tracking vaccine rollout and the elusive “pat h to herd immunity”—which in early 2021 he revised to “path to normality.” Whereas herd immunity is achieved when a sufficient portion of a population is immune to the virus, thus curtailing further spread, Gu defines normality as “the lifting of all covid-19-related restrictions for the majority of US states.”

“It became clear that we’re not going to reach herd immunity in 2021, at least definitely not across the whole country,” he says. “And I think it’s important, especially if you’re trying to instill confidence, that we make sensible paths to when we can go back to normal. We shouldn’t be pegging that on an unrealistic goal like reaching herd immunity. I’m still cautiously optimistic that my original forecast in February, for a return to normal in the summer, will be valid.”

In early March, he packed up shop entirely—he figured he’d made what contribution he could. “I wanted to step back and let the other modelers and experts do their work,” he says. “I don’t want to muddle the space.”

He’s still keeping an eye on the data, doing research and analysis—on the variants, the vaccine rollout, and the fourth wave. “If I see anything that’s particularly troubling or worrisome that I think people aren’t talking about, I’ll definitely post it,” he says. But for the time being he is focusing on other projects, such as “YOLO Stocks,” a stock ticker analytics platform. His main pandemic work is as a member of the World Health Organization’s technical advisory group on covid-19 mortality assessment, where he shares his outsider’s expertise.

“I’ve definitely learned a lot this past year,” Gu says. “It was very eye-opening.”

Lesson #1: Focus on fundamentals

“From the data science perspective, my models have shown the importance of simplicity, which is often undervalued,” says Gu. His death forecasting model was simple in not only its design—the SEIR component with a machine-learning layer—but also its very pared-down, “bottom-up” approach regarding input data. Bottom-up means “start from the bare-bones minimum and add complexity as needed,” he says. “My model only uses past deaths to predict future deaths. It doesn’t use any other real data source.”

Gu noticed that other models drew on an eclectic variety data about cases, hospitalizations, testing, mobility, mask use, comorbidities, age distribution, demographics, pneumonia seasonality, annual pneumonia death rate, population density, air pollution, altitude, smoking data, self-reported contacts, airline passenger traffic, point of care, smart thermometers, Facebook posts, Google searches, and more.

“There is this belief that if you add more data to the model, or make it more sophisticated, then the model will do better,” he says. “But in real-word situations like the pandemic, where data is so noisy, you want to keep things as simple as possible.”

“I decided early on that past deaths are the best predictor of future deaths. It’s very simple: input, output. Adding more data sources will just make it more difficult to extract the signal from the noise.”

Lesson #2: Minimize assumptions

Gu considers that he had an advantage in approaching the problem with a blank slate. “My goal was to just follow the data on covid to learn about covid,” he says. “That’s one of the main benefits of an outsider’s perspective.”

But not being an epidemiologist, Gu also had to be sure that he wasn’t making incorrect or inaccurate assumptions. “My role is to design the model such that it can learn the assumptions for me,” he says.

“When new data comes along that goes against our beliefs, sometimes we tend to overlook that new data or ignore it, and that can cause repercussions down the road,” he notes. “I certainly found myself falling victim to that, and I know that lots of other people have as well.”

“So being aware of the potential bias that we have and recognizing it, and being able to adjust our priors—adjusting our beliefs if new data disproves them—is really important, especially in a fast-moving environment like what we’ve seen with covid.”

Lesson #3: Test the hypothesis

“What I’ve seen over the last few months is that anyone can make claims or manipulate data to fit the narrative of what they want to believe in,” Gu says. This highlights the importance of simply making testable hypotheses.

“For me, that is the whole basis of my projections and forecasts. I have a set of assumptions, and if those assumptions are true, then this is what we predict will happen in the future,” he says. “And if the assumptions end up being wrong, then of course we have to admit that the assumptions we make are not true and adjust accordingly. If you don’t make testable hypotheses, then there is no way to show whether you are actually right or wrong.”

Lesson #4: Learn from mistakes

“Not all the projections that I made were correct,” Gu says. In May 2020, he projected 180,000 deaths in the US by August. “That is much higher than we saw,” he recalls. His testable hypothesis proved incorrect—“and that forced me to adjust my assumptions.”

At the time, Gu was using a fixed infection fatality rate of approximately 1% as a constant in the SEIR simulator. When in the summer he lowered the infection fatality rate to about 0.4% (and later to about 0.7%), his projections returned to a more realistic range.

Lesson #5: Engage critics

“Not everyone will agree with my ideas, and I welcome that,” says Gu, who used Twitter to post his projections and analysis. “I try to respond to people as much as I can, and defend my position, and debate with people. It forces you to think about what your assumptions are and why you think they are correct.”

“It goes back to confirmation bias,” he says. “If I am not able to properly defend my position, then is it really the right claim, and should I be making these claims? It helps me understand, by engaging with other people, how to think about these problems. When other people present evidence that counters my positions, I have to be able to acknowledge when I may be incorrect in some of my assumptions. And that has actually helped me tremendously in improving my model.”

Lesson #6: Exercise healthy skepticism

“I am now much more skeptical of science—and it’s not a bad thing,” Gu says. “I think it’s important to always question results, but in a healthy way. It’s a fine line. Because a lot of people just flat-out reject science, and that’s not the way to go about it either.”

“But I think it’s also important to not just blindly trust science,” he continues. “Scientists aren’t perfect.” It is appropriate, he says, if something doesn’t seem right, to ask questions and find explanations. “It’s important to have different perspectives. If there is anything we’ve learned over the past year, it’s that no one is 100% right all the time.”

“I can’t speak for all scientists, but my job is to cut through all the noise and get to the truth,” he says. “I’m not saying I’ve been perfect over this past year. I’ve been wrong many times. But I think we can all learn to approach science as a method of finding the truth, rather than the truth itself.”

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)

.................................................................................................

help! with top 20 Economist challenges - eg why china is world leader to partner on all youth's sustainability goal www.economistchina.net

what's purpose of spending thousands times more on communications technologies than 1948 unless health and happiness for all 10 times more affordable

when you look at the proposed 12 supercities of sustainability, i wonder what use dc-baltimore unless if leads on health

also published in 1984 2025 report by norman and chris macrae- timelined how as an integral system a global village world could only result in 2 opposite end games - our stories on positive ways forward clarified opposite risks -

most popular chapter 6

x chapter 1 chapter 2
chapter 3 part 1 chapter 3 part 2 chapter 4 chapter 5

chapter 6 chapter 7 chapter 8 chapter 9 chapter 10 chapter 11 part 1 chapter 11 part 2 chapter 12 chapter 13 chapter 14 chapter 15 chapter 16 chapter 17 chapter 18 chapter 19 chapter 21

chapter 20 will optimistic economics lead local-global space 1984-2024

i note bloomberg march 2020 refers to a half time 2004 report on doomsday scenario of communities not being prepared to e resilient to virus

GET THE NEWSLETTER

For the prognosticators on the U.S. National Intelligence Council who sat down in 2004 to consider what the world might look like in 2020, the answer hinged heavily on one big question: What did the future of globalization look like?

Their answer: Not great.

By 2020, they predicted, globalization would face a political backlash in a world increasingly plagued by identity politics. Yet if anything was going to really derail economic integration, it would likely be the mass spread of a virulent new disease.

“Short of a major global conflict, which we regard as improbable, another large-scale development that we believe could stop globalization would be a pandemic,” the council warned in a report laying out the findings of its “Project 2020.” A death toll in the millions and a virus that “put a halt to global travel and trade during an extended period” would certainly leave globalization “endangered.”

Just a bit over two months into 2020 and it’s not hard to make the case for why that rings true.

There is an alternative view that holds globalization may actually be a lot more resilient today than it seemed in 2004, in the halcyon days before smartphones had taken over our lives.

But what would it take in the months ahead to get to Doomsday for globalization? It all hinges on the reaction from policy makers to the coronavirus crisis. So here are three things to watch for. If these happen, we should be ready for the shape-shifting in globalization we’ve seen in recent years to morph into a deep freeze.

New barriers to exports. White House trade hawk Peter Navarro, in a recent Financial Times interview, criticized the export controls some countries have placed on medicines and medical supplies like face masks. His motivation may be pure. But Navarro tends to like anything that makes his argument for a shift away from globalization. So what if he used those export controls by others to argue for the U.S. to do the same? Navarro has said he wants to repatriate supply chains for national security reasons and advocated stricter controls on tech exports to China. What if he convinced President Donald Trump to ban exports of not just face masks or medicines but shipments of an eventual vaccine? And other countries followed suit? What if the controls shifted to food stockpiles?
New import restrictions. Chinese trade data for January and February pointed to the damage so far from China’s industrial shutdown last month. Exports were down 17.2% in dollar terms. But what if the U.S. and other countries started limiting imports of goods coming by air and sea not just from China but from South Korea, Italy and other affected countries? And those countries retaliated and did the same? So far the focus on supply chain vulnerabilities has focused on China. But what if all trade was deemed contaminated?
A collapse in global governance. The weekend emergence of a battle between Saudi Arabia and Russia over oil production caused crude prices to tumble dramatically on Monday. What if such discord spills to the G-7 or the G-20? What happens if, driven by fear of a virus, global economic policy makers can’t get on the same page? Or, worse, actively start working against each other in an area like, say, currencies?

Robert Hutchings, the former diplomat and Princeton academic who led the National Intelligence Council as it prepared its 2004 report, said in a recent email exchange that the point they were trying to make was “that globalization is a ubiquitous force that carries with it bad consequences as well as good.”

Ominously, he added: “We particularly wanted to argue that globalization is not irreversible.”

—Shawn Donnan in Washington

2013 has seen khanac labs spread from maths to coding to healthcare -please tell us the next billion jobs alumni app of khan labs

2014 sees first coursera of a social good summit- atlanta and 25000 youth have 22 months to work out how to turn its greatest ever youth celebration into an ongoing curriculum

help linkin Number 1 collaborations in Economics for Youth and millennium goal action networks

in 2013, The Economist celebrates its 170th anniversary as the world leading media of end hunger. Its end year xmas issue 2012 celebrated Free Education's comihg of Massive Open Online Curriculum.

Quiz - what need to be the top 10 MOOCS of 2013 to get youth back to work everywhere and so that the net generation can believe in collaboration around millennium goals?

entrepreneurialrevolution.avi

Transparency note: the last time The Economist carried as important a xmas issue contribution may have been 1976's Entrepreneurial Revolution (ER) by dad. The Economist. Saturday, 25 December 1976

ER's Ten green bottles

Breakthrough erroneous mindsets of macroeconomics before there is nothing left at all:

#1 Entrepreneurs-and good news media owners - are not political- they connect left right and centre dialogues

Verify Top 2 pro-youth economists: Norman Macrae 1923-2010 & the most exciting microeconomist of our epoch & net generation : Muhammad Yunus born 1940 ...

egs ECONOMIES OF HEALTH:

infant and maternal health services can be the world's most social and economical- benchmark bangladesh villages

wellbeing and infectious disease prevention markets ought to be worldwide and very affordable the more openly connected worldwide youth can map

markets that involve surgery are always going to be as expesnive as health gets; markets depending on global pharma need a total different coonstitutiuon if they are ever to be economical

markets specialising in elderly depend on how a plavce's communities and family valuing structures are designed

microeconomist

·

17 videos
11

116 views