Proposal to Increase NSF
Funding for
Theoretical Computer Science
INTRODUCTION
AND SUMMARY
As computer science enters an era of new challenges and multidisciplinary opportunities, there is a pressing need for new approaches, new conceptual models, and unconventional ideas. Theoretical computer science (TCS) has a proven track record in providing all of these. TCS innovations such as analysis of algorithms, NP-completeness, complexity-based cryptography, use of probabilistic choices in algorithms, etc., are mainstays of CS research and education and one result is the growing popularity of quantitative reasoning and formal models in all of CS. In the past decade, TCS has invented new sub-disciplines (quantum computing), contributed new paradigms for emerging areas (web-search, internet content distribution, data-mining, computational biology etc.), and made important breakthroughs in a continuing study of intrinsic complexity, NP-completeness, randomness, efficient algorithms, networks, etc. Its long-term focus is at the root of its high impact --scientific, conceptual--- and this has benefited the strategic national interest. Finally, TCS has also had considerable commercial impact: several IT and Biotech companies --Google, Amazon, Celera Genomics, Akamai, Verity, Digital Fountain, etc.—rely crucially on good algorithms originating from TCS. In many cases their chief scientists were also TCS researchers (at Amazon, the title is “Chief Algorithms Officer”).
Looking ahead into the 21st century, security and reliability will be as
important in CS as performance was in the 20th,
and TCS is poised to do for
these issues what it did for cryptography and probabilistic algorithms
in the
20th century. Similarly, just as TCS used its unique computational perspective to forge
connections with physics (quantum
computing) and biology (computational
biology) in the late 20th century, it will do the same
with economics (a computational perspective
of economics, game theory, mechanism design, learning, etc., currently
being
developed may impact e-commerce) and neuroscience
(specifically,
understanding the brain). Beyond its direct applicability, computing
is
one of the key new conceptual paradigms of modern science. But despite TCS’s stunning success stories (some of them
told below),
computing as a scientific concept is still mostly not
understood. TCS is
not unlike pre-Einstein physics: a growing and useful body of knowledge
with a
revolution coming ahead. TCS was largely invented in the
This document suggests that TCS’s research agenda for the 21st century is vastly underfunded at NSF/CISE. The budget dedicated to TCS has remained almost stationary for a long time—it was $6.4M in 1989 and was $5.1M in 2004. (The proposed budget for 2005 is $7.2M.) Meanwhile, CISE’s budget tripled from 1990 to 2005. Though TCS’s importance has been well-recognized within CS (TCS work won 5 out of 15 Turing awards and 9 out of 19 ACM Dissertation awards since 1990; TCS researchers constitute at least 10% of CS faculty at the top 25 CS depts) the same does not appear to be true at CISE, which devotes only around 1% of its budget to TCS. Grant sizes have remained almost unchanged for two decades.
Though past applications of TCS are well
appreciated at NSF
(e.g., computational genomics and quantum computing are now funded out
of a
special program, and total NSF funding for quantum computing exceeds
$16M/yr)
there appears to be an under-appreciation of the danger that such
exciting
applications will stop in the future unless the long-term TCS agenda is
also pursued.
The dominant application-driven funding model will hamper future TCS
breakthroughs:
it is worth noting that the breakthroughs from 1990-1995 (including quantum computing, PCPs) were unanticipated
in 1989, and those from
the 1995-2000 period (including web
search, internet content distribution) were equally unpredicted in
1994.
Finally, much of TCS’s impact was achieved
by
training a small, flexible research force in its vast body of previous
work and
imbuing them with TCS’s holistic bent.
This well-trained
workforce allowed CS to respond to various challenges in the 1990s and
also
pushed it into new opportunities. It is unclear how the current funding
model
can allow the continued training of such a TCS workforce for the
future. These
and related issues are discussed in Section 3.
Finally, we note that recent CRA surveys show an alarming
disinterest in
computer science among the young, who are increasingly drawn to other
sciences. One suggested
solution by CMU dean Peter Lee is to emphasize
the "science" aspect of computer science in education. TCS can
play an important role here, as will be clear from the examples of TCS
work in the rest of the document.
To provide a basis for
the
discussion of the shortcomings of current TCS funding, Section 2
surveys the history
of TCS’s varied impacts, especially those
from the
past decade.
CHIEF RECOMMENDATION:
NSF/CISE should revitalize its research and training program in TCS
with a
budget of $18M per year (3% of CISE budget), which could support
roughly 100 researchers
with
grants of $125K/yr/PI on average (summer salary + 1 student + travel/computer
expenses), a postdoc program, and a CAREER program.
Such a move would vastly benefit 21st century science
and the
training of the IT workforce. (See Section 4.)
This
budget increase could come out of funds that will become available from
the
expiration of the ITR program (as estimated $200M).
1) THE TCS
AGENDA FOR THE 21st
CENTURY
TCS is a multidisciplinary field that takes a holistic view of much of computer science and makes connections to many outside disciplines. A broad overview of TCS research appears in a 50-page document "Challenges for Theoretical Computer Science" available from www.sigact.acm.org . As mentioned, it is difficult to predict TCS breakthroughs, so one cannot predict TCS's impacts in 2006-2016. Nevertheless, some plausible examples are sketched below.
The problem of handling large data sets is already at the forefront of 21st century needs for national security, market intelligence, health care, and cyber-infrastructure for scientific research, and it has become customary to acknowledge that “better algorithms” will play a decisive role. (Pioneering computer architect John Cocke predicted this almost two decades ago in his Turing award lecture.) But it would be a mistake to think of “better algorithms” simply as a faster implementation of a known method – such advances will predictably occur, and TCS will of course play a role. The harder problem is to invent new algorithmic frameworks and mathematical analyses to tame the data deluge. Here TCS will be crucial. In the past few years, link analysis for web search, data stream analysis, and auction design for keywords have been areas with significant impact from TCS, whereby one or more publications in TCS conferences and/or from TCS researchers gave rise to a groundswell of research and subsequent applications. Many prominent researchers in the web research community are from TCS. (Note: the claim is not that TCS is the only contributor, but that TCS often helped focus attention on core issues.) Similarly, recent TCS research directions in sublinear algorithms and streaming algorithms also look very promising: Here, the spotlight is on the data, which is huge in size and dimensionality and often comes attached with uncertainty and varying costs. It must be emphasized that these contributions are merely TCS’s rapid response to a new challenge; what more could be achieved if a sustained and deeper study were funded?
Much of TCS’s scientific and conceptual impact is the direct result of its computational perspective (one of the major scientific insights of the past 50 years), which views many natural and man-made environments as a collection of computationally limited agents interacting with each other. TCS’s discoveries spring from it, whether they are in the context of age-old endeavors such as cryptography (new insight: “the adversary is computationally limited”), or of new ones such as the effort to understand computational complexity, quantum computation, or network phenomena (Web, brain, biological systems, etc.).
In the 21st century, TCS plans to employ this computational perspective to forge a new kind of economics. It is engaged in an ongoing effort to rework much of economics, game theory, mechanism design, etc., using the idea of computationally limited players (an ambitious extension of the bounded rationality idea in economics). Similarly, the ability to reason formally about previously nebulous concepts such as trust and privacy ---visited earlier by TCS in context of cryptography---is fundamental to emerging eBusiness models. For instance, economists have quantified the premium commanded by a reliable seller at an auction site such as eBay. How does this relate mathematically to the propagation of trust through members of an online community? What new forms of auctions can arise, and how to design mechanisms by which individuals in a community are motivated to align with the collective good? These are some of the critical questions from computational economics under examination by the TCS community. It is conceivable all the above developments could impact e-commerce in future the same way cryptography impacted computer security. It could also impact many other areas. For instance, TCS work on viral propagation and influence networks is having unanticipated links to current research in protein functional annotation.
Similarly, TCS will likely develop a connection to neuroscience as it did to genomics in the 1990s. More than one researcher has suggested that an understanding of the brain will arise from a holistic approach marrying neuroscience work with insights from TCS’s vast conceptual framework of networks, algorithms, efficient storage and manipulation of information. Valiant has suggested a computational view of the neuron network in the brain using random graphs. The hope is that the vastly complicated picture of the brain could be understood via a process of reverse engineering: how would we implement algorithms for cognition tasks on such unreliable, simple networks? It is conceivable that the space of feasible algorithms is fairly small. Such theoretical work will need to proceed hand in hand with experimental neuroscience.
System security and reliability will be as important to CS in the 21st century as performance was in the 20th. TCS may provide the kind of integrative, long-term study--- combining examination of intrinsic complexity and invention of new theoretical concepts rather than quick application of known techniques---of these issues in the same way that the field did so effectively for cryptography and probabilistic computation in the 1980s-90s. In those celebrated developments, there was a thrilling interplay of practical issues informing the theoretical breakthroughs, with in turn the theoretical breakthroughs leading to practical applications. TCS’s 21st century study of security and reliability could conceivably combine traditional strengths such as cryptography and program verification, recent developments concerning reliable encoding and verification of information, and its sophisticated analytic understanding of networks and protocols. It has been suggested that security of very large systems and networks may ultimately be designed by analogy to biological systems (which are also based upon large numbers of extremely unreliable elements) and here again TCS may contribute crucial insights.
Another future impact might come from ongoing efforts in TCS to overcome computational intractability in many situations via some notion of approximation (competitive analysis in many online settings, approximation for NP-hard problems, approximate data representation in geometric problems, etc.). This area is beginning to have practical and conceptual impact and important connections have also emerged to classical questions studied in mathematics.
The area of probabilistically checkable proofs (PCPs) was a big surprise when invented in the early ‘90s and has continued to yield a steady stream of breakthroughs. The underlying ideas of efficient representation and checking of information seem powerful and very counterintuitive and will surely lead to other breakthroughs, including important applications.
The study of intrinsic complexity has developed startling connection with an ongoing study of derandomization. The latter was originally motivated in the 1980s by the poor quality of real-life random number generators but led to many theoretical advances such as the 2002 breakthrough result on deterministic primality testing (which made the front page of the New York Times) as well as progress in data management and hashing (which had great commercial impact). Two years ago it was realized, to one’s great surprise, that further progress in derandomization may hold the key to complexity lower bounds (the holy grail of TCS ever since the P versus NP question was formulated).
2) TCS’S DIVERSE
IMPACT AND WHAT IT SAYS ABOUT THE FIELD
This section briefly describes TCS’s varied impacts in the recent past. These may be the most (or only) reliable way to judge its future impact since TCS breakthroughs are usually unanticipated. They also illustrate TCS’s interconnected, nature and its “unexpected connections.”
Commercial impact: Increasingly,
commercially successful (and paradigm-changing) computational
technologies
arise from good scientific ideas,
especially TCS ideas---the RSA encryption
scheme that underlies much of the
e-commerce infrastructure; the Hubs and
Authorities algorithm (1999), which is a new tool of information
retrieval and web search and influenced several
search engines; the Consistent Hashing
algorithm (1997) used by Akamai
Technologies in the
past 5 years to provide great speedups in internet content
distribution. It is
clear why this happened. As CS focused on manipulation, analysis, and
search of
large data sets, the need for faster algorithms as well as new
modes of design and analysis became apparent. TCS is
well-poised to provide all of these. Recognizing this, IT and biotech
companies
--Google, Amazon, Celera Genomics, Akamai,
Yahoo,
Verity, Digital
Fountain,etc.--hired TCS researchers as
chief scientists.
Impact on other scientific disciplines: The impact of CS on science also traces in no small part to TCS. In the 1990s, a small group of TCS researchers, acting on a suggestion from physicists such as Feynman and Deutsch, invented the quantum model of computation. This invention, followed in short order by Peter Shor’s breakthrough discovery of an algorithm for integer factoring, helped revitalize a good chunk of quantum physics, and is today considered an important example of multidisciplinary science. Physics Nobel laureate David Gross lists a better understanding of quantum computation as one of the 25 challenges for 21st century physics. Federal funding agencies are actively supporting efforts to find a practical implementation of this model.
In genomics, a group of TCS
researchers in the 1990s helped invent a body of algorithmic techniques
and
models ---now part of computational genomics--that
allowed biologists to overcome errors in sequencing data. These sped up
the
historic project (a collaboration between
government,
industry, and academia) for sequencing the human genome. It is clear
that
computation-inspired models will continue to be crucial to molecular
biology
and neuroscience.
In mathematics and operations research,
ideas such as NP-completeness, polynomial interior point methods, and
number-theoretic cryptography introduced an entire new perspective and
a set of
research problems to these older fields. The P versus NP
problem is on a list of six outstanding open problems
for 21st century mathematics. The Clay Math Institute, which
maintains
this list, honored a TCS breakthrough on primality
testing in 2002 with a special award. In the late 1990s, the Institute
for
Advanced Study --- the former home of Einstein, Goedel,
and von Neumann-- started a research program in TCS.
Conceptual impact: The above-mentioned impacts are not a sequence of lucky coincidences, but a direct result of one of the major scientific insights of the past 50 years: The computational perspective, which is the essence of TCS, views many natural and man-made environments as a collection of computationally limited agents interacting with each other. Another far-reaching idea popularized by TCS (only 30 years ago!) is the use of randomness in algorithms and computation. Initially used as a simple algorithmic tool, as in network routing (randomized routers), it also led to new ways of looking at computer security (the definition of “security” of algorithms and protocols), artificial intelligence (PAC Learning, Boosting), and data management (hashing, internet content distribution).
The conceptual impact of such TCS innovations is at least as significant as its commercial one. Boosting--combining weak heuristics into a powerful learning algorithm—has been a pillar of AI and data mining since 1996: a recent text in information retrieval calls it the “the best off-the-shelf classifier in the world.” This hints at its vast practical impact, but one should realize that even the idea of boosting was completely counterintuitive to practitioners, let alone the fact that it can be practical. (This counterintuitive idea, surprisingly, came out of theoretical work in cryptography and PAC learning.) Similarly, Kleinberg’s Hubs and Authorities work (1999) was important not only as an enabler of a new technology (web search), but also as a new way of thinking about the web. It is one of three most cited papers in web research; the other two being papers by Tim Berners-Lee et al., and Google founders Brin and Page.
Creation of a coherent, multidisciplinary science. TCS’s status as an academic sub-discipline of CS ---analogous to graphics, networks, scientific computation, AI, etc.— often obscures the fact that TCS by itself is a coherent, multidisciplinary science that takes a holistic view of many areas of CS (including all the ones just named), and contributes to them. Though its inquiry spans a broad spectrum from abstract to concrete, all TCS research shares a common perspective and set of techniques. Like any healthy science, its theory is of inherent interest, yet also leads to important applications. In turn, practical issues inform its greatest theoretical breakthroughs.
One good example of this interplay is the following sequence of developments over two decades. Starting in the late 1970s, computational complexity theory, fresh from its NP-completeness success, took on the practical issues of cryptography and random number generation and transformed them ---along the way inventing a computation-based framework of information, privacy, knowledge, etc. Cryptographic motivations then introduced new ideas into complexity theory, namely interactive proofs (1985), which in the 1990s led to probabilistically checkable proofs (PCPs) and an important extension of the theory of NP-completeness--- the demonstration that computing approximate solutions to many NP-hard problems is itself NP-hard. This work continued for a decade and negatively answered questions long asked by practitioners. Practical offshoots of the theoretical PCP work in late 1990s were applications in information theory (a practical algorithm for list decoding of error-correcting codes; see Appendix A) and cryptography.
Also, theoretical work often fertilizes the field with new techniques that later allow it to respond to new issues. (This is not surprising, because ultimately the theoretical work also deals with issues of efficiency, information management and representation, etc.) Recent practical algorithmic impact mentioned above came out of this body of techniques.
3)
General note: From
now on, budgets will be discussed in
terms of number of regular grants
they can support, which cover PI’s summer salary + 1 Ph.D. student +
travel and
computer expenses. Depending on the PI’s seniority level and home
institution,
a regular grant ranges from $115K/yr to $150K/yr and we use an average of $125K/yr. This is also the
number used in new NSF programs such as the Emerging Models cluster.
(Currently, TCS grants are far smaller.)
The model of single-PI
grants (“many small bets” and occasional
“medium bets” consisting of grants that pay for two PhD students) is
reasonable
for TCS because (a) the full impact of individual projects is harder to
predict
(b) TCS researchers often set up collaborations anyway.
There are two modes of funding TCS work at NSF today. First, like all other CS researchers, TCS researchers can apply to application-area programs (sometimes also called priority areas). Second, they can apply to a dedicated TCS program whose budget in 2005 will be $7.2M (not much more than the budget of $6.4M in 1989). Since this program is now so tiny, one must carefully weigh the pros and cons of funding TCS work from other application-area programs.
Pros:
The application-driven funding model helps
TCS researchers realize the practical impact of their ideas, and put
together
large projects to serve the nation’s immediate priorities. An example
is an
ongoing large ITR project (a collaboration among TCS and non-TCS
researchers;
budget $2.5M/yr) that will, among other things, develop practical
implementations of far-reaching TCS work from the 1980s by
Cons:
(A) Inability to
anticipate issues that may become priorities in more than 5 years.
The
above example of
Though individual program officers at CISE find ways to fund some long-term TCS research (e.g., the TCS program at the Institute for Advanced Study receives $300K/yr from a medium ITR grant), the question remains how much these are happy coincidences --the TCS researcher teaming up with the right project; getting reviewed by the right panel and program officer-- and whether it is prudent to entrust an important part of the nation’s research agenda to chance events. Also, rising proposal pressure in CISE programs is likely to make such “happy coincidences” more infrequent.
(B) Failure to realize the benefits of TCS’s unique holistic view and “unexpected connections.” TCS is a coherent science rather than a collection of disparate algorithms for disparate applications and relies for its impact on a large conceptual framework. Funding only a few pockets of TCS fails to feed this framework and also will miss out on “unexpected connections.” A look at recent TCS breakthroughs listed in Appendix A shows that, although cryptography is thought of as related to computer security (and thus funded exclusively through Cybertrust now), work in cryptography also motivated –via an indirect path-- breakthroughs in AI/Data Mining (boosting), Information Theory (list decoding of error-correcting codes), and complexity theory (PCPs). In turn, some of those breakthroughs (e.g., PCPs) then proved useful in cryptography research.
One should also realize that boosting was completely unanticipated by AI researchers, and list decoding by information theory researchers. None of TCS’s breakthroughs from 1990-2000 mentioned in this document were foreseen in 1989, not even by the researchers who made them. Because of such uncertainties, NSF historically maintained a viable TCS program that funded a diverse body of good researchers (albeit with very modest grant sizes). In today’s funding model, TCS breakthroughs of the 1980s and 1990s would probably have been missed.
(C) Failure to train the right workforce. As detailed earlier, TCS PhDs of the 1980s and 1990s helped computer science face many new challenges and pushed it into new opportunities. This group’s impact is surprising given its small size ---TCS’s PhD production has always been modest owing to chronic funding problems. The likely explanation is that they were highly trained in a large set of analytic techniques (the large body of existing TCS work) and imbued with TCS’s multidisciplinary and integrative worldview. This made them effective and flexible. It is unclear if today’s funding model is training such a workforce.
4) NEEDED, A
REVITALIZED TCS PROGRAM AT NSF
Given the high impact of TCS work in the past
decade, the
ambitious agenda for the future sketched in Section 2, and the
important role
played by TCS alumni in industry and science in recent years, there is
a strong
argument for NSF to revitalize the important research and training
agenda of
its TCS program. The fact that the current budget pays for only 13 new
regular
grants per year and trains only 10-12 new TCS
Ph.D.s
each year in all of
Main recommendation: NSF
should revitalize the TCS program and fund it at the level of
$18M/year, about
3% of the CISE budget. This modest figure is derived from weighing the
ongoing
acute budgetary crisis at CISE against the vast likely benefits to the
nation’s
future scientific, technological, security, and education
infrastructure.
Though the design of the program would require
careful
thought, this budget can support for example roughly 100 researchers
(in most
cases, with a regular 3-year grant). Since the number of TCS
researchers at the
top 25 CS departments (US News ranking, 2002) alone exceeds 100, the
proposal
quality is expected to be very high. In a steady state the program
would
produce about 25 new PhDs per year ---a reasonable figure given
industrial
demand and the replacement rate for existing pool of TCS faculty in the
The expanded TCS budget could support a CAREER
program (5
grants per year) and a postdoc program (8 per year). The latter is
crucial since 21st century science requires exposing
researchers to
a variety of problems and modes of thinking. As TCS grows more
multidisciplinary, there is a strong need to adopt this training model
(which
is common in Physics and Biology).
Strategic
considerations: 21st century competitors of the
Recently, the Chinese government managed to
attract Andrew Yao, a Turing award winner
and TCS pioneer (one of the
founding fathers of cryptography and quantum computing), to Tsinghua
University to jumpstart an ambitious new program in CS research and
training.
Appendix A: a
brief history of recent TCS breakthroughs.