view · edit · attach · print · history

Nugget Ideas: Pre-workshop Brainstorming

Click here for all the nugget ideas in MS Word format.

  • Security with Certainty
    • proposed by: Salil Vadhan
    • image: a lock or safe, but maybe these are overused for crypto?
    • tagline: All of the cryptographic protocols currently used to protect electronic transactions on the Internet could potentially be rendered completely insecure by a future algorithmic breakthrough. However, researchers expect to remove this threat one day, by designing protocols that have unconditional proofs of security.
    • summary: While modern research in cryptography has transformed a significant portion of computer security from an art into a science, ultimately the security of all current cryptographic protocols (encryption, digital signatures, etc.) rely on conjectures, e.g. that factoring large integers is intractable, or that finding collisions in certain "hash functions" is hard. These conjectures can turn out to be false, as was discovered by 3 Chinese researchers in 2004 for the widely used hash function MD5?. If a similar discovery was made in the future and kept secret, devastating damage could be inflicted on the electronic economy and cyberinfrastructure before the vulnerability is noticed. Thus, an extremely important research challenge is to eliminate the need for conjectures in cryptography. It is plausible that this goal can be achieved, but it will require major advances in theoretical computer science, including resolving the famous P vs. NP question. Along the way, society will benefit from the continual progress researchers are making in minimizing the role of conjectures in cryptography.
    • This Vision is very similar to Boaz Barak's Grand Challenges for the Foundations of Cryptography, and some of his text might be useful to incorporate.

  • Natural Algorithms
    • proposed by: Bernard Chazelle
    • tagline: Classical mathematics lacks the expressive power to model large, complex systems in a way that allows reliable predictions. Indeed, it is unlikely that the language of PDEs? can model, let alone explain, the behavior of proteomes, animal groups, or social networks. For this, one needs to turn to algorithms, which offer a more versatile modeling language.
    • summary: Through millions of years of evolution, nature has designed robust algorithms that are still, by and large, a mystery to us. There are fairly reliable algorithmic models for bird flocking, fish schooling, and other sorts of collective behavior, but we lack the tools to analyze them and make predictions. While such systems can often be engineered to be undecidable, in their natural habitat they are likely to operate in computational regimes far removed from universality. Central computer science questions such as abstraction (or "renormalization") need to be addressed before we can make any significant progress. While the modeling is in the hands of natural scientists, building analytical tools for "natural algorithms" (the heart of biocomplexity) should be a golden opportunity for theoretical computer scientists.

  • Secure Program Obfuscation or Functional Secrets
    • proposed by: Amit Sahai
    • image: ? maybe an image of a cute-looking obfuscated c code contest winner?
    • tagline: Can we publish algorithms that contain secret information?
    • summary: Intellectual discoveries are increasingly taking the form of algorithms. In many contexts, it is important to keep the inner workings of such algorithms secret, whether to protect an investment in proprietary research, or to protect algorithms for electronic commerce or national security applications that will be executed in insecure computers. But can we maintain secrecy of an algorithm even while it is being executed? Note that traditional notions like encryption fall far short -- when data is encrypted, it becomes inert -- useless until decrypted. Progress in research on secure program obfuscation would lead to progress on important and long-standing open questions in cryptography with numerous applications. Tackling this problem will require major advances in theoretical computer science, and this question deals directly with the nature of computation. To add: Interesting relaxations of this problem, which include: (1) how to make hardware that can keep a secret even if it is tampered with, (2) encryption that allows computation to be performed on encrypted values

  • The Factor-Discrete Log Connection
    • proposed by: Dick Lipton
    • tagline: Why do factoring and discrete log seem to have the same complexity?
    • summary: Factoring large integers and taking discrete log's modulo large primes both seem to be hard to compute. Many modern cryptographic protocols rely on assuming one or the other is hard. (Ay least for classic computers.) There seems to be no direct or even indirect connection between these two problems. Solving one efficiently seems to have no direct consequences for the other. Yet there seems to be a hidden connection. Every advance in computing one is followed by a similar advance in the other. This is true for classic algorithms as well as quantum. Recall, for example, that Shor's breakthrough paper showed how to solve both in quantum polynomial time. This seems to be a great puzzle. How can their complexity seem to "match" yet there be no apparent connection between them? Thus, an extremely important research challenge is to unravel the hidden connection between these problems. If this goal can be achieved it seems likely that it will advance our understanding of basic computational number theory. Such an advance could have important consequences in cryptography.

  • Programming/algorithm models for the processor-of-the-future
    • proposed by: Uzi Vishkin
    • image: See picture of the PRAM-On-Chip computer board at the top of http://www.ece.umd.edu/news/news_story.php?id=2289
    • tagline: All current and future computers are parallel machines. It is currently too difficult to program them effectively.
    • summary: After improving the clock frequency of computer chips by a factor of nearly a Million since the 1940s, vendors have all but given up on further clock frequency improvement. This means that the only way for running a computer application faster is to program it for parallelism. As the number of transistors that fit on a single chip has increased by a factor of a Million over 3 decades, vendors can now build 1000-processor chips. Unfortunately, this huge potential computing power is greatly underused. Vendors are incorporating much fewer processors into their designs, and application programmers are greatly under-using them as the mainstream programming models are too difficult for many of them. Vendors have also made it public knowledge that they will be building parallel machines with an ever increasing number of processors. It is also widely understood that the programming model is likely to change in currently unforeseen ways in order to make these future machines easier to program. Consequently, application software vendors tend to avoid investing rather than risk betting on the wrong programming approach. Identifying good parallel programming and algorithms models for incorporation in the processor-of-the-future is also crucial for the productivity of the information technology sector of the economy as a whole, as it affects run time of computer programs and their development time. The PRAM algorithmic model that was developed by the theoretical computer science community in prior decades is already the basis for the 64-processor PRAM-On-Chip computer prototype at the University of Maryland http://www.umiacs.umd.edu/users/vishkin/XMT/CompFrontiers08.pdf. The PRAM-On-Chip architecture scales to 1000 processors. This demonstrates that, contrary to the common wisdom that algorithm work should start only after a machine is built, a theory that explores algorithmic thinking paradigms within an abstract mathematical model can pave the way for the computer architecture work that comes later. In fact, it makes more sense to figure out how a complex modern computer is to be programmed before it is even designed, than after it is already fully built. The theory community should continue to play the same pivotal role in coming up with refined and new models and study existing options till convergence to a framework for a commodity parallel machine is achieved. At that point in time it should be called upon to develop more algorithmic techniques.

  • Incorporating Incentives into Algorithm Design
    • proposed by: Anna Karlin
    • tagline: In many distributed algorithms, e.g., Internet protocols, the participants cannot be assumed to follow the algorithm but rather their own self-interest. Given desired goals, such as maximizing profit or social welfare, how do we design algorithms in such a clever way, that rational participants, motivated solely by their self-interests, end up achieving the designers' goals?
    • summary: Algorithms and protocols in diverse settings ranging from electronic commerce to resource allocation to routing and congestion control are designed to achieve objectives such as global optimality, social efficiency, fairness and profit maximization. However, when these algorithms are deployed their outputs are often computed from inputs provided by participants (agents) that have a selfish interest in the output of the algorithm. It is not reasonable to assume that these agents will behave as instructed if such behavior goes against their selfish interests. Yet, most work on algorithm design makes precisely this assumption. A fundamental challenge, therefore, is to incorporate “incentive engineering” into algorithm design. How do we design algorithms so that rational participants, motivated solely by their self-interest, will end up achieving the designer’s goals? Solving these problems will require merging algorithmic principles and complexity considerations with concepts and techniques from game theory and economics.
    • Useful text can be found in Nisan's survey entitled "Algorithms for Selfish Agents: Mechanism Design for Distributed Computation" and Papadimitriou's survey "Algorithms, Games and the Internet" among others, and I have taken a few lines from both.

  • Unraveling the web of interactions in cells
    • proposed by: Bhaskar DasGupta?
    • image: Some nice colored picture of biological network (plenty available). Click here for some possibilities.
    • tagline: Can theoretical computer science help in understanding cell computation?
    • summary: A central concern of contemporary cell biology is that of unraveling the web of interactions among the components of complex protein and genetic regulatory networks. Notwithstanding the remarkable progress in genetics and molecular biology in the sequencing of the genomes of a number of species, the inference and quantification of interconnections in signaling and genetic networks that are critical to cell function is still a challenging practical and theoretical problem. Investigations of central questions in this area lead to many specialized combinatorial and graph connectivity problems that may require sophisticated algorithmic methodologies. Thus, a bridge between graph algorithms community and system biologists would enable us to understand cell computation better.

  • Energy Efficient Computing and Communication
    • Proposed by: Lisa Zhang
    • Image: Something green? (Need more thoughts.)
    • Tagline: How can theory of computing help with green computing and green communication?
    • Summary: Green computing and green communication are receiving increasingly more attention, e.g. IBM's Project Big Green, Lawrence Berkeley Lab's climate computer. While the primary focus is on issues such as building green hardware and discovering alternative power sources, what role can algorithms and optimization play?
      There are examples of energy optimization from algorithmic perspective, e.g. the recent wave of work in sensor networks. Problems in related areas can also be reexamined. For example, wireless scheduling algorithms often aim to maximize throughput assuming given channel rates and fixed power. What can we propose for the dual problem, namely minimizing energy consumption subject to fixed throughput? Is there a satisfactory tradeoff?
      More broadly, is it meaningful to have power consumption as a measure of algorithmic performance, in addition to the traditional notion of time and space? Would it be possible to define this new measure on a "universal" platform that is independent of particulars such as operating system, compiler or the networking environment? A specific example to this question could be how to build new data structures that are sensitive to energy consumption.
      Another thread to explore is the gain in energy efficiency that can be obtained from parallel computing.

  • Safety-Critical System Verification
    • Proposed by: Kristin Yvonne Rozier
    • Image: composite of many types of safety-critical systems:
      • a flying airplane
      • space shuttle
      • air traffic control screen
      • autopilot screen shot
      • nuclear power plant (could use nuclear symbol here)
      • medical radiation device (could use red cross here)
      • CPU (from Intel -- their chips are verified this way)
    • Tagline: Safety-critical systems hold our lives in their hands yet how do we know we can trust them? Formal methods harness the power of logic and mathematics to provide the highest possible level of assurance.
    • Summary: Safety-critical systems have become ubiquitous is our everyday lives and grow increasingly complex every year. They fly our planes, control our nuclear power plants, run our medical devices, and so much more. Yet, how do we know they are safe? Verification techniques such as exhaustive testing and simulation could take millions of years, or longer, to certify a system to a high level of assurance. In contrast, we can use formal methods to logically analyze safety-critical systems and provide absolute assurances of software algorithms in reasonable timeframes. Recent algorithmic advances have vastly increased the size and complexity of the systems which we can analyze using formal methods but there is still more research to be done.

  • A cloud connecting earth
    • Proposed by: Rajmohan Rajaraman
    • Image: Maybe a cloud showing a range of devices and services, with a map of the world in the background
    • Tagline: We now have the technology to connect together a wide range of computers, from tiny devices to powerful servers. This offers us the unprecedented and immensely challenging opportunity to tap into this cloud of billions of computers.
    • Summary: The ubiquity and the ever-increasing complexity of cell phones, pdas, laptops, and other personal computing devices allows them to do complex computational and communication tasks. At the same time, the increasing interconnectedness of these devices among themselves and to the Internet presents to us an amazingly diverse massively distributed network that can vastly increase and improve the kind of tasks we want to accomplish. Some limited versions of this paradigm have been around, of course, and variously referred to as peer-to-peer systems, grid computing, and off-late, cloud computing. Computing, communicating, and sharing information through this cloud of interconnected computers raises a number of interesting algorithmic questions. How can complex computational tasks be distributed within this heterogeneous cloud with very little state being maintained at any individual element of this cloud? What kind of incentives can be provided to ensure that various elements of the cloud contribute appropriate resources? How can performance improvements be provided without sacrificing anonymity and security? How can the cloud be used to maintain a permanent archival of important data that is easily and always accessible to all interested users? We need to build on fundamental work in distributed computing, data storage systems, overlay routing. The new research needs to focus on the widely heterogeneous and highly dynamic nature of the distributed network and the diversity of interests.

  • The evolution of networks, great and small
    • Proposed by: Rajmohan Rajaraman
    • Tagline: Complex networks abound: the web, social networks, the Internet, biological networks, organizational networks. We need models that both explain and impact the design and evolution of these networks, taking into account the motives of the network agents.
    • Summary: Complex networks arise in diverse domains including social interactions, digital communications, information exchange, organizational behavior, biology, and genetics. There have been several studies for modeling such networks; e.g., the preferential attachment model, power-law graphs, the bow-tie model for the Web, evolving copying models, and duplication models. These models primarily focus on the global, structural, and statistical properties of the network. They do not always capture how individual agents forming these networks act and change their behavior in response to other agents, based on their own selfish motives. What kind of networks are formed by selfish agents with diverse utility functions and asymmetric information? How do coalitions affect the way these networks evolve? What is the impact of locality of information? What is the impact of certain special agents -- such as the feedback loop between search engines and users placing weblinks? What incentives can achieve desirable network designs? The ultimate goal is to develop hybrid game-theoretic and statistical models for real networks, help us reason about their behavior, and influence socially-effective network design in non-cooperative environments. Simplified game-theoretic models for network formation have been proposed both in the CS and economics communities. We need new game-theoretic and statistical models that capture information asymmetry, bounded rationality, altruistic and adversarial interests, and the complex dynamics of real networks.

  • Locality, Locality, Locality: A Communication-centric Perspective on Algorithms
    • Proposed by: David Wise
    • Tagline: For years the practical measure of performance of a computer or of an algorithm has been either uniprocessor cycles or space in main memory. The rapid scaling of multicore chips makes it clear that now communication, among both processors and memories, dominates aggregate time and performance. It's bandwidth---even within one computer---not just cycles.
    • Image: here is a rough sketch which should be better framed. The first photo is of 1890 telephone wires from here --- public domain. The second one is from here. It is an old Marchant calculator, of which I can find an older version---if this one is yet in copyright. Not difficult.
    • Summary: Computing theory now has a rare chance to lead other branches of computer science and engineering, which are already struggling with these new definitions of efficiency. Theory can explain this altered future by developing new
      • models that expose communication and memory hierarchy,
      • techniques for anaysis of different transfer protocols and routing,
      • and abstract measures of these complexities.
    • Summary (cont): Since locality of computation in both processors and memory already dominates aggregate performance, an abstract measure of this locality is a primary goal. Progress here will show
      • computer architechts how to allocate their nanoacrage for optimal use of local data;
      • designers of programming languages what tools to develop that can reduce communication and promote effective use of what remains;
      • new algorithms and new characteristics of the best ones for designers to leverage into their products;
      • new content for the standard computing curriculum to prepare today's students for using good locality in their professional careers;
      • tools for software engineers to design our local/global future.

  • Rethinking the Internet's economic structure
    • Proposed by: Shuchi Chawla
    • Tagline: A big challenge facing today's Internet is the lack of economic incentives to adopt new technology (e.g. IPv6?) and improve user experience. Can theoretical computer science guide the design of a new economic structure for the Internet that enables such incentives?
    • Summary: Today's Internet industry suffers from several well-known pathologies, but none is as destructive in the long term as its resistance to evolution. The last two decades have seen an explosive growth of the Internet, however it's core architecture has remained more or less static over this time. In recent years, numerous ideas have been developed in universities, implemented in code, and even written into the routers and end systems of the network, only to languish as network operators fail to turn them on on a large scale. The list includes IPMulticast?, IPv6?, IntServ?, and DiffServ?. These new proposals, if adopted, would greatly improve quality of service on the Internet, enable new applications, and prevent crises such as address space exhaustion. Rather than introducing new services, ISPs? are presently moving towards greater commoditization. The prevailing wisdom now is that ISPs? have little incentive to deploy new architectures; the costs of universally deploying a new architecture are immense, whereas the corresponding increase in revenue is not commensurate. What is needed is a greater understanding of the Internet's economic structure and the incentives it creates, as well as the development of a new economic structure that provides incentives for efficient resource allocation as well as innovation and growth. This inherently entails an algorithmic and game theoretic approach. Analytic tools from algorithmic mechanism design and price of anarchy, developed over the past few years by the theoretical computer science community, will be immensely helpful here.

  • Algorithms for mining the modern scientific datasets.
    • Proposed by: Petros Drineas
    • Image: some raster plot of microarray data or SNP data.
    • Tagline: As data generation becomes increasingly automated in most scientific domains, mining the resulting datasets is at the forefront of scientific research. Widely applicable, provably accurate algorithms for tasks such as dimensionality reduction, regression, clustering, classification, feature selection, etc., are of continuing interest.
    • Summary: Technological developments over the last two decades in many scientific domains permit the automatic generation of large data sets. Consider, for example, the 1000 Genomes Project, which intends to sequence the full genome of 1000 individuals over the next three years, thus offering unique insight and a gold-standard reference set for analysis of human genetic variation. The key research question is to visualize the big picture from the sea of available data by extracting easily interpretable information from such complex data sets. Towards that end, algorithmic tools such as dimensionality reduction, regression, clustering, classification, etc. have been successfully applied to data analysis. As the size and the complexity of the datasets grows, the need for new techniques is growing as well. Simple, computationally efficient, and provably accurate algorithmic approaches typically have a greater chance of being widely applied across scientific disciplines and thus enjoy greater impact. It should be noted that such data analysis algorithms typically operate on inputs with random noise and significant amounts of structure. Thus, worst-case analysis of the proposed algorithms might be less relevant than, eg., smoothed analysis.

  • The Limits of Sampling
    • Proposed by: Santosh Vempala
    • Image: a set of points, inside a square, forming a circle
    • Tagline: What can/cannot be inferred about the ground set from a sample? What can/cannot be inferred efficiently?
    • Summary: A small sample can reveal a lot. Or can it? For example, the centroid of a convex body in n-dimensional space can be estimated with O(n) uniform samples, but learning the body needs exponential in n samples. What functions of a convex body need only poly(n) samples to estimate? It is open to compute the volume using fewer than exponential samples (note: this is from a purely random sample with no membership queries). An analogous result in graph theory says that all hereditary properties can be tested (approximately) using a small sample. Is there a common generalization of these questions? An information theory of sampling discrete and geometric sets? Such a theory could provide systematic ways to deal with large or high-dimensional data sets.

  • P != NP as a law of nature.
    • proposed by: Avi Wigderson
    • tagline: What is a good explanation of natural phenomena? When is a scientific theory useful? Being predictive is essential to these. And if prediction in the given model is intractable, it is probably wrong, since nature does not solve intractable problems.
    • summary: The efficient version of the Church-Turing thesis asserts that the class P captures *everything* computable efficiently by reasonable means. This should certainly include all natural phenomena, which we can hope to integrate into our computers. Thus whenever some natural phenomena seem to solve a problem faster than we can on our computers, we have a win-win situation. Either we can use it to enhance computation, or our model of this phenomena is faulty.
      Quantum computing research is a perfect example of this scenario, in which we still don't know who "wins" - can we build quantum mechanical computers, or should we revise quantum mechanics. But more likely than not, the extremely efficient mechanism of protein folding cannot be used to enhance computers. Rather the NP-completeness of this problem in current models suggests that we need to modify them (the actual objective function, the allowed operations, and/or the set of legal inputs) to better explain that efficiency.
    • This nugget complements well the "Natural Algorithms" nugget of Bernard Chazelle.

  • Truly Universal Heuristics
    • Proposed by: Leonid Levin.
    • I do not use drugs, thus have no visions. I even think, only mediocre results can be seriously envisioned in advance. Real breakthroughs are the results that are fundamentally unpredictable. The attempts of bureaucratic machines to guide researchers are deadly for science. (This, of course, does not apply to projects that involve large groups or expensive equipment and require advance planning or coordination.) Committees are notorious for having additive shallowness; thus even groups of very bright scientist make silly committees, their meddling having a chilling effect. I fervently hope, that what I write would be only read by politicians, and not used to guide researchers! With that stipulation, here is my random example of a Nugget (Agrh!):
    • Tag-line: How do humans solve "unsolvable" problems.
    • Logo/image: A meditating (possibly in the air :) Yogi.
    • Summary: Great many crucial problems that defeat current computer arts can be stated in a form of inverting easily computable functions. Still other problems, such as extrapolation, are related to this form. We have no idea whether our difficulties are intrinsic to these problems or just reflect our ignorance. We will remain in the dark till major advances in our fundamental questions, such as, e.g., P=NP. And yet, traveling salesmen do get to their destinations, mathematicians do find proofs of their theorems, and physicists do find patterns in transformations of their elementary particles! How do they do this, and how could computers emulate their success?
      In fact, whatever the difficulty of inversion problems, we know an optimal algorithm for them all, the algorithm that cannot be sped-up by more than a constant factor, even on a subset of instances. It searches for solutions, but in order of increasing complexity, not increasing length. A fairly short solution may be hard to find, while a much longer one could be straightforward. Complexity can be roughly defined as the minimal sum of the length of a prefixless binary program p that transfers the given instance into its solution and the log of the running time of p. Extrapolations could be done by double-use of this concept: weight extrapolations (consistent with know data) according to 2^{-k) where k is the length of their shortest description, and use the above optimal search method to find short descriptions.
      Yet, these methods are optimal only up to constant factors. Nothing is known about these factors and simplistic attempts make these factors completely unreasonable. Current theory cannot even answer straight questions, such as, e.g., is it true that some such optimal algorithm cannot be sped-up 10-fold on infinitely many instances? Yet humans do seem to use such generic methods successfully, raising hopes for a reasonable approach to these factors. Something interesting may lie here.

  • Toward a Computational Understanding of the Brain
    • proposed by: Rocco Servedio
    • image: a brain, a flowchart, a circuit, etc; maybe something like the cover of Valiant's "Circuits of the Mind," which is a big source of inspiration for this nugget
    • tagline: Can theoretical computer science help unlock the secret of how our brains work?
    • summary: For millenia humans have pondered the mystery of how intelligence and consciousness arise from brain functioning. How is information represented, stored, and processed in our brains? How do we learn? These are fascinating and deeply important questions that are inextricably linked both to computer science and biology, since they are really questions about how a biological system performs certain information-processing tasks. Fortunately, the past few decades have witnessed parallel scientific advances in biology and computer science that that each hold tremendous promise for helping answer these questions. Our knowledge of brain structure and functional organization has exploded in recent years (biology and neuroscience), and there have also been tremendous advances in our understanding of the abilities and limitations of efficient computational processes in general and of computationally feasible learning in particular (theoretical computer science and computational learning theory). We do not know what data structures and algorithms the brain uses to store and process information, or how these algorithms and data structures are implemented in the brain. But in order to make these discoveries, we will surely need both a thorough understanding of the brain in particular and a thorough understanding of what is (and is not) possible in general for algorithms, data structures, and learning. By working together in the decades to come, theoretical computer scientists and neuroscientists are well positioned to make breakthrough progress in our understanding of how the brain functions as a biological computer.

  • Theory of Stored Information
    • proposed by: John Grisinger
    • image: Attach:storedinfo-nugget.bmp
    • tagline: What is Information?
    • summary: We stored information in computers without having a fundamental understanding what it is or how it relates to human comprehension. The storage methods we use were developed in a previous era when the goal was storage efficiency. Those methods are analogs of oral and written forms of human communication; they are antithetical to how humans comprehend the world. Mathematics and logic provide a means of describing information, but neither provide guidance on why it should be stored one way or another. A fundamental understanding stored information can only arise from the study of the first principles of information itself, not from mathematics or logic, nor from the study of information processes or the machines used to execute those processes. Mathematical expressions (as well as the human mind) integrate data and its organization so that data and its organization are inseparable. In contrast, each current method stores the data's organization externally (e.g., schema, file format specifications, software and documentation) and is limited by what organization it can specify. A storage method that is built from the first principles of information that integrates data and its organization has the potential to store any kind of information. One attempt to develop such a storage method is described in the "first principles" paper at http://www.varadata.com/papers.html.
    • longer rationale: We stored information in computers without having a fundamental understanding what it is or how it relates to human comprehension. The storage methods we use were developed in a previous era when storage efficiency was the primary goal. Those methods are analogs of oral and written forms of human communication, primarily complex text strings, tables, files and software; they are antithetical to how humans comprehend the world. They survive today because of the prodigious efforts by system developers to overcome their limitations. Those limitations are now being further appreciated by those attempting to parallelize application logic.
      Mathematics and logic provide a means of describing information however we chose to store it, but neither provide guidance on why it should be stored one way or another. A fundamental understanding stored information can only arise from the study of the first principles of information itself, not from mathematics or logic, nor from the study of information processes or the machines used to execute those processes. A lesson to be taken from the other sciences is that one must address fundamental questions by resolving the problem into its most fundamental elements. For information, those elements are easily identified because they appear in some manner in all methods of storing information of any kind. They are (1) data (stored as bits) and (2) their organizing associations.
      Both mathematical expressions and the human mind integrate data and its organization. In each case, data and its organization are inseparable and their conventions/rules are minimized. In contrast, languages (both human and programming) and the current information storage methods each store the organization of words/data externally and are not minimized. An information storage method's data organization is specified by conventions/rules in the form of schema, file format specifications, software and documentation. Each storage method is limited by what organization it can specify.
      A storage method (1) built from the first principles of information and fundamental elements, (2) that integrates data and its organization and (3) minimizes its rules/conventions has the potential to store any kind of information and possibly provide a more fundamental understanding of information itself. It may also provide a means of parallelizing application logic. One attempt to develop such a storage method is described in the "first principles" paper at http://www.varadata.com/papers.html.

  • Understanding patterns
    • proposed by: Valentine Kabanets
    • tagline: is there a way to tell random patterns from non-random ones? can one extract an efficient algorithm responsible for creating a given pattern?
    • summary: Every object/phenomenon/pattern in nature can be viewed as the result of some algorithmic process. (A car is the result of a certain manufacturing process; a book is the result of some creative writing process by authors; the shape of Hawaii is the result of some geological process; etc.) One way to understand natural patterns is to recover algorithms that generated them; ideally such an algorithm must be the simplest possible (most efficient). Obviously such "understanding of patterns" would have many interesting applications. In a specific setting where objects are finite sequences of data, recovering a simplest algorithm responsible for the data sequence (or even distinguishing a random data sequence from a non-random one) is related to one of the most basic open problem in computational complexity: determining the circuit complexity of a given Boolean function (which in turn is related to the "P vs NP" problem). A related problem is when we know an (efficient) algorithm that generated a given pattern, but we don't know the initial conditions (input to the algorithm). The complexity of this problem in its full generality is not known. (For special cases, say for certain error-correcting encoding algorithms, we know how to recover the original message, even from very corrupted received words.)

  • Semantics of Data
    • proposed by: Madhu Sudan
    • image: bits flowing into a computer with a human at the keyboard; with multiple "speech clouds" saying "Nice picture" and "clever proof", "strange music".
    • tagline: Information is not of interest in its own right, but rather relative to the meaning one associates to it. How can we build systems that protect not only the information, but also the associated meaning?
    • summary: Loss of information is increasingly associated not so much to the loss of individual bits, but rather to the loss of the entire method to understand those bits. Day to day examples include the loss of players for vinyl records, loss of players for videotapes (it is no longer simple to buy them in many countries!), and the inability to compile some documents produced using old text processors (e.g., troff, and even certain styles of TeX?). Increasingly it is becoming clear that the method of ``interpretation'' of data has to be integrated with the data itself. But such an interpretation is not an absolute entity. It is always presented relative to an interpreter.
      This cyclical process begs the question: Is it possible to build a theory of information that preserves the meaning of data, even as the ability to interpret it keeps evolving? Even more basic is question: What is the "meaning" of information? We believe that answers to such questions could lead a new domain of problems/challenges/opportunities for computer science.

  • Refining the meaning of "tractable"
    • proposed by: Piotr Indyk
    • summary: Over the last few decades, polynomial-time computability has emerged as the gold standard of computational tractability. The class P enjoys a powerful combination of features:
      • it is expressive: it contains a great wealth of interesting computational problems;
      • it is robust: it is closed under natural operations (notably composition), and is invariant under natural choices of computational models
      • it is predictive: the membership in P correlates surprisingly well with the empirical tractability of a problem.
        All of this made this class an indispensable tool for investigating the boundary between tractable and non-tractable.
    • summary (cont): It has been observed, however, that the predictive power of the class P becomes diminished when the input size n becomes sufficiently large. For example, a quadratic-time algorithm is typically not efficient on, say, gigabyte sized inputs. This motivated investigation of alternative notions of tractability, often inspired by technological considerations. Examples of such notions include sub-linear space/time computability, variants of parallelizability (e.g., via Map-Reduce), etc. Alternative definitions of intractability (e.g., 3-SUM hardness) have been investigated as well. Each of those notions carves out a slice of computational landscape that can be plausibly labeled as tractable or intractable. Despite significant effort though, vast fraction of the landscape remains unlabeled.
      The ultimate definition of tractability, it seems, might need to take into account the limitations imposed by the laws of physics. This might become more apparent as the performance of computing devices keeps getting closer to the physical limits. There are many research directions along these lines (many of them have been investigated in the literature):
      • Given that the total amount of matter available to store/process the data is bounded, the amount of information that can be stored in our universe can be bounded from above by a (somewhat large) constant. Is it possible that such arguments could lead to meaningful impossibility results for large enough, yet realistic (e.g., terabyte scale) problems?
      • The 3-dimensional structure of space imposes limitations on communication patterns between processing units. Does this imply any constraints on traditional complexity measures such as time or space ?

  • Efficient computation in the physical world (?)
    • proposed by: Scott Aaronson
    • summary: What is the most powerful kind of computer compatible with the laws of physics? Which problems can computers ever solve within reasonable constraints of time and memory? These questions took on a new character fourteen years ago, with Peter Shor's discovery that a quantum computer could factor integers in polynomial time, and thereby break the cryptographic codes on which most of the world's electronic commerce depends. A quantum computer is a proposed device that would exploit the counterintuitive nature of quantum physics to solve certain problems exponentially faster than we know how to solve them with any computer today.
      In the wake of Shor's algorithm, one can identify three basic questions:
      • First, can quantum computers actually be built? Can they cope with realistic rates of decoherence -- that is, unwanted interaction between a quantum computer and its external environment? Alternatively, can we find any plausible change to currently-accepted laws of physics such that quantum computing would *not* be possible?
      • Second, what would the actual capabilities of quantum computers be? For example, could they efficiently solve NP-complete problems? Though quantum computers would break many of today's cryptographic codes (including RSA), can other practical codes be found that are secure against quantum attacks?
      • Third, does quantum computing represent the actual limit of what is efficiently computable in the physical world? Or could (for example) quantum gravity lead to yet more powerful kinds of computation?
    • summary (cont): Theoretical computer scientists have already contributed basic insights about all three of these questions. For example, they played a central role in proving and extending the Fault-Tolerance Theorem, which says that once a quantum computer's noise rate falls below a certain critical point, decoherence can be corrected faster than it occurs, and hence arbitrarily long quantum computations are possible in principle. On the other hand, theoretical computer scientists also discovered evidence that quantum computers would face major limitations in solving NP-complete problems, as well as related tasks such as breaking cryptographic hash functions. Finally, they studied hypothetical models beyond quantum computing, for instance involving topological quantum field theories and closed timelike curves.
      In summary, quantum computing has revealed an unexpected link between the central unsolved problems of theoretical computer science (such as P versus NP, and the security of public-key cryptography) and the nature of physical law. One guesses that there is more to discover at this intersection.

  • What can be computed insensitively?
    • proposed by: Cynthia Dwork and Udi Wieder
    • Tagline: For which problems can we build algorithms that are insensitive to input modifications
    • Summary: More and more modern computational tasks have the property that the input may change and is sometimes not entirely known. For instance, in optimization problems over large and distributed systems it is expected that a solution not degrade much even if a small number of input entries are added, removed or modified. One may prefer a slightly sub-optimal solution which is robust under these operations over the optimal but sensitive solution. Another example is related to privacy of individual entries in a database. Here the queries for which a meaningful privacy preserving reply exists correspond (more or less) to queries which are insensitive to small data modifications.
      Related notions of sensitivity (or insensitivity) have been the target of investigation for quite some time. Examples include the affect of small perturbations of data values on the computation (smooth analysis, perturbation theory), dynamic data structures, graceful degradation and so on. In contrast, here we consider arbitrary perturbations on a small number of elements.

  • Approximation in economics
    • proposed by: Jason Hartline
    • tagline: when simple isn't optimal, how close is it?
    • summary: Economic mechanisms specify the rules for interaction between people and computers in society and networks. In eBay sellers post items for sale in an ascending auction with a reserve price (minimum bid value). In real estate markets, a seller posts a price, a buyer accepts, and the real estate agents take six percent. Advertisements on Internet search engines are sold keyword auctions: advertisers bid for keywords, on-the-fly when a user enters a search query, all advertiser keyword-bids that match the query are entered in an auction to sell several advertising slots that appear on the right- hand-side of the search page. These mechanisms are very simple and even natural, but are they optimal? When they are not optimal, how far from optimal are they? When they are far from optimal, are there other natural mechanisms that are closer to optimal? The key to answering these questions is a theory of approximation for economic mechanisms. eBay's auction is optimal, the real estate exchange is optimal for the agents in some settings and suboptimal in others, Internet ad auction are suboptimal.
      Economic theory literature focus almost exclusively on optimal mechanisms, it describes settings where known mechanisms are optimal, what the optimal mechanisms are in more general settings, and gives impossibility for mechanisms in still more general settings. Theoretical computer science design and analysis techniques from approximation algorithms, online algorithms, and randomized algorithms, where an algorithms performance is measured relatively to a performance benchmark, provide a rigorous framework for approximation can be applied to economic mechanism design. Notable examples of approximation for economics include three subareas of the theoretical computer science literature, (a) "price of anarchy" measures the approximation achieved by selfish agents in equilibrium against the performance of globally optimal outcome, (b) "prior-free mechanism design" looks at design and analysis paradigms where a single mechanism across many settings is sought to simultaneously approximates the performance the optimal mechanisms for each setting, and (c) "algorithmic pricing" where an algorithm is sought to give simple, approximately optimal, pricings of alternatives under a known demand. This last area addresses one of the main economic impossibility results for designing optimal mechanisms for agents with complex preferences.
      While consideration of economic objectives through the lens of approximation will help address crucial challenges of economic theory, it will also provide the foundations for Internet systems design where both economic and computational constraints and methodologies are necessary.

  • Spectral data analysis for the masses
    • proposed by: James Lee

  • Complexity and rationality
    • proposed by: Tim Roughgarden
    • summary: using computational complexity to contribute to the debate on e.g. different game-theoretic equilibrium concepts

  • Self-assembly of nano-structures
    • proposed by: Diane Souvaine
    • tagline: Can rod-like portions of viruses be self-assembled into guaranteed structures such as fine-grained meshes?

  • Pseudorandomness of the Primes
    • proposed by: Luca Trevisan
    • image: a "sieve of Eratosthenes" picture like http://mgccl.com/files/primespiral500.png
    • tagline: what do complexity theory and the foundations of cryptography have to do with fundamental questions in number theory?
    • summary: several traditional problems in number theory, such as whether there are infinitely many primes p such that p+2 is prime, can be approached as questions about the "pseudorandom" behavior of primes. The recent work of Green, Tao, and others on patterns in the primes combines methods from Analysis, Combinatorics and Ergodic Theory to, ultimately, prove results about pseudorandomness. Complexity theorists and cryptographers have been studying pseudorandomness for nearly thirty years, and the tools and theories they have developed seem to be genuinely different, and likely quite useful. (Both giving new ways to attack old questions, and giving a new language in which it is easier to formulate new problems and conjectures.)

  • Computing in large-dimension state-spaces
    • proposed by: John Sidles
    • Image: a Goldilocks and the Three Bears picture like http://courses.washington.edu/goodall/MRFM/three_bears.png
    • Tagline: What are our options when state-space is too large or too small?
    • Summary for science advisors: More and more modern computational tasks have the property that the size of the state space is a matter of discretion and design. For example, in simulating quantum systems researchers can choose to work in the native Hilbert state-space (thus preserving invariances), or alternatively in reduced-order spaces (thus bringing geometry to bear), or alternatively in the larger-than-Hilbert dictionary spaces of compressive sensing and sampling theory (thus bringing information theory to bear).
         These new design choices are ubiquitous in both classical and quantum state-spaces. Increasingly, they are relevant to large-scale calculations in every branch of science and engineering.
          A program that specifically focussed on the question "What general computational principles and techniques are broadly applicable to systems having large-dimension state-spaces?", as defined in a series of conferences, white papers, test-beds, and rapid-prototyping software, would cross-fertilize and invigorate many disciplines. Industrial participation will be encouraged.
    • Summary for senators: Science and engineering work best---generate the most new ideas, new enterprises, jobs, and prosperity---when there is a vigorous two-way flow of ideas, from the abstract to the concrete and from the concrete to the abstract. Large-scale state-spaces are an emerging arena within which this two-way flow of ideas is outstandingly vigorous. The United States science and technology community is uniquely well-positioned to take advantage of these new opportunities, by virtue of its culture of enterprise and innovation. Mathematics and computation are the two keys that open this new door.
         Then the following two slides need to be made more general, and more broad-spectrum, while maintaining their present focus on practical applications and enterprise, and especially, their present senator-friendliness: http://courses.washington.edu/goodall/MRFM/cs_front.png and http://courses.washington.edu/goodall/MRFM/ISH.png

view · edit · attach · print · history
Page last modified on May 20, 2008, at 08:00 AM