Probabilistic Computation Project
A great deal of enthusiasm has been focused on building increasingly large neural models. We are pursuing an analogous scaling roadmap for probabilistic programming, and for the applications and cognitive AI architectures that it enables. The probabilistic source code for these AI systems is partly written by AI engineers, partly learned from data, and partly synthesized from natural language. This approach integrates the best of large-scale generative modeling and deep learning with probabilistic inference and symbolic programming. Unlike neural networks, probabilistic programs can report what they know, what they don’t, and why; they can be modularly designed, debugged, and tested; and they can learn new symbolic code rapidly and accurately from sparse data.
Starting in 2019, open-source MIT probabilistic programming platforms have yielded SOTA results that outperform machine learning in multiple application domains. These results depend on fast, robust inference of model structure with calibrated uncertainty, and automated Bayesian learning of probabilistic program source code. Existing deep learning fabrics, although well suited for inferring model parameters, do not support the hybrids of deep learning with sequential Monte Carlo, Markov chain Monte Carlo, variational inference, dynamic programming, and symbolic meta-programming that are required for large-scale Bayesian structure learning and real-time inference. Our open-source probabilistic programming tools have, for the first time, yielded productivity and scalability gains analogous to gains from deep learning platforms such as TensorFlow and PyTorch. Compared to expert handwritten code, implementations via these platforms require ~20x fewer lines of code, are competitive (same or better) in runtime, and are easier to develop, optimize, and scale further. In addition, since 2008, MIT has been developing massively parallel, ultra-low-precision, stochastic digital hardware, that can be 100 – 10,000x more power efficient for probabilistic programming workloads, as well as spiking neural Monte Carlo circuits that further narrow the efficiency gap with the brain.
These results suggest that it may be possible to achieve orders-of-magnitude more efficient and controllable AI scaling than large neural models: using just tens of watts of power for real-time learning from just one or a few unlabeled examples; generalizing across tasks, environments, and sensors, without retraining; robustly interpreting real-world data without the inexplicable failures and vulnerability to adversarial examples that seem inescapable with deep learning; and reporting appropriate uncertainty in common-sense terms to downstream clients.
In short, this is scaling AI the human way, an alternative shot-on-goal for AGI with a distinctive value proposition very different from — and if we’re right, far more valuable than — today’s deep-learning-focused efforts.