The Rise of Cognitive AI

Structured, explicit, and intelligible knowledge can provide a path toward higher machine intelligence

Photo credit: Ting Ling Goay stock.adobe.com

Deep learning (DL) is generating a great deal of progress and revolutionizing entire industries across all aspects of life, including healthcare, retail, manufacturing, autonomous vehicles, security and fraud prevention, and data analytics. However, to build the future of artificial intelligence (AI), it is necessary to define a set of goals and expectations that will drive a new generation of technologies beyond the deployments we are seeing today. By 2025, we are likely to see a categorical jump in the competencies demonstrated by AI, with machines growing markedly wiser.

Many of the current DL applications address perception tasks related to object recognition, natural language processing (NLP), translation, and other tasks that involve the broad correlation processing of data such as recommendation systems. DL systems provide exceptional results based on differential programming and sophisticated data-based correlation and is expected to drive transformation across industries for years to come. At the same time, a number of fundamental limitations inherent to the nature of DL itself must be overcome so that machine learning, or more broadly AI, can come closer to realizing its potential. A concerted effort in the following three areas is needed to achieve non-incremental innovation:

  • Materially improve model efficiency (e.g., reduce the number of parameters by two to three orders of magnitude without loss in accuracy)
  • Substantially enhance model robustness, extensibility, and scaling
  • Categorically increase machine cognition

Among other developments, the creation of transformers and their application in language modeling has driven computational requirements to double roughly every 3.5 months in the recent years, highlighting the urgency for improvements in model efficiency. Despite developments in acceleration and optimization of neural networks, without improvements in model efficiency, current model growth trends will not be sustainable for the long haul.

Figure 1 shows the exponential growth of the number of parameters in DL-based language models:

Based on https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/

Techniques such as pruning, sparsity, compression, distillation and graph neural networks (GNNs) offer helpful advancements in efficiency but ultimately yield incremental improvements. A model size reduction of orders of magnitude without compromise in the quality of results will likely require a more fundamental change in the methods for capturing and representing information itself and the learning capabilities within a DL model. Continued progress will require “dramatically” more computationally efficient DL methods or moving to other machine learning methods. A promising class of AI systems that are using retrieval from auxiliary information repository to replace the embedding of very large sets of facts and data is quickly gaining traction. Using AI systems that integrates neural networks with added information injected per-need might alleviate some of the language model growth trends.

At the same time, statistical machine learning methods rely on the assumption that the distribution of training samples is representative of what must be handled during inference, creating major deficiencies in real-life uses. In particular, DL models are challenged when encountering situations sparsely sampled in the training dataset, or even absent from the training data. The effects of so-called black swan events — events that are unpredictable and carry a massive impact — can be especially detrimental when applying a pre-trained model for inference across domains.

Advancements in transfer learning and few-shot/zero-shot inference have provided results that are still far from satisfactory. Ineffective extensibility of models hinders scaling AI to the many domains that are not as rich in datasets and data scientists. The applicability of AI to a much broader set of business cases calls for a substantially new approach integrating information and knowledge in DL-based systems to handle the long tail distribution covering real-life cases. DL is also highly susceptible to variations in data and can produce implausible classifications, which could be addressed when improving robustness and extensibility.

Finally, for the most part, neural networks cannot properly provide cognition, reasoning and explainability. Deep learning lacks the cognitive mechanisms to address tasks fundamental to human intelligence, missing competencies such as abstraction, context, causality, explainability, and intelligible reasoning.

There is a strong push for AI to reach into the realm of human-like understanding. Leaning on the paradigm defined by Daniel Kahneman in his book, Thinking, Fast and Slow, Yoshua Bengio equates the capabilities of contemporary DL to what he characterizes as “System 1” — intuitive, fast, unconscious, habitual, and largely resolved. In contrast, he stipulates that the next challenge for AI systems lies in implementing the capabilities of “System 2” — slow, logical, sequential, conscious, and algorithmic, such as the capabilities needed in planning and reasoning. In a similar fashion, Francois Chollet describes an emergent new phase in the progression of AI capabilities based on broad generalization (“Flexible AI”), capable of adaptation to unknown unknowns within a broad domain. Both these characterizations align with DARPA’s Third Wave of AI, characterized by contextual adaptation, abstraction, reasoning, and explainability, with systems constructing contextual explanatory models for classes of real-world phenomena. These competencies cannot be addressed just by playing back past experiences. One possible path to achieve these competencies is through the integration of DL with symbolic reasoning and deep knowledge. I will use the term “Cognitive AI” to refer to this new phase of AI.

While not expected to reach the goals of open-ended artificial general intelligence (AGI), AI with higher cognitive capabilities will play a more involved role in technology and business, both through this set of shared cognitive competencies and by navigating through the shared values that are fundamental in the relationship between humans and machines. Once AI can make reliable decisions in unforeseen environments, it will eventually be trusted with higher autonomy and become significant in areas such as robotics, autonomous transportation, as well as in control points of logistics, industrial, and financial systems. Finally, an increased level of human-machine collaboration can be expected as AI represents an active and persistent agent that communicates and collaborates with people as it serves and learns from them.

There is a divide in the field of AI between those who believe categorically higher machine intelligence can be achieved by advancing DL further and those who see the need for incorporating additional fundamental mechanisms. I fall in the latter camp — let me explain why.

DL masters the underlying statistics-based mapping from an input through the multi-dimensional structures in the embedding space to a predicted output. This helps DL excel in the classification of broad and shallow data (for example, a sequence of words or pixels/voxels in an image). The input data carry limited positional information (such as the location of a pixel, voxel, or a character in relation to others) and limited structural depth. It is the task of the AI system to discover the features, structures, and relationships. DL is equally effective in indexing very large sources (such as Wikipedia) and retrieving answers from the best matching places in the corpus — as demonstrated in benchmarks such as NaturalQA or EffiicentQA. As defined by Bengio, System 1 tasks rely on a statistical mapping function created during training. For these tasks, DL delivers.

In contrast, knowledge that is structured, explicit, and intelligible could provide one path to higher machine intelligence or System 2 type capabilities. Structured knowledge can capture and represent the full richness associated with human intelligence, and therefore constitutes a key ingredient for higher intelligence. One essential knowledge construct is the ability to capture declarative knowledge about elements and concepts and encode abstract notions such as hierarchical property inheritance among classes. For example, knowledge about birds, with added particulars on passerine species, plus specifics on sparrows, provides a wealth of implied information about chestnut sparrows even when not specifically spelled out. Other knowledge constructs include causal and predictive models.

Such constructs rely on explicit concepts and well-identified, overtly defined relations rather than machine embeddings in the latent space, and the resulting models will have more extensive potential for explanations and predictions well beyond the capabilities of a statistical mapping function. By capturing an underlying model of relationships between factors and forces, current events can be worked backward for causality or forward for predicted outcomes. A model of the underlying dynamics can be richer with context, internal states, and simulated sequences that were never yet encountered.

The human brain demonstrates the ability to ‘imagine’, simulating and assessing potential futures never yet encountered by experience or observation. These capabilities provide an evolutionary advantage to human intelligence. In a complex world, individuals have to make choices involving scenarios that haven’t yet been experienced. Mental simulations of possible future episodes within environments not bounded by clear rules are based on an underlying model of world dynamics and provide great adaptive value in planning and problem-solving. The resulting ability for humans to adapt and make choices uses a different part of the brain not available to other mammals, which are tasked primarily with fast and automatic ready mapping functions.

Essential for higher cognition, procedural modeling mechanisms are based on covert mathematical, physical, or psychological principles beyond input-to-output observable statistical correlations. For example, a physics model can capture the phenomenon of hydroplaning and provide a concise predictor of the motion of a car under various conditions. Addressing hydroplaning through physical modeling instead of (or in addition to) leveraging statistical extrapolation from measured training data allows for effective handling of out-of-distribution circumstances and the long tail of real-life eventualities. On a higher level, having a model of special relativity that states “E=mc²” captures the genius of Albert Einstein in expressing the fundamental relationship between elements, rather than the statistical correlation function extracted from multiple tests. Such a procedural model can be combined with a DL-based approach to expand current capabilities.

Knowledge bases can capture (otherwise implicit) commonsense assumptions and the underlying logic not always overtly presented in the training data of DL systems. This implied an “obvious” understanding of the world and its dynamics is highly instrumental in addressing many tasks of higher machine intelligence. Finally, well-structured knowledge representation can address aspects of disambiguation (separating attributes of ‘club’ as a playing bat, weapon, card type, or place for parties), in contextualized and aggregated content.

In the coming years, major advances in DL-based System 1 type of systems can be expected, as underlying shallow mapping functions become significantly more elaborate and knowledgeable, and compute processing becomes cheaper and faster. Cognitive AI will bring an additional level of more sophisticated capabilities. We can already see promising early efforts to integrate structured knowledge with DL to build more generative and dynamic systems (for example, by formulating traditional tasks such as commonsense question-answering as inference over dynamically generated knowledge graphs).

Overall, the emerging focus on symbolic-based approaches founded on overt, structured knowledge leads me to believe that a new set of Cognitive AI competencies will emerge by 2025, unlocking the capabilities needed for systems that are not only more explainable, but also able to apply a level of autonomous reasoning closer to that of a human being than the current DL-based systems. The next level of machine intelligence will require reasoning over deep knowledge structures, including facts and deep structures of declarative (know-that), causal (know-why), conditional/contextual (know-when), relational (know-with), and other types of models. The capture and use of deep knowledge can address fundamental challenges of AI such as the difficulties presented by the explosion in DL model size and gaps in model robustness, extensibility, and scaling.

We have established Cognitive Computing Research at Intel Labs to drive Intel’s innovation at the intersection of machine intelligence and cognition and address these emerging cognitive AI competencies. Our efforts combine the latest in deep learning, with the integration of knowledge structures and neuro-symbolic AI, in order to build self-learning AI that can make informed decisions in complex context-rich situations.

Deep learning makes AI systems incredibly effective in recognition, perception, translation and recommendation system tasks. The nascent technologies for the next wave of machine learning and AI will create a new class of AI solutions with higher understanding and cognition. We look forward to building next-generation AI systems that will one day understand this blog post and other informative content — and deliver even greater benefits to our lives.

This article is based on insights presented on an earlier series published on LinkedIn:

  • Age of Knowledge Emerges:
  1. Part 1: Next, Machines Get Wiser
  2. Part 2: Efficiency, Extensibility and Cognition: Charting the Frontiers
  3. Part 3: Deep Knowledge as the Key to Higher Machine Intelligence

Gadi Singer is vice president at Intel Labs, director of Cognitive Computing Research.