November 27, 2022

Knowledge Graph Embedding and Applications

  1. Background
    1. 1. Knowledge graph

Knowledge graph is about entities and their relationships. It is a standard format for representing knowledge about the relationships between entities in multi-relational data. Technically, a knowledge graph is just a collection of triples (h, t, r), representing relation r from head entity h to tail entity t. In a graph theoretic view, it is a labeled directed multigraph, with an edge labeled r connected from node h to node t.

This is a simple but very versatile format. It can represent many types of data, such as information networks, bibliographic data, and user—item transaction.

Figure 1: Example of a bibliographic knowledge graph.

  1. 2. Contextual link prediction in knowledge graph

The contextual link prediction task aims to predict the link between entities given a relation as the context. This is a self-supervised representation learning task. This task originates from knowledge graph completion, but several other tasks can be casted as and solved by contextual link prediction, such as recommendation, entity alignment, information retrieval, data browsing, question answering.

Figure 2: Example of link prediction.

  1. 3. Knowledge graph embedding

Knowledge graph embedding (KGE) is a representation learning approach to solve link prediction. It aims to represent entities and relations as embeddings (vectors) h, t, r and compute the matching score between the embeddings S(h, t, r). 

There are many knowledge graph embedding models, notably:

  • Neural-network-based models: ERMLP [KDD’14], ConvE [AAAI’18], InteractE [AAAI’20], CompGCN [ICLR’20]
  • Translation-based models: TransE [NeurIPS’13], TorusE [AAAI’18]
  • Tensor-decomposition-based models: RESCAL [ICML’11], ComplEx [ICML’16], CPh [ICML’18], SimplE [NeurIPS’18], RotatE [ICLR’19], QuatE [NeurIPS’19], MEI [ECAI’20], MEIM [IJCAI’22]

Tensor-decomposition-based models are faster than neural-network-based and more accurate than translation-based models. They use some forms of tensor representation formats to compute the matching scores.

Figure 3: Overview architecture of a knowledge graph embedding model, with three main components: embedding lookup, interaction mechanism that computes the matching score, prediction that uses the matching score to predict the link existence.

  1. 4. Efficient and expressive knowledge graph embedding

When data is large, the model size usually needs to increase to fit the data. However, real-world knowledge graphs are very large with billions of entities, so even the largest practical model sizes are relatively small compared to the data sizes. 

The relationship between the computational cost of a large model and the benefit in its accuracy is called the efficiency–expressiveness trade-off. On large real world knowledge graphs, a good trade-off between efficiency and expressiveness is crucial. Previous tensor-decomposition-based models usually provide a good trade-off by manually designing special interaction mechanisms with sparse and expressive interaction patterns. However, these interaction mechanisms are specially designed and fixed, potentially causing them to be suboptimal or difficult to extend.

  1. Multi-partition Embedding Interaction and Beyond

The Multi-partition Embedding Interaction (MEI) model with block term tensor format generalizes previous tensor-decomposition-based models, provides a general solution for the efficiency–expressiveness trade-off, and achieves state-of-the-art results on the link prediction task. 

MEI divides the embedding vector into multiple partitions for efficient sparsity, automatically learns the local interaction patterns on each partition for expressiveness, then combines the local scores to get the full interaction score. The trade-off between efficiency and expressiveness can be systematically controlled by changing the partition size and learning the interaction patterns through the core tensor of the block term tensor format. For more information, please see our ECAI 2020 paper. The source code and results are available at https://github.com/tranhungnghiep/MEI-KGE.

Figure 4: Architecture of MEI with block term format in three different views for the local interaction: Tucker format, parameterized bilinear format, and neural network format.

The Multi-partition Embedding Interaction iMproved (MEIM) model goes beyond block term tensor format to introduce independent core tensor for ensemble boosting effects and soft orthogonality for max-rank relational mapping, in addition to multi-partition embedding. MEIM improves expressiveness while still being highly efficient, helping it to outperform strong baselines and achieve state-of-the-art results on difficult link prediction benchmarks using fairly small model sizes. For more information, please see our IJCAI 2022 paper. The source code and results are available at https://github.com/tranhungnghiep/MEIM-KGE.

Figure 5: Architecture of MEIM with independent core tensors and max-rank mapping matrices in three different views: Tucker format, parameterized bilinear format, and neural network format.

  1. Open research topics

There are two main promising research approaches:

  • Analysis and development of new knowledge graph embedding (KGE) models. It is interesting to analyze the performance of previous KGE models in new settings. For developing new KGE models, it is promising to continue the approach of MEI and MEIM that combines tensor decomposition and deep learning techniques. 
  • Application of KGE in other tasks by converting them to contextual link prediction. Many real-world tasks can be converted into contextual link prediction by defining the appropriate “entity” and “relation”, such as the recommendation and entity alignment tasks. It is very interesting to directly apply KGE models such as MEI and MEIM and see the performance on these tasks.

References:

  1. Hung Nghiep Tran and Atsuhiro Takasu. MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2022.
  2. Hung Nghiep Tran and Atsuhiro Takasu. Multi-Partition Embedding Interaction with Block Term Format for Knowledge Graph Completion. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 2020.
  3. Hung-Nghiep Tran and Atsuhiro Takasu. Analyzing Knowledge Graph Embedding Methods from a Multi-Embedding Interaction Perspective. In Proceedings of DSI4 at EDBT/ICDT, 2019.
  4. Hung-Nghiep Tran and Atsuhiro Takasu. Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space. In Proceedings of International Conference on Theory and Practice of Digital Libraries (TPDL), 2019.