For Multi-source and Heterogeneous Knowledge Representation
Our goal is to provide a unified programming framework for KGE tasks and a series of knowledge representations for downstream tasks.
Key Features of CogKGE
We contribute an open source toolkit that can build a bridge between KGE models and multi-source heterogeneous data by plug-and-play knowledge adapters
Multi-source and heterogeneous knowledge representation
CogKGE explores the unified representation of knowledge from diverse sources. Moreover, Our toolkit not only contains the triple fact-based embedding models, but also supports the fusion representation of additional information, including text descriptions, node types and temporal information.
Comprehensive models and benchmark datasets
CogKGE implements lots of classic KGE models in the four categories of translation distance models, semantic matching models, graph neural network-based models and transformer-based models. Besides out-of-the-box models, we release two large benchmark datasets for further evaluating KGE methods, called EventKG240K and CogNet360K.
Extensible and modularized framework
CogKGE provides a programming framework for KGE tasks. Based on the extensible architecture, CogKGE can meet the requirements of module extension and secondary development, and pre-trained knowledge embeddings can be directly applied to downstream tasks.
Open source and visualization demo
Besides the toolkit, we also release an online CogKGE demo to discover knowledge visually. Source code, datasets and pre-trained embeddings are publicly available at GitHub.
BaseModel class is the base class of all models in CogKGE
Translation Distance Models
The translation distance models use distance-based measures to compute the similarity score for a pair of entities and their relationships. In CogKGE, we implement several translational distance models, including TransE, TransH, TransR, TransD, TransA, BoxE and PairRE.
Semantic Matching Models
The semantic matching models use similarity-based score function of translation distance models. They measure plausibility of facts by matching latent semantics of entities and relations embodied in their vector space representations. RESCAL, SimpleIE, RotatE and TuckER have been built into CogKGE.
Graph Neural Network-based Models
Graph neural network (GNN) has recently been shown to be quite successful in modeling graph-structured data. Considering that KG itself happens to be a kind of graph-structured data, GNN can integrate the topological structure and node feature, then provides a more refined vector representation. We implement R-GCN and CompGCN to represent the multi-relational data.
Transformer has been widely used in pre-trained language model and machine translation fields, its deep network architecture can learn contextual representations of entities and relations in a KG jointly by aggregating information from graph neighborhoods. Besides, transformer-based models can also utilize the text descriptions in KGs, encoding the texts and facts into a unified semantic space. CogKGE supports KEPLER, HittER.
Knowledge module mainly integrates three kinds of knowledge representation, namely world, commonsense and linguistic knowledge
World KGs such as Freebase, DBpedia and Wikidata mainly focus on explicit world knowledge. In CogKGE, we implement entity-centric knowledge representation based on Wikidata and event-centric knowledge representation based on EventKG. World knowledge representations have been widely used in knowledge-enhanced pretrained language model, entity disambiguation and event extraction.
Commonsense knowledge tries to capture implicit general facts and regular patterns in our daily life.Compared with world KG, nodes in commonsense KG are semantically-rich natural language phrases rather than entities, e.g., (Rocket, is used for , Fly to the moon). CogKGE supports the commonsense knowledge representation of ConceptNet, which can be helpful to improve commonsense completion and commonsense reasoning.
Linguistic knowledge includes considerable information about lexical, conceptual and predicate argument semantics. For example, rocket has hyponymy relation to skyrocket in WordNet, while rocket can evoke a Change position on a scale frame in FrameNet. In CogKGE, the knowledge representation of Framenet can be provided for downstream tasks, such as word sense disambiguation and machine reading comprehension.