Corresponding author: Victor Telnov ( telnov@bk.ru ) Academic editor: Georgy Tikhomirov
© 2019 Victor Telnov, Yuri Korovin.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Telnov V, Korovin Y (2019) Semantic web and knowledge graphs as an educational technology of personnel training for nuclear power engineering. Nuclear Energy and Technology 5(3): 273-280. https://doi.org/10.3897/nucet.5.39226
|
The technologies of knowledge representation and inference in an artificial intelligence system focused on the domain of nuclear physics and nuclear power engineering are considered. The possibilities of description logics and graph databases of nuclear knowledge for the generation of cognitive hypotheses, using in addition to deduction and other ways of reasoning, such as inductive inference and reasoning based on analogies, are discussed. The use of adequate description logic and measures of semantic similarity is substantiated. Interactive visual navigation and reasoning on the knowledge graphs are performed by means of special retrieval widgets and the smart RDF browser. Operations with semantic repositories are implemented on cloud platforms using SPARQL queries and RESTful services. The proposed software solutions are based on cloud computing using DBaaS and PaaS service models to ensure scalability of data warehouses and network services. Example of use of the offered technologies and software has been given.
Nuclear education, semantic web, knowledge graph, cloud computing
Since the 1960s, in the framework of research on artificial intelligence, various formalisms for knowledge representation (semantic networks, frame systems, etc.) have been developed (
The reports of International Conference on Semantic Systems, International Workshops on Description Logic noted the growing interest of giants of the IT industry (Google, Facebook, Wikimedia) to graph models of knowledge representation and description logics. As of 2019 educational web–portals of universities, national centers for the exchange of scientific information, world nuclear data centers underused semantic web technologies. As for the inductive inference rules in graphs, the following considerations make them useful. First, inductive inference rules based on consideration of possible alternatives (precedents) allow to generate cognitive hypotheses (fuzzy knowledge) that cannot be obtained directly by deductive reasoning on the graph. Secondly, inductive inference is one of the basic technologies of semantic annotation of network content, when it is necessary to redesign, expand and update existing graphs with new knowledge. With the help of inductive inference the problems of classification and clustering of new concepts and individuals in the semantic base of nuclear knowledge are solved.
The aim of the work presented in the paper is to create a semantic web portal of knowledge in the domain of nuclear physics and nuclear power engineering based on ontology and using graph databases deployed on cloud platforms. The task of the study was to create the following graphs of nuclear knowledge:
The potential beneficiaries of information solutions and technologies that are proposed in the paper are students, teachers, experts, engineers, managers and specialists in the domain of nuclear physics and nuclear power engineering (target audience).
The choice of adequate description logic (DL) for project (
The description logic with signature SROIQ is an extension of the earlier description logic SHOIN by all expressive means that were suggested by ontology developers, and which do not affect its decidability and practicability. Among others, complex role inclusion axioms of the form R ◦ S ⊑ R or S ◦ R ⊑ R to express propagation of one property along another one have been added, which have proven useful for many domains. Furthermore, SHOIN has been expanded with reflexive, antisymmetric, and irrelexive roles, disjoint roles, a universal role. Named individuals occur naturally in ontologies as names for specific things, persons, institutes, etc. Nominals can be viewed as an artificial supplement to ABox, which provides additional expressive power to DL.
In project (
Specific values (temporal parameters, spatial characteristics, etc.) cannot be represented in the concept description language directly, because when moving from one interpretation to another, these specific elements and the relationships between them may change, while it is required that they remain unchanged. The solution is to select a separate “concrete” domain (
The corresponding expansion of DL is commonly referred to as SROIQ(D). Summary of the syntax and semantics of the description logic used in the project (
Summary of the syntax and semantics of the used SROIQ(D) description logic.
Nuclear knowledge sometimes involves various degrees of uncertainty. For such a reason, in the semantic web context, difficulties arise when modeling real–world domains using only classical logical formalisms. Alternative approaches often suggest probabilistic knowledge, while this is hardly always appropriate and justifiable (
Today standard ontology markup languages are supported by mature semantics of DL along with a number of available reasoning algorithms (
Generally, the problem of the induction of structural knowledge turns out to be a uneasy task in first–order logic or equivalent representations. In order to overcome the existing difficulties, the last decades have seen the development of research related to the calculation of similarity measures for concepts and individuals in ontologies. Similarity measure plays an important role in information retrieval and information integration as a means for comparing concepts and/or concept instances that can be retrieved or integrated across heterogeneous knowledge databases. It seems that the quite significant from a practical point of view results were obtained by
Naive semantic similarity can be defined as a path distance between entities in the hierarchical structure of the ontology. More meaningful methods to assess semantic similarity within a single ontology are feature matching and information content. There are measures have been developed to compute similarity values among classes belonging to different ontologies. For instance, a similarity function can detect similar entity classes by using a matching process, making use of special dictionaries, semantic neighborhood, and discriminating features. Of particular interest is the approach proposed by
Let there be a knowledge database KB = 〈T, A〉, contains two components: a TBox T and an ABox A. Let C and D be two concept descriptions in a T. Given a concept C in T, it is possible to consider its extension CI, where I is the interpretation function. Further the canonical interpretation of the ABox is considered, when constants in the ABox are interpreted as themselves and different names for individuals stand for different domain objects. The semantic similarity measure is defined as in the following (
Definition 1 (Semantic Similarity Measure). Let L be the set of all concepts in DL and let A be an ABox with canonical interpretation I. The Semantic Similarity Measure s is a function
s : L × L → [0, 1]
which is defined as follows:
where X = C ⊓ D and (·)I computes the concept extension w.r.t. the interpretation I.
The measure can be explained as follows. In case of semantic equivalence of the concepts C and D, the maximum value of the similarity will be calculated. In case of disjunction, the minimum value of similarity will be assigned because the two concepts are totally different: their extensions do not overlap. Finally, in the case of overlapping concepts, a value in the range] 0, 1 [will be computed (
Definition 2 (Most Specific Concept). Let there be a knowledge database KB = 〈T, A〉. Given an ABox A and an individual a, the Most Specific Concept of a w.r.t. A is the concept C, denoted MSCA (a), such that A |= C (a) and ∀D such that A |= D (a), it holds: C ⊑ D. Here |= stands for the standard semantic deduction.
Once the most specific MSCA (a) of an individual a is known, to decide if KB |= D (a) holds for an arbitrary concept D, it suffices to test if T |= MSCA (a) ⊑ D. This method, unfortunately, loses its simplicity and efficiency when applied to large and complex ontologies, as it tends to generate very large MSCs that could lead to intractable reasoning. Revised MSC method for DL, allowing it to generate much simpler and smaller concepts that are specific enough to answer a given query, has been proposed by
Let c and d two individuals in a given ABox. Then it is possible to calculate C = MSCA (c) and D = MSCA (d). According to
∀c,d : s (c, d) = s (C, D) = s (MSCA (c), MSCA (d))
The similarity value between a concept C and an individual a can be computed by determining the MSC of the individual and then applying the similarity measure:
∀a : s (a, C) = s (MSCA (a), C)
The complexity of s calculation depends on the complexity of the instance checking task for the adopted DL language, denote it as C (InstanceChecking). Similarity between concepts: s is a numerical measure, all calculus have constant complexity, instance checking is repeated three times: for concepts C, D and their intersection, so:
C (s) = 3 ⋅ C (InstanceChecking)
Similarity between an individual and a concept: in this case, besides of the instance checking operations required by the previous case, the MSC of the considered individual is to be computed. Thus, denoted by C (MSC) the complexity of the MSC computing, get the complexity estimate:
C (s) = C (MSC) + 3 ⋅ C (InstanceChecking)
Similarity between individuals: this case is analogous to the previous one, the only difference is that now two MSC is to be computed for the arguments. So the complexity in this case is:
C (s) = 2 ⋅ C (MSC) + 3 ⋅ C (InstanceChecking)
From the previous formulas it is clear that the computational complexity of the similarity measure sensitive to the choice of the DL. For the ALC logic, C (InstanceChecking) has polynomial complexity. Computation of the MSC also implies instance checking and depends on algorithm properties.
From a practical point of view, knowledge graphs are placed in the data warehouse, which are called RDF–repositories or triple repositories. The project (
Each of the knowledge graphs contains thousands of triplets. Search widgets, shown in Figure
The RDF browser is an essential attribute of the project (
The visual way of specifying the inference rules on the graph makes it stand out from the more traditional known reasoner’s interfaces, where inference rules are specified using SWRL language, logical predicates or a SPARQL–like syntax. It seems, that the intuitively clear interactive visual way of specifying inference rules is more friendly for unsophisticated users of knowledge graphs.
The knowledge graphs, presented in the project (
As an example, сonsider the following situation. Some student is preparing to pass the exam in nuclear physics at the Physics Faculty of Moscow State University. Let the student know only the name of the training course: “Physics of the atomic nucleus and particles” and the name of the professor: “I.M. Kapitonov”. Let us formulate the task: using the semantic educational web portal (
Also suppose, that the student discovered in YouTube a video lecture titled “Lecture 1. Physics of the atomic nucleus and particles”. He suggests, that this video lecture may be relevant to the training course being studied. Let us formulate the hypothesis: “Lecture 1. Physics of the atomic nucleus and particles” is taught by the professor “I.M. Kapitonov” at the Physics Department of Moscow State University and it is included in the training course “Physics of the Atomic Nucleus and Particles”.
To solve the task and test the validity of the hypothesis, it is necessary to perform the following obvious inductive reasoning on the knowledge graph step by step.
Step 1. Go to the educational web portal (
Step 2. Let the student decided to begin the reasoning from the class “Training course”. The RDF browser workspace opens and the node of the graph with this name appears.
Step 3. The appeared node of the graph in Figure
Step 4. Continuing to similarly disclose neighboring nodes for the object “Physical Faculty of Moscow State University” (node number 6 in Figure
Step 5. The result obtained in Step 4 could also be achieved in the course of deductive reasoning, without considering possible alternatives. However, the use of inductive inference allows one to naturally extract from the graph additional knowledge that will not be easy to obtain with a simple deductive inference. Acting as described above, it is easy to find that some video lectures on the training course “Physics of the atomic nucleus and particles” at the Physics Faculty of Moscow State University are also taught by professor B.S. Ishkhanov, see node 1 in Figure
As was shown in the above example, the process of inductive inference on knowledge graphs resembles a computer adventure game, does not require special skills, and is accessible to the inexperienced user. Knowledge graphs, similar to the above, are used in the educational process at the NRNU MEPhI. Practice shows that university students master the techniques of interactive work with knowledge graphs within a few minutes.
The metrics of the computational processes presented in Table
Some results of testing the performance of the knowledge graphs.
Test script | Test data set | Computing process metrics |
---|---|---|
Launching the main page of the web portal (Search widgets) | Knowledge graph “World nuclear data centers” | Number of network requests: 162. Web page loading time: 1140ms. Repository loading time: 850ms. Server timeout: 150ms. |
Discovery and visualization of the knowledge graph (RDF Browser) | Knowledge graph “World nuclear data centers», object «Center for Photonuclear Experimental Data” | Number of network requests: 39. Web page loading time: 893ms. Repository loading time: 402ms. Server timeout: 44ms. |
Discovery and visualization of the knowledge graph (RDF Browser) | Knowledge graph “Nuclear physics at MSU and MEPhI”, object “Physics of the atomic nucleus and particles” | Number of network requests: 57. Web page loading time: 1050ms. Repository loading time: 643ms. Server timeout: 17ms. |
Discovery and visualization of the DBpedia knowledge base (RDF Browser) | DBpedia knowledge base, object “World War II” | Number of network requests: 37. Web page loading time: 2060ms. Repository loading time: 753ms. Server timeout: 15ms. |
The software architecture is presented in (
The consumption of the computing resource is detailed in the debugging panels of browsers (Google Chrome, Mozilla Firefox), the corresponding parameters can be observed live when working with the semantic web portal (
University of Manchester, Stanford University, University of Bari and a number of other universities are focused on the issues of theory development and technology’s implementation for semantic web, description logics and incarnations of the ontologies description language OWL. Special mention should be made on the project (
As for the issues of visualization linked data (
In contrast to the above solutions, the project (
The reported study was funded by Russian Foundation for Basic Research according to the research projects №18-07-00583, №19-47-400002, by Government of the Kaluga Region according to the research project №19-47-400002 and was funded by Vladimir Potanin Foundation according to the project №GC190001383.