.png)
Various industries accumulate text documents in DOCX, PDF, TXT, and other formats, describing incidents, the organization's norms, routines, and procedures. For example, the healthcare sector produces an enormous number of medical reports and assessments daily for every patient. Similarly, the cybersecurity industry frequently generates incident reports (for example, due to a data breach analysis) and playbooks (descriptions of organization-specific procedures for handling security events and incidents). These documents, sometimes available in a more structured format, such as JSON or XML, are used to guide human and computer actions, such as incident response.
The problem is that the utility of these documents decreases as their amount grows, since their shallow structure does not offer an easy way to access and manipulate the knowledge inside the texts. In other words, organizations have a lot of unstructured data, but they crave knowledge, which is the basis for human and machine decision-making. Can we turn a mountain of PDF files into usable knowledge? Yes, we can!

Figure 1. A neuro-semantic approach to turn raw data into well-founded knowledge graphs
By leveraging the power of Applied Ontology, Semantic Web Technologies, and Machine Learning techniques, including Large Language Models (LLMs), we can transform raw documents into Linked Data, as illustrated in Figure 1. This approach should not be mistaken for constructing a simple graph database (whether it be a labeled-property graph model or an RDF model). In practical terms, we first study the domain through the lens of Applied Ontology tools, such as foundational ontologies (for example, the Unified Foundational Ontology) and reference domain ontologies (for example, theories of value, risk, and security). This gives us conceptual clarification and the general structure of the respective domain of interest, represented in a conceptual model that is independent of use cases. Now, we can implement a database of our choice, following the conceptual model.
At Y.digital, we bet on the RDF model to do this job, as it allows us to achieve the necessary level of precision and detail (data and metadata combined). Finally, we can populate the implemented ontology through Natural Language Processing techniques, including LLMs, by extracting entities from the raw text and adding them to our ontology. The result is a well-founded knowledge graph, which can be queried via SPARQL, facilitating the automation of multiple tasks.
Contact us!
Our team at Y.digital has a unique expertise in combining Applied Ontology tools, Semantic Web technologies, and Large Language Models in real-world projects in the public and private sectors. If you want to know more, feel free to contact us:
- Ítalo Oliveira, Senior Information Analyst - italo@y.digital
- Jan Voskuil, Managing Director, Ontologist and Linked Data Advisor - jan@y.digital
References
- Guizzardi, G., Botti Benevides, A., Fonseca, C.M., Porello, D., Almeida, J.P.A. and Prince Sales, T., 2022. UFO: Unified foundational ontology. Applied ontology, 17(1), pp.167-210. https://doi.org/10.3233/AO-210256
- Oliveira, Í., Sales, T.P., Baratella, R., Fumagalli, M. and Guizzardi, G., 2022, October. An ontology of security from a risk treatment perspective. In International Conference on Conceptual Modeling (pp. 365-379). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-17995-2_26
- Sales, T.P., Baião, F., Guizzardi, G., Almeida, J.P.A., Guarino, N. and Mylopoulos, J., 2018, September. The common ontology of value and risk. In International Conference on Conceptual Modeling (pp. 121-135). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-00847-5_11
- Schreiber, G., Raimond, Y. et al., 2014. RDF 1.1 Primer. W3C Working Group Note, 24 June 2014. Available at: https://www.w3.org/TR/rdf11-primer/
.jpg)