Blog posts - Y.digital

Ítalo Oliveira

14/11/2025

•

5 min read

Various industries accumulate text documents in DOCX, PDF, TXT, and other formats, describing incidents, the organization's norms, routines, and procedures. For example, the healthcare sector produces an enormous number of medical reports and assessments daily for every patient. Similarly, the cybersecurity industry frequently generates incident reports (for example, due to a data breach analysis) and playbooks (descriptions of organization-specific procedures for handling security events and incidents). These documents, sometimes available in a more structured format, such as JSON or XML, are used to guide human and computer actions, such as incident response.

‍

The problem is that the utility of these documents decreases as their amount grows, since their shallow structure does not offer an easy way to access and manipulate the knowledge inside the texts. In other words, organizations have a lot of unstructured data, but they crave knowledge, which is the basis for human and machine decision-making. Can we turn a mountain of PDF files into usable knowledge? Yes, we can!

‍

A diagram of a processAI-generated content may be incorrect.

Figure 1. A neuro-semantic approach to turn raw data into well-founded knowledge graphs

‍

By leveraging the power of Applied Ontology, Semantic Web Technologies, and Machine Learning techniques, including Large Language Models (LLMs), we can transform raw documents into Linked Data, as illustrated in Figure 1. This approach should not be mistaken for constructing a simple graph database (whether it be a labeled-property graph model or an RDF model). In practical terms, we first study the domain through the lens of Applied Ontology tools, such as foundational ontologies (for example, the Unified Foundational Ontology) and reference domain ontologies (for example, theories of value, risk, and security). This gives us conceptual clarification and the general structure of the respective domain of interest, represented in a conceptual model that is independent of use cases. Now, we can implement a database of our choice, following the conceptual model.

‍

At Y.digital, we bet on the RDF model to do this job, as it allows us to achieve the necessary level of precision and detail (data and metadata combined). Finally, we can populate the implemented ontology through Natural Language Processing techniques, including LLMs, by extracting entities from the raw text and adding them to our ontology. The result is a well-founded knowledge graph, which can be queried via SPARQL, facilitating the automation of multiple tasks.

‍

Contact us!

Our team at Y.digital has a unique expertise in combining Applied Ontology tools, Semantic Web technologies, and Large Language Models in real-world projects in the public and private sectors. If you want to know more, feel free to contact us:

Ítalo Oliveira, Senior Information Analyst - italo@y.digital

Jan Voskuil, Managing Director, Ontologist and Linked Data Advisor - jan@y.digital

‍

References

Guizzardi, G., Botti Benevides, A., Fonseca, C.M., Porello, D., Almeida, J.P.A. and Prince Sales, T., 2022. UFO: Unified foundational ontology. Applied ontology, 17(1), pp.167-210. https://doi.org/10.3233/AO-210256

Oliveira, Í., Sales, T.P., Baratella, R., Fumagalli, M. and Guizzardi, G., 2022, October. An ontology of security from a risk treatment perspective. In International Conference on Conceptual Modeling (pp. 365-379). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-17995-2_26

Sales, T.P., Baião, F., Guizzardi, G., Almeida, J.P.A., Guarino, N. and Mylopoulos, J., 2018, September. The common ontology of value and risk. In International Conference on Conceptual Modeling (pp. 121-135). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-00847-5_11

Schreiber, G., Raimond, Y. et al., 2014. RDF 1.1 Primer. W3C Working Group Note, 24 June 2014. Available at: https://www.w3.org/TR/rdf11-primer/

‍

Ítalo Oliveira

November 14, 2025

•

5 min read

By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Preferences Deny Accept

Linked Data for Knowledge Infrastructure and Automation

Figure 1. A neuro-semantic approach to turn raw data into well-founded knowledge graphs