Making Council Session Minutes Searchable by Topic at the Historical Archives of the European Union
The Historical Archives of the European Union worked with Archipanion to extract agenda text from large, multilingual Council session minutes and turn it into searchable metadata for the archive’s existing catalogue system.

Summary
The Historical Archives of the European Union preserve and provide access to the archives of EU institutions, with many records available through an online catalogue. The archive supports research into the history of European cooperation and policymaking over time.
One part of the archive includes the minutes of sessions of the Council of the European Communities (the predecessor of today’s Council of the European Union), where government ministers met to negotiate and adopt policy.
These records are valuable primary sources for researchers studying how policy was discussed and agreed, but they also present a practical access problem. The minutes are available only as scanned PDFs, which means researchers can search by date and session title, but not by the topics or agenda items under discussion, because that information is not captured as searchable text.
To solve this, the Historical Archives of the European Union used Archipanion’s AI metadata extraction service to identify and extract agenda text from these large, multilingual Council session minutes. The extracted content was then returned to the archive’s existing catalogue as structured data, making topic-based search possible.
Project outcomes
-
The initial pilot covered roughly 70 Council session minutes from three representative years.
-
Following the successful pilot, Archipanion’s AI metadata extraction service was applied to around 430 additional Council session minutes. T
-
he extracted agenda text has now been added to the archive’s catalogue system and is publicly available through that interface, making Council session minutes searchable by topic as well as by date and session title.
The challenge
Council session minutes can run to many dozens or even hundreds of pages, in part because the same session material is presented in multiple languages within a single document. While the records themselves were already available in the archive, that did not make them easily searchable by topic. Researchers could retrieve specific records if they already knew which dates to look for, but topic search remained limited when the catalogue entry contained only minimal descriptive fields, such as the date and a generic session title.
For researchers, the agenda is often the most useful guide to what a meeting was about. It shows what was due to be discussed and provides a concise guide to the wider record, generally matching what was later recorded in the minutes aside from occasional minor additions.
For the archive, therefore, extracting the agenda offered the clearest and most practical way to improve access. The practical goal was not full transcription of every page, but reliable extraction of the agenda section so that Council sessions could be discovered by topic in the existing catalogue system.
A second challenge was structural. Within these large scanned files, the agenda does not appear in a fixed, predictable location. It could be buried deep within a document, so the task was not only text extraction, but also agenda detection: identifying the correct agenda section consistently across many long files.

The pilot phase
The collaboration began with a pilot phase designed to test two things: whether AI metadata extraction could accurately extract agenda text, and whether it could reliably identify the correct agenda section within Council session minutes.
To assess extraction accuracy, the Historical Archives of the European Union provided a set of previously manually transcribed agendas for the year 1992, covering around 30 sessions.
This gave Archipanion a direct reference point for comparing the AI-extracted output with agenda text that had already been transcribed by hand.
The extracted agenda text from 1992 proved highly accurate in practice. Because the minutes had been typed and scanned rather than handwritten, the extracted output was at least as good as manual transcription and, in some cases, better - particularly where manual transcription struggled with language-specific details.
The wider pilot covered roughly 70 Council session minutes across three representative years and showed that the workflow could also reliably identify the correct agenda section within very long multilingual files. Even when the agenda did not appear in a fixed or predictable place, the AI system was able to locate the relevant section consistently for extraction. The success of the pilot led directly to a larger follow-on phase of work covering around 430 additional Council session minutes.
Deliverables
Across the pilot and follow-on work, Archipanion extracted agenda text from around 500 long, multilingual Council session minutes, some running to hundreds of pages. The delivered output was a spreadsheet designed for straightforward import into the archive’s existing catalogue search fields, so that agenda keywords could become searchable within the archive’s catalogue system. The format was intentionally simple and system-friendly: one row per document, with fields for the filename, extracted title, extracted date(s), and an abstract field containing the agenda text.
The extracted agenda text has now been integrated into the archive’s catalogue search system and is publicly available through that interface, meaning researchers can find relevant sessions by topic rather than relying on date-based searching alone.

Image: Extracted agenda text displayed in the Historical Archives of the European Union catalogue, where it supports topic-based search within Council session minutes.
What this shows for archives and libraries
This case study is a strong example of AI metadata extraction solving a real archival access problem in a targeted way. Large, complex digital files often contain a small amount of high-value descriptive content – such as agendas, tables of contents, indexes, or summaries – that can significantly improve searchability if extracted and fed back into catalogue fields.
In this case, extracting agenda text from very long Council session minutes created the possibility of topic-level access to material that previously required page-by-page reading or date-only searching. While a limited subset of agendas had already been transcribed manually, AI metadata extraction made it possible to extend topic-level access much more quickly across a far larger body of material.
In summary, this case shows that even a relatively small section of information within a collection – in this case the agendas – can be leveraged to open up an entire set of records. Archives and libraries with similar collections can apply the same approach to significantly increase discoverability within a collection.
To explore what this could look like for your own records, Archipanion welcomes the opportunity to look at your collections and assess their suitability for metadata extraction.