From Aggregation to Activation: The Epistemological Break and Future Ethics of Library Metadata

In an era where digital information is expanding at an unprecedented speed, library science is facing a profound identity crisis. For a long time, the cornerstone of our profession has been the art of "aggregation"—integrating a myriad of complex knowledge resources into an orderly, retrievable universe through standardized metadata. However, Philip Schreur, in his recent discourse, inadvertently reveals a more disruptive possibility through an analysis of linked data and cutting-edge practices in artificial intelligence. This is not merely a technical upgrade, but a potential epistemological rupture: Are we moving from an era that pursues universal aggregation to a future dedicated to the activation of particularity? This article argues that a concept called "Data Disaggregation" is at the core of this rupture, compelling us to re-examine the power, practices, and ethical foundations of libraries.

The Power of Knowledge: Reflecting on the Genealogy of "Aggregation"#

The tradition of aggregation in libraries stems from the Enlightenment's belief in universal classification systems. From Dewey's Decimal Classification to the complex rules of Resource Description and Access (RDA), the driving force has always been to construct a "meta-narrative" that can accommodate and organize all human knowledge. This model simplifies resources by creating standardized "bibliographic surrogates"—such as book cards or MARC records—allowing them to be managed and discovered uniformly.

However, this seemingly neutral technical practice conceals profound power dynamics. As Foucault revealed, any knowledge organization system is a form of power that constructs reality through definition, classification, and naming, inevitably dividing the center from the periphery. Pioneers of Critical Librarianship have long pointed out that tools like standardized subject headings, while pursuing universality, often suppress, distort, or outright erase the experiences and voices of marginalized groups. In the grand project of aggregation, particularity is often seen as noise that needs to be "normalized."

"Data Disaggregation" is a direct challenge to this tradition. It advocates for releasing knowledge units from their bibliographic carriers, focusing on finer-grained "facts" and "voices," rather than merely the documents themselves. Stanford University's Black@Stanford project serves as a radical example. This project has achieved a thorough disaggregation of "voices" by training a chatbot specifically focused on a particular archival corpus. It no longer attempts to integrate this archive into a larger knowledge system dominated by mainstream narratives; instead, it chooses to let the archive "speak for itself" directly and independently. This marks a fundamental shift: from "cataloging for marginalized groups" to "allowing marginalized groups to speak."

Reconstructing Practice: From Rule Guardians to Context Architects#

This epistemological rupture will inevitably trigger a dramatic upheaval in library practices. If "activating particularity" becomes our new mission, then traditional role definitions and technical architectures will face reconstruction.

The role of metadata librarians may shift from "guardians of rules" to "architects of context." Their core value will no longer be the precise application of a complex set of cataloging rules, but rather engaging in a more creative and critical intellectual labor. They will need to examine resources and ask: What are the most important knowledge nodes within this resource? How do these nodes connect to external knowledge, revealing deep relationships obscured by traditional classifications?—as Stanford's other project "Understanding Systemic Racism" (KSR) does, constructing a knowledge graph that reveals systemic issues by linking data points from police manuals, legal texts, and news reports. The task of context architects is to design and build such knowledge networks, generating new insights by activating connections between data.

Correspondingly, the technical systems supporting library work also need to be reimagined. Existing Integrated Library Systems (ILS/LSP) are essentially designed for managing the aggregation of "bibliographic surrogates." Future systems are more likely to be incubators for knowledge graphs and artificial intelligence models, empowering librarians to extract entities from digital collections, define relationships, and rapidly deploy intelligent services capable of engaging deeply with specific knowledge domains.

Caution Ahead: Ethical Dilemmas of the New Paradigm#

However, embracing the paradigm of "Data Disaggregation" also means confronting new and more subtle ethical challenges.

First, there is the risk of "decontextualization." Extracting data from its original documents, while allowing for flexible reorganization and connection, may also detach it from necessary context, leading to misinterpretation or even malicious manipulation. When we stitch together data points from different sources into a knowledge graph, how do we ensure that these connections are responsible, evidence-based, and not a misleading "digital collage fallacy"? This requires us to establish a new ethical framework for knowledge graph construction that balances flexibility with contextual integrity.

Second, there is the illusion of "authenticity" and the rise of algorithmic mediation. AI chatbots create an immersive experience of direct dialogue with archives, but this is a "reality" mediated by algorithms. The object of user interaction is not the original archive itself, but the large language model's "understanding" and "rephrasing" of that archive. The inherent biases of algorithms and potential errors in models (such as "hallucinations") constitute a new, opaque power mediation. This raises a serious question: In the process of replacing traditional librarian mediation with algorithmic mediation, how can libraries ensure the transparency of algorithms and the traceability of information sources to maintain their core position as a cornerstone of social trust?

Finally, there is the potential to exacerbate a new "digital divide." Building detailed knowledge graphs and training specialized AI models requires significant technical and human investment. Does this mean that only resource-rich top institutions can "activate" their collections, while many small and medium-sized libraries will continue to slumber in the old paradigm, becoming even more marginalized in the knowledge network? Developing low-cost, scalable tools and methods to achieve the democratization of the new paradigm will be key to determining whether this transformation can truly promote information equity.

In conclusion, "Data Disaggregation" is not only a technical shift but also a mirror reflecting the deep philosophical assumptions and social responsibilities of the library science profession. It compels us to bid farewell to the old dream of trying to contain everything within a single framework and instead embark on a new journey that is more complex, challenging, and full of possibilities. On this path, our task is no longer to perfect a closed system, but to cautiously and consciously explore and establish the ethics and practices of knowledge organization that belong to the future within an open knowledge network.

Reading Material#

Meaningful and Inclusive Access to Information: the Challenges Brought by the Brisbane Declaration to Standardized Metadata in the Context of Linked Data and AI