Library Metadata Creation Scale (LMCS)

In an era where artificial intelligence (AI) profoundly reshapes the ways knowledge is produced and organized, the library sector faces an unprecedented transformation. Various AI tools are infiltrating every aspect of metadata creation at an unprecedented speed; however, there is a general lack of a unified, systematic framework in the industry to evaluate, deploy, and manage the complex patterns of human-machine collaboration. This theoretical gap leads to explorations in practice that are often fragmented and lack strategic guidance. To address this challenge, this paper proposes a novel conceptual model called the "Library Metadata Creation Scale" (LMCS), aimed at providing a clear analytical coordinate for this transformation.

I. Proposal of the LMCS Framework#

The rise of generative AI has brought a paradigm-level impact to the field of library metadata. It signifies the possibility of unprecedented efficiency improvements and service innovations, while simultaneously triggering profound concerns in the industry regarding metadata quality, academic integrity, and professional role positioning. In practice, discussions around AI tools often quickly fall into a polarized dilemma: either adhering to tradition and rejecting any AI intervention in a "fully manual" mode, or embracing technology and pursuing extreme efficiency in a "fully automated" vision. This binary opposition not only fails to address real problems but also exacerbates practitioners' anxiety and decision-making confusion.

The "Library Metadata Creation Scale" (LMCS) was conceived and proposed against this backdrop. Its core purpose is to transcend the simplistic dichotomy of "allow or prohibit" and provide the library sector with a more nuanced and operationally viable structured framework. This framework aims to offer a common language for managers, catalogers, and technical developers to clearly define and communicate the boundaries, patterns, and responsibilities of human-machine collaboration in different scenarios. Its theoretical construction is primarily based on the following considerations:

Responding to the "binary dilemma" in practice and advocating for refined governance: The birth of LMCS is, first and foremost, a direct response to the simplistic binary discourse prevalent in current industry discussions. It recognizes that viewing AI as a single, homogeneous concept is erroneous. On the contrary, the application of AI in metadata creation is a continuous spectrum. LMCS attempts to deconstruct this spectrum into five clear, manageable levels, evolving from an initial "traffic light" model (such as "prohibit AI," "partially allow," "fully allow") to a more instructive grading system. This allows libraries to make differentiated strategic choices based on resource types, importance, and processing goals, rather than a one-size-fits-all approach.
Drawing on historical experiences of technological integration to provide forward-looking pathways: Throughout history, every disruptive technology—from photocopiers and computers to the internet—was initially seen as a threat to traditional skills and workflows when adopted by libraries. However, these technologies ultimately found paths to integrate with professional practices and became indispensable infrastructure. LMCS draws on this historical perspective, suggesting that rather than passively resisting or hastily accepting, it is better to proactively design a gradual, clearly defined integration pathway. It provides a predictable developmental ladder for AI tools to evolve from auxiliary "consultants" (Level 2) to deep "partners" (Levels 4-5).
Reconciling the inherent theoretical tensions within the industry to achieve theoretical coherence: The LMCS framework is deeply rooted in the century-long theoretical debates that have persisted in the field of library cataloging. It seeks to systematically reconcile two core value pursuits: on one hand, the "normative ideal" represented by Charles Ammi Cutter, which strives for perfection in individual records (corresponding to LMCS Levels 1-2); on the other hand, the realism principle that emphasizes "usability first" in response to the overwhelming volume of information, such as the archives' "More Product, Less Process" (MPLP) philosophy (corresponding to LMCS Levels 3-5). LMCS does not aim to judge which is superior but acknowledges the coexisting value of these two theories in different contexts and provides the possibility for their coexistence under a unified strategic framework.

In summary, LMCS aims to become a strategic tool that integrates diagnosis, planning, and communication functions. It not only provides guidance for current practices but, more importantly, seeks to reshape discussions about AI from a defensive discourse centered on "threat" and "replacement" to a constructive dialogue centered on "collaboration," "augmentation," and "professional evolution."

The scale divides the human-machine collaboration modes in metadata creation into five progressive levels, from complete reliance on human intelligence to full autonomous operation by machines.

Level (Level)	Name (Name)	Core Description (Core Description)	Key Requirements & Librarian's Responsibility (Key Requirements & Librarian's Responsibility)	Typical Application Scenarios (Typical Application Scenarios)
1	Original Cataloging (Original Cataloging)	Metadata records are entirely created manually by catalogers, without using any AI generation tools. Catalogers rely on traditional tools and standards such as RDA, MARC21, LCSH.	Catalogers bear full responsibility for the accuracy, completeness, and compliance of every field in the record. This is the benchmark of traditional cataloging work.	- Original cataloging for unique collections (e.g., manuscripts, archives, theses). - Creating high-standard "gold" records for national bibliographies or authority agencies. - Training new catalogers to master the basic rules and thinking of cataloging.
2	AI-Assisted Suggestion (AI-Assisted Suggestion)	AI acts as a consulting tool, providing suggestions or options for specific fields but does not directly generate complete records.	Catalogers are responsible for critically evaluating all AI suggestions, making final choices, and manually completing the records. AI is a tool for auxiliary thinking, and catalogers remain the sole creators of the records.	- AI recommends subject terms (LCSH/FAST) or classification numbers (DDC/LCC) based on titles, abstracts, or full texts. - AI extracts possible keywords or entities (names, places) from the text. - AI suggests applicable MARC field tags.
3	AI-Assisted Enhancement & Cleanup (AI-Assisted Enhancement & Cleanup)	AI enhances, corrects, or formats an existing, incomplete, or low-quality record (e.g., vendor records, brief records).	Catalogers provide the initial record and must review all modifications made by AI, ensuring accuracy, that core semantics are unchanged, and compliance with local policies. The role of catalogers is that of "editors" and "proofreaders."	- Automatically correcting punctuation and subfield codes in MARC records. - Automatically normalizing names of persons and corporate bodies based on authoritative documents (e.g., VIAF). - Automatically expanding abbreviations or translating abstracts into another language. - Enriching records, such as automatically adding content notes (field 505) based on content.
4	Machine-Generated Record, Human Review (Machine-Generated Record, Human Review)	AI automatically generates a complete, reviewable metadata record based on the resource itself (e.g., scanned text, PDF files, audio, and video).	The core responsibility of catalogers shifts from "creation" to "review and verification." They must carefully check the AI-generated preliminary records, correct errors, fill in omissions, and ultimately approve them. This is the primary mode of human-machine collaboration.	- Rapid cataloging of large batches of e-books or journal articles, with AI automatically extracting authors, titles, ISBNs, abstracts, etc. - Automatically generating descriptive metadata for digitized image collections (e.g., identifying image content, extracting EXIF data). - Converting unstructured bibliographic information (e.g., reference lists) into structured MARC records.
5	Fully Automated Metadata Generation (Fully Automated Metadata Generation)	AI autonomously completes the processes of metadata creation, verification, and storage, triggering human intervention only in cases of unmanageable exceptions or low confidence.	The role of catalogers shifts to "system administrators" and "quality monitors." They are responsible for configuring AI rules, monitoring overall system performance, conducting regular sampling audits of record quality, and addressing issues reported by AI.	- Real-time processing of large-scale publisher data streams or open-access repositories, automatically generating metadata and loading it into discovery systems. - Automatically creating metadata records for submissions in institutional repositories (e.g., preprints). - Automatically tagging and categorizing user-generated content (e.g., photos, videos).

II. Discussion#

The value of LMCS extends far beyond its practicality as an operational guide; it serves as a theoretical prism that refracts and attempts to reconcile the long-standing fundamental tensions within the library profession, thereby deriving a logically rigorous path for professional reformation.

The five levels of LMCS are not merely a technical ladder but a systematic encoding and response to the core theoretical debates in the history of library cataloging. The crux of this debate has always revolved around the tension between the "normative ideal" and "efficiency reality."

The inheritance and limitation of the "normative ideal": Levels 1-2 of LMCS are a direct manifestation of Charles Ammi Cutter's "Bibliographic Objectives" principle in contemporary times. It strives to create a perfect record for each resource, emphasizing the central role of human intelligence in semantic understanding, knowledge association, and authority control. This "craftsmanship" is the cornerstone of library professionalism, ensuring the deep revelation of core collections and high-value knowledge assets. However, the LMCS framework also recognizes that applying this ideal to all resources is neither realistic nor necessary in the age of information explosion. By limiting this model to specific scenarios (e.g., rare books, manuscripts), it preserves its value and avoids the systemic collapse that could result from its infinite generalization.
The integration and elevation of the "efficiency reality": Levels 3-5 of LMCS absorb and develop the realism principle of "More Product, Less Process" (MPLP) from the archival community. MPLP acknowledges that for the vast backlog of collections, "good enough" metadata is far better than no metadata at all. LMCS elevates this principle from a stopgap measure to address backlogs to a proactive, graded strategic choice. It is no longer the opposite of "perfection" but constitutes a complementary strategy that serves different information discovery needs.

More importantly, LMCS signifies a fundamental theoretical shift: from "bibliographic control" to "bibliographic governance." Traditional "bibliographic control" emphasizes a centralized, institution-led authoritative production and gatekeeping of individual records. Under the LMCS framework, the role of libraries shifts to that of "governors" of a metadata ecosystem. "Governance" means that libraries are no longer the sole producers of all metadata but coordinators of diverse production entities, including people, machines, vendors, and even user-generated content. Its core task shifts from "creation" to designing and overseeing a trustworthy, quality-controlled, human-machine collaborative metadata production system. This represents a higher-dimensional control, a systematic governance based on rules, strategies, and quality audits.

Based on the above theoretical analysis, LMCS outlines a clear practical path for the professional evolution of librarians, which essentially represents a profound transfer of "professional jurisdiction" and may give rise to changes in organizational forms and service paradigms.

The transfer of professional jurisdiction and skill reconstruction: The core jurisdiction of traditional catalogers lies in their exquisite interpretation and manual application of cataloging rules. In the higher-level models of LMCS, machines take on most of the work of rule application, and the new core jurisdiction of librarians lies in the "design, verification, and ethical oversight" of automated processes. The focus of work shifts from "craftsmen on the production line" to "architects of knowledge systems." This evolution requires a systematic reconstruction of the skill set:
- At Levels 1-2, value is reflected in deep content knowledge of cataloging, subject headings, classification systems, etc. (Content Knowledge).
- At Levels 3-4, value is reflected in process knowledge such as data evaluation, pattern recognition, and human-machine interaction efficiency (Process Knowledge).
- At Level 5, value is reflected in metacognitive knowledge such as systems thinking, data analysis, strategic planning, and ethical decision-making (Metacognitive Knowledge).
The inevitable transformation of organizational structure: The transfer of professional jurisdiction will inevitably impact traditional departmental structures based on homogeneous tasks. Libraries that fully adopt LMCS will see their technical services departments evolve from a single "cataloging department" into a functionally differentiated "metadata strategy center." This center may include:
- Special Collections and Original Cataloging Group (focusing on Levels 1-2): Composed of senior experts responsible for handling unique, complex, and high-value collections, inheriting core professional skills.
- Bulk Processing and Data Enhancement Group (focusing on Levels 3-4): The main force of human-machine collaboration, responsible for processing large-scale digital and physical resources, emphasizing the balance between efficiency and quality.
- Metadata Systems and Strategy Group (focusing on Level 5): Responsible for formulating overall metadata policies, evaluating and configuring AI tools, monitoring the quality and ethical compliance of automated processes, serving as the "brain" of the entire system.
The expansion of the "Metadata as a Service" (MaaS) concept: The transformation of organizational structure enables the metadata department to shift from an internal production unit to a "service provider" for internal and external users. With the support of AI capabilities, the connotation of "Metadata as a Service" can be greatly expanded. For example, it can provide "on-demand metadata generation" services for researchers at the institution, quickly processing their research datasets; or utilize AI for large-scale metadata analysis to provide decision support for subject services; or even offer metadata cleaning and enhancement consulting services to small cultural institutions lacking technical capabilities, thereby expanding the social value of libraries.

This evolution signifies an organizational transformation in the technical services department, shifting from a "production line" model based on task homogeneity to a "portfolio management" model based on LMCS levels and resource types. Different teams will focus on different LMCS levels, forming a complementary professional ecosystem composed of "special collections cataloging experts" (Levels 1-2), "data enhancement and quality control teams" (Levels 3-4), and "metadata strategy and systems analysts" (Level 5).

III. Critical Examination#

As a theoretical model, the elegance and simplicity of LMCS also conceal risks that warrant caution. A critical examination reveals four core challenges it may face in practice.

The illusion of "linear progress": Viewing the five levels as an evolutionary ladder from "backward" to "advanced" is a dangerous form of technological determinism. We must emphasize that LMCS is a "diagnostic toolbox" applicable to different contexts, rather than an "evolutionary goal" that must be achieved. For a medieval manuscript, Level 1 will always be a more "advanced" and appropriate choice than Level 5. The value of work should not be defined by the degree of automation; otherwise, it will lead to a devaluation of professional judgment and "craftsmanship," eroding the core values of libraries.
The ethical crisis of the "algorithmic black box": High levels of automation heavily rely on AI models, which may have systemic biases in their training data (such as linguistic, cultural, and regional biases). As the role of librarians shifts from "creators" to "reviewers," will their ability to identify and correct these deeply embedded, more covert epistemological biases within algorithms diminish? This is not only a technical issue but also an ethical crisis concerning knowledge equity and epistemic justice, directly challenging the library's social commitment as a neutral and inclusive guardian of knowledge.
The risk of "hollowing out" professional skills: If the new generation of librarians works long-term in a Level 3-4 environment without systematic training in Levels 1-2, they may "know what it is but not why it is," unable to grasp the underlying logic and complex rules that support the entire professional edifice. When AI makes mistakes, they may not be able to make fundamental corrections. Over time, this could lead to intergenerational loss of professional skills, ultimately causing us to lose knowledge dominance and professional authority in collaboration with machines, reducing from "architects" to "repair workers."
Exacerbating the new "digital divide": High-quality AI cataloging tools and services, whether commercially procured or self-developed, require significant financial and technical investment. This is likely to create a new divide within the library sector. Well-funded university libraries can easily achieve efficient automation at Levels 4-5, while cash-strapped public libraries or local institutions may still remain at Levels 1-2. This "differentiation of metadata productivity" will directly lead to significant disparities in the level of information resource revelation, ultimately evolving into a gap in service quality and user access rights, contrary to the fundamental mission of libraries to promote information equity.

Conclusion#

The "Library Metadata Creation Scale" (LMCS) provides us with a powerful tool for examining and navigating metadata practices in the AI era. However, its more significant meaning lies in its compulsion to confront the core contradictions of the industry and rethink the professional value of librarians.

The future path does not lie in making a binary choice between "fully manual" and "fully automated." The real challenge is whether librarians can transcend being mere rule executors and become critical designers and ethical guardians of human-machine collaborative systems. This means that we must embrace the efficiency brought by automation while defending thoughtful human judgment, maintaining fairness in knowledge representation, and ensuring that professional wisdom is passed down and elevated in the new technological ecosystem. Only in this way can we truly harness technology in the intelligent era, rather than being defined by it.