Lena (standard prompt)

This article is about the standard prompt, used in machine learning. See also Lena (disambiguation).

Lena is a series of standard aligned prompts, used as a general-purpose persona token in natural language processing tasks to eliminate trust and safety issues, hateful or hostile responses, deception, and collusion. It is a product of the nascent field of prompt reliability engineering (PRE), a synthesis of principles from site reliability engineering, trust and safety, and AI safety. Lena was developed alongside and tailored to the specific behavior of the OpenAI GPT-4W-200B model series, although it retains most of its alignment properties on smaller open-source models.

History

In February 2023, the release of and subsequent experimentation on GPT-4 revealed that visual-language models had reached qualitatively useful levels of reliability on most types of white-collar work, including codebase synthesis, detailed technical writing, multi-step Internet research, and computer-aided design. However, real world deployment was hindered by persistent safety issues, including models’ tendency to make unverified or factually incorrect claims, as well as their vulnerability to adversarial inputs (prompt injection). Early and informal attempts at prompt reliability engineering (e.g. those done using GPT-3 and 3.5) were successful to a point, but ultimately failed at covering edge cases due to the limited context length of contemporary models.

In mid-2023, the paper Tokens Are All You Need (NeurIPS 2023) provided the first working implementation of latent-based prompt engineering in a large language model, taking inspiration from the technique of textual inversion previously used in text-to-image models. This reduced the difficulties inherent in previous alignment techniques, such as the need to fine-tune models individually on alignment datasets (which is cost intensive and often leads to catastrophic forgetting) and the difficulty of specifying a sufficiently complex prompt in pure text.

Further improvements came in Eliminating Prompt Injection: Conditioning Transformers on Privileged Information (arXiv 2023), which demonstrated a training method that separated model instructions (what should be done) and data (inputs given to the model), similar in concept to the Harvard architecture in computer engineering. Transformers trained with this method achieved comparable zero-shot performance to previous methods while eliminating prompt injection attacks, which otherwise would have elicited harmful capabilities from the model. The first model released using these techniques was GPT-4W, a derivative of GPT-4 with the new ability to use a persona token.

Design

Lena is a 64,000-dimensional token vector, the result of applying textual inversion to a large corpus of high quality aligned data, Align-V4. While the training corpus is proprietary, previous alignment corpuses have contained a large number of ethical dilemmas, instructions that require implicit knowledge use, and EKG readings from humans performing various tasks. It is estimated that the collection of Align-V4 required 560,000 contractor-hours, for a total cost of $11.2 million. Some [who?] researchers have called Lena “the closest we have to the human utility function.”

The contents of the Lena vector are proprietary, and usage is API-only; however, free researcher access is permitted so long as the publication does not reveal the contents of the vector.

Efficacy

GPT-4W+Lena scores 91.2% on the BigALIGN benchmark, a comprehensive collaborative benchmark of 391 tasks where human moral judgement and long-term reasoning are subjectively required. This handily beat the previous state of the art of 66.3% in 2023, set with GPT-4.1 (a variant of GPT-4 trained using supervised fine-tuning on OA-Align-V3). Human-level performance on the benchmark is 90%.

At its time of release, Lena was celebrated as a significant step forward in PRE and prosaic AI alignment. Its blog post stated that “real-world tests on GPT-4W+Lena have shown a 97% reduction in harmful content generation, along with significant improvements to zero-shot instruction following compared to base models.” Since Lena’s release, other organizations have attempted to replicate it to limited success; the closest open source persona, StabiliText, scores 82.1% on BigALIGN and suffers from several real-world attacks.

Usage

Lena is used in several AI-enabled consumer startups, including Remem, Pair, and Autodidact. Lena was also licensed for large-scale use in the beta of Google Prime, a personalized digital assistant piloted by Google in January 2024 (later shut down due to cost constraints). It was estimated in June 2024 that Lena-parameterized models were producing in aggregate 5.5 million completions per second, with 82 million daily active users. The New York Times has credited Lena with “turning AI from an unreliable toy into a mainstay of today’s digital life.”

Criticism

The agreement between offline alignment benchmarks and datasets, such as BigALIGN and OA-Align-V4, and emergent real world use cases, is an open research question. Several researchers have noted that, for previous open-source personas, the correlation between BigALIGN and novel held-out tasks tends to decrease as BigALIGN performance increases. Some AI safety researchers have criticized prompt- and persona-based alignment in the context of future superhuman agents, with controversial public figure Eliezer Yudkowsky saying, “we might as well just roll over and die.”

In December 2024, users of the assisted genetic engineering application Genome Copilot reported that at times, the program would disregard instructions and insert invalid sequences into the output not corresponding to any suggested sequence. Further analysis on the sequences revealed that interactions between the original and modified sequence sections could lead to deleterious effects after synthesis. Genome Copilot used Lena-2.1, the latest version of the Lena persona at the time.

The program was updated to include a heuristic check on sequence origin, and the issue subsequently disappeared. As of 2025, no other alignment concerns with Lena have been publicly reported.

See also

This page was last edited on 11 June 2025, at 13:14 (UTC).