Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality

The article is based on a research paper titled “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality.” The authors include Fabrizio Dell’Acqua, Edward McFowland III, Ethan Mollick, Hila Lifshitz-Assaf, Katherine C. Kellogg, Saran Rajendran, Lisa Krayer, François Candelon, and Karim R. Lakhani. This paper was created with support from various institutions including Harvard Business School, The Wharton School at the University of Pennsylvania, and Boston Consulting Group, among others.

Abstract

The study investigates the influence of Large Language Models (LLMs) on how humans use Artificial Intelligence (AI) for a range of tasks. In partnership with Boston Consulting Group, the research evaluated the impact of AI on realistic, intricate, and knowledge-intensive tasks. The experiment included 758 consultants, roughly 7% of the firm’s individual contributor-level consultants. After determining a performance baseline, participants were randomly allocated to one of three scenarios: no AI access, GPT-4 AI access, or GPT-4 AI access with a prompt engineering overview. The findings suggest that while AI excels in certain tasks, other seemingly similar tasks remain challenging. In a set of 18 realistic consulting tasks within AI’s capability range, consultants using AI were significantly more productive and delivered higher quality results. On average, they completed 12.2% more tasks, and 25.1% faster, with over 40% higher quality than a control group. However, for tasks outside of AI’s current capability, consultants using AI were 19% less likely to find correct solutions. Two distinct patterns of AI utilization emerged: “Centaurs,” who divided and delegated tasks between themselves and the AI, and “Cyborgs,” who fully integrated their workflow with the AI.

Introduction

The advent of OpenAI’s ChatGPT and other Large Language Models (LLMs) has revolutionized public access to powerful AI tools. As the capabilities of AI increasingly intersect with human skills, especially in knowledge-intensive sectors, integrating human work with AI presents both challenges and opportunities. This research studies the impact of AI on skilled professional workers using randomized controlled field experiments. The findings reveal a “jagged technological frontier” in AI’s capabilities. Within this frontier, AI can enhance or even replace human tasks. However, outside this boundary, AI’s results are often inaccurate and can hamper human performance. The challenge lies in professionals’ ability to discern the extent of this frontier, with those adept at navigating it reaping significant productivity benefits. The release of ChatGPT in November 2022 marked a shift in discussions about AI, as LLMs demonstrated unexpected proficiency in creative, analytical, and writing tasks. This new wave of automation notably affected roles traditionally reserved for the most educated and highest-paid workers.

The research also references several prior studies on AI’s impact, emphasizing the novelty of LLMs and their potential to influence worker performance, particularly in knowledge-intensive sectors. The study’s focus is on complex tasks typical of knowledge workers, aiming to shed light on AI’s evolving influence on such professionals.

The implications of LLMs for organizations, individuals, and broader societal structures have garnered immense attention from scholars, businesses, and governments. Previous AI iterations have sparked debates on the adoption of AI for knowledge work, its impact on organizations, and the potential benefits and risks for professionals. For instance, some research has indicated that AI can enhance the effectiveness and efficiency of professionals, while others have cautioned against the risks of relying on AI for critical tasks, particularly those AI systems that are not transparent (“black-boxed”). There have also been concerns about “algorithmic management” by AI, which can have negative personal effects on professionals and raise ethical and accountability issues.

However, most of these studies were conducted before the emergence of ChatGPT and primarily focused on AI forms designed to produce discrete predictions based on past data, which differ significantly from LLMs. LLMs possess three distinct characteristics:

  1. Unexpected Capabilities: LLMs demonstrate abilities they weren’t specifically designed for, which are evolving rapidly as model size and quality improve.
  2. Specialist Knowledge: Despite being trained as general models, LLMs exhibit specialized knowledge and skills during their training and usage.
  3. Rapid Advancements: The effective capabilities of AI, especially LLMs, are novel, versatile, and have seen rapid advancements in a short timeframe. For example, AI has shown high-level performance in professional contexts, including medicine and law, often surpassing human performance in innovation measures.

Key Characteristics of LLMs:

  1. Performance Enhancement: LLMs can directly enhance worker performance without necessitating significant organizational or technological investment. Recent studies have shown improvements in writing, programming, ideation, and creative work due to LLMs. Consequently, the impact of AI is anticipated to be more profound on the most creative, well-compensated, and highly educated professionals.
  2. Opacity and Uncertainty: LLMs exhibit a degree of opacity, leading to unpredictable outcomes. They can produce plausible yet incorrect results and other types of errors, like in mathematical calculations or citations. The optimal utilization methods for these AIs are often discovered through user experimentation and shared via online platforms such as forums, hackathons, and social media.

The Jagged Technological Frontier: Considering the LLMs’ unexpected abilities, ease of use, and opacity, there exists a “jagged technological frontier.” Within this frontier, while some tasks (e.g., idea generation) are effortlessly handled by AI, others that seem straightforward (e.g., basic math) can challenge some LLMs. This frontier presents a scenario where tasks of seemingly similar difficulty may yield varying performance levels when humans use AI. The future of AI’s role in the workspace will be determined by how human interaction with AI evolves based on this frontier and how the frontier itself changes over time.

The primary objective of the research is to explore how humans navigate this jagged frontier and the consequent implications for performance.

The study was conducted in collaboration with a global management consulting firm, Boston Consulting Group (BCG).

Inside the Technological Frontier

Definition

Tasks “inside the frontier” are those that fall within the realm of AI’s capabilities, where AI can competently perform and even excel. These tasks typically involve aspects like creativity, analytical thinking, writing proficiency, and persuasiveness.

Characteristics

  • Proficiency in Execution: AI demonstrates substantial proficiency and accuracy in executing these tasks, often enhancing the quality and efficiency of outcomes.
  • Enhanced Performance: The integration of AI in these tasks leads to significant performance enhancements, with users completing more tasks, faster, and with higher quality.
  • Beneficial for Varied Skill Levels: Both top-half-skill and bottom-half-skill performers benefit from AI integration in these tasks, with the latter experiencing more substantial performance enhancements.
  • Homogenization of Outputs: While AI generates superior content in these tasks, it tends to reduce the variability of the ideas produced, leading to more homogenized outputs.

Outside the Technological Frontier

Definition

Tasks “outside the frontier” are those that exceed the capabilities of AI, where AI struggles to perform effectively. These tasks usually require nuanced understanding, integration of diverse data sources, and subtle insights that AI lacks.

Characteristics

  • Struggle in Execution: AI finds it challenging to execute these tasks effectively and requires extensive guidance to attempt them.
  • Decrease in Correctness: The integration of AI in these tasks leads to a significant decrease in the correctness of the responses provided.
  • Reduction in Time Spent: Despite struggles in task execution, AI integration results in a reduction in the time spent on tasks.

Methodology

  1. Collaboration and Structure: The research was conducted in partnership with Boston Consulting Group (BCG). It consisted of three phases: initial demographic and psychological profiling, the primary experimental phase involving task completions, and a concluding interview segment.
  2. Tasks for Testing: Two distinct tasks were tested — one situated within AI’s capability frontier and the other outside of it. The objective was to comprehend how AI integration might impact the workflows of high human capital professionals.
  3. Results Overview: The current generation of LLMs, particularly GPT-4 (which was the most advanced model during the study in Spring 2023), has the potential to significantly enhance quality and productivity or even automate some tasks. However, the specific tasks that AI can handle effectively might not be immediately evident to individuals or LLM producers.
  4. Data Collection: Data was sourced from two randomized experiments to assess the causal impact of AI on professionals who traditionally worked without AI. The study was pre-registered, outlining its design, experimental conditions, variables, and main analytical approaches.
  5. Participation: BCG consultants globally were invited to participate in this 5-hour experiment assessing the AI’s impact on their tasks. Roughly 7% of BCG’s global individual contributor consultants participated.
  6. Experimental Phases:
    • In the initial phase, consultants filled out a survey capturing demographic, psychological profiles, and their role details within the company.
    • A few weeks post-enrollment, participants received a link to the main experimental phase. This phase started

Conclusion

  • For tasks inside the technological frontier, the use of AI led to an increase in the number of subtasks completed by an average of 12.5%, and it also enhanced the quality of the responses by an average of more than 40%.
  • However, for tasks outside the frontier, where AI struggled without extensive guidance, there was a noticeable dip in performance among the AI treatment groups when juxtaposed with the control group.
  • The AI treatments showed a significant negative impact on the correctness of tasks in the outside-the-frontier experiment, with subjects in the AI conditions being less likely to provide accurate recommendations compared to the control group.
  • Despite the decrease in correctness, AI treatments indicated a reduction in the time spent on tasks, and there was an increase in the quality of the recommendations given.

This concludes the detailed summary of the results from the tasks conducted both inside and outside the technological frontier in the study.

    0 Shares:
    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You May Also Like