a surveillance camera, mounted on a post, with mountains visible in the background
Blog

AI in the Kill Chain: The Risk of AI-Enabled Warfare and What Investors Need to Know

Camille Bisconte De Saint Julien, LBP AM 

The rapid integration and use of artificial intelligence (AI) in the battlefield is profoundly altering the nature of warfare. AI-enabled technologies now sit across the full military decision chain, from intelligence gathering and surveillance to targeting and decision support. Their strategic value lies in their ability to process vast quantities of data and translate it into operational outputs in real time. In doing so, they do not merely assist decisions, they shape how decisions are produced. 

This transformation is often framed through the promise of more precise and “cleaner” warfare. Faster data processing, pattern recognition, and predictive capabilities are presented as ways to reduce human error and limit civilian harm. Yet evidence from recent conflicts increasingly challenges this narrative. Rather than consistently strengthening compliance with international humanitarian law (IHL), these systems may contribute to undermining core IHL principles such as distinction, precaution, and proportionality. IHL is the set of rules that seeks to limit the effects of armed conflict by protecting individuals who are not participating in hostilities and restricting the means and methods of warfare. 

For investors, this evolution introduces a new category of risk. While debates have tended to focus on autonomous weapons, a broader shift is taking place: AI is progressively restructuring how military decisions are generated, accelerated, and justified. Understanding this shift is essential for engaging companies beyond high-level commitments and assessing how their technologies operate in practice. 

Risks Associated with AI-Enabled Military Systems 

The risks associated with military AI are not isolated failures, but systemic effects arising from the interaction between system architecture, operational environments, and institutional constraints. They affect how information is produced, how uncertainty is managed, and how responsibility can be exercised across the decision chain. This article outlines eight top risks and for each has suggested questions for investors to use in engagements with providers of military AI. 

  1. Traceability, Accountability, and Responsibility Chains 

A first concern lies in the traceability and interpretability of decision-making processes. AI-enabled systems rely on multi-layered pipelines in which data is progressively transformed into operational outputs. These outputs do not originate from a single identifiable step, but from the interaction of multiple models and datasets operating in sequence or in parallel. In operational conditions, this makes it difficult to reconstruct how a given output was produced. 

This has immediate implications for accountability. When an AI-supported assessment contributes to a targeting decision, it may be unclear which part of the system influenced the outcome, how inputs were weighted, or how alternative interpretations were excluded. In the event of civilian harm or alleged violations of IHL, this complicates the reconstruction of the decision chain and raises the question of responsibility: who is accountable - the operator, the commanding authority, or the system developer? For armed forces, this weakens investigative and review processes. For companies designing and supplying these systems, it creates exposure to legal, ethical, and reputational risks that cannot be addressed through high-level commitments alone. 

These challenges are reinforced by the limited interpretability of system outputs in real time. Operators typically receive the result of complex processing without access to a clear explanation of how conclusions were reached. As a result, decisions are increasingly based on outputs that cannot be fully scrutinized or independently verified. This constrains the ability to question system reasoning, assess data quality, or identify alternative readings of the operational environment. 

Investors should therefore examine whether companies can demonstrate end-to-end traceability across their systems, and whether decision pathways can be meaningfully reconstructed in the event of harm. If neither clients nor providers can explain how a system arrived at a given output, assigning responsibility becomes inherently more complex. 

  • How does the company ensure end-to-end traceability across the AI pipeline, from data inputs to operational outputs? 
  • Is it possible to reconstruct, in practice, the decision pathway behind a specific output used in an operational context? 
  • What level of granularity exists in logging and audit trails (e.g. data used, model version, decision thresholds)? 
  • How is responsibility mapped across the value chain (developer, integrator, operator) in the event of harm? 
  • What technical and organizational arrangements are in place to support incident investigation and IHL-related reviews? 
  1. Meaningful Human Control in Practice 

Beyond its implications for traceability and accountability, opacity also directly affects the substance of human control. While a human may remain formally within the decision loop, their capacity to critically assess or challenge system outputs is reduced if the underlying process is not intelligible. In practical terms, this shifts human involvement from independent judgment to the validation of pre-structured outputs. For military actors, this raises questions as to whether control remains meaningful in the sense required for lawful use of force, particularly where IHL obligations depend on informed, context-sensitive decision-making. 

These constraints become more acute in fast-paced operational environments, where AI systems are specifically designed to manage large data volumes and accelerate decision cycles. By filtering and prioritizing information, they structure the operational picture presented to decision-makers. Operators engage primarily with system-curated representations rather than raw data, limiting their ability to cross-check or contextualize outputs within the available time. 

While companies frequently emphasize “human-in-the-loop” safeguards, the key issue is whether this control remains effective under operational conditions. Investors should therefore look beyond formal claims and consider how systems are actually used. 

  • How is “meaningful human control” translated into system design requirements and operational procedures? 
  • What evidence demonstrates that operators can interpret, question, and override system outputs in real-world conditions? 
  • How is operator understanding of system outputs assessed (e.g. testing, simulations, training validation)? 
  • Are there design features aimed at mitigating over-reliance on automated outputs (e.g. interface design, redundancy, forced pauses)? 
  • In high-speed environments, under what conditions can operators realistically intervene or suspend system outputs? 
  1. Interpretability and Management of Uncertainty 

Under these operational conditions, the question of how uncertainty is represented and acted upon becomes central. AI outputs are often probabilistic, meaning they are expressed through confidence scores, classifications, or likelihoods, but they are typically presented in formats that emphasize clarity and actionability. In practice, this can compress uncertainty into simplified signals that are easier to use operationally but more difficult to critically interpret. Under time pressure, the distinction between a high-confidence estimate and a verified fact may become blurred. 

This dynamic creates a potential gap with the requirements of IHL obligations, particularly those relating to distinction and precaution, which are not satisfied by acting on confidence alone. They require decision-makers to recognize and appropriately account for uncertainty, especially where doubt exists as to the status of a target. Where uncertainty is downplayed or insufficiently visible in AI-generated outputs, there is a risk that it is not adequately integrated into the decision-making process. In such situations, actions may be taken on the basis of representations that appear sufficiently reliable, without meeting the threshold of certainty required under IHL. 

Investors should ask how companies ensure that uncertainty is adequately communicated and accounted for. 

  • How are uncertainty and confidence levels represented to users in operational interfaces? 
  • Can operators access meaningful explanations of why a classification or recommendation was generated? 
  • How does the system distinguish between high-confidence outputs and genuinely verified information? 
  • Are there mechanisms that explicitly flag ambiguity, conflicting signals, or degraded data quality? 
  • What training or guidance is provided to ensure operators interpret probabilistic outputs appropriately in high-risk contexts? 
  1. Reliability Under Real-World and Adversarial Conditions 

The limitations of AI-based interpretation are particularly exposed in complex and contested environments, where ambiguous and degraded signals heighten the risk that outputs fall short of the reliability required for lawful decision-making under IHL. Contemporary battlefields can be characterized by fluid interactions between civilians and combatants, adaptive adversaries and asymmetrical threats, political and security volatility, high levels of population density, and widespread dual-use infrastructure. These conditions generate patterns that do not align neatly with the categories embedded in machine-learning models. 

In such settings, errors are not exceptional but structural. AI systems apply patterns derived from training data to environments that may differ significantly in form and context. The outputs produced may appear coherent and definitive, yet rely on incomplete or distorted inputs. This creates an illusion of precision, where the apparent clarity of classifications masks underlying uncertainty. 

Operational conditions further amplify this risk. Sensor limitations, electronic interference, and incomplete data feeds can degrade inputs, while adversarial tactics, such as camouflage, spoofing, or behavioral adaptation, may intentionally exploit model weaknesses. The result is a situation in which outputs remain internally consistent but may diverge from reality. For operators, the difficulty lies in identifying when this is the case, particularly under time constraints. 

From a military perspective, these dynamics directly affect the application of IHL. The principles of distinction and precaution require decision-makers to recognize uncertainty and act accordingly. Where uncertainty is obscured within system outputs, the capacity to meet these obligations is reduced, increasing the risk of misidentification and unintended harm. 

Investors should therefore examine how companies validate system performance under such scenarios, and how they detect, manage, and communicate anomalies or failure modes once systems are deployed

  • How are systems tested against degraded inputs (e.g. partial data, sensor failure, communication loss)? 
  • What validation is conducted for adversarial conditions (e.g. spoofing, camouflage, behavioral adaptation)? 
  • How does the system perform when confronted with data outside its training distribution? 
  • What safeguards exist to detect anomalous outputs and detect ‘silent failures’, i.e., errors, biases, or automation effects that are not immediately visible in operational settings? 
  • How is performance monitored and updated once systems are deployed in real-world environments? 
  1. Scale, Acceleration, and Decision Safeguards 

These issues are compounded by the combined effects of scale and acceleration. AI systems can generate large volumes of potential targets or alerts in real time, far exceeding the analytical capacity of human operators. At the same time, decision timelines are compressed, limiting the depth of review that can be applied to each case. 

In practice, this creates a structural imbalance: more outputs to assess, less time to assess them. Validation processes may therefore become selective, focusing on prioritized cases while others are processed with limited scrutiny. Over time, this can shift decision-making thresholds, as the ability to thoroughly evaluate each output diminishes. 

For military actors, this has clear implications for proportionality and precaution. Both require deliberation, the consideration of alternatives, and the assessment of potential civilian harm. Where operational tempo constrains these processes, the practical conditions for compliance are weakened. For companies developing these systems, the question is not only whether their tools perform as intended, but whether their integration into operational workflows alters the conditions under which legal obligations can realistically be met. 

Investors should consider whether companies assess the impact of this acceleration on human interpretation and validation capacity. 

  • How does the company assess the impact of system speed and output volume on human decision-making capacity? 
  • Are there mechanisms to regulate or limit the number of actionable outputs presented to operators? 
  • How is the risk of decision-compression (limited time for review) accounted for in system design? 
  • Has the company conducted analysis linking system acceleration to operator error or misclassification rates? 
  • Are there operational safeguards to ensure outputs are not acted upon faster than they can be meaningfully reviewed? 
  1. Integration of International Humanitarian Law (IHL) 

A further layer of risk lies in the interaction between AI systems and legal decision-making frameworks. AI tools are designed to classify, rank, and prioritize objects of interest. However, they do not incorporate the contextual reasoning required for legal assessment under IHL, which depends on evaluating intent, behavior, and situational factors. 

In operational use, AI outputs often define the starting point of decision-making by identifying and structuring potential targets. This may lead to a subtle but significant shift: legal reasoning becomes anchored in system-generated classifications rather than independently constructed assessments. For armed forces, this raises questions about how to maintain the primacy of human judgment. For corporate actors, it highlights the importance of how systems are designed, presented, and integrated into decision processes. 

For investors, this involves questioning whether systems incorporate contextual information relevant to distinction, proportionality, and precaution, and whether they are capable of representing uncertainty rather than obscuring it. It also raises the question of limits: do companies define clear boundaries on how their technologies can be used, particularly in targeting applications, and are these limits reflected in contractual arrangements? 

  • How are IHL principles (distinction, proportionality, and precaution) translated into system design or constraints? 
  • Does the system incorporate contextual information relevant to target assessment (e.g. civilian presence, environment, temporal factors)? 
  • How are uncertainty and doubt explicitly integrated into targeting-related outputs? 
  • How does the system handle ambiguous cases, including dual-use infrastructure or mixed civilian–military environments? 
  • Are there defined limits or exclusions regarding the use of AI in targeting, and how are these enforced in practice (e.g. contracts, system restrictions)? 
  1. Data Governance, Bias, and Model Integrity 

Finally, these risks are underpinned by issues related to data governance and model integrity. AI systems depend on datasets that are partial, context-specific, and inherently limited. In military applications, training data may not capture the variability of real-world environments, particularly across different theaters of operation. Biases in datasets, whether due to sampling gaps, historical patterns, or proxy variables, can influence classification outcomes in ways that are not always visible. 

These limitations can lead to systematic misclassification, particularly in complex or heterogeneous environments. Moreover, feedback loops may reinforce these patterns over time, as system outputs shape subsequent interpretation and data collection. From a military perspective, this directly affects the reliability of targeting processes. Under IHL, access to accurate and contextually relevant information is a prerequisite for lawful decision-making. Where data limitations introduce persistent distortions, this requirement becomes more difficult to fulfil. 

Investors should therefore scrutinize how companies address bias, validate training data, and monitor model performance over time. In dynamic environments, ensuring that models remain aligned with real-world conditions is essential, not only for operational effectiveness, but for preventing and mitigating the risk of systematic error. 

  • How are training datasets validated for representativeness across different operational contexts? 
  • What processes are in place to identify and mitigate bias in both datasets and model outputs? 
  • How does the company test system performance across diverse and complex real-world scenarios? 
  • Are there safeguards to detect and correct feedback loops that may reinforce bias over time? 
  • How is model drift monitored and managed once systems are deployed in evolving environments? 
  1. Governance, Escalation, and Corporate Responsibility 

Finally, beyond technical and operational risks, the governance of military AI within companies raises a distinct set of concerns. As private actors play a growing role in shaping how these systems are designed and deployed, their internal decision-making frameworks become directly relevant to how risks materialize in practice. 

A key issue is how companies identify, assess, and manage high-risk applications. While many have adopted general principles on responsible AI, it is often unclear how these translate into concrete decisions when commercial incentives and ethical considerations intersect. This brings into focus the effectiveness of internal escalation processes: whether high-risk use cases are subject to meaningful review, and whether companies are prepared to modify, or refuse, certain applications where risks cannot be adequately mitigated. 

These questions are closely linked to expertise and oversight. Robust governance requires the involvement of legal, compliance, and, where relevant, IHL specialists in decision-making. Without such structures, governance risks remain largely procedural. 

Transparency is another critical dimension. Some risks, such as system opacity, uncertainty, or degraded performance in contested environments, cannot be fully eliminated. The extent to which these limitations are clearly communicated to clients is therefore essential, particularly where they may affect operational use. 

Finally, corporate responsibility extends beyond system design to how technologies are ultimately deployed. Even where companies do not control end use, they may retain influence through contractual terms, safeguards, or ongoing engagement with clients. The absence of such mechanisms raises fundamental questions about how companies respond when their systems are used in ways that may carry significant legal or ethical implications. 

For investors, this area of governance is a critical point of leverage. It is at this level, where companies decide what to develop, how to deploy it, and under which conditions, that many of the structural risks identified above are either mitigated or reinforced. Engagement should, therefore, focus not only on stated principles, but on how governance functions in practice: how risks are escalated, how trade-offs are managed, and where companies draw boundaries. Understanding these internal processes is essential to assess whether companies are equipped to handle the legal and ethical implications of military AI, and whether their practices are consistent with responsible business conduct. 

  • How are high-risk use cases identified, assessed, and escalated within the company? 
  • What decision-making processes exist to modify, restrict, or refuse applications where risks cannot be mitigated? 
  • How are legal, ethical, and IHL considerations integrated into product development and deployment decisions? 
  • How are residual risks (e.g. opacity, uncertainty, system limitations) communicated to clients? 
  • What mechanisms are in place to respond if systems are used in ways that raise legal or human rights concerns? 

Conclusion 

AI is not simply another tool in the military domain. It is reshaping how decisions are produced, interpreted, and executed. The risks highlighted in the article are structural, and they interact in ways that are difficult to detect through high-level frameworks alone. 

For armed forces, this raises operational and legal challenges in maintaining accountability and compliance with IHL. For companies developing and supplying these technologies, it raises questions about design choices, deployment conditions, and responsibility across the value chain. 

For investors engaging with military AI providers, the implication is clear: these risks cannot be assessed through high-level policies alone. They require a deeper understanding of how systems function in practice, and how corporate choices influence the way military decisions are ultimately shaped. By asking these questions, and insisting on clear, evidence-based answers, they have a key role to play in shaping how these technologies are developed and used.