Skip to main content
You can find information about Azure OpenAI’s latest models and their costs, context windows, and supported input types in the Azure docs. For the full set of Microsoft integrations in LangChain (including tools like Azure AI Search, Azure Database for PostgreSQL, and the M365 suite), see the Microsoft provider page.
Azure OpenAI vs OpenAIAzure OpenAI refers to OpenAI models hosted on the Microsoft Azure platform. Models hosted on Azure come with added enterprise features including support for keyless authentication with Entra ID.
Use ChatOpenAI with v1 API (recommended)Azure OpenAI’s v1 API (Generally Available as of August 2025) allows you to use ChatOpenAI directly with Azure endpoints. This removes the need for dated api-version parameters and provides native support for Microsoft Entra ID authentication with automatic token refresh.We continue to support AzureChatOpenAI, which now shares the same underlying base implementation as ChatOpenAI, which interfaces with OpenAI services directly.This page serves as a quickstart for authenticating and connecting your Azure OpenAI Chat Models to LangChain.
API ReferenceFor detailed documentation of all features and configuration options, head to the AzureChatOpenAI API reference. Visit the ChatOpenAI docs for details on available features.

Overview

Integration details

ClassPackageSerializableJS/TS SupportDownloadsLatest Version
AzureChatOpenAIlangchain-openaibeta(npm)Downloads per monthPyPI - Latest version

Model features

Tool callingStructured outputImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs

Setup

To access Azure OpenAI models you’ll need to create an Azure account, create a deployment of an Azure OpenAI model, get the name and endpoint for your deployment, and install the langchain-openai integration package.

Installation

pip install -U langchain-openai

Credentials

Both ChatOpenAI and AzureChatOpenAI support authenticating to Azure OpenAI with either Microsoft Entra ID (recommended) or an API key.

Microsoft Entra ID

Microsoft Entra ID provides keyless authentication with automatic token refresh. Install the azure-identity package and create a token provider—the same provider works with both ChatOpenAI and AzureChatOpenAI:
pip install azure-identity
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default",
)

API key

Head to the Azure docs to create your deployment and generate an API key. Set the AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT environment variables:
import getpass
import os

if "AZURE_OPENAI_API_KEY" not in os.environ:
    os.environ["AZURE_OPENAI_API_KEY"] = getpass.getpass(
        "Enter your AzureOpenAI API key: "
    )
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://YOUR-RESOURCE-NAME.openai.azure.com/"
To enable automated tracing of your model calls, set your LangSmith API key:
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

Instantiation

ChatOpenAI with v1 API

Set base_url to your Azure endpoint with /openai/v1/ appended. With the v1 API you can call any model deployed in Microsoft Foundry (including OpenAI, Llama, DeepSeek, Mistral, and Phi) through a single interface by pointing model at your deployment name.

AzureChatOpenAI

Use AzureChatOpenAI when working with traditional Azure OpenAI API versions that require api_version.

Invocation

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
print(ai_msg.text)
J'adore la programmation.

Tool calling

Bind tools to the model using Pydantic classes, dict schemas, LangChain tools, or functions:
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field


class GetWeather(BaseModel):
    """Get the current weather in a given location"""

    location: str = Field(description="The city and state, e.g. San Francisco, CA")


llm = ChatOpenAI(
    model="gpt-5.4-mini",
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
    api_key="your-azure-api-key",
)

llm_with_tools = llm.bind_tools([GetWeather])

ai_msg = llm_with_tools.invoke("What is the weather like in San Francisco?")
ai_msg.tool_calls
[{'name': 'GetWeather',
  'args': {'location': 'San Francisco, CA'},
  'id': 'call_jUqhd8wzAIzInTJl72Rla8ht',
  'type': 'tool_call'}]
For more on binding tools and tool call outputs, head to the tool calling docs.

Build an agent

Use create_agent to build an agent with Azure OpenAI and tools:
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-5.4-mini",
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
    api_key="your-azure-api-key",
)


def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"


agent = create_agent(
    model=llm,
    tools=[get_weather],
    system_prompt="You are a helpful assistant",
)

# Stream agent responses
for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "What is the weather in SF?"}]},
    stream_mode="updates",
):
    for step, data in chunk.items():
        print(f"step: {step}")
        print(f"content: {data['messages'][-1].text}")

Streaming usage metadata

OpenAI’s Chat Completions API does not stream token usage statistics by default (see the OpenAI API reference for stream options). To recover token counts when streaming, set stream_usage=True as an initialization parameter or on invocation:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-5.4-mini",
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
    api_key="your-azure-api-key",
    stream_usage=True,  
)

Responses API

Azure OpenAI supports the Responses API, which provides stateful conversations, built-in server-side tools (code interpreter, image generation, file search, and remote MCP), and structured reasoning summaries. ChatOpenAI automatically routes to the Responses API when you set the reasoning parameter, or you can opt in explicitly with use_responses_api=True: For details on built-in tools and how to use them, see the Azure OpenAI Responses API docs.

Reasoning effort and summary

Azure OpenAI reasoning models (for example, o4-mini, gpt-5) spend extra tokens thinking through a request before producing their final answer. With ChatOpenAI on the v1 API, you can configure how much effort the model spends reasoning and optionally request a summary of its chain of thought.

Reasoning effort

Set reasoning_effort to "low", "medium", or "high". Higher settings let the model spend more tokens on reasoning, which typically improves quality for complex tasks at the cost of latency:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-5.4-mini",  # your Azure reasoning model deployment
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
    api_key=token_provider,
    reasoning_effort="medium",
)

response = llm.invoke("Tell me about the bitter lesson.")
print(response.text)
Reasoning models use tokens for internal reasoning (reasoning_tokens in completion_tokens_details). These tokens aren’t returned in the message content but count toward the output token limit. If you see empty responses, increase max_tokens or leave it unset so the model has room for both reasoning and output.

Reasoning summary

When using a reasoning model via the Responses API, you can request a summary of the model’s chain of thought by passing a reasoning dict. Setting reasoning automatically routes ChatOpenAI to the Responses API:
from langchain_openai import ChatOpenAI

reasoning = {
    "effort": "high",    # 'low', 'medium', or 'high'
    "summary": "auto",   # 'auto', 'concise', or 'detailed'
}

llm = ChatOpenAI(
    model="gpt-5.4-mini",  # your Azure reasoning model deployment
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
    api_key=token_provider,
    reasoning=reasoning,
)

response = llm.invoke("What's the optimal strategy to win at poker?")

# Final answer
print(response.text)

# Reasoning summary blocks
for block in response.content_blocks:
    if block["type"] == "reasoning":
        print(block["reasoning"])
Attempting to extract raw reasoning tokens through methods other than the reasoning summary parameter isn’t supported and may violate Azure’s Acceptable Use Policy. Use the summary field to access model reasoning.
Even when enabled, reasoning summaries aren’t guaranteed for every step or request—this is expected behavior.

Specifying model version (legacy API)

This section applies only when using AzureChatOpenAI with traditional API versions. The v1 API does not require api_version parameters.
When using AzureChatOpenAI, Azure OpenAI responses contain a model_name response metadata property. Unlike native OpenAI responses, it does not contain the specific version of the model (which is set on the deployment in Azure). Pass model_version to distinguish between different versions:
from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    azure_deployment="gpt-5.4-mini",  # or your deployment
    api_version="2025-04-01-preview",  # or your api version
    model_version="0301",
)

API reference

For detailed documentation of all features and configuration options, head to the AzureChatOpenAI API reference.