perf(core): eliminate redundant schema computation in tool infrastructure by sydney-runkle · Pull Request #37101 · langchain-ai/langchain

Sydney Runkle (sydney-runkle) · 2026-04-30T15:02:24Z

This PR is a stack of 7 focused commits targeting redundant schema computation in langchain-core's tool infrastructure. Each commit is independently reviewable. No public API signatures change; all improvements are internal.

Motivation

Every tool invocation in an agent loop triggers several redundant operations: repeated annotation walks over the same class hierarchy, repeated Pydantic model construction for the same schema, and full OpenAI-format schema conversion just to count characters for token estimation. At scale — hundreds of tool calls, dozens of tools — this is measurable overhead. These commits eliminate the redundancy without changing any public interfaces.

Commit-by-commit summary

1. `refactor(core): remove dead _get_filtered_args`

_get_filtered_args had zero call sites in the codebase. Removed.

2. `perf(core): memoize get_all_basemodel_annotations with lru_cache`

get_all_basemodel_annotations walks a class MRO on every call and is on every hot path (schema generation, injection detection, input parsing). Added @functools.lru_cache(maxsize=512). The function is pure given (cls, default_to_bound) — both are hashable. The returned dict is now shared; callers must not mutate it (none did).

Signature change: default_to_bound moves from keyword-only to positional (required for lru_cache compatibility below Python 3.12). Internal recursive call sites updated accordingly. This is a public function — the signature change is noted here for review.

3. `perf(core): deduplicate annotation walk in _parse_input`

_parse_input called get_all_basemodel_annotations once per branch of an if issubclass(…, BaseModel) / elif issubclass(…, BaseModelV1) block — the same call, duplicated in code. Hoisted above the branch. Also added a short-circuit: if _injected_args_keys is empty, the annotation walk is skipped entirely (tools with no injected args pay nothing).

4. `perf(core): eliminate per-call annotation walk in _filter_injected_args`

_filter_injected_args (called on every run() and arun()) was walking get_all_basemodel_annotations(self.args_schema) on every invocation, even though _injected_args_keys (a cached_property on StructuredTool) already holds the same set. Simplified to:

filtered_keys = set(FILTERED_ARGS) | self._injected_args_keys
return {k: v for k, v in tool_input.items() if k not in filtered_keys}

Subclasses that define injected args via args_schema annotations (not via function signature) must override _injected_args_keys. StructuredTool does this automatically. Docstring updated to document this contract.

5. `perf(core): cache tool_call_schema, args, and inferred input schema with invalidation`

tool_call_schema and args were uncached @property instances, rebuilding a Pydantic model and running model_json_schema() on every access. get_input_schema called create_schema_from_function on every access when args_schema was None.

All three converted to @functools.cached_property. A __setattr__ override clears the cache when name, description, or args_schema change:

_SCHEMA_INVALIDATING_FIELDS = frozenset({"args_schema", "name", "description"})

def __setattr__(self, name, value):
    if name in self._SCHEMA_INVALIDATING_FIELDS:
        self.__dict__.pop("tool_call_schema", None)
        self.__dict__.pop("args", None)
        self.__dict__.pop("_inferred_input_schema", None)
        self.__dict__.pop("_approximate_schema_chars", None)
    super().__setattr__(name, value)

cached_property works on Pydantic v2 models because they have a writable __dict__ — confirmed by the existing _injected_args_keys cached_property on BaseTool.

This commit deserves careful review. The __setattr__ invalidation is the load-bearing mechanism. If a subclass sets name/description/args_schema outside of __init__ and bypasses __setattr__, caches will go stale. This is the correct behavior (mutation should go through __setattr__), but reviewers should verify edge cases.

6. `perf(core): cache schema char-count on BaseTool for token estimation`

count_tokens_approximately in messages/utils.py was calling convert_to_openai_tool(tool) — a full schema rebuild and OpenAI-format conversion — just to count characters for token estimation. Two changes:

On BaseTool: a new _approximate_schema_chars: int cached property that serializes the neutral tool payload (name + description + raw schema dict) to JSON once and caches the char count. Invalidated by the same __setattr__ hook from commit 5.

On count_tokens_approximately: a new tool_format: str = "openai" parameter. BaseTool instances now use _approximate_schema_chars + offset where offset comes from:

_TOOL_FORMAT_OFFSETS = {
    "openai": 32,    # {"type":"function","function":{...}} envelope
    "anthropic": 0,  # flat form ≈ neutral
}

Default is "openai" to preserve existing numeric behavior. Chat models that use Anthropic's wire format can pass tool_format="anthropic".

7. `refactor(core): replace deprecated validate_arguments in create_schema_from_function`

create_schema_from_function used pydantic.validate_arguments (deprecated in Pydantic v2) and pydantic.v1.validate_arguments as a proxy to build a Pydantic model from a function signature. Replaced with inspect.signature + pydantic.create_model directly.

Key behavioral changes:

_SchemaConfig, _function_annotations_are_pydantic_v1, _is_pydantic_annotation removed — no longer needed
validate_arguments and validate_arguments_v1 imports removed
Functions annotated with pydantic.v1.BaseModel parameter types still work, but v1 types are treated as Any in the generated schema. Dict-to-v1-model coercion is no longer supported — callers must pass v1 model instances directly. This is a known behavior change noted in tests.
Mixed v1/v2 annotated functions still raise NotImplementedError (detection preserved)
*args/**kwargs parameters still produce args/kwargs fields in the schema to preserve is_single_input behavior

_convert_any_typed_dicts_to_pydantic in function_calling.py was intentionally not changed in this PR — switching that from v1 to v2 create_model changes the JSON schema output format (v2 adds title to nested models) in ways that affect OpenAI schema compatibility. See the future work section below.

This commit also deserves careful review — it is the most behavior-sensitive change in the stack.

What was explicitly not done

User-facing pydantic.v1 acceptance: BaseModelV1 isinstance checks in _parse_input, tool_call_schema, and output parsers are unchanged. Tools with pydantic.v1.BaseModel as args_schema continue to work.
_convert_any_typed_dicts_to_pydantic: Kept on the v1 path to avoid schema format drift. See future work.
ToolSchema abstraction: Deferred — see future work.

Future work

Two architectural improvements identified during this work are deferred to a follow-on branch:

ToolSchema dataclass — inspired by pydantic-ai's FunctionSchema, a central object that owns both schema generation and validation:

@dataclass
class ToolSchema:
    name: str
    description: str
    validator: TypeAdapter   # validates tool call inputs
    json_schema: dict        # pre-computed, sent to LLMs

This would replace the split between args_schema (model class), tool_call_schema (schema for LLMs), and _parse_input (ad-hoc validation). Partners could consume ToolSchema directly.

TypeAdapter-based TypedDict conversion — _convert_any_typed_dicts_to_pydantic creates a throwaway pydantic.v1 model from a TypedDict just to call .schema(). The correct v2 approach is TypeAdapter(typed_dict).json_schema(schema_generator=GenerateToolJsonSchema) where GenerateToolJsonSchema strips title fields from properties (the same pattern pydantic-ai uses). This requires updating expected schemas in tests to match v2 format.

These two items belong together: ToolSchema would own the TypeAdapter-based schema generation, making _convert_any_typed_dicts_to_pydantic a natural removal target.

AI-agent assisted contribution.

…rgs`

…ma with invalidation

…hema_from_function`

codspeed-hq · 2026-04-30T15:06:52Z

Merging this PR will not alter performance

✅ 13 untouched benchmarks
⏩ 2 skipped benchmarks¹

_{Comparing perf/tool-schema-refactor (dc7a009) with master (cc5a537)}

2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

…lass` in `create_schema_from_function`

…te_schema_from_function`

…dDict conversion with `TypeAdapter` (#37103) Builds on #37101. --- Two changes in one commit, both motivated by the same principle: a single, clean owner for everything schema-related on a tool. ## `ToolSchema` — the root cache Previously `BaseTool` had three independent `cached_property` slots (`tool_call_schema`, `args`, `_approximate_schema_chars`) that all computed overlapping data and each needed individual invalidation. This PR replaces them with a single `ToolSchema` dataclass and one `tool_schema` cached property that is the sole root: ```python @DataClass class ToolSchema: name: str description: str validator: TypeAdapter # validates tool call inputs json_schema: dict # sent to LLMs pydantic_schema: Any # model class or dict (backward compat) args: dict # properties from json_schema approximate_chars: int # precomputed for token estimation ``` `BaseTool.tool_call_schema`, `BaseTool.args`, and `BaseTool._approximate_schema_chars` are now plain `@property` delegates to `tool_schema`. `__setattr__` only needs to pop one key on mutation instead of four. The `is`-identity caching tests still pass because all delegates read from the same cached `ToolSchema` object. `ToolSchema` is exported from `langchain_core.tools` and can be used directly by integrations that want to consume both the validator and the schema without going through `BaseTool`. ## `TypeAdapter`-based TypedDict conversion `_convert_any_typed_dicts_to_pydantic` was a ~70-line recursive function that converted TypedDicts to throwaway pydantic v1 model classes just to call `.schema()`. Replaced with: ```python adapter = TypeAdapter(typed_dict) schema = adapter.json_schema() ``` Pydantic v2's `TypeAdapter` handles everything the old code did — nested TypedDicts, generic containers, `Annotated` metadata — and also correctly handles `NotRequired` and `Required` annotations, which the v1 path did not. A new test `test__convert_typed_dict_not_required` verifies this: ```python class Tool(TypedDict): required_field: str optional_field: NotRequired[int] result = _convert_typed_dict_to_openai_function(Tool) assert "required_field" in result["parameters"]["required"] assert "optional_field" not in result["parameters"]["required"] ``` Field descriptions from Google-style docstrings and `Annotated[T, ..., "description"]` metadata are preserved by post-processing the schema after generation. The old `test__convert_typed_dict_to_openai_function_fail` test expected a `TypeError` for `MutableSet` because pydantic v1 didn't support it. pydantic v2 does; the test is updated to verify successful conversion instead. ## What stays unchanged - All public `BaseTool` API signatures — `tool_call_schema`, `args`, `get_input_schema()` all have the same signatures and return types as before. - `pydantic.v1` acceptance for `args_schema` — tools with v1 model schemas continue to work. > AI-agent assisted contribution. --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Eugene Yurtsev (eyurtsev) · 2026-05-01T14:27:45Z

@@ -0,0 +1,54 @@
+"""Schema dataclass for LangChain tool definitions."""


I think this is duplicated with the other PR

https://github.com/langchain-ai/langchain/pull/37103/changes#diff-34384c4646d4c37adbff1d96cb643065507ab1e2d91634537512e2905e081988

I left some comments on the actual contents of the dataclass

Eugene Yurtsev (eyurtsev) · 2026-05-01T14:27:53Z

+    from pydantic import TypeAdapter
+
+
+@dataclass


(slots=True, frozen=True?)

Sydney Runkle (sydney-runkle) added 8 commits April 30, 2026 10:24

refactor(core): remove dead _get_filtered_args function

7ff6c96

perf(core): memoize get_all_basemodel_annotations with lru_cache

6c719df

perf(core): deduplicate annotation walk in _parse_input

0f0171b

perf(core): eliminate per-call annotation walk in `_filter_injected_a…

ffa515c

…rgs`

perf(core): cache tool_call_schema, args, and inferred input sche…

9042ab9

…ma with invalidation

perf(core): cache schema char-count on BaseTool for token estimation

1c649cd

refactor(core): replace deprecated validate_arguments in `create_sc…

fbbbb06

…hema_from_function`

chore: ignore .worktrees/ directory

7da4279

Sydney Runkle (sydney-runkle) requested a review from Eugene Yurtsev (eyurtsev) as a code owner April 30, 2026 15:02

github-actions Bot added core `langchain-core` package issues & PRs internal performance size: L 500-999 LOC labels Apr 30, 2026

chore(core): fix lint and format errors from schema refactor

0a16615

fix(core): resolve annotations via get_type_hints and guard `issubc…

18a5a76

…lass` in `create_schema_from_function`

Sydney Runkle (sydney-runkle) mentioned this pull request Apr 30, 2026

feat(core): introduce ToolSchema as root schema cache; replace TypedDict conversion with TypeAdapter #37103

Merged

Sydney Runkle (sydney-runkle) added 2 commits April 30, 2026 13:43

fix(core): use non-string cast to fix mypy no-any-return in `crea…

c8df178

…te_schema_from_function`

fix(core): add quotes to cast to satisfy ruff TC006

0fc85dc

Sydney Runkle (sydney-runkle) mentioned this pull request Apr 30, 2026

perf(sdk): DeltaChannel + add_messages fast-path + no-inline Sends langchain-ai/deepagents#2910

Closed

4 tasks

Sydney Runkle (sydney-runkle) and others added 2 commits April 30, 2026 15:15

chore(core): move test imports to top-level to satisfy ruff PLC0415

c832f7c

github-actions Bot added size: XL 1000+ LOC and removed size: L 500-999 LOC labels May 1, 2026

Eugene Yurtsev (eyurtsev) reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(core): eliminate redundant schema computation in tool infrastructure#37101

perf(core): eliminate redundant schema computation in tool infrastructure#37101
Sydney Runkle (sydney-runkle) wants to merge 14 commits into
masterfrom
perf/tool-schema-refactor

Sydney Runkle (sydney-runkle) commented Apr 30, 2026

Uh oh!

codspeed-hq Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Eugene Yurtsev (eyurtsev) May 1, 2026

Uh oh!

Eugene Yurtsev (eyurtsev) May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,54 @@
		"""Schema dataclass for LangChain tool definitions."""

Conversation

Sydney Runkle (sydney-runkle) commented Apr 30, 2026

Motivation

Commit-by-commit summary

1. refactor(core): remove dead _get_filtered_args

2. perf(core): memoize get_all_basemodel_annotations with lru_cache

3. perf(core): deduplicate annotation walk in _parse_input

4. perf(core): eliminate per-call annotation walk in _filter_injected_args

5. perf(core): cache tool_call_schema, args, and inferred input schema with invalidation

6. perf(core): cache schema char-count on BaseTool for token estimation

7. refactor(core): replace deprecated validate_arguments in create_schema_from_function

What was explicitly not done

Future work

Uh oh!

codspeed-hq Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

Eugene Yurtsev (eyurtsev) May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Eugene Yurtsev (eyurtsev) May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `refactor(core): remove dead _get_filtered_args`

2. `perf(core): memoize get_all_basemodel_annotations with lru_cache`

3. `perf(core): deduplicate annotation walk in _parse_input`

4. `perf(core): eliminate per-call annotation walk in _filter_injected_args`

5. `perf(core): cache tool_call_schema, args, and inferred input schema with invalidation`

6. `perf(core): cache schema char-count on BaseTool for token estimation`

7. `refactor(core): replace deprecated validate_arguments in create_schema_from_function`

codspeed-hq Bot commented Apr 30, 2026 •

edited

Loading