perf(core): eliminate redundant schema computation in tool infrastructure#37101
Open
Sydney Runkle (sydney-runkle) wants to merge 14 commits into
Open
perf(core): eliminate redundant schema computation in tool infrastructure#37101Sydney Runkle (sydney-runkle) wants to merge 14 commits into
Sydney Runkle (sydney-runkle) wants to merge 14 commits into
Conversation
…ma with invalidation
…hema_from_function`
Merging this PR will not alter performance
Comparing Footnotes
|
…lass` in `create_schema_from_function`
Closed
4 tasks
…dDict conversion with `TypeAdapter` (#37103) Builds on #37101. --- Two changes in one commit, both motivated by the same principle: a single, clean owner for everything schema-related on a tool. ## `ToolSchema` — the root cache Previously `BaseTool` had three independent `cached_property` slots (`tool_call_schema`, `args`, `_approximate_schema_chars`) that all computed overlapping data and each needed individual invalidation. This PR replaces them with a single `ToolSchema` dataclass and one `tool_schema` cached property that is the sole root: ```python @DataClass class ToolSchema: name: str description: str validator: TypeAdapter # validates tool call inputs json_schema: dict # sent to LLMs pydantic_schema: Any # model class or dict (backward compat) args: dict # properties from json_schema approximate_chars: int # precomputed for token estimation ``` `BaseTool.tool_call_schema`, `BaseTool.args`, and `BaseTool._approximate_schema_chars` are now plain `@property` delegates to `tool_schema`. `__setattr__` only needs to pop one key on mutation instead of four. The `is`-identity caching tests still pass because all delegates read from the same cached `ToolSchema` object. `ToolSchema` is exported from `langchain_core.tools` and can be used directly by integrations that want to consume both the validator and the schema without going through `BaseTool`. ## `TypeAdapter`-based TypedDict conversion `_convert_any_typed_dicts_to_pydantic` was a ~70-line recursive function that converted TypedDicts to throwaway pydantic v1 model classes just to call `.schema()`. Replaced with: ```python adapter = TypeAdapter(typed_dict) schema = adapter.json_schema() ``` Pydantic v2's `TypeAdapter` handles everything the old code did — nested TypedDicts, generic containers, `Annotated` metadata — and also correctly handles `NotRequired` and `Required` annotations, which the v1 path did not. A new test `test__convert_typed_dict_not_required` verifies this: ```python class Tool(TypedDict): required_field: str optional_field: NotRequired[int] result = _convert_typed_dict_to_openai_function(Tool) assert "required_field" in result["parameters"]["required"] assert "optional_field" not in result["parameters"]["required"] ``` Field descriptions from Google-style docstrings and `Annotated[T, ..., "description"]` metadata are preserved by post-processing the schema after generation. The old `test__convert_typed_dict_to_openai_function_fail` test expected a `TypeError` for `MutableSet` because pydantic v1 didn't support it. pydantic v2 does; the test is updated to verify successful conversion instead. ## What stays unchanged - All public `BaseTool` API signatures — `tool_call_schema`, `args`, `get_input_schema()` all have the same signatures and return types as before. - `pydantic.v1` acceptance for `args_schema` — tools with v1 model schemas continue to work. > AI-agent assisted contribution. --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
| @@ -0,0 +1,54 @@ | |||
| """Schema dataclass for LangChain tool definitions.""" | |||
Collaborator
There was a problem hiding this comment.
I think this is duplicated with the other PR
I left some comments on the actual contents of the dataclass
| from pydantic import TypeAdapter | ||
|
|
||
|
|
||
| @dataclass |
Collaborator
There was a problem hiding this comment.
(slots=True, frozen=True?)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a stack of 7 focused commits targeting redundant schema computation in
langchain-core's tool infrastructure. Each commit is independently reviewable. No public API signatures change; all improvements are internal.Motivation
Every tool invocation in an agent loop triggers several redundant operations: repeated annotation walks over the same class hierarchy, repeated Pydantic model construction for the same schema, and full OpenAI-format schema conversion just to count characters for token estimation. At scale — hundreds of tool calls, dozens of tools — this is measurable overhead. These commits eliminate the redundancy without changing any public interfaces.
Commit-by-commit summary
1.
refactor(core): remove dead _get_filtered_args_get_filtered_argshad zero call sites in the codebase. Removed.2.
perf(core): memoize get_all_basemodel_annotations with lru_cacheget_all_basemodel_annotationswalks a class MRO on every call and is on every hot path (schema generation, injection detection, input parsing). Added@functools.lru_cache(maxsize=512). The function is pure given(cls, default_to_bound)— both are hashable. The returned dict is now shared; callers must not mutate it (none did).Signature change:
default_to_boundmoves from keyword-only to positional (required forlru_cachecompatibility below Python 3.12). Internal recursive call sites updated accordingly. This is a public function — the signature change is noted here for review.3.
perf(core): deduplicate annotation walk in _parse_input_parse_inputcalledget_all_basemodel_annotationsonce per branch of anif issubclass(…, BaseModel) / elif issubclass(…, BaseModelV1)block — the same call, duplicated in code. Hoisted above the branch. Also added a short-circuit: if_injected_args_keysis empty, the annotation walk is skipped entirely (tools with no injected args pay nothing).4.
perf(core): eliminate per-call annotation walk in _filter_injected_args_filter_injected_args(called on everyrun()andarun()) was walkingget_all_basemodel_annotations(self.args_schema)on every invocation, even though_injected_args_keys(acached_propertyonStructuredTool) already holds the same set. Simplified to:Subclasses that define injected args via
args_schemaannotations (not via function signature) must override_injected_args_keys.StructuredTooldoes this automatically. Docstring updated to document this contract.5.
perf(core): cache tool_call_schema, args, and inferred input schema with invalidationtool_call_schemaandargswere uncached@propertyinstances, rebuilding a Pydantic model and runningmodel_json_schema()on every access.get_input_schemacalledcreate_schema_from_functionon every access whenargs_schemawasNone.All three converted to
@functools.cached_property. A__setattr__override clears the cache whenname,description, orargs_schemachange:cached_propertyworks on Pydantic v2 models because they have a writable__dict__— confirmed by the existing_injected_args_keyscached_property onBaseTool.This commit deserves careful review. The
__setattr__invalidation is the load-bearing mechanism. If a subclass setsname/description/args_schemaoutside of__init__and bypasses__setattr__, caches will go stale. This is the correct behavior (mutation should go through__setattr__), but reviewers should verify edge cases.6.
perf(core): cache schema char-count on BaseTool for token estimationcount_tokens_approximatelyinmessages/utils.pywas callingconvert_to_openai_tool(tool)— a full schema rebuild and OpenAI-format conversion — just to count characters for token estimation. Two changes:On
BaseTool: a new_approximate_schema_chars: intcached property that serializes the neutral tool payload (name + description + raw schema dict) to JSON once and caches the char count. Invalidated by the same__setattr__hook from commit 5.On
count_tokens_approximately: a newtool_format: str = "openai"parameter.BaseToolinstances now use_approximate_schema_chars + offsetwhereoffsetcomes from:Default is
"openai"to preserve existing numeric behavior. Chat models that use Anthropic's wire format can passtool_format="anthropic".7.
refactor(core): replace deprecated validate_arguments in create_schema_from_functioncreate_schema_from_functionusedpydantic.validate_arguments(deprecated in Pydantic v2) andpydantic.v1.validate_argumentsas a proxy to build a Pydantic model from a function signature. Replaced withinspect.signature+pydantic.create_modeldirectly.Key behavioral changes:
_SchemaConfig,_function_annotations_are_pydantic_v1,_is_pydantic_annotationremoved — no longer neededvalidate_argumentsandvalidate_arguments_v1imports removedpydantic.v1.BaseModelparameter types still work, but v1 types are treated asAnyin the generated schema. Dict-to-v1-model coercion is no longer supported — callers must pass v1 model instances directly. This is a known behavior change noted in tests.NotImplementedError(detection preserved)*args/**kwargsparameters still produceargs/kwargsfields in the schema to preserveis_single_inputbehavior_convert_any_typed_dicts_to_pydanticinfunction_calling.pywas intentionally not changed in this PR — switching that from v1 to v2create_modelchanges the JSON schema output format (v2 addstitleto nested models) in ways that affect OpenAI schema compatibility. See the future work section below.This commit also deserves careful review — it is the most behavior-sensitive change in the stack.
What was explicitly not done
pydantic.v1acceptance:BaseModelV1isinstance checks in_parse_input,tool_call_schema, and output parsers are unchanged. Tools withpydantic.v1.BaseModelasargs_schemacontinue to work._convert_any_typed_dicts_to_pydantic: Kept on the v1 path to avoid schema format drift. See future work.ToolSchemaabstraction: Deferred — see future work.Future work
Two architectural improvements identified during this work are deferred to a follow-on branch:
ToolSchemadataclass — inspired by pydantic-ai'sFunctionSchema, a central object that owns both schema generation and validation:This would replace the split between
args_schema(model class),tool_call_schema(schema for LLMs), and_parse_input(ad-hoc validation). Partners could consumeToolSchemadirectly.TypeAdapter-based TypedDict conversion —_convert_any_typed_dicts_to_pydanticcreates a throwawaypydantic.v1model from a TypedDict just to call.schema(). The correct v2 approach isTypeAdapter(typed_dict).json_schema(schema_generator=GenerateToolJsonSchema)whereGenerateToolJsonSchemastripstitlefields from properties (the same pattern pydantic-ai uses). This requires updating expected schemas in tests to match v2 format.These two items belong together:
ToolSchemawould own theTypeAdapter-based schema generation, making_convert_any_typed_dicts_to_pydantica natural removal target.