Workaround: route NuGet RestoreTask to transient TaskHost in server or mt modes#13660
Workaround: route NuGet RestoreTask to transient TaskHost in server or mt modes#13660OvesN wants to merge 12 commits into
Conversation
Workaround for static singleton state issues in NuGet RestoreTask (e.g., PluginManager, EnvironmentWrapper) that persist across builds when running in sidecar TaskHost processes. When /mt mode or MSBuild server (MSBUILDUSESERVER=1) is active, RestoreTask is now forced to run in a transient (non-sidecar) TaskHost that terminates after execution, ensuring all static state is cleaned up. Changes: - TaskRouter: Add IsKnownProblematicTask() to identify tasks by full name - AssemblyTaskFactory: Force transient TaskHost for problematic tasks - Tests: Add unit and integration tests for the workaround Fixes dotnet#13315 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…alue of MSBUILDUSESERVER. Add logging for task host spawning details.
9793967 to
a1ce582
Compare
|
/review |
|
✅ Expert Code Review (command) completed successfully! |
There was a problem hiding this comment.
Pull request overview
Implements a workaround to isolate NuGet’s RestoreTask (which relies on process-wide static singletons) by forcing it onto a transient TaskHost when running under MSBuild Server mode or /mt, preventing static state from leaking across builds/invocations.
Changes:
- Add an internal “original server mode” env var (
_MSBUILDORIGINALUSESERVER) to preserve server-mode detection through environment snapshotting. - Route allow-listed “known problematic” tasks (currently
NuGet.Build.Tasks.RestoreTask) to a non-sidecar (transient) TaskHost in/mtor server mode. - Add TaskHost diagnostic logging and integration tests validating routing + per-invocation process isolation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Framework/Traits.cs | Adds _MSBUILDORIGINALUSESERVER env var name and trait for detecting server-mode launch context. |
| src/Build/BackEnd/Components/Communications/NodeLauncher.cs | Stashes/restores the original MSBUILDUSESERVER value into the new internal env var around child process creation. |
| src/Build/BackEnd/Node/OutOfProcServerNode.cs | Preserves the internal env var across SetEnvironment(...) so server node can still detect original server-mode intent. |
| src/Build/BackEnd/Components/RequestBuilder/TaskRouter.cs | Introduces allow-list logic to identify “known problematic” tasks by full type name. |
| src/Build/Instance/TaskFactories/AssemblyTaskFactory.cs | Forces problematic tasks to run in TaskHost and disables sidecar reuse in /mt or server mode. |
| src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcTaskHost.cs | Extends host acquisition to report host PID / creation status for diagnostics. |
| src/Build/Instance/TaskFactories/TaskHostTask.cs | Logs per-invocation TaskHost details (PID, reuse, etc.) into the build log/binlog. |
| src/Build.UnitTests/BackEnd/TaskRouter_IntegrationTests.cs | Adds integration tests for problematic-task routing and “fresh process per invocation” behavior. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Expert Review — 24-Dimension Analysis
Summary
| # | Dimension | Verdict |
|---|---|---|
| 1 | Backward Compatibility | ✅ LGTM (test-only) |
| 2 | ChangeWave Discipline | ✅ LGTM (test-only) |
| 3 | Performance | ✅ LGTM (test-only) |
| 4 | Allocation Awareness | ✅ LGTM (test-only) |
| 5 | Test Coverage | ✅ LGTM — covers MT mode, server mode, fresh-process guarantee, and no-workaround fallback |
| 6 | Error Message Quality | ✅ LGTM (test-only) |
| 7 | Logging Fidelity | ✅ LGTM (test-only) |
| 8 | String Comparison | ✅ LGTM — regex matches ASCII digits only, ShouldContain is ordinal |
| 9 | API Surface | ✅ LGTM (test-only) |
| 10 | Target Authoring | ✅ LGTM (test-only) |
| 11 | Cross-Platform | ✅ LGTM — uses Path.Combine, Process.GetCurrentProcess().Id, standard APIs |
| 12 | Code Simplification | ✅ LGTM — boilerplate duplication is acceptable for test readability |
| 13 | Concurrency | ✅ LGTM — xunit.runner.json enforces parallelizeTestCollections: false / maxParallelThreads: 1 |
| 14 | Naming Precision | ✅ LGTM — names are descriptive and consistent with existing tests |
| 15 | SDK Integration | ✅ LGTM (test-only) |
| 16 | Evaluation Model | ✅ LGTM (test-only) |
| 17 | Correctness | ✅ LGTM — fake RestoreTask FullName matches TaskRouter.IsKnownProblematicTask check; Build() overload is self-contained |
| 18 | Documentation Accuracy | 📝 NIT — see inline comment |
| 19 | Dependency Management | ✅ LGTM (test-only) |
| 20 | Scope Discipline | ✅ LGTM — tests + comment cleanup in the same PR is reasonable |
| 21 | Security | ✅ LGTM (test-only) |
| 22 | Build Infrastructure | ✅ LGTM (test-only) |
| 23 | Binary Log Compatibility | ✅ LGTM (test-only) |
| 24 | Error Handling | ✅ LGTM (test-only) |
Findings: 1 NIT
One nit on the server-mode test's EnableNodeReuse comment — // Load-bearing: see the /mt counterpart. is a fragile cross-reference that could become a dead pointer. Consider duplicating the 1-line explanation inline. Details in the inline comment.
The comment cleanup is well-executed overall: removed XML docs restated what the method name already says, // Arrange/// Act/// Assert markers were noise for these straightforward tests, and the condensed // Load-bearing: comments on the MT test preserve the non-obvious reasoning. The tests themselves are correct and well-structured.
Note
🔒 Integrity filter blocked 2 items
The following items were blocked because they don't meet the GitHub integrity level.
- #13660
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved". - #13660
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
To allow these resources, lower min-integrity in your GitHub frontmatter:
tools:
github:
min-integrity: approved # merged | approved | unapproved | noneGenerated by Expert Code Review (command) for issue #13660 · ● 9.7M
…OvesN/msbuild into workaround-restore-issues-in-mt
Fixes #13315
Context
NuGet's RestoreTask holds static singletons (PluginManager, EnvironmentVariableWrapper) that assume
one invocation per process. Two MSBuild modes break that assumption:
builds back-to-back.
In both cases NuGet's static state survives past its intended scope.
This PR routes RestoreTask to a transient TaskHost (nodeReuse=false) when either mode is active, so
the spawned MSBuild.exe exits after Execute() and statics die with it
Changes Made
Why the original attempt didn't work
Original commit.
The server-mode trigger read
Traits.Instance.UseMSBuildServer, which checksMSBUILDUSESERVER. That env var is0/nullin the worker process where tasks run,stripped by:
NodeLauncher.DisableMSBuildServerzeroes it before spawning the Server child(recursion guard).
OutOfProcServerNode.HandleServerNodeBuildCommandoverwrites the server's env fromthe client's snapshot, which doesn't include MSBuild internals.
Net effect: the workaround branch never fired.
What this PR does
(src/Build/BackEnd/Components/Host/IHostInfo.cs) with two impls:
TransientHostInfo (default, registered by BuildComponentFactoryCollection)
and LongLivedServerHostInfo (registered once at startup by
OutOfProcServerNode.Run).
TaskRouter.IsKnownProblematicTask — currently the single entry
NuGet.Build.Tasks.RestoreTask.
is on the allow-list AND (/mt mode OR the registered IHostInfo reports
IsLongLivedHost == true), force useTaskFactory = true and
forceTransientTaskHost = true. The host lookup is null- and
missing-component-safe (returns false for hosts that don't register
IHostInfo).
Diagnostic logging added
Per-invocation TaskHost diagnostics in
TaskHostTask.Execute(low importance, capturedin binlog) — complements the existing
ExecutingTaskInTaskHostmessage. RecordsProcessId,ParentProcessId,NewNodeContext,IsSidecar,NodeReuseEffective.Useful when investigating any TaskHost-routing question.
Testing
Unit tests in
src/Build.UnitTests/BackEnd/TaskRouter_IntegrationTests.cs:Manual end-to-end with
dotnet restore App.csproj /bl:r.binlog:With
DOTNET_CLI_USE_MSBUILD_SERVER=1(server mode) and again with/mt, opened thebinlog and confirmed two
TaskHost details for task "RestoreTask"lines withIsSidecar=False,NodeReuseEffective=False, and differentProcessIdbetweenmultiple dotnet restore invocations.
End-to-end repro — manual two-PAT scenario against the private
DevDiv
vssdkNuGet feed, exercising NuGet's staticEnvironmentWrapper/PluginManagerover a single long-lived MSBuild Server process. Onmainthe second restore reproduces the bug (
401 Unauthorized); with this PRit succeeds. Details and steps in the PR comment below.
Notes