Linus Torvalds
@torvalds
Joined on 03 September 2011
GitHub Stats
247294
Followers
9
Repositories
1
Organizations
1
Gists
85
Pull Requests
7
Issues
3074
Commits
0
Sponsors
1
Contributed To
207023
Star Earned
Most Used Languages
98.21%
C
0.70%
Assembly
0.39%
Shell
0.25%
Python
0.20%
Makefile
0.12%
Rust
0.09%
Perl
0.01%
Roff
Popular Projects
linux
Linux kernel source tree
C
202413
57784
0
1244
uemacs
Random version of microemacs with my private modificatons
C
1571
268
22
24
test-tlb
Stupid memory latency and TLB tester
C
817
211
10
13
GuitarPedal
No description
Unknown
512
15
3
2
pesconvert
Brother PES file converter
C
448
68
3
6
1590A
Random odd guitar pedal design in kicad
OpenSCAD
427
19
4
9
Top Contributions
Top contributions made by the user in the last year.
Charts
Follow Up
Activity Graph
Contributions Calendar
Contributions made by the user in the last 365 days.
Recent Activity
9/19/2025, 8:55:04 PM
- ASoC: rt712: avoid skipping the blind write Some devices might not use the DMIC function of the RT712VB. Therefore, this patch avoids skipping the blind write with RT712VB. Signed-off-by: Shuming Fan <shumingf@realtek.com> Link: https://patch.msgid.link/20250901085757.1287945-1-shumingf@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: wm8940: Correct PLL rate rounding Using a single value of 22500000 for both 48000Hz and 44100Hz audio will sometimes result in returning wrong dividers due to rounding. Update the code to use the actual value for both. Fixes: 294833fc9eb4 ("ASoC: wm8940: Rewrite code to set proper clocks") Reported-by: Ankur Tyagi <ankur.tyagi85@gmail.com> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Tested-by: Ankur Tyagi <ankur.tyagi85@gmail.com> Link: https://patch.msgid.link/20250821082639.1301453-2-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: wm8940: Correct typo in control name Fixes: 0b5e92c5e020 ("ASoC WM8940 Driver") Reported-by: Ankur Tyagi <ankur.tyagi85@gmail.com> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Tested-by: Ankur Tyagi <ankur.tyagi85@gmail.com> Link: https://patch.msgid.link/20250821082639.1301453-3-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: wm8974: Correct PLL rate rounding Using a single value of 22500000 for both 48000Hz and 44100Hz audio will sometimes result in returning wrong dividers due to rounding. Update the code to use the actual value for both. Fixes: 51b2bb3f2568 ("ASoC: wm8974: configure pll and mclk divider automatically") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20250821082639.1301453-4-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: codec: sma1307: Fix memory corruption in sma1307_setting_loaded() The sma1307->set.header_size is how many integers are in the header (there are 8 of them) but instead of allocating space of 8 integers we allocate 8 bytes. This leads to memory corruption when we copy data it on the next line: memcpy(sma1307->set.header, data, sma1307->set.header_size * sizeof(int)); Also since we're immediately copying over the memory in ->set.header, there is no need to zero it in the allocator. Use devm_kmalloc_array() to allocate the memory instead. Fixes: 576c57e6b4c1 ("ASoC: sma1307: Add driver for Iron Device SMA1307") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/aLGjvjpueVstekXP@stanley.mountain Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: amd: acp: Adjust pdm gain value Set pdm gain value by setting PDM_MISC_CTRL_MASK value. To avoid low pdm gain value. Signed-off-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/20250821054606.1279178-1-venkataprasad.potturu@amd.com Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: SDCA: Add quirk for incorrect function types for 3 systems Certain systems have CS42L43 DisCo that claims to conform to version 0.6.28 but uses the function types from the 1.0 spec. Add a quirk as a workaround. Closes: https://github.com/thesofproject/linux/issues/5515 Cc: stable@vger.kernel.org Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev> Link: https://patch.msgid.link/20250901151518.3197941-1-mstrozek@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: SOF: imx: Fix devm_ioremap_resource check devm_ioremap_resource does not return NULL on error but an error pointer so we need to use IS_ERR to check the return code. While at it also pass the error code to dev_err_probe to improve logging. Fixes: bc163baef570 ("ASoC: Use of_reserved_mem_region_to_resource() for "memory-region"") Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://patch.msgid.link/20250902102101.378809-1-daniel.baluta@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: SOF: Intel: hda-stream: Fix incorrect variable used in error message The dev_err message is reporting an error about capture streams however it is using the incorrect variable num_playback instead of num_capture. Fix this by using the correct variable num_capture. Fixes: a1d1e266b445 ("ASoC: SOF: Intel: Add Intel specific HDA stream operations") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://patch.msgid.link/20250902120639.2626861-1-colin.i.king@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
- Minor bug fixes for some older Wolfson devices Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>: Minor bug fixes for a couple of older devices reported by some users. Mostly this centers around the automatic PLL configuration getting the wrong values due to rounding.
- ALSA: docs: Remove 3rd person singular s in *to indicate* Fixes: 78811dd56def ("ALSA: docs: Add documents for recently changes in snd-usb-audio") Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://patch.msgid.link/20250903100842.267194-1-pmenzel@molgen.mpg.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
- ASoC: amd: amd_sdw: Add quirks for some new Dell laptops Add a quirk to include the codec amplifier function for Dell SKU's listed in quirk table. Note: In these SKU's, the RT722 codec amplifier is excluded, and an external amplifier is used instead. Signed-off-by: Syed Saba Kareem <syed.sabakareem@amd.com> Message-ID: <20250903171817.2549507-1-syed.sabakareem@amd.com> Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: qcom: q6apm-lpass-dais: Fix NULL pointer dereference if source graph failed If earlier opening of source graph fails (e.g. ADSP rejects due to incorrect audioreach topology), the graph is closed and "dai_data->graph[dai->id]" is assigned NULL. Preparing the DAI for sink graph continues though and next call to q6apm_lpass_dai_prepare() receives dai_data->graph[dai->id]=NULL leading to NULL pointer exception: qcom-apm gprsvc:service:2:1: Error (1) Processing 0x01001002 cmd qcom-apm gprsvc:service:2:1: DSP returned error[1001002] 1 q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: fail to start APM port 78 q6apm-lpass-dais 30000000.remoteproc:glink-edge:gpr:service@1:bedais: ASoC: error at snd_soc_pcm_dai_prepare on TX_CODEC_DMA_TX_3: -22 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8 ... Call trace: q6apm_graph_media_format_pcm+0x48/0x120 (P) q6apm_lpass_dai_prepare+0x110/0x1b4 snd_soc_pcm_dai_prepare+0x74/0x108 __soc_pcm_prepare+0x44/0x160 dpcm_be_dai_prepare+0x124/0x1c0 Fixes: 30ad723b93ad ("ASoC: qdsp6: audioreach: add q6apm lpass dai support") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Message-ID: <20250904101849.121503-2-krzysztof.kozlowski@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: SDCA: Fix return value in sdca_regmap_mbq_size() The MBQ size function returns an integer representing the size of a Control. Currently if the Control is not found the function will return false which makes little sense. Correct this typo to return -EINVAL. Fixes: e3f7caf74b79 ("ASoC: SDCA: Add generic regmap SDCA helpers") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Message-ID: <20250820163717.1095846-2-ckeepax@opensource.cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: SDCA: Fix return value in detected_mode_handler() The detected mode IRQ handler should return an irqreturn_t not a regular error code. Correct the return value in detected_mode_handler(). Fixes: b9ab3b618241 ("ASoC: SDCA: Add some initial IRQ handlers") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Message-ID: <20250820163717.1095846-3-ckeepax@opensource.cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: SDCA: Reorder members of hide struct to remove holes Remove some padding holes in the sdca_entity_hide struct by reordering the members. Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Message-ID: <20250820163717.1095846-4-ckeepax@opensource.cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: codecs: lpass-rx-macro: Fix playback quality distortion Commit bb4a0f497bc1 ("ASoC: codecs: lpass: Drop unused AIF_INVALID first DAI identifier") removed first entry in enum with DAI identifiers, because it looked unused. Turns out that there is a relation between DAI ID and "RX_MACRO RX0 MUX"-like kcontrols which use "rx_macro_mux_text" array. That "rx_macro_mux_text" array used first three entries of DAI IDs enum, with value '0' being invalid. The value passed tp "RX_MACRO RX0 MUX"-like kcontrols was used as DAI ID and set to configure active channel count and mask, which are arrays indexed by DAI ID. After removal of first AIF_INVALID DAI identifier, this kcontrol was updating wrong entries in active channel count and mask arrays which was visible in reduced quality (distortions) during headset playback on the Qualcomm SM8750 MTP8750 board. It seems it also fixes recording silence (instead of actual sound) via headset, even though that's different macro codec. Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Fixes: bb4a0f497bc1 ("ASoC: codecs: lpass: Drop unused AIF_INVALID first DAI identifier") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Message-ID: <20250901074403.137263-2-krzysztof.kozlowski@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
- ASoC: codecs: lpass-wsa-macro: Fix speaker quality distortion Commit bb4a0f497bc1 ("ASoC: codecs: lpass: Drop unused AIF_INVALID first DAI identifier") removed first entry in enum with DAI identifiers, because it looked unused. Turns out that there is a relation between DAI ID and "WSA RX0 Mux"-like kcontrols (which use "rx_mux_text" array). That "rx_mux_text" array used first three entries of DAI IDs enum, with value '0' being invalid. The value passed tp "WSA RX0 Mux"-like kcontrols was used as DAI ID and set to configure active channel count and mask, which are arrays indexed by DAI ID. After removal of first AIF_INVALID DAI identifier, this kcontrol was updating wrong entries in active channel count and mask arrays which was visible in reduced quality (distortions) during speaker playback on several boards like Lenovo T14s laptop and Qualcomm SM8550-based boards. Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Fixes: bb4a0f497bc1 ("ASoC: codecs: lpass: Drop unused AIF_INVALID first DAI identifier") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Message-ID: <20250831151401.30897-2-krzysztof.kozlowski@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
- More minor SDCA bug fixes Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>: Just some minor SDCA bug fixes and some minor structure reordering to improve the padding.
- ALSA: hda/tas2781: Fix the order of TAS2781 calibrated-data A bug reported by one of my customers that the order of TAS2781 calibrated-data is incorrect, the correct way is to move R0_Low and insert it between R0 and InvR0. Fixes: 4fe238513407 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib") Signed-off-by: Shenghao Ding <shenghao-ding@ti.com> Link: https://patch.msgid.link/20250907222728.988-1-shenghao-ding@ti.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
9/19/2025, 8:24:15 AM
9/19/2025, 8:24:15 AM
- Add archive message Add archive message Signed-off-by: GNUfault <blumatrikz@gmail.com>
- Update very incomplete STM32 board I'm committing this partial work just to have the history, and because I might use this as a starting point if I take up the mixed-mode thing in the new GuitarPedal project. It's almost certainly not worth saving, but you never know... Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Merge branch 'archive-note' of https://github.com/GNUfault/1590A * 'archive-note' of https://github.com/GNUfault/1590A: Add archive message
9/19/2025, 5:01:41 AM
- rv: Support systems with time64-only syscalls Some systems (like 32-bit RISC-V) only have the 64-bit time_t versions of syscalls. So handle the 32-bit time_t version of those being undefined. Fixes: f74f8bb246cf ("rv: Add rtapp_sleep monitor") Closes: https://lore.kernel.org/oe-kbuild-all/202508160204.SsFyNfo6-lkp@intel.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Acked-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20250804194518.97620-2-palmer@dabbelt.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
- rv: Fix wrong type cast in enabled_monitors_next() Argument 'p' of enabled_monitors_next() is not a pointer to struct rv_monitor, it is actually a pointer to the list_head inside struct rv_monitor. Therefore it is wrong to cast 'p' to struct rv_monitor *. This wrong type cast has been there since the beginning. But it still worked because the list_head was the first field in struct rv_monitor_def. This is no longer true since commit 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor") moved the list_head, and this wrong type cast became a functional problem. Properly use container_of() instead. Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor") Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250806120911.989365-1-namcao@linutronix.de Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
- include/linux/rv.h: remove redundant include file Remove redundant include <linux/types.h> to clean up the code. Move all unique include files inside CONFIG_RV as they are only needed when CONFIG_RV is enabled. Arrange include files alphabetically. Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor") [1] Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202507312017.oyD08TL5-lkp@intel.com/ Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in> Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/aJneRbHGlNFg7lr9@bhairav-test.ee.iitb.ac.in Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
- rv: Fix missing mutex unlock in rv_register_monitor() If create_monitor_dir() fails, the function returns directly without releasing rv_interface_lock. This leaves the mutex locked and causes subsequent monitor registration attempts to deadlock. Fix it by making the error path jump to out_unlock, ensuring that the mutex is always released before returning. Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor") Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20250903065112.1878330-1-zhen.ni@easystack.cn Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
- rv: Add Gabriele Monaco as maintainer for Runtime Verification Gabriele will start taking over managing the changes to the Runtime Verification. Make him officially one of the maintainers. Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250911115744.66ccade3@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
- Merge tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verifier fixes from Steven Rostedt: - Fix build in some RISC-V flavours Some system calls only are available for the 64bit RISC-V machines. #ifdef out the cases of clock_nanosleep and futex in the sleep monitor if they are not supported by the architecture. - Fix wrong cast, obsolete after refactoring Use container_of() to get to the rv_monitor structure from the enable_monitors_next() 'p' pointer. The assignment worked only because the list field used happened to be the first field of the structure. - Remove redundant include files Some include files were listed twice. Remove the extra ones and sort the includes. - Fix missing unlock on failure There was an error path that exited the rv_register_monitor() function without releasing a lock. Change that to goto the lock release. - Add Gabriele Monaco to be Runtime Verifier maintainer Gabriele is doing most of the work on RV as well as collecting patches. Add him to the maintainers file for Runtime Verification. * tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv: Add Gabriele Monaco as maintainer for Runtime Verification rv: Fix missing mutex unlock in rv_register_monitor() include/linux/rv.h: remove redundant include file rv: Fix wrong type cast in enabled_monitors_next() rv: Support systems with time64-only syscalls
9/19/2025, 4:37:21 AM
- Trial attempt at different analog rail filtering This does the +VA net a bit differently: by using a common collector series follower transistor I can use much bigger resistors in the lowpass filter because the current through the resistor is just the tiny base current. In fact, let's just make it the main filtering stage, and share the resulting +VA rail for both the input stage and the output bias. They both need on the order of one mA of current, so assuming a DC current gain (whether you call it h_{FE} or Beta) of roughly one hundred, we can easily use a couple of 2k2 resistors and make it a 2nd order filter. (The original used a single 220 Ohm resistor, so this increases that by a factor of twenty, but then the h_{FE} will more than make up for it) Of course, the small viltage drop over the resistors will now be almost entirely dwarfed by the V_{BE} voltage drop of the transistor. But if it gives us a nice stable analog voltage rail, it should be worth the small voltage drop. I use a MOSFET for reverse polarity protection because I don't like the voltage drop of a simple diode, but that's because the voltage drop there would be just a negative. Here there's a _reason_ for a voltage drop. Of course, I don't know how much it really helps with noise, but it looks good in simulation, and I'm willing to at least do a test board for this. I'm just using the BC846 BJT I already have around. It's a very average cheap general-purpose transistor, I have no idea if this is any good. Live and learn. That's still the point of the whole project, after all. While doing random trials, this also ends up DC-coupling the input side. I probably shouldn't do two independent changes like this, but I'm too lazy to do two separate boards. Talking about separate boards: I wish KiCad had some good way to share common circuits (some kind of "combination symbol" concept or something) so that you could do explicitly different boards without having to either do duplicate boards, or like this, just change an existing one. I guess I could use different git branches for trial runs like this, but with all the random noise that any small change makes, merging things when they work would probably be a nightmare. Anyway, just random musings when doing random testing with things that may or may not be a good idea. In another news: when laying out this change, the resulting board is just a single layer. The bottom copper layer is all ground. Not that it matters, but I like how clean it looks. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9/18/2025, 10:28:56 AM
- mm/gup: check ref_count instead of lru before migration Patch series "mm: better GUP pin lru_add_drain_all()", v2. Series of lru_add_drain_all()-related patches, arising from recent mm/gup migration report from Will Deacon. This patch (of 5): Will Deacon reports:- When taking a longterm GUP pin via pin_user_pages(), __gup_longterm_locked() tries to migrate target folios that should not be longterm pinned, for example because they reside in a CMA region or movable zone. This is done by first pinning all of the target folios anyway, collecting all of the longterm-unpinnable target folios into a list, dropping the pins that were just taken and finally handing the list off to migrate_pages() for the actual migration. It is critically important that no unexpected references are held on the folios being migrated, otherwise the migration will fail and pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is relatively easy to observe migration failures when running pKVM (which uses pin_user_pages() on crosvm's virtual address space to resolve stage-2 page faults from the guest) on a 6.15-based Pixel 6 device and this results in the VM terminating prematurely. In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its mapping of guest memory prior to the pinning. Subsequently, when pin_user_pages() walks the page-table, the relevant 'pte' is not present and so the faulting logic allocates a new folio, mlocks it with mlock_folio() and maps it in the page-table. Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch by pagevec"), mlock/munlock operations on a folio (formerly page), are deferred. For example, mlock_folio() takes an additional reference on the target folio before placing it into a per-cpu 'folio_batch' for later processing by mlock_folio_batch(), which drops the refcount once the operation is complete. Processing of the batches is coupled with the LRU batch logic and can be forcefully drained with lru_add_drain_all() but as long as a folio remains unprocessed on the batch, its refcount will be elevated. This deferred batching therefore interacts poorly with the pKVM pinning scenario as we can find ourselves in a situation where the migration code fails to migrate a folio due to the elevated refcount from the pending mlock operation. Hugh Dickins adds:- !folio_test_lru() has never been a very reliable way to tell if an lru_add_drain_all() is worth calling, to remove LRU cache references to make the folio migratable: the LRU flag may be set even while the folio is held with an extra reference in a per-CPU LRU cache. 5.18 commit 2fbb0c10d1e8 may have made it more unreliable. Then 6.11 commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch") tried to make it reliable, by moving LRU flag clearing; but missed the mlock/munlock batches, so still unreliable as reported. And it turns out to be difficult to extend 33dfe9204f29's LRU flag clearing to the mlock/munlock batches: if they do benefit from batching, mlock/munlock cannot be so effective when easily suppressed while !LRU. Instead, switch to an expected ref_count check, which was more reliable all along: some more false positives (unhelpful drains) than before, and never a guarantee that the folio will prove migratable, but better. Note on PG_private_2: ceph and nfs are still using the deprecated PG_private_2 flag, with the aid of netfs and filemap support functions. Although it is consistently matched by an increment of folio ref_count, folio_expected_ref_count() intentionally does not recognize it, and ceph folio migration currently depends on that for PG_private_2 folios to be rejected. New references to the deprecated flag are discouraged, so do not add it into the collect_longterm_unpinnable_folios() calculation: but longterm pinning of transiently PG_private_2 ceph and nfs folios (an uncommon case) may invoke a redundant lru_add_drain_all(). And this makes easy the backport to earlier releases: up to and including 6.12, btrfs also used PG_private_2, but without a ref_count increment. Note for stable backports: requires 6.16 commit 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation"). Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/ Acked-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- mm/gup: local lru_add_drain() to avoid lru_add_drain_all() In many cases, if collect_longterm_unpinnable_folios() does need to drain the LRU cache to release a reference, the cache in question is on this same CPU, and much more efficiently drained by a preliminary local lru_add_drain(), than the later cross-CPU lru_add_drain_all(). Marked for stable, to counter the increase in lru_add_drain_all()s from "mm/gup: check ref_count instead of lru before migration". Note for clean backports: can take 6.16 commit a03db236aebf ("gup: optimize longterm pin_user_pages() for large folio") first. Link: https://lkml.kernel.org/r/66f2751f-283e-816d-9530-765db7edc465@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch" This reverts commit 33dfe9204f29: now that collect_longterm_unpinnable_folios() is checking ref_count instead of lru, and mlock/munlock do not participate in the revised LRU flag clearing, those changes are misleading, and enlarge the window during which mlock/munlock may miss an mlock_count update. It is possible (I'd hesitate to claim probable) that the greater likelihood of missed mlock_count updates would explain the "Realtime threads delayed due to kcompactd0" observed on 6.12 in the Link below. If that is the case, this reversion will help; but a complete solution needs also a further patch, beyond the scope of this series. Included some 80-column cleanup around folio_batch_add_and_move(). The role of folio_test_clear_lru() (before taking per-memcg lru_lock) is questionable since 6.13 removed mem_cgroup_move_account() etc; but perhaps there are still some races which need it - not examined here. Link: https://lore.kernel.org/linux-mm/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/ Link: https://lkml.kernel.org/r/05905d7b-ed14-68b1-79d8-bdec30367eba@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- mm: revert "mm: vmscan.c: fix OOM on swap stress test" This reverts commit 0885ef470560: that was a fix to the reverted 33dfe9204f29b415bbc0abb1a50642d1ba94f5e9. Link: https://lkml.kernel.org/r/aa0e9d67-fbcd-9d79-88a1-641dfbe1d9d1@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- mm: folio_may_be_lru_cached() unless folio_test_large() mm/swap.c and mm/mlock.c agree to drain any per-CPU batch as soon as a large folio is added: so collect_longterm_unpinnable_folios() just wastes effort when calling lru_add_drain[_all]() on a large folio. But although there is good reason not to batch up PMD-sized folios, we might well benefit from batching a small number of low-order mTHPs (though unclear how that "small number" limitation will be implemented). So ask if folio_may_be_lru_cached() rather than !folio_test_large(), to insulate those particular checks from future change. Name preferred to "folio_is_batchable" because large folios can well be put on a batch: it's just the per-CPU LRU caches, drained much later, which need care. Marked for stable, to counter the increase in lru_add_drain_all()s from "mm/gup: check ref_count instead of lru before migration". Link: https://lkml.kernel.org/r/57d2eaf8-3607-f318-e0c5-be02dce61ad0@google.com Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins <hughd@google.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Keir Fraser <keirf@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: yangge <yangge1116@126.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- mm/damon/core: introduce damon_call_control->dealloc_on_cancel Patch series "mm/damon/sysfs: fix refresh_ms control overwriting on multi-kdamonds usages". Automatic esssential DAMON/DAMOS status update feature of DAMON sysfs interface (refresh_ms) is broken [1] for multiple DAMON contexts (kdamonds) use case, since it uses a global single damon_call_control object for all created DAMON contexts. The fields of the object, particularly the list field is over-written for the contexts and it makes unexpected results including user-space hangup and kernel crashes [2]. Fix it by extending damon_call_control for the use case and updating the usage on DAMON sysfs interface to use per-context dynamically allocated damon_call_control object. This patch (of 2): When damon_call_control->repeat is set, damon_call() is executed asynchronously, and is eventually canceled when kdamond finishes. If the damon_call_control object is dynamically allocated, finding the place to deallocate the object is difficult. Introduce a new damon_call_control field, namely dealloc_on_cancel, to ask the kdamond deallocates those dynamically allocated objects when those are canceled. Link: https://lkml.kernel.org/r/20250908201513.60802-3-sj@kernel.org Link: https://lkml.kernel.org/r/20250908201513.60802-2-sj@kernel.org Fixes: d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file internal work") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Yunjeong Mun <yunjeong.mun@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control DAMON sysfs interface is using a single global repeat mode damon_call_control variable for refresh_ms handling, for all DAMON contexts. As a result, when there are more than one context, the single global damon_call_control is unexpectedly over-written (corrupted). Particularly the ->link field is overwritten by the multiple contexts and this can cause a user hangup, and/or a kernel crash. Fix it by using dynamically allocated damon_call_control object per DAMON context. Link: https://lkml.kernel.org/r/20250908201513.60802-3-sj@kernel.org Link: https://lore.kernel.org/20250904011738.930-1-yunjeong.mun@sk.com [1] Link: https://lore.kernel.org/20250905035411.39501-1-sj@kernel.org [2] Fixes: d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file internal work") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Yunjeong Mun <yunjeong.mun@sk.com> Closes: https://lore.kernel.org/20250904011738.930-1-yunjeong.mun@sk.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- MAINTAINERS: add Jann Horn as rmap reviewer Jann has been an excellent contributor in all areas of memory management, and has demonstrated great expertise in the reverse mapping. It's therefore appropriate for him to become a reviewer. Link: https://lkml.kernel.org/r/20250908194959.820913-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- MAINTAINERS: add Lance Yang as a THP reviewer I've been actively digging into the MM/THP subsystem for over a year now, and there's a real interest in contributing more and getting further involved. Well, missing out on any more cool THP things is really a pain ;) Link: https://lkml.kernel.org/r/20250908104857.35397-1-lance.yang@linux.dev Signed-off-by: Lance Yang <lance.yang@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Barry Song <baohua@kernel.org> Acked-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- samples/damon/wsse: avoid starting DAMON before initialization Patch series "samples/damon: fix boot time enable handling fixup merge mistakes". First three patches of the patch series "mm/damon: fix misc bugs in DAMON modules" [1] were trying to fix boot time DAMON sample modules enabling issues. The issues are the modules can crash if those are enabled before DAMON is enabled, like using boot time parameter options. The three patches were fixing the issues by avoiding starting DAMON before the module initialization phase. However, probably by a mistake during a merge, only half of the change is merged, and the part for avoiding the starting of DAMON before the module initialized is missed. So the problem is not solved and thus the modules can still crash if enabled before DAMON is initialized. Fix those by applying the unmerged parts again. Note that the broken commits are merged into 6.17-rc1, but also backported to relevant stable kernels. So this series also needs to be merged into the stable kernels. Hence Cc-ing stable@. This patch (of 3): Commit 0ed1165c3727 ("samples/damon/wsse: fix boot time enable handling") is somehow incompletely applying the origin patch [2]. It is missing the part that avoids starting DAMON before module initialization. Probably a mistake during a merge has happened. Fix it by applying the missed part again. Link: https://lkml.kernel.org/r/20250909022238.2989-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250909022238.2989-2-sj@kernel.org Link: https://lkml.kernel.org/r/20250706193207.39810-1-sj@kernel.org [1] Link: https://lore.kernel.org/20250706193207.39810-2-sj@kernel.org [2] Fixes: 0ed1165c3727 ("samples/damon/wsse: fix boot time enable handling") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- samples/damon/prcl: avoid starting DAMON before initialization Commit 2780505ec2b4 ("samples/damon/prcl: fix boot time enable crash") is somehow incompletely applying the origin patch [1]. It is missing the part that avoids starting DAMON before module initialization. Probably a mistake during a merge has happened. Fix it by applying the missed part again. Link: https://lkml.kernel.org/r/20250909022238.2989-3-sj@kernel.org Link: https://lore.kernel.org/20250706193207.39810-3-sj@kernel.org [1] Fixes: 2780505ec2b4 ("samples/damon/prcl: fix boot time enable crash") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- samples/damon/mtier: avoid starting DAMON before initialization Commit 964314344eab ("samples/damon/mtier: support boot time enable setup") is somehow incompletely applying the origin patch [1]. It is missing the part that avoids starting DAMON before module initialization. Probably a mistake during a merge has happened. Fix it by applying the missed part again. Link: https://lkml.kernel.org/r/20250909022238.2989-4-sj@kernel.org Link: https://lore.kernel.org/20250706193207.39810-4-sj@kernel.org [1] Fixes: 964314344eab ("samples/damon/mtier: support boot time enable setup") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/* When accessing one of the files under /sys/fs/nilfs2/features when CONFIG_CFI_CLANG is enabled, there is a CFI violation: CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d) ... Call Trace: <TASK> sysfs_kf_seq_show+0x2a6/0x390 ? __cfi_kobj_attr_show+0x10/0x10 kernfs_seq_show+0x104/0x15b seq_read_iter+0x580/0xe2b ... When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops. When nilfs_feature_attr_group is added to that kobject via sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw, which will call sysfs_kf_seq_show() when ->seq_show() is called. sysfs_kf_seq_show() in turn calls kobj_attr_show() through ->sysfs_ops->show(). kobj_attr_show() casts the provided attribute out to a 'struct kobj_attribute' via container_of() and calls ->show(), resulting in the CFI violation since neither nilfs_feature_revision_show() nor nilfs_feature_README_show() match the prototype of ->show() in 'struct kobj_attribute'. Resolve the CFI violation by adjusting the second parameter in nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct kobj_attribute' to match the expected prototype. Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com Fixes: aebe17f68444 ("nilfs2: add /sys/fs/nilfs2/features group") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/ Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- zram: fix slot write race condition Parallel concurrent writes to the same zram index result in leaked zsmalloc handles. Schematically we can have something like this: CPU0 CPU1 zram_slot_lock() zs_free(handle) zram_slot_lock() zram_slot_lock() zs_free(handle) zram_slot_lock() compress compress handle = zs_malloc() handle = zs_malloc() zram_slot_lock zram_set_handle(handle) zram_slot_lock zram_slot_lock zram_set_handle(handle) zram_slot_lock Either CPU0 or CPU1 zsmalloc handle will leak because zs_free() is done too early. In fact, we need to reset zram entry right before we set its new handle, all under the same slot lock scope. Link: https://lkml.kernel.org/r/20250909045150.635345-1-senozhatsky@chromium.org Fixes: 71268035f5d7 ("zram: free slot memory early during write") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reported-by: Changhui Zhong <czhong@redhat.com> Closes: https://lore.kernel.org/all/CAGVVp+UtpGoW5WEdEU7uVTtsSCjPN=ksN6EcvyypAtFDOUf30A@mail.gmail.com/ Tested-by: Changhui Zhong <czhong@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Merge tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "15 hotfixes. 11 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 13 of these fixes are for MM. The usual shower of singletons, plus - fixes from Hugh to address various misbehaviors in get_user_pages() - patches from SeongJae to address a quite severe issue in DAMON - another series also from SeongJae which completes some fixes for a DAMON startup issue" * tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: zram: fix slot write race condition nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/* samples/damon/mtier: avoid starting DAMON before initialization samples/damon/prcl: avoid starting DAMON before initialization samples/damon/wsse: avoid starting DAMON before initialization MAINTAINERS: add Lance Yang as a THP reviewer MAINTAINERS: add Jann Horn as rmap reviewer mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control mm/damon/core: introduce damon_call_control->dealloc_on_cancel mm: folio_may_be_lru_cached() unless folio_test_large() mm: revert "mm: vmscan.c: fix OOM on swap stress test" mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch" mm/gup: local lru_add_drain() to avoid lru_add_drain_all() mm/gup: check ref_count instead of lru before migration
9/18/2025, 9:07:03 AM
- smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES We should not use more sges for ib_post_send() than we told the rdma device in rdma_create_qp()! Otherwise ib_post_send() will return -EINVAL, so we disconnect the connection. Or with the current siw.ko we'll get 0 from ib_post_send(), but will never ever get a completion for the request. I've already sent a fix for siw.ko... So we need to make sure smb_direct_writev() limits the number of vectors we pass to individual smb_direct_post_send_data() calls, so that we don't go over the queue pair limits. Commit 621433b7e25d ("ksmbd: smbd: relax the count of sges required") was very strange and I guess only needed because SMB_DIRECT_MAX_SEND_SGES was 8 at that time. It basically removed the check that the rdma device is able to handle the number of sges we try to use. While the real problem was added by commit ddbdc861e37c ("ksmbd: smbd: introduce read/write credits for RDMA read/write") as it used the minumun of device->attrs.max_send_sge and device->attrs.max_sge_rd, with the problem that device->attrs.max_sge_rd is always 1 for iWarp. And that limitation should only apply to RDMA Read operations. For now we keep that limitation for RDMA Write operations too, fixing that is a task for another day as it's not really required a bug fix. Commit 2b4eeeaa9061 ("ksmbd: decrease the number of SMB3 smbdirect server SGEs") lowered SMB_DIRECT_MAX_SEND_SGES to 6, which is also used by our client code. And that client code enforces device->attrs.max_send_sge >= 6 since commit d2e81f92e5b7 ("Decrease the number of SMB3 smbdirect client SGEs") and (briefly looking) only the i40w driver provides only 3, see I40IW_MAX_WQ_FRAGMENT_COUNT. But currently we'd require 4 anyway, so that would not work anyway, but now it fails early. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: linux-rdma@vger.kernel.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Fixes: ddbdc861e37c ("ksmbd: smbd: introduce read/write credits for RDMA read/write") Fixes: 621433b7e25d ("ksmbd: smbd: relax the count of sges required") Fixes: 2b4eeeaa9061 ("ksmbd: decrease the number of SMB3 smbdirect server SGEs") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
- ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer If data_offset and data_length of smb_direct_data_transfer struct are invalid, out of bounds issue could happen. This patch validate data_offset and data_length field in recv_done. Cc: stable@vger.kernel.org Fixes: 2ea086e35c3d ("ksmbd: add buffer validation for smb direct") Reviewed-by: Stefan Metzmacher <metze@samba.org> Reported-by: Luigino Camastra, Aisle Research <luigino.camastra@aisle.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
- ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size This is inspired by the check for data_offset + data_length. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable@vger.kernel.org Fixes: 2ea086e35c3d ("ksmbd: add buffer validation for smb direct") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
- Merge tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - Two fixes for remaining_data_length and offset checks in receive path - Don't go over max SGEs which caused smbdirect send to fail (and trigger disconnect) * tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES
9/18/2025, 6:59:27 AM
- Update the README.md file a bit Fix some link issues, add some commentary on part choices. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9/18/2025, 6:57:01 AM
- Update the README.md file a bit Fix some link issues, add some commentary on part choices. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9/18/2025, 5:47:19 AM
- tracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal() A crash was observed with the following output: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 2899 Comm: syz.2.399 Not tainted 6.17.0-rc5+ #5 PREEMPT(none) RIP: 0010:trace_kprobe_create_internal+0x3fc/0x1440 kernel/trace/trace_kprobe.c:911 Call Trace: <TASK> trace_kprobe_create_cb+0xa2/0xf0 kernel/trace/trace_kprobe.c:1089 trace_probe_create+0xf1/0x110 kernel/trace/trace_probe.c:2246 dyn_event_create+0x45/0x70 kernel/trace/trace_dynevent.c:128 create_or_delete_trace_kprobe+0x5e/0xc0 kernel/trace/trace_kprobe.c:1107 trace_parse_run_command+0x1a5/0x330 kernel/trace/trace.c:10785 vfs_write+0x2b6/0xd00 fs/read_write.c:684 ksys_write+0x129/0x240 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x5d/0x2d0 arch/x86/entry/syscall_64.c:94 </TASK> Function kmemdup() may return NULL in trace_kprobe_create_internal(), add check for it's return value. Link: https://lore.kernel.org/all/20250916075816.3181175-1-wangliang74@huawei.com/ Fixes: 33b4e38baa03 ("tracing: kprobe-event: Allocate string buffers from heap") Signed-off-by: Wang Liang <wangliang74@huawei.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
- Merge tag 'probes-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probe fix from Masami Hiramatsu: - kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal(), by handling NULL return of kmemdup() correctly * tag 'probes-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal()
9/18/2025, 2:02:27 AM
- cgroup: split cgroup_destroy_wq into 3 workqueues A hung task can occur during [1] LTP cgroup testing when repeatedly mounting/unmounting perf_event and net_prio controllers with systemd.unified_cgroup_hierarchy=1. The hang manifests in cgroup_lock_and_drain_offline() during root destruction. Related case: cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio Call Trace: cgroup_lock_and_drain_offline+0x14c/0x1e8 cgroup_destroy_root+0x3c/0x2c0 css_free_rwork_fn+0x248/0x338 process_one_work+0x16c/0x3b8 worker_thread+0x22c/0x3b0 kthread+0xec/0x100 ret_from_fork+0x10/0x20 Root Cause: CPU0 CPU1 mount perf_event umount net_prio cgroup1_get_tree cgroup_kill_sb rebind_subsystems // root destruction enqueues // cgroup_destroy_wq // kill all perf_event css // one perf_event css A is dying // css A offline enqueues cgroup_destroy_wq // root destruction will be executed first css_free_rwork_fn cgroup_destroy_root cgroup_lock_and_drain_offline // some perf descendants are dying // cgroup_destroy_wq max_active = 1 // waiting for css A to die Problem scenario: 1. CPU0 mounts perf_event (rebind_subsystems) 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work 3. A dying perf_event CSS gets queued for offline after root destruction 4. Root destruction waits for offline completion, but offline work is blocked behind root destruction in cgroup_destroy_wq (max_active=1) Solution: Split cgroup_destroy_wq into three dedicated workqueues: cgroup_offline_wq – Handles CSS offline operations cgroup_release_wq – Manages resource release cgroup_free_wq – Performs final memory deallocation This separation eliminates blocking in the CSS free path while waiting for offline operations to complete. [1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends") Reported-by: Gao Yingjie <gaoyingjie@uniontech.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> Suggested-by: Teju Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
- cgroup/psi: Set of->priv to NULL upon file release Setting of->priv to NULL when the file is released enables earlier bug detection. This allows potential bugs to manifest as NULL pointer dereferences rather than use-after-free errors[1], which are generally more difficult to diagnose. [1] https://lore.kernel.org/cgroups/38ef3ff9-b380-44f0-9315-8b3714b0948d@huaweicloud.com/T/#m8a3b3f88f0ff3da5925d342e90043394f8b2091b Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
- Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()" scx_bpf_reenqueue_local() can be called from ops.cpu_release() when a CPU is taken by a higher scheduling class to give tasks queued to the CPU's local DSQ a chance to be migrated somewhere else, instead of waiting indefinitely for that CPU to become available again. In doing so, we decided to skip migration-disabled tasks, under the assumption that they cannot be migrated anyway. However, when a higher scheduling class preempts a CPU, the running task is always inserted at the head of the local DSQ as a migration-disabled task. This means it is always skipped by scx_bpf_reenqueue_local(), and ends up being confined to the same CPU even if that CPU is heavily contended by other higher scheduling class tasks. As an example, let's consider the following scenario: $ schedtool -a 0,1, -e yes > /dev/null $ sudo schedtool -F -p 99 -a 0, -e \ stress-ng -c 1 --cpu-load 99 --cpu-load-slice 1000 The first task (SCHED_EXT) can run on CPU0 or CPU1. The second task (SCHED_FIFO) is pinned to CPU0 and consumes ~99% of it. If the SCHED_EXT task initially runs on CPU0, it will remain there because it always sees CPU0 as "idle" in the short gaps left by the RT task, resulting in ~1% utilization while CPU1 stays idle: 0[||||||||||||||||||||||100.0%] 8[ 0.0%] 1[ 0.0%] 9[ 0.0%] 2[ 0.0%] 10[ 0.0%] 3[ 0.0%] 11[ 0.0%] 4[ 0.0%] 12[ 0.0%] 5[ 0.0%] 13[ 0.0%] 6[ 0.0%] 14[ 0.0%] 7[ 0.0%] 15[ 0.0%] PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command 1067 root RT 0 R 0 99.0 0.2 0:31.16 stress-ng-cpu [run] 975 arighi 20 0 R 0 1.0 0.0 0:26.32 yes By allowing scx_bpf_reenqueue_local() to re-enqueue migration-disabled tasks, the scheduler can choose to migrate them to other CPUs (CPU1 in this case) via ops.enqueue(), leading to better CPU utilization: 0[||||||||||||||||||||||100.0%] 8[ 0.0%] 1[||||||||||||||||||||||100.0%] 9[ 0.0%] 2[ 0.0%] 10[ 0.0%] 3[ 0.0%] 11[ 0.0%] 4[ 0.0%] 12[ 0.0%] 5[ 0.0%] 13[ 0.0%] 6[ 0.0%] 14[ 0.0%] 7[ 0.0%] 15[ 0.0%] PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command 577 root RT 0 R 0 100.0 0.2 0:23.17 stress-ng-cpu [run] 555 arighi 20 0 R 1 100.0 0.0 0:28.67 yes It's debatable whether per-CPU tasks should be re-enqueued as well, but doing so is probably safer: the scheduler can recognize re-enqueued tasks through the %SCX_ENQ_REENQ flag, reassess their placement, and either put them back at the head of the local DSQ or let another task attempt to take the CPU. This also prevents giving per-CPU tasks an implicit priority boost, which would otherwise make them more likely to reclaim CPUs preempted by higher scheduling classes. Fixes: 97e13ecb02668 ("sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()") Cc: stable@vger.kernel.org # v6.15+ Signed-off-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
- sched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED While collecting SCX related fields in struct task_group into struct scx_task_group, 6e6558a6bc41 ("sched_ext, sched/core: Factor out struct scx_task_group") forgot update tg->scx_weight usage in tg_weight(), which leads to build failure when CONFIG_FAIR_GROUP_SCHED is disabled but CONFIG_EXT_GROUP_SCHED is enabled. Fix it. Fixes: 6e6558a6bc41 ("sched_ext, sched/core: Factor out struct scx_task_group") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509170230.MwZsJSWa-lkp@intel.com/ Tested-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
- Merge tag 'cgroup-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This contains two cgroup changes. Both are pretty low risk. - Fix deadlock in cgroup destruction when repeatedly mounting/unmounting perf_event and net_prio controllers. The issue occurs because cgroup_destroy_wq has max_active=1, causing root destruction to wait for CSS offline operations that are queued behind it. The fix splits cgroup_destroy_wq into three separate workqueues to eliminate the blocking. - Set of->priv to NULL upon file release to make potential bugs to manifest as NULL pointer dereferences rather than use-after-free errors" * tag 'cgroup-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/psi: Set of->priv to NULL upon file release cgroup: split cgroup_destroy_wq into 3 workqueues
- Merge tag 'sched_ext-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED - Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()" which was causing issues with per-CPU task scheduling and reenqueuing behavior * tag 'sched_ext-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
9/17/2025, 6:48:18 AM
9/17/2025, 6:40:57 AM
9/17/2025, 6:32:42 AM