raynoldahh it's a wonderful day
luke-jr[ 2009.041563] WARNING: CPU: 3 PID: 1289 at fs/fs-writeback.c:2339 __writeback_inodes_sb_nr+0xbc/0xd0
luke-jris there a solution to this yet? :/
luke-jrhow dangerous is it?
MoZygo: "not broken enough to RMA" :)
schmittlauch_Well, is it a normal thing that btrfs check itself crashes while trying to repair an FS?
Ke it at least used to be normal, but obviously not a good thing
optybig fs?
Keschmittlauch_: if you want to be a good lobbyist, post the log to btrfs mailing list
Keschmittlauch_: 1) get a recent enough btrfs-progs 2) get it with debug symbols 3) run inside gdb 4) once it crashes, run backtrace full
Kebut I think it mostly crashes on assertions, in which case you just need to share the assertion and perhaps be prepared to share the metadata image
optyrefining... big fs and 32-bit system? :)
Keschmittlauch_: in general things get better based on the number of complaints people post on them
Ke(and obviously I don't mean spamming but valid complaints)
schmittlauch_Ke: question is, how do I best do that if I currently can't use my system because I'm talking about my rootfs here?
Kethat's nasty then
schmittlauch_(I guess qgroups are to blame for wrecking the FS)
schmittlauch_Ke: I can provide this wonderful probably useless kerne trace: http://paste.opensuse.org/79318387
MoHi, sending snapshots to a UAS device has rates about 54,9KiB/s :( How can I improve that? I already tried balancing with the btrfs-maintenance default values. defrag is no solution as I would loose all reflinks of the snapshots.
MoThe device has "only" 66 subvolumes being snapshots.
MoMounted like this: rw,noatime,nodiratime,sync,compress-force=zlib,nossd,noacl,space_cache,autodefrag,subvolid=5,subvol=/
Mosync because it's a "mobile" device. I also have hdparm -W0 set to disable write cache. But those 2 don't affect the performance much when changing.
Kesync is definitely expected to change the performance, also disabled write cache
MoKe: Ok, but switching those doesn't solve that performance issue.
optysmall files?
KeMo: yup noticed that, but that is very unexpected
Kenot sure how slow is your zlib compression
Moopty: It' snapshots of a / root filesystem and a /home, nothing special.
MoI remounted -o async and hdparm -W1, no difference.
MoI've choosen zlib compression because that USB device is already slow, and the CPU is a corei7 mobile, 2 cores, 4 with HT.
MoAnd performance usually doesn't matter as I only send snapshots to it. But with 54,9KiB/s that takes hours..
MoQuite sure the fragmentation is worse, but defragmenting takes me days or weeks, as I need to defrag all the snapshots, remove the ro flags, deduplicate manually everything, setting ro flags back.
MoI like to avoid that.
Mobtw. autodefrag, does that require a filesystem to be mounted for a while? As I usually umount immediatly after the send. So are there any background services cleaning the device while idle?
Keyeah, it might
multicoreunmounted fs isn't idle
optydarkling: out of curiosity, what timestamps?
MoHi, looking at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f51d2b59120ff364a5e612a594ed358767e1cd09 btrfs: allow to set compression level for zlib : What is the default compression level for compress=zlib?
MoUsually I preferred compress-force as the heuristic was very simple, there have been changes in 4.15 improviving the compression heuristic. Should I switch back to compress= ?
MoAre new zlib compression levels backwards compatible, like is compress=zlib9 still readable by some older btrfs compress=zlib? I guess so.
Momulticore: But btrfs cleaner processes don't touch a btrfs not mounted, do they?
multicoreMo: unmounted fs isn't idle it's unmounted so...
Momulticore: Question was, if I need to keep the btrfs mounted for longer and if there are processes maintaining a mounted btrfs that are blocked if I umount it directly after send/receive.
MoI did not find many notes about autofs + btrfs.
KeI use autofs, but I have transient "lockups"
MoWhat does that mean?
demfloroMo: if I'm reading code right default level is 3 for zlib
demfloro+workspace->level = level > 0 ? level : 3;
demfloroin zlib_set_level()
demfloroi.e. compress=zlib, compress=zlib0 and compress=zlib3 are the same
demfloroold kernels don't accept compress=zlibN parameter, where N is a number
UukGoblinhello, `touch /var/lib/lxc/lxc1/rootfs/foo` unexpectedly results in "Device or resource busy". It's a subvolume on @ (which is mounted at /)
UukGoblinthe lxc container is not yet started
UukGoblin(well it was started but I stopped it to debug this issue)
UukGoblinwhat can this EBUSY error indicate?
UukGoblinI had it earlier and I thought it was because I moved the subvolume from /lxc1 to @/var/lib/lxc/eth1/rootfs (because LXC config seems to work better that way), and then I somehow got rid of it after a reboot, but now after another reboot, it's still returning that busy thing
KnorrieI'm going to retry converting DUP metadata to single with ssd_spread behaviour for the metadata extent allocator
Knorrieholy moly, first two done in 1 minute
Knorrieinstead of 3 hours per chunk
darklingWhat are you breaking now, Knorrie?
Knorriea time record
darklingThe poor thing. :)
KnorrieI patched this 4.14 kernel to use tetris for data and spread for metadata
Knorrieand limited single metadata chunks to 512M so that convert does not replace every 512M DUP with a 1GB single
darklingAnd now you're plaing sokoban with it?
Knorrieand while doing that submitted the chunk allocator patch for DUP
Knorrienow, hopefully it keeps going this fast, but I'm afraid there are some monsters waiting in the low-vaddr metadata chunks
darklingAny heavily-shared extents?
Knorrieit's the 12 months, 4 weeks, 14 days backup schedule, and yesterday I already expired everything that would expire in the next 15 days, removing about 30-40% of all subvols
Knorrieso, not heavily shared
darklingSo you should be free of the obvious O(n^3) problems in metadata balance.
darkling(Or is it O(n^4)?)
Knorriewhen trying this the last time, I had some which only made the kernel go 100% cpu and not recover from it
darklingI've got some that take 4+ hours.
darklingIt really is pathological.
Knorrieright now it keeps doing 2 per minute, but I have >1TiB metadata on this fs
darklingIt'll chomp through most of them like that. It's only a small fraction that are slow, IME.
Knorriewell, the ssd_spread behaviour is really really relevant here
Knorrieonly thing that's slowing it down now is random disk reads
KnorrieMetadata, DUP: total=804.00GiB, used=537.50GiB
KnorrieMetadata, single: total=85.00GiB, used=10.91GiB
floxquick question, does it make sense to defragment files on a btrfs raid1 over smr drives?
Knorriesounds like an XY question
Knorriethat I suspect there's a different question behind that question, which is about the actual problem you ahve
Knorrieoh it was actually this one that was relevant for what I'm doing now... https://www.spinics.net/lists/linux-btrfs/msg64771.html
Knorriehm, there was a delayed refs patch that would split up stuff in different lists... what was it called
Knorrieso that it doesn't keep traversing lists with too many items
floxKnorrie: a collegue of mine wrote some code to move big files to an archive storage and after each move he calls a defragment on the file. I question the wisdom behind that.
Knorrieif you just wrote them to that storage it doesn't sound like making a lot of sense yes
Knorrieunless it's dedupe and not defrag and there are very similar files on there already
Knorrieah "[PATCH v3] btrfs: imporve delayed refs iterations"
Knorrieah looks like it got in
Knorrieso maybe that one is also helping a bit
Knorrieamazing, it keeps going with 2 chunks per minute, only slowed down by a few MiB/s random reads, 0% cpu usage and sometimes some writes
darklingKnorrie: But.. but... Man Cannot Live At Such Speeds!
KnorrieI think it's still writing back a lot of writes into the DUP chunks instead of the new single, but at least it's less of a problem now
KnorrieMetadata, DUP: total=761.50GiB, used=527.51GiB
KnorrieMetadata, single: total=170.00GiB, used=20.87GiB
Knorriessd_spread is a total waste of disk space, but it
Knorrieit's a sensible sacrifice this way
Zygoyou're testing on 4.9.x?
Knorrie4.14.17 + some yolo patches
Zygoso you've already got the couple-orders-of-magnitude-less-CPU backref patches
Knorrieyeah I think I was looking at that patch a few lines back here
Knorrieoh no, that's delayed refs
Knorriewhich patches are those?
Knorrieif I had my storage (and if btrfs could do) with metadata on ssd and data on hdd, it would totally go at ludricous speed with tens of TiB of gazillions and gazillions of small files in tens of thousands of subvols
Zygo"btrfs: track refs in a rb_tree instead of a list"
KnorrieI remember seeing that one
Zygothat's also the "make bees barf a kernel trace every 1.7 seconds" patch
Knorrieyep, I have that one
Zygobut I'll take it because...faster backrefs
Zygoor at least backrefs with less heat output and fan noise
darklingDoes that bring it down to O(n^2 log n) or O(n (log n)^2) ?
Zygonot sure they're actually faster
Knorriedarkling: 0e0adbcfdc908684317c99a9bf5e13383f03b7ec
Knorrielooks like not doing unnecessary things
Knorrie"This doesn't make any sense as we likely have refs for different roots, and so they cannot be merged."
darklingThings are bad. Something must be done. This is something. Therefore we will do it.
ZygoDuncan advised Knorrie to be careful when doing device-level btrfs clones. That's...cute. ;)
Knorrielun clones are in another igroup on the netapp, so split on the level of iscsi initiator
darklingTo be fair, he's an excellent macro bot. :)
Zygo"UUID aliasing is so far down the list of massive data-eating problems I'm creating for myself--and that I'm also totally immune to, because test setup--that I didn't even think of it"
ZygoI'm pretty sure that there's a faster way to do find_parent_nodes and that it'll make balance faster...but that's all I'm sure of.
Zygoclearly there are parts of balance that are slow and not find_parent_nodes, too
ZygoI'd think that adding or removing dup copies would be much easier in the kernel with all the transaction logic in place than in userspace
Zygobut you'd probably have to avoid parts of the kernel that try to avoid accidentally messing with the device/block group trees
Zygothere are probably already pieces of this in the kernel, e.g. we can add and delete chunks/block groups already
Zygobut it's probably not something like delete + add in one transaction, because what if the metadata tree is in the block group you're deleting?
Zygobut maybe you can change a block group if you're changing nothing else about it, then create a new block group out of the chunk you're dropping, then immediately delete that
Zygoand somehow do that without leaking a chunk if the system crashes half way through
Knorrieconverting dup to single?
Zygodup or raid1 to single
Zygo(I guess raid10 to raid0 too, maybe)
Zygogoing the other way requires copying the data which is somewhat harder
coderobeHow is this possible? https://tmp.codero.be/GentleYellowCaterpillar3.txt
coderobewrong link whoops
coderobethis, https://tmp.codero.be/FierceMaroonEland3.txt
Knorrieit's not just the block group, also chunk item with stripes and dev_extent tree
Zygoblock group is the hard one that ties the others together
KnorrieI remounted to nossd for fun, and now it's indeed taking a lot longer to do I metadata chunk
Knorrieremounting to ssd_spread and BAM! chunk done, next one
zerocooldo any of you work on btrfs?
virichello btrfs!
viricI have 590G in my btrfs, I scrubbed it, and it finished at 429GB.
viricWhy does scrub report 429GiB and I have 590G occupied?
viricsingle disk.
viricAh, compression, maybe?
Zygodoes 'btrfs fi df' say 429G used?
viricData, single: total=583.01GiB, used=582.81GiB
viric scrub started at Mon Feb 5 22:57:30 2018, running for 01:18:29
viric total bytes scrubbed: 438.42GiB with 0 errors
Zygo"running for"...i.e. not finished yet
viricOH. ok.
viricI missed it.
Zygothe magic words are "finished after" ;)
viricit makes sense now that I know it.
viricscrub affects the whole device, right? even if I supply only a subvolume
Knorriezerocool: work?
Knorriesome in here are full time btrfs workers, and some occasionally. like, if you run into a problem and you find out why it happens yourself, and think of a solution, you can just send a patch to the btrfs mailinglist and see what the "official" developers think of it
KnorrieI also see it as a fun way to learn a lot of things