raynold | ahh it's a wonderful day |
luke-jr | [ 2009.041563] WARNING: CPU: 3 PID: 1289 at fs/fs-writeback.c:2339 __writeback_inodes_sb_nr+0xbc/0xd0 |
luke-jr | is there a solution to this yet? :/ |
luke-jr | how dangerous is it? |
Mo | Zygo: "not broken enough to RMA" :) |
schmittlauch_ | Well, is it a normal thing that btrfs check itself crashes while trying to repair an FS? |
Ke | it at least used to be normal, but obviously not a good thing |
opty | big fs? |
Ke | schmittlauch_: if you want to be a good lobbyist, post the log to btrfs mailing list |
Ke | schmittlauch_: 1) get a recent enough btrfs-progs 2) get it with debug symbols 3) run inside gdb 4) once it crashes, run backtrace full |
Ke | but I think it mostly crashes on assertions, in which case you just need to share the assertion and perhaps be prepared to share the metadata image |
opty | refining... big fs and 32-bit system? :) |
Ke | schmittlauch_: in general things get better based on the number of complaints people post on them |
Ke | (and obviously I don't mean spamming but valid complaints) |
schmittlauch_ | Ke: question is, how do I best do that if I currently can't use my system because I'm talking about my rootfs here? |
Ke | that's nasty then |
schmittlauch_ | (I guess qgroups are to blame for wrecking the FS) |
schmittlauch_ | Ke: I can provide this wonderful probably useless kerne trace: http://paste.opensuse.org/79318387 |
Mo | Hi, sending snapshots to a UAS device has rates about 54,9KiB/s :( How can I improve that? I already tried balancing with the btrfs-maintenance default values. defrag is no solution as I would loose all reflinks of the snapshots. |
Mo | The device has "only" 66 subvolumes being snapshots. |
Mo | Mounted like this: rw,noatime,nodiratime,sync,compress-force=zlib,nossd,noacl,space_cache,autodefrag,subvolid=5,subvol=/ |
Mo | sync because it's a "mobile" device. I also have hdparm -W0 set to disable write cache. But those 2 don't affect the performance much when changing. |
Ke | sync is definitely expected to change the performance, also disabled write cache |
Mo | Ke: Ok, but switching those doesn't solve that performance issue. |
opty | small files? |
Ke | Mo: yup noticed that, but that is very unexpected |
Ke | not sure how slow is your zlib compression |
Mo | opty: It' snapshots of a / root filesystem and a /home, nothing special. |
Mo | I remounted -o async and hdparm -W1, no difference. |
Mo | I've choosen zlib compression because that USB device is already slow, and the CPU is a corei7 mobile, 2 cores, 4 with HT. |
Mo | And performance usually doesn't matter as I only send snapshots to it. But with 54,9KiB/s that takes hours.. |
Mo | Quite sure the fragmentation is worse, but defragmenting takes me days or weeks, as I need to defrag all the snapshots, remove the ro flags, deduplicate manually everything, setting ro flags back. |
Mo | I like to avoid that. |
Mo | btw. autodefrag, does that require a filesystem to be mounted for a while? As I usually umount immediatly after the send. So are there any background services cleaning the device while idle? |
Ke | yeah, it might |
multicore | unmounted fs isn't idle |
opty | darkling: out of curiosity, what timestamps? |
Mo | Hi, looking at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f51d2b59120ff364a5e612a594ed358767e1cd09 btrfs: allow to set compression level for zlib : What is the default compression level for compress=zlib? |
Mo | Usually I preferred compress-force as the heuristic was very simple, there have been changes in 4.15 improviving the compression heuristic. Should I switch back to compress= ? |
Mo | https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5cea7647e64657138138a3794ae172ee0fc175da |
Mo | Are new zlib compression levels backwards compatible, like is compress=zlib9 still readable by some older btrfs compress=zlib? I guess so. |
Mo | multicore: But btrfs cleaner processes don't touch a btrfs not mounted, do they? |
multicore | Mo: unmounted fs isn't idle it's unmounted so... |
Mo | multicore: Question was, if I need to keep the btrfs mounted for longer and if there are processes maintaining a mounted btrfs that are blocked if I umount it directly after send/receive. |
Mo | I did not find many notes about autofs + btrfs. |
Ke | I use autofs, but I have transient "lockups" |
Mo | What does that mean? |
demfloro | Mo: if I'm reading code right default level is 3 for zlib |
demfloro | +workspace->level = level > 0 ? level : 3; |
demfloro | in zlib_set_level() |
demfloro | i.e. compress=zlib, compress=zlib0 and compress=zlib3 are the same |
demfloro | old kernels don't accept compress=zlibN parameter, where N is a number |
UukGoblin | hello, `touch /var/lib/lxc/lxc1/rootfs/foo` unexpectedly results in "Device or resource busy". It's a subvolume on @ (which is mounted at /) |
UukGoblin | the lxc container is not yet started |
UukGoblin | (well it was started but I stopped it to debug this issue) |
UukGoblin | what can this EBUSY error indicate? |
UukGoblin | I had it earlier and I thought it was because I moved the subvolume from /lxc1 to @/var/lib/lxc/eth1/rootfs (because LXC config seems to work better that way), and then I somehow got rid of it after a reboot, but now after another reboot, it's still returning that busy thing |
Knorrie | \o/ |
UukGoblin | \m/ |
Kobaz | mm |
Knorrie | I'm going to retry converting DUP metadata to single with ssd_spread behaviour for the metadata extent allocator |
Knorrie | holy moly, first two done in 1 minute |
Knorrie | \:D/ |
Knorrie | instead of 3 hours per chunk |
darkling | What are you breaking now, Knorrie? |
Knorrie | a time record |
Knorrie | ;] |
darkling | The poor thing. :) |
Knorrie | I patched this 4.14 kernel to use tetris for data and spread for metadata |
Knorrie | and limited single metadata chunks to 512M so that convert does not replace every 512M DUP with a 1GB single |
darkling | And now you're plaing sokoban with it? |
Knorrie | and while doing that submitted the chunk allocator patch for DUP |
Knorrie | now, hopefully it keeps going this fast, but I'm afraid there are some monsters waiting in the low-vaddr metadata chunks |
darkling | Any heavily-shared extents? |
Knorrie | it's the 12 months, 4 weeks, 14 days backup schedule, and yesterday I already expired everything that would expire in the next 15 days, removing about 30-40% of all subvols |
Knorrie | so, not heavily shared |
darkling | So you should be free of the obvious O(n^3) problems in metadata balance. |
darkling | (Or is it O(n^4)?) |
Knorrie | when trying this the last time, I had some which only made the kernel go 100% cpu and not recover from it |
Knorrie | https://www.spinics.net/lists/linux-btrfs/msg70624.html |
darkling | I've got some that take 4+ hours. |
darkling | It really is pathological. |
Knorrie | right now it keeps doing 2 per minute, but I have >1TiB metadata on this fs |
darkling | It'll chomp through most of them like that. It's only a small fraction that are slow, IME. |
Knorrie | well, the ssd_spread behaviour is really really relevant here |
Knorrie | only thing that's slowing it down now is random disk reads |
Knorrie | Metadata, DUP: total=804.00GiB, used=537.50GiB |
Knorrie | Metadata, single: total=85.00GiB, used=10.91GiB |
m4t | :o |
flox | quick question, does it make sense to defragment files on a btrfs raid1 over smr drives? |
Knorrie | sounds like an XY question |
flox | meaning? |
Knorrie | that I suspect there's a different question behind that question, which is about the actual problem you ahve |
Knorrie | oh it was actually this one that was relevant for what I'm doing now... https://www.spinics.net/lists/linux-btrfs/msg64771.html |
Knorrie | hm, there was a delayed refs patch that would split up stuff in different lists... what was it called |
Knorrie | so that it doesn't keep traversing lists with too many items |
flox | Knorrie: a collegue of mine wrote some code to move big files to an archive storage and after each move he calls a defragment on the file. I question the wisdom behind that. |
Knorrie | if you just wrote them to that storage it doesn't sound like making a lot of sense yes |
Knorrie | unless it's dedupe and not defrag and there are very similar files on there already |
Knorrie | ah "[PATCH v3] btrfs: imporve delayed refs iterations" |
Knorrie | imporve... |
Knorrie | ah looks like it got in |
Knorrie | so maybe that one is also helping a bit |
Knorrie | amazing, it keeps going with 2 chunks per minute, only slowed down by a few MiB/s random reads, 0% cpu usage and sometimes some writes |
darkling | Knorrie: But.. but... Man Cannot Live At Such Speeds! |
Knorrie | I think it's still writing back a lot of writes into the DUP chunks instead of the new single, but at least it's less of a problem now |
Knorrie | Metadata, DUP: total=761.50GiB, used=527.51GiB |
Knorrie | Metadata, single: total=170.00GiB, used=20.87GiB |
Knorrie | ssd_spread is a total waste of disk space, but it |
Knorrie | it's a sensible sacrifice this way |
Zygo | you're testing on 4.9.x? |
Knorrie | 4.14.17 + some yolo patches |
Zygo | ah |
Zygo | so you've already got the couple-orders-of-magnitude-less-CPU backref patches |
Knorrie | yeah I think I was looking at that patch a few lines back here |
Knorrie | oh no, that's delayed refs |
Knorrie | which patches are those? |
Knorrie | if I had my storage (and if btrfs could do) with metadata on ssd and data on hdd, it would totally go at ludricous speed with tens of TiB of gazillions and gazillions of small files in tens of thousands of subvols |
Zygo | "btrfs: track refs in a rb_tree instead of a list" |
Knorrie | aha |
Knorrie | I remember seeing that one |
Zygo | that's also the "make bees barf a kernel trace every 1.7 seconds" patch |
Knorrie | yep, I have that one |
Zygo | but I'll take it because...faster backrefs |
Zygo | or at least backrefs with less heat output and fan noise |
darkling | Does that bring it down to O(n^2 log n) or O(n (log n)^2) ? |
Zygo | not sure they're actually faster |
Knorrie | darkling: 0e0adbcfdc908684317c99a9bf5e13383f03b7ec |
Knorrie | looks like not doing unnecessary things |
Knorrie | "This doesn't make any sense as we likely have refs for different roots, and so they cannot be merged." |
darkling | Things are bad. Something must be done. This is something. Therefore we will do it. |
darkling | ;) |
Zygo | Duncan advised Knorrie to be careful when doing device-level btrfs clones. That's...cute. ;) |
Knorrie | lun clones are in another igroup on the netapp, so split on the level of iscsi initiator |
darkling | :) |
darkling | To be fair, he's an excellent macro bot. :) |
Zygo | "UUID aliasing is so far down the list of massive data-eating problems I'm creating for myself--and that I'm also totally immune to, because test setup--that I didn't even think of it" |
Zygo | I'm pretty sure that there's a faster way to do find_parent_nodes and that it'll make balance faster...but that's all I'm sure of. |
Zygo | clearly there are parts of balance that are slow and not find_parent_nodes, too |
Zygo | I'd think that adding or removing dup copies would be much easier in the kernel with all the transaction logic in place than in userspace |
Zygo | but you'd probably have to avoid parts of the kernel that try to avoid accidentally messing with the device/block group trees |
Zygo | there are probably already pieces of this in the kernel, e.g. we can add and delete chunks/block groups already |
Zygo | but it's probably not something like delete + add in one transaction, because what if the metadata tree is in the block group you're deleting? |
Zygo | but maybe you can change a block group if you're changing nothing else about it, then create a new block group out of the chunk you're dropping, then immediately delete that |
Zygo | and somehow do that without leaking a chunk if the system crashes half way through |
Knorrie | converting dup to single? |
Zygo | dup or raid1 to single |
Zygo | (I guess raid10 to raid0 too, maybe) |
Zygo | going the other way requires copying the data which is somewhat harder |
coderobe | How is this possible? https://tmp.codero.be/GentleYellowCaterpillar3.txt |
coderobe | wrong link whoops |
coderobe | this, https://tmp.codero.be/FierceMaroonEland3.txt |
Knorrie | it's not just the block group, also chunk item with stripes and dev_extent tree |
Zygo | block group is the hard one that ties the others together |
Knorrie | I remounted to nossd for fun, and now it's indeed taking a lot longer to do I metadata chunk |
Knorrie | 1 |
Knorrie | remounting to ssd_spread and BAM! chunk done, next one |
zerocool | do any of you work on btrfs? |
viric | hello btrfs! |
viric | I have 590G in my btrfs, I scrubbed it, and it finished at 429GB. |
viric | Why does scrub report 429GiB and I have 590G occupied? |
viric | single disk. |
viric | Ah, compression, maybe? |
Zygo | does 'btrfs fi df' say 429G used? |
viric | Data, single: total=583.01GiB, used=582.81GiB |
viric | scrub started at Mon Feb 5 22:57:30 2018, running for 01:18:29 |
viric | total bytes scrubbed: 438.42GiB with 0 errors |
Zygo | "running for"...i.e. not finished yet |
viric | OH. ok. |
viric | I missed it. |
Zygo | the magic words are "finished after" ;) |
viric | it makes sense now that I know it. |
viric | scrub affects the whole device, right? even if I supply only a subvolume |
Knorrie | zerocool: work? |
zerocool | develope |
Knorrie | some in here are full time btrfs workers, and some occasionally. like, if you run into a problem and you find out why it happens yourself, and think of a solution, you can just send a patch to the btrfs mailinglist and see what the "official" developers think of it |
Knorrie | I also see it as a fun way to learn a lot of things |