Knorriesometimes if the situation is really stuck, it can help to temporarily attach some small block device (like usb drive or even ram disk) to get balance going to compact the data a bit and free up empty raw space
NSAyeah that's what i'm doing right now
NSA32gb usb stick
Knorriebut just doing balance on a metadata chunk will try to "help you a bit" and convert it from single to raid1 in that case, which totally screws up the situation
NSAlol shotgut heatmaps
Knorrieso if you remount with nossd (which increases the chance existing empty space will be filled up) and point balance at a chunk that is the least filled, the chance will be the highest that you get some unallocated raw space back
pythonhello, i am very depressed as my btrfs partion stopped working, can anyone help. please
pythonroot@ubuntu:/home# mount -orecovery /dev/sda4 /mnt mount: wrong fs type, bad option, bad superblock on /dev/sda4, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so.
python# btrfs check --repair /dev/sda4 enabling repair mode warning devid 2 not found already bytenr mismatch, want=169929834496, have=0 bytenr mismatch, want=169761996800, have=0 Couldn't open file system
NSApython: if you put that in a pastebin site like it'll be more readable
NSAi can't help you with that, i'm just trying to help make it more readable for someone who can
pythonaa okay
NSAand it's pretty much always better if you paste text to a text-pastebin site than taking screenshots
NSAthat's much better, good luck with your problem
pythonis there anyone online who could help and save a life here lol
NSAgive it a bit, patience usually helps
pythonanyone online can help,
NSAyou could probably also check `dmesg | tail`
NSAi'm guessing EU people went to bed now
NSAif there's nobody who can help you in USTZ I'd recommend you trying at a more EU friendly time
pythonaaaa okay
NSAwon't hurt to stay in here though, someone might come back later and see your messages
pythonyeah ill stay here i cant sleep anyway thinking of a data loss
pythonhere is he updated gist,
pythonwith logs
nebul8is it correct, that the nodatacow mount option can _not_ be set per subvolume, but only per filesystem?
nebul8(wiki mentions both ways somehow)
pythonif anyones up and can help me with btrfs please let me know. thanks
pythonif anyones up and can help me with btrfs please let me know. thanks
xnxspython: is the drive ok? was this part of a multi device volume? what happened that caused the issue?
python@xnxs laptop's battey died while it was asleep
pythonafter the reboot home partion couldnt be mounted and it was using brtfs
pythontake a look at that it has some logs too
pythonthe drive is ok i guess
xnxsi'm assuming that there was stuff that still needed to be written to disk when the computer went to sleep...
pythoni have no clue, because i shut the lid of the laptop and then didnt use the laptop for 8-9 hours
pythonat that point battery died
pythonso when i opened the laptop again i put a charge and the issue was there
xnxsis there important stuff on the drive?
xnxsie. stuff that's not backed up/replaceable?
python@xnxs yes there is
pythoni couldnt sleep because of that
xnxsdid you at least make a dd copy of the partition to another disk before doing any of the stuff to the volume
pythonyes i did
python Just to verify, the command is dd if=/dev/sda of=/new/path
xnxshave you tried btrfs restore?
pythonjust did, got this
xnxshow big is sda4?
pythonsorry 88gb
xnxstry the btrfs-find-root?
xnxsor specify the second superblock to see if it's different
python"second superblock to see if it's different" --> what would be the command?
xnxswas this in a raid?
pythoni dont think so
pythonim not sure tbh
xnxsbtrfs restore -u 1 whateverljiohihil
pythonbtrfs restore -u 1 whateverljiohihil btrfs restore: too few arguments
xnxsthe whateveroijpovih shoudl have been rplaced with whatever fits to your situation
pythonoh lol
pythonthe highest would be the first in "well block" right?
python# btrfs restore -u 1 95975227392 btrfs restore: too few arguments only have two at 64k and 64M
xnxsbtrfs restore -u 1 /dev/sda4 /mnt/whateveritwasthatyouwantedhere
pythonsorry im new to this btrfs so im slow in this
xnxsme too for the most just reading things
xnxsand you really can't fuck it up more than you already have since you have a bitwise copy of the partition
xnxsbtrfs restore /dev/sda /mnt/restore -i
xnxssee if ignoring errors will continue
pythonokie sec
pythonnop :(
pythonCould not open root, trying backup super thinks the device is 274G
pythonbasically im fucked ? :(
xnxsdid you by chance try mounting degraded yet?
pythonyes i tried didnt work
MoHi, I can't get a subvolume sent/received: ERROR: send ioctl failed with -25: Inappropriate ioctl for device
multicore"ERROR: not a subvolume: /mnt/btrfs/snapshots/home ERROR: Failed to get subvol info /mnt/btrfs/snapshots/home: 1" <- not a subvol
RakkinI ran btrfs check /dev/sda2 once and it reported "cache and super generation don't match, space cache will be invalidated", then I ran btrfs check again and it still reported unmatched cache
Rakkinwhen will cache be invalidated?
xnxsthere is a 'clear_cache' mount option
Rakkinok, thanks
cebeweehi everyone, my machine died yesterday by OOM after moving around 1TB of data from one btrfs filesystem to another one and deleting around a 100 snapshots. Kernel was Debian's 4.9.0-4 -- is this an issue worth reporting?
xnxsDepends. Do you have details devs would need to replicate and determine if the OS/FS is handling OOM conditions sanely?
RakkinBy the way, will it make sense to make a btrfs raid 1 over ssd and hdd? What I want from raid 1 is protection from "corrupt leaf, bad key order" I had such an issue yesterday and I don't want to experience it again.
cebeweeI have the kernel logs and I might be able to describe the kind of data I was moving around -- what kind of information do you need?
xnxsi wouldn't need anything because i don't do any of that
xnxsRakkin: i dont think ssd/hdd pairs would be good.
Momulticore: So I need to create the subvolume first on the target and use that as target path?
RakkinThat's a pity. :( I don't have two ssds.
Momulticore: I tried that as well, but still failing:
multicoreRakkin: "corrupt leaf, bad key order" raid doesn't help you with this error
Rakkintoo bad
multicoreRakkin: it's a memory problem 99% of the time
RakkinI ran memtest for 16 hours with no errors.
xnxsi use ECC
Rakkini wish the machine where i encountered the error supported ecc
multicoreRakkin: patched btrfs-progs could (possibly) fix bad key order issue
multicorenot sure if any of bitflip patches are merged to btrfs-progs
RakkinWell, it looks like mainline btrfs-progs fixed it for me this time. I haven't booted into the system yet, but "btrfs check --readonly" doesn't find that problem anymore.
multicoreit's possible
Rakkin - log of check --repair
multicoreMo: i've never used send/receive so can't help you on that one...
darklingMo: The problem is on the send side (which it tells you in the error: "send ioctls failed"), not the receiving side.
darklingAnd "inappropriate ioctl" means that the parameter to the send command isn't the right kind of thing. So, is $sub1 on a btrfs filesystem? Is it a subvolume? Is it a read-only snapshot?
darklingmulticore: The bitflip patches aren't merged yet.
Modarkling: There is a subvolume show in the paste, does that have all this information? Yes, it is a readonly snapshot.
Modarkling: I need to do that from a Live system for migration my subvolumes. So I wondered if the btrfs-progs are compatible, but it's a recent kernel and progs:
Mo4.9.47-std510-amd64, btrfs-progs v4.9.1.
multicoredarkling: ok
Modarkling: What is wrong on the send parameters?
RakkinWould send/receive between ssd and hdd on the same machine save me from corrupt leaves? In other words, if the partition on the ssd dies one day, would I be able to simply copy the backup created an hour ago from the hdd which is on the same machine (wouldn't it be corrupt too)?
KeRakkin: send/receive should never replicate any corruption that would show up as IO errors from the kernel interface
Kehowever, if your kernel silently gives out invalid data the invalid data could be replicated
Kelatter should be less relevant, in my assumption, but I don't have actual data on that
Kein my tehcnical opinion send/receive is slightly safer/more isolated than raid1 mode
RakkinThanks. I'll probably set up send/receive.
RakkinJust in case another leaf gets corrupted. Thought, I've never had such problems in a couple of years since I started using btrfs.
ZygoRakkin: raid1 between hdd and sdd can't be good anyway (it'd be limited by the hdd performance, wasting the ssd), so send/receive between two drives configured as separate filesystems should be much better than raid1
Zygodo use dup metadata though, because bit flips suck
Rakkindup metadata? Is it a send/receive option?
Zygono, a mkfs or btrfs-balance-convert option
Rakkinok, I'll check that
Zygothe default for single detected hdd's is dup metadata, but for detected ssd's it's "single" which is one bit-flip away from destruction
Zygo'btrfs balance -mconvert=dup,soft /fs' will convert existing non-dup metadata to dup
Zygo'btrfs balance start -mconvert=dup,soft /fs' will even work without a snarky error message ;)
Zygofor multi-disk filesystems always use raid1 for metadata
darklingNote that most "bad key order" errors really _are_ in RAM, though, and no amount of RAID is going to deal with that case.
darklingThe RAID robustness deals with the case where there's an error on the physical medium at rest, when the medium itself doesn't detect the problem.
darkling(Disks have checksums, too)
Zygotrue, if you're getting bad key orders you need to replace/upgrade your RAM subsystem
Zygobut disks also have RAM
darklingOr look very hard at your power regulation.
Zygodiverse redundancy ftw
darklingDi less redundnacy you have, di verse it gets?
Zygoit's the ISO version of 'don't put all your eggs in the same (brand of) basket'
RakkinI use coreboot, so my problem might be in wrong low-level settings of sata controller or even the pch. It might be not RAM itself.
btrfsoopsGot an Internal error Oops Bug for you devs:
Zygoif you get "bad key order" but not "wrong CRC/checksum" then it's not any part of the system outside of the CPU and RAM
darklingbtrfsoops: Probably best to take that to the mailing list.
RakkinI ran memtest for 16 hours with no errors. Perhaps I should stress it more.
darklingWe don't get many developers active in here.
ZygoI wonder how expensive it would be to just run a quick check on metadata pages and panic/remount-ro if a bad key order is detected?
darklingZygo: It effectively does that already.
darklingOr do you mean check all of them in one go?
Zygocheck them after the checksum is computed but before writing them
Zygoit happens often enough that we can spot it in IRC, so it might be worth coding for
darklingProbably pretty fast.
darklingThe data's already in RAM, it should already be sorted, and the data structure that needs verifying is a fixed unit size.
darklingSo you can just skip along in 25-byte units (or however big it is), and check that each one is larger than the previous one.
darklingIt's likely to be lost in the noise of the effort to actually write stuff to permanent storage.
Zygohunh...I have a drive that has been running the "short" SMART self-test since October 11
darklingCould be worse.
darklingAt least it's not October '11
optybig one
Zygothe firmware is prioritizing host IO requests over self-test execution quite well, it seems
Zygoit's keeping up with its application workload (including btrfs scrubs), so...meh
Zygo"10% of test remaining." oh good we'll have the results by Christmas
darklingWestern calendar or Eastern? :)
Zygodoes it matter if I don't specify which year? ;)
Ketimes expressed as stages of the life cycle of the sun
Kesomewhere early in the red giant stage
darklingNo, you have to specify a year.
Zygoas long as the test completes before the drives crumbles into dust, it's all good
darklingJust not necessarily which calendar you're using.
Zygoand if it doesn't, well, "fail" is a result too
Zygobees has two developers who a) aren't me and b) are talking to each other
Zygowooo it's almost like a real open source project now
darklingWell done. :)
darklingACTION wonders what Zygo put in the coffee
Zygohoney! ;)
optywhat is correct, "top level subvolume" or "top-level subvolume"?
darklingProbably the latter, although I don't think it makes much difference these days.
urmet"subvolume with the ID number of five"
darklingOr, "Rodney", as we call him.
optyrodney mckay? :)
darklingI was thinking Trotter, not McKay.
pythonif anyones up and can help me with btrfs please let me know. thanks
bccanyone seen this before:
bccjust randomally happened on my desktop
bcci guess balance it
darklingbcc: Also, is the "ssd" option enabled on that FS? If so, you may want to turn it off (with "nossd") until 4.14 comes out.
pythonif anyones up and can help me with btrfs please let me know. thanks
darklingpython: Do you have a missing device in that filesystem? If so, try mounting with -o degraded.
darklingAlso, "bytenr mismatch, want=169929834496, have=0" doesn't look good, as it implies that you've got a bunch of zeroes somewhere important.
python@darkling i tried mounting with -o let me paste u the log
pythonit happened while laptop was asleep and battery died
pythonwhen i turned it on, it couldnt mount
darklingWhat kernel is this? Also, is it an SSD?
pythonnot sure on SSD
pythonubuntu 15.04
darklingThat's the OS version. I asked about the kernel. :)
pythoni think its 4.6
pythoni hope i can recover the data :(
darklingThat's pretty old, but not in the "dangerously antique" range.
darklingThe "bytenr mismatch" error, where it's read a load of zeroes off the disk where there should be some metadata, suggests a problem with trim on an SSD.
darklingDo you know if you had the "discard" mount option set?
pythonif it is by default
pythonthen it was
pythonotherwise i didnt set anything manual
pythoni setup'd the btrfs while installing ubuntu (my bad)
darklingI don't know what Ubuntu do for their defaults there.
pythonlet me google
pythonand exact kernel was this , (Ubuntu 4.2.0-42.49-generic 4.2.8-ckt12)
pythoni checked the log files
pythonto find that
pythonbut only /home folder was btrfs
darklingThen the mount options for it should be readable in /etc/fstab
darklingOK, so it's probably not a trim problem.
darklingProbably one of either the old kernel, or slightly buggy hardware. (Either could be the case -- I'd probably point at the hardware here, given the nature of the error messages).
pythonit was installed on a macbok pro
pythonmacbook* pro
pythonwould changing the hardware and cloning the disks wud work?
darklingIt might change the nature of the errors you get in the future.
darklingIt won't fix the damage that's been done to this FS.
pythonaaa ill change the laptop once i recover the filesystem
darklingI would recommend putting in a 4.13 kernel as well.
darklingKeep it reasonably up to date (6-9 months or so, probably)
pythonso i use the recovery option on boot, login as root and update the kernel ? apt-get update and upgrade ?
darklingNo need to use the recovery option -- you can do it on a normally-running system.
darklingYou need to reboot after the upgrade, though, and you may find that you'll need to enable one of the backports repos to get a recent kernel.
pythonnormally-running system -- problem is i can run the OS but then i can login as guest only cant login as user because /home dir doesnt exist
pythonright now i am on a live cd
darklingRecovering the FS is probably your first priority.
darklingThe only thing I can suggest at this point is btrfs restore.
darklingYou may be able to get further with btrfs-find-root to identify older versions of the metadata to use, but it's probably not likely to work.
pythondoes this mean all data is lost?
darklingWell, sort that by the "gen" value (the first one on each line, in brackets),
darklingand then the block number (first number on each line) is the important number to pass to btrfs restore.
darklingAnd... (one sec)
darkling... it's the -t option that you use to pass the block number in.
darklingYou want to start with the values corresponding to the largest gen value, and work backwards.
pythonaha okay wait
darklingI need to go and find some food.
darklingI'll be back in a few hours.
pythonill try to do what u said
pythonbut do let me know when ur back :D
pythoni cant sleep cuz of data loss
pythonso from Well block 95975227392(gen: 7015343 level: 1) seems good, but generation/level doesn't match, want gen: 7015344 level: 1
pythoni need 7015343 right?
pythoni used this cmd, for i in `tac /root/000-btrfs-find-root.1 | grep 'Well block' | awk '{print $4}'`; do echo "--- Well block $i ---"; done;
pythonand i got a list of those numbers
bccso got out of space issue. rebooted pc and get open ctree failed. Now in rescue cant mount in recovery, without same error. Running btrs xheck comes back with inode issues... Im a bit worried to run --repair..
bccsafe to run --repair?
multicorebcc: repair won't fix those transid faileds
bccmulticore: okay. I assume that happened when my pc went into read only mode
multicorebcc: try mounting with -ousebackuproot
bccmulticore: didn't work..
bccmulticore: maybe try mount as rw
bccmulticore: i guess if i can get it to mount then can run scrub
multicorebcc: scrub won't help (unless it's raid...)
bccmulticore: ack
bccmulticore: can't get it to mount
multicorebcc: you can use "btrfs restore" to restore files from the fs but note that it's possible you'll get corrupted files from it...
bccmulticore: that sucks.. so btrfs ran out metadata space, sent me pc into read only and got its self in a mess
multicorebcc: kernel version?
bccmulticore: ran --repair and able to mount
bccmulticore: Metadata = total=10.01G, used=9.39G
multicore--repair fixed the transid verify faileds ?
multicorebcc: did you try running check --readonly after the --repair?
bccmulticore: --repair fixed transid verify
bccmulticore: i didnt run --readonly
bccmulticore: i mounted with usebackuproot
multicoreit's possible that --repair fixed the other issues and usebackuproot fixed the transid verify problem
bccmulticore: what should I run now? balance with dusage=5?
multicoreit's possible that those transid problems are still there
multicorebcc: remount with nossd
multicorebcc: then balance with dusage or dlimit so you'll have at least 1GB unallocated space, check with btrfs fi usage
bccmulticore: and usebackuproot?
multicorebcc: -oremount,nossd
bccmulticore: okay. it seems in order
bccmulticore: i range dusage=5. free = 105G, metadata 10G total with 9.39G used.. which seems bad?
bccmetadata ratio is 1.00
multicorebcc: look at the unallocated space
multicorebcc: you'll need to have >=1GiB unallocated so metadata has space to grow
bccmulticore: ah ack
python@darkling are u back?
gustav_ganzhi. how can i clean a btrfs-filesystem, which gave me a uncorrectable error 'verify=1' with an scrub? its not about how to restore (potential) destroyed data but how to get the filesystem back without having an false block..
Kegustav_ganz: btrfs insp log <logical number> <mount point>
Kethen remove that file, if any is printed
gustav_ganzKe: only prints: ERROR: logical ino ioctl: No such file or directory
Keit might mean that the error is in metadata, in which case reformat may be required
Knorrieor it's a part of an extent that is not referenced by any file?
gustav_ganzKe: looks like metadata is correct, as in dmesg i got the logical number from: checksum/header error at logical 170698358784 on dev /dev/mapper/data, sector 99255840: metadata leaf (level 0) in tree 257
Knorrieit says it's about a metadata leaf
KeKnorrie: yeah, could be that
gustav_ganzokay. so i'll have to reformat the drive and restore the data from backup?
Kewell you can salvage whatever is readable anyway
Keif that node is used only by some old snapshot, it might not be a problem, until you want to access/remove that snapshot
Kebut definitely you should do something as soon as it's convenient
Keand immedeately check that you actually have the backups and salvage everything, if you don't
gustav_ganzKe: thank you. copying all files to a spare drive, restore from backup and copy back the updates from the salvaged files (;
matt4d617474hi all; have a situation where a btrfs filesystem is unable to mount (think it was doing an auto balance when power cycled)
matt4d617474get a kernel stacktrace with btrfs_recover_relocation at the top
matt4d617474this is a sles12sp2 box (kernel v4.4)
matt4d617474"-o skip_balance" doesn't seem to help
matt4d6174744.4.19 to be more precise
darklingmatt4d617474: That might be dealt with using a later kernel. There are a couple of bugs in the area of btrfs_recover_relocation that are dealt with in more recent kernels, IIRC.
matt4d617474i'm going to try to build a rescue kernel then
matt4d617474hopefully if i'm able to boot it, then subsequent boots will work on prior kernels?
darklingMmm... not sure. Possibly not.
darklingIt may be a case of perfectly good metadata which isn't handled correctly by that kernel.
darklingI'd recommend running something a little newer than 4.4 anyway.
matt4d617474thanks for the info... will give it a spin tonight