MoUukGoblin: I'm not sure why duperemove wouldn't be able to deduplicate your .rar and .mkv files.
MoUukGoblin: In the past I did experiments with very tiny block sizes like duperemove -b 4096 also increasing fragmentation.
multicoreMo: it won't work
Momulticore: Why, doesn't duperemove find duplicates within files based on extents?
multicoreMo: because there's none
multicoreMo: file A and file B which is ~250 byte header and the rest is same as file A
Momulticore: You mean already reflinked to the same extent? I don't think so in UukGoblin 's use case, when file b is created as new file. Without deduplication the extents are full copies.
multicoreMo: file A is the newly create file
UukGoblinI can either create the .mkv from .rar, or find both .mkv and .rar on disk and trigger deduplication later
MoAnyway, at the end you have 2 full copies and need to deduplicate, no?
UukGoblinthey're not full copies, no
UukGoblinone of the file has a few bytes of junk in front of it
UukGoblinone of the files*
MoWho deduplicated it?
multicoreMo: ... every block will be off by ~250 bytes
UukGoblinand "few" * 4096 is not a natural number
UukGoblinMo, no-one yet
UukGoblinI mean "few" / 4096 is not a natural number ;-P
MoSorry, then I didn't get your issue. I thought you are trying to deduplicate?
multicoreMo: yes, but you can't
UukGoblinI am trying to de-duplicate but the data is offset
multicoreMo: because there's no duplicate blocks tu dedupe
xnxs_basically, you'll need to add padding after the offset to make the offset land on a block boundary or not be able to dedup the data
UukGoblinyes, if adding padding was possible within a block, then that would work
xnxs_on your file, yes
MoUukGoblin: But then you have 2 full copies of the file, one with the header and shifted data.
UukGoblinno I don't mean 0-byte padding, I mean like some magic block-padding that doesn't get translated to zeroes when reading ;-)
xnxs_pretty much you're gonna be SOL
multicoreMo: try it, won't work
xnxs_basically, you have two files that contain for example "abcdefghi" and "123abcdefghi"....if they're say 4 char blocks....nothing will match up...
Momulticore: What to try, deduplicating? Yes I'm not sure about that. But I wondered about UukGoblin message "they're not full copies, no". They are but not "identical" or "copies", but different full copies.
Moxnxs_: I see.
xnxs_abcd and 123a don't match....efgh and bcde dont match
MoI have the same issue, like some data, some cache with that data and a lot of emails including parts of that data, and tried to deduplicate that. I can't even say if it worked, only by comparing the usage after deduplication.
xnxs_diffiehellman would probably catch it and compress them together if they use the same dictionary
MoI'm running duperemove jobs with cache on $HOME periodically as cronjob.
UukGoblinxnxs_, things like rdiff do it based on a rolling checksum, so the tool CAN figure out how the data is duplicated. The issue is being able to tell BTRFS how the blocks are organized, or where to add padding
xnxs_yeah...prolly not gonna happebn
UukGoblinok, cool, thanks :-)
MoSending/Receiving snapshots by btrbk, today I get ERROR: unexpected header. Don't find that sentence in the btrbk sources so probably coming from btrfs.
MoSeveral things I did: I defragmented the last snapshots in the target because receive was so slow. I switched to 4.15.1 today. I switched from compress-force=zlib to compress-force=zlib:9.
py1hondoes btrfs have some sort of an "extended fiemap" ioctl that tells you what device(s) each extent is on?
MoIf I have btrfs on LUKS on a mobile disk, is it safe to unplug after umount? Or do I need to remove the LUKS mapper device first?
optyi'd do that
Keif there is a way to do things properly, I would always go with the proper solution
Keobviously should be safe to do right after umount
optyyup
Zygopy1hon: not in a single ioctl. You can get a virtual address from FIEMAP or TREE_SEARCH, then use TREE_SEARCH to look up the block group, then look at the device tree to see which device UUIDs it is stored on. Then you enumerate all the devices in /dev (or use another ioctl to get the kernel's mapping if the filesystem is mounted), find the devices with matching UUIDs, and then you know which devices the extent is on
UukGoblinsomething very strange happened to me.. I had an active filsystem mounted from an LVM volume, /dev/vg0/root, mounted on /
UukGoblinI created an LVM snapshot with `lvcreate -L10G -n root_snap -s /dev/vg0/root`, mounted it to /root/b, rsynced it over, unmounted /root/b
UukGoblinnow I can't remove the snapshot logical volume and somehow magically /dev/vg0/root_snap is now mounted on my /
multicoreUukGoblin: https://btrfs.wiki.kernel.org/index.php/Gotchas#Block-level_copies_of_devices
UukGoblinuhm, thanks, reading
UukGoblinok, so I should expect data corruption now?
UukGoblinno way to fix this?
UukGoblingreat, just when I wanted to make a backup...
multicoreUukGoblin: ... unrecoverable corruption is expected
UukGoblinsuper-weird that, I was sure it stores the number of devices in a filesystem somewhere
UukGoblinshould be easy to detect that 2 are identical and therefore actually clones and not separate parts of a filesystem...
UukGoblincan't this be fixed in a future version? Seems really odd... should be trivial to detect such a case
viricIs there any command that will tell me the mount options for a mounted fs? "mount" lists all, but what about querying one specific?
viriclike... show me the mount information of "."
viricdevice, mount point, options, etc.
optytry /proc/mounts
UukGoblinyeah, that'd be nice to have, grepping for '.' doesn't work very well ;-)
viricright.
viricI remember like this existed... but I remember as if "mount ." would report that, which looks quite like bad memory. :)
optymount probably just uses /etc/mtab
viric$ stat -c '%m' .
opty/proc/mounts should show all
viricthat's a start.
optyyou can also use df but that probably requires some post-processing :)
UukGoblindoes partclone.btrfs change the UUID on the clone then?
UukGoblinwhat does btrfs even do with a cloned device? how does it know if it's meant to be raid0, raid1, jbod?
optydig in https://github.com/Thomas-Tsai/partclone
optyi didn't find anything (quickly)
UukGoblinhah, I wonder if the author is aware of this "gotcha"
Zygothe problem as I understand it is that when you do a btrfs device scan, any existing device UUID is silently overwritten by new devices with the same UUID. The API doesn't do something like "if this is a duplicate UUID, remove _both_ devices and raise an error" which would be much more data-preserving
Zygodoing it that way might break recovery on USB disconnect/reconnect cycles, though, which I gather is a thing people do use
Zygothe filesystem does have an idea what raid levels are (raid levels are actually per block group, not per device) and it knows devices by UUID, but it relies on userspace tools to tell it what the UUID-to-physical-device mapping is
UukGoblinah right, yeah, now I remember raid is indeed per-block
UukGoblinok, it appears it just overwrote the original with the snapshot device (`btrfs filesystem show /` only shows 1 device), in which case I should be OK~ish
UukGoblinsince it'll just write everything to the snapshot device from now on
UukGoblinso I should be OK until it gets full
ZygoUukGoblin: you probably want to get out of your current situation ASAP, but _not_ before you've made a backup
Zygoin the best possible case, btrfs simply started writing cleanly on the lvm snapshot, so if you merge it with its origin it will be fine
Zygoin the other cases, the filesystem will be dead as soon as you umount it
Zygoso...plan accordingly ;)
TriztACTION thinking of using bcache for btrfs, would it be smarter to use md for raid1 than btrfs own
zdzichuless smart
zdzichumd cannot self-heal
TriztACTION nods
Triztoh an unrelated question, on my current btrfs installation I have noticed it takes quite long time to list files when you start to have more than a handful in a directory, I have tried with scrub, balance, defrag, but nothing seems to improve things and io-wait is always quite high on heavy loads. Anything that could help more than a fresh setup?
multicoreTrizt: have you tried defrag without -r ?
TriztNo, I think that's the only thing I haven't tried.
Triztthought that everything was done when using -r
ZygoI think you need to run defrag -r on the top of a subvol, it's a no-op on any other directory
ZygoI think you need to run defrag *without* -r on the top of a subvol, it's a no-op on any other directory
Zygowhat is "more than a handful"? Six? Six thousand? Six million?
TriztZygo: more than 20
Triztthe derag without -r seems to make things better, at least glances seems to start within 10 sec, before it took like a minute
Zygothat seems...slow
ZygoI get 6ms to list a directory with 40 files, cold cache
Zygoprobably includes the time to page in and dynamically link the 'ls' binary too
Zygosomething else is going on there
Triztmaybe, just that I don't find anything that would indicate on any issues
Zygomachine with 5900 rpm drives and a load average of 24.00, ls -l on 40 files takes 371ms
Zygoon peak write loads it takes longer but only during transaction commit
Zygobut transaction commit loads don't vary according to the number of files
UukGoblinZygo, yup, thanks!
TriztACTION nods
UukGoblinbacking the hell out of it up
UukGoblinzdzichu! hi man, what's up? :-)
Zygook, so in userspace, I can have my snapshot-delete script call 'balance pause', delete snapshots, wait for the snapshots to go away, then call 'balance resume'
Zygo...is there any reason why btrfs-cleaner can't do that?
ZygoI note that btrfs-transaction does pause btrfs scrub during transaction commit
darklingProbably not.
Zygoof course, to avoid my point of pain, it would have to work on device delete too
Zygo'btrfs device delete' -> disk fills up -> snapshot deletes are deferred until device delete ends -> device delete fails with ENOSPC
Zygothen snapshot deletes run and the device delete would have succeeded
Zygoor, repeat the above until the device is finally gone
darklingI'm increasingly of the opinion that there should be an ioctl where you pass it a block group ID, a RAID level, and a list of n-tuples of device IDs, and it moves the block group into (a) free space, (b) new block groups with chunks in the positions specified in the list of device tuples.
darklingYou can do device delete, balancing, conversion, and all kinds of combinations of those, under userspace control.
darklingTakes out a load of the things like "well, balance is almost like device delete, but not quite", and "you can't convert this FS in one go because the chunks aren't evenly spread around, and there's a corner case where it fills up before you're finished"
Zygosure, or have device delete check between block groups to see if any other device deletes were requested, and add those
darklingYou'd also be able to do some of the things like drop a device and convert from RAID-1 to single in one go.
Zygothere's one case where it's not solvable: converting a 2T single into 2x1T raid1 + the original 2T single. The allocator fills up the 2x1T disks, then has nowhere free to put the rest of the data
darklingI mean, loads of complicated algorithms to make it work, but they'd be in userspace...
Zygoyou have to do _two_ balances to make that case work