
how-to block ads
|
|
Uniqs: 1631 |
Share Topic  |
 |
|
|
|
 timcuthBraves FanPremium join:2000-09-18 Pelham, AL Reviews:
·AT&T Southeast
| btrfs or zfs - Which way to the future? Ok, now that Chris Mason is leaving Oracle:
»www.muktware.com/3678/btrfs-crea···s-oracle
which advanced filesystem will dominate on Linux in the mid-term future? Or is it still too soon to even consider?
Tim -- "Life is like this long line, except at the end there ain't no merry-go-round." - Arthur on The King of Queens ~ Project Hope ~ | |  MaxoYour tax dollars at work.Premium,VIP join:2002-11-04 Tallahassee, FL | Any project that relies on a single person is doomed in the long run. Since Oracle is relying on this for their Linux I'd find it hard to imagine they would let development on it slow down. | |  rchandraStargate Universe fanPremium join:2000-11-09 14225-2105 | ...such as Hans Reiser?
Is there something fundamentally wrong with ext4 then? I don't know the subject too deeply...why are the choices stated as only those two? | |  DeHackEdBill Ate Tux's Rocket join:2000-12-07 | Ext4 is kinda hackish. The ext4 driver is capable of running an ordinary ext2 or 3 filesystem.
btrfs is a completely new system with snapshot support, internal compression, and a boatload of other features. ZFS is similar but it's more mature as long as you're using Solaris or maybe BSD; it's available for Linux as a FUSE driver (somewhat poor performance) and a kernel driver (which still deadlocks itself under pressure).
I would imagine the devs of each filesystem couldn't turn down a feature request. It's good for features but means a hard development cycle. Neither are ready for primetime under Linux yet. That said I've been playing with ZFS anyway and do like what I've been seeing so far. That's more-or-less where I'd throw my money, mainly because it's already mature and has a seemingly higher level of portability with the FUSE driver, native BSD and all that.
I do hope btrfs surprises me though, but in the short term I don't see that happening. Then again ZFS is struggling for stability. -- That's odd... | |  reub2000Premium join:2001-12-28 Evanston, IL | reply to rchandra said by rchandra:Is there something fundamentally wrong with ext4 then? I don't know the subject too deeply...why are the choices stated as only those two? ext4 lacks all of the bells and wistles of the newer filesystems. As computers get more powerful the overhead of zfs will matter less and ext4 will seem quaint. -- My pbase gallery | |  timcuthBraves FanPremium join:2000-09-18 Pelham, AL Reviews:
·AT&T Southeast
| reply to rchandra said by rchandra:...such as Hans Reiser?
Is there something fundamentally wrong with ext4 then? I don't know the subject too deeply...why are the choices stated as only those two? Even the top developer of ext4 stated that it is only a stopgap between the past and the much more capable filesystems of the future. Right now, on the *ix platforms, the main candidates are btrfs and zfs. Oracle develops both of them. Both are capable of handling many drives and huge amounts of storage, with maximums stated in exabytes. They natively contain all the functions of volume managers, filesystems, and RAID (but not yet RAID 5 or 6).
I am still on ext4 myself with all of my Linux instances, but I am wanting to put up a new testing instance just to learn one of the new filesystems. I have no real need except for a need to know.
From my reading yesterday, it seems that zfs is currently more ready for prime time. But btrfs is open source, and it may even be free (I need to check).
Tim -- "Life is like this long line, except at the end there ain't no merry-go-round." - Arthur on The King of Queens ~ Project Hope ~ | |  DeHackEdBill Ate Tux's Rocket join:2000-12-07 | ZFS is open source (depending on which version you're looking at) but not with a GPL-compatible license. It also supports RAID5, 6 and, umm, I'll call it RAID7 (tolerates 3 dead drives) along with the usual mirroring and striping.
brfs is obviously GPL since it ships with the linux kernel right now (and as of several years ago).
I'm going to put my money on zfs for now, but look forward to being pleasantly surprised by btrfs. -- That's odd... | |  rchandraStargate Universe fanPremium join:2000-11-09 14225-2105 | reply to timcuth I'm must say, I'm kind of surprised. The Unix philosophy is do (really only) one thing, and do it well, with simple (as in the opposite of convoluted) interfaces in and out to make integration (such as pipelining) easy. The one thing to do is store and retrieve data, and provide access controls, and to a certain extent, organize it. The method I'm used to has the RAID and LVM pieces as a more-or-less totally independent layer between filesystem and block devices. It allows plug-n-substitute-n-play, whereas integrating dm into a filesystem takes away that modularity.
Hmmmm.... -- English is a difficult enough language to interpret correctly when its rules are followed, let alone when a writer chooses not to follow those rules.
Jeopardy! replies and randomcaps REALLY suck! | |  | reply to timcuth The following were published within the last few years and may be helpful:
● Benchmarking ZFS On FreeBSD vs. EXT4 & Btrfs On Linux
● Benchmarks Of ZFS-FUSE On Linux Against EXT4, Btrfs
● Workload Dependent Performance Evaluation of the Btrfs and ZFS Filesystems [pdf]
● Benchmarking ZFS, XFS, ext4 and btrfs with PostgreSQL 9.0 and 9.1
● ReiserFS vs ext4 vs XFS vs ZFS vs Btrfs Linux filesystems compared | |  | reply to timcuth said by timcuth:From my reading yesterday, it seems that zfs is currently more ready for prime time. But btrfs is open source, and it may even be free (I need to check). ZFS has been stable for years, but it was a product of Sun Microsystems and mainly developed for Solaris. Now, of course, Oracle owns it due to their buy-out of Sun. Later it was made available for BSD and is apparently stable there as well.
And yes, ZFS is open-source and has always been. However, it is not GPL compatible (it is licensed under Sun's CDDL license). The two are not compatible and that is the quandary Linux finds itself in. Sun refused to relicense it under GPL (they even had talks with Torvalds himself). Oracle is even less open to the GPL so it is doubtful it will ever be compatible with Linux (outside of FUSE, which sucks). -- Getting people to stop using windows is more or less the same as trying to get people to stop smoking tobacco products. They dont want to change; they are happy with slowly dying inside. -- munky99999 | |  timcuthBraves FanPremium join:2000-09-18 Pelham, AL Reviews:
·AT&T Southeast
| Thank you, KodiacZiller . That is very helpful info. Software freedom is quite important to me.
@FF4m3, I am not so much interested in current performance comparisons. I am sure both of these filesystems' performance will be inproved. But I do appreciate your posting them.
My main interest is in getting a feel for which one will become more prevalent on Linux in, say, the next two years. As Yogi Berra once said, "If you come to a fork in the road, take it."
Tim -- "Life is like this long line, except at the end there ain't no merry-go-round." - Arthur on The King of Queens ~ Project Hope ~ | | |
|  rexbinaryMod KingPremium join:2005-01-26 Plano, TX | reply to timcuth Since ZFS is not GPL compatible, I believe that makes btrfs the future for Linux by default. | |  DeHackEdBill Ate Tux's Rocket join:2000-12-07 | Well, like Broadcom wireless or nVidia graphics drivers, it's completely allowed to load non-GPL drivers into the kernel at runtime as modules. ZFS would qualify and that's how the work is being done now.
People who want ZFS can and will get it. The question is, will it be worth the additional hassle? My personal opinion is that, if both were magically made stable right now, I'd be all over ZFS. -- That's odd... | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:20 1 edit | reply to KodiacZiller said by KodiacZiller:... Later it was made available for BSD and is apparently stable there as well. No, it isn't. Don't let the crazies on the web try to make you think otherwise. Check out the freebsd-fs, freebsd-stable, and zfs-devel (web archive not available; for whatever reason this list is intentionally not archived) mailing lists (check multiple months) for all the issues that continue to surface about ZFS on FreeBSD. You can also check PRs if you want. They're all non-stop, and have been for over a year.
I realise that ZFS on FreeBSD, presently, does not print the "SUPPORT IS EXPERIMENTAL!" text any more, and that's mainly because there are people actively working on it at all times. I can name names if needed. The problem, like with everything on FreeBSD kernel-wise, is that there are only a couple people actually doing the work. FreeBSD has no where near the number of clueful eyes doing the work + analysing bits as Linux does -- and I'm speaking about everything here, not just ZFS.
Bottom line: if you want stable ZFS, run Illumos or OpenIndiana (for those who don't know what those are, they're spawns of OpenSolaris).
Overall, if you want my advice, it would be this: if you absolutely need/want ZFS, go with Illumos/OpenIndiana. If you need Linux, go with Btrfs. Performance on Btrfs is often better than that of ZFS (on FreeBSD anyway), and doesn't mandate massive amounts of RAM (result of ZFS's ARC) to get good performance. If you need Linux and need I/O speed, consider using md and ext3 or ext4, but remember: none of the ext* filesystems support checksumming, which is absolutely the #1 reason to run ZFS or Btrfs.
P.S. -- There's always HAMMER and HAMMER2 if you want go to the DragonflyBSD route...  -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  DeHackEdBill Ate Tux's Rocket join:2000-12-07 | ZFS doesn't eat memory so much as it takes a more active roll by doing its own cache management. If you use ZFS on linux, you'll find the usual "cache" memory not being used. You've really only changed ownership of the filesystem cache. (Linux is currently my only experience with any significant weight to it)
As for IO speed, ZFS has a couple of interesting... uhh.. hacks (!) to speed it up - using an SSD as a secondary read cache and having a designated fsync() storage space for speed reasons whether you use the external ZIL feature or not. Then again raidz is notoriously bad for seek performance and makes software raid look good.
And I don't think checksumming is the number one reason to use either ZFS or btrfs. It's a high selling point but if you imagine the day they (or either one) become standard installs for Fedora, Ubuntu, etc. the typical workstation doesn't have several hard drives in their computer and the thought of using copies=2 (or whatever btrfs' equivalent is called) isn't going to appeal to anybody. Snapshot-assisted backups, in place compression and protection against any power outage are far more likely to be the killer features to the average user. Yes it's a different target than what enterprise has in mind but I think this model has to be taken into account as well. If ext4 really is just the crutch until btrfs is ready, this has to be the direction we're going and I'm not entirely sure it's the intended use case. -- That's odd... | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:20 1 edit | Using some ZFS terminology here because I'm not up to speed on Btrfs:
Checksumming is important even in the case of systems with a single-disk ZFS pool: it allows you to detect underlying storage subsystem issues. Yes, with a single disk you can't auto-repair the problem (mirrors and raidzX solve that), but you can still detect the problem. Furthermore, with detection on ZFS, it's even kind enough to tell you the filename of what has been impacted (if available; it will report hexadecimal offsets for metadata corruption or files which have since been deleted).
With ext4, ext3, UFS, NTFS, FAT/FAT32, blah blah, you can't even detect the situation, which is horrible.
So I'll make it clear: it is very, very important that a filesystem be able to detect if the bits it submit to the controller to write to the disk didn't get written correctly. This is not the same thing as a write failure; what I'm describing is a successful write (meaning the controller and disk both said "yep wrote things fine") of, say, a value of 0xFA (%11111010) -- but what actually got written to the platters was 0xEA (%11101010).
As for the the other features you mention -- I can't talk about ZFS on Linux as I have never used it, but:
- Filesystem snapshots (including incrementals) -- yes, very useful. However at least on FreeBSD these are quite possibly the #1 source of crashes and filesystem problems, even as of this writing. I have avoided them since ZFS was originally ported to FreeBSD and still continue to do so. They work beautifully/reliably on Solaris (used them at my previous job constantly).
- Compression and data deduplication -- useful, but presently only plausible on Solaris. On FreeBSD, use of either of these features completely destroys system usability. (I'm Jeremy Chadwick ). I have tested both of these features on FreeBSD which is partially how this crap came to light.
- Protection against power outages -- journalling solves this, which is something reiserfs, ext3 and ext4 all provide. FreeBSD, on the other hand, is a complete clusterf*** when it comes to journalling. I'm not even sure where to begin. All of the below things are independent of one another:
First we have softupdates (SU), which should not be misunderstood as a journalling replacement -- they aren't. Journalling does a lot more than SU. SU has been enabled on UFS/UFS2 (except root filesystem) on FreeBSD since the 5.x days, if I remember right. But wait until I get to FreeBSD 9...
Secondly we have gjournal. This is a journalling layer provided by GEOM. This was introduced in FreeBSD 7.0, and isn't commonly used given some of its requirements and configuration complexities.
Finally we have softupdate journalling (SUJ or SU+J), which was introduced starting with FreeBSD 9.0 (9.0-RELEASE) The very bad, brash, and idiotic decision was made to enable this by default in the 9.0 installer because it was deemed stable. Very quickly people began reporting filesystem inconsistencies and silent corruption, and kernel panics especially when using snapshots. Disabling the journalling part of SUJ (meaning switching to just SU) solved the 2 former problems. The problems were admitted by one of the authors of the code, but who couldn't fix the problem promptly because his cohort who wrote most of the code had disappeared/went on hiatus. A fix was eventually committed, but there have been no 9.0-STABLE snapshots provided anywhere, and there are still reports of problems even after the fact (some involving UFS filesystem snapshots). The same author also confirms those problems.
So at least in the FreeBSD world, things are not in very good shape. ZFS has the most attention (understandably), which is good, but I still would not call it "stable" given the existing problems with it. It still requires kernel tuning to get stability as well (kmem map exhaustion is still a problem -- really!). And SUJ should be avoided like the plague (I'm left with the impression after this nobody is going to dare use it given what it has done).
Oh, I forgot to mention one of the ZFS features that drives a lot of system administrators to it (myself included): the integration of a LVM. I know rchandra pointed out the true UNIX philosophy (and I agree with it), but the LVM-esque integration in ZFS makes managing filesystems so incredibly easy. I know the true UNIX way would be to make many small things all work together, but sadly that is not easily do-able with the kernel; the more you push out to userland the worse performance you get (Linux went through this same issue back in the late 90s; am I the only one who remembers the huge NFS-in-kernel vs. NFS-in-userland debacle during Linux 1.2 and 1.3? ).
So if I was to switch to, say, graid, I would lose checksumming and the LVM-esque features, forcing me to make separate filesystems (rather than have multiple filesystems that share a single pool of disks). Folks may not know that FreeBSD does not have a decent volume manager; gvinum and vinum are neglected and nobody bothers to use them any more.
On the other hand, scrapping ZFS would allow me to get my performance back, I wouldn't need 8GB+ of RAM in systems or dedicated servers, and so on. Of course, on my current hosting server I use UFS2 (SU, not SUJ) exclusively on all filesystems and I "just deal with it". Thankfully it works, including on FreeBSD 9, as long as you remember to disable journalling during the install. Don't get me started on the other horrible decisions in bsdinstall though, like the default now being a single filesystem (/) which completely breaks booting into single-user to do installworld. Whoever decided that did not think it through -- still want to wring the neck of that person... -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  DerwoodWherever you go, there you arePremium join:2003-01-21 Dayton, OH Reviews:
·RoadRunner Cable
| said by koitsu:So if I was to switch to, say, graid, I would lose checksumming and the LVM-esque features, forcing me to make separate filesystems (rather than have multiple filesystems that share a single pool of disks). Folks may not know that FreeBSD does not have a decent volume manager; gvinum and vinum are neglected and nobody bothers to use them any more.
On the other hand, scrapping ZFS would allow me to get my performance back, I wouldn't need 8GB+ of RAM in systems or dedicated servers, and so on. Of course, on my current hosting server I use UFS2 (SU, not SUJ) exclusively on all filesystems and I "just deal with it". Thankfully it works, including on FreeBSD 9, as long as you remember to disable journalling during the install. Don't get me started on the other horrible decisions in bsdinstall though, like the default now being a single filesystem (/) which completely breaks booting into single-user to do installworld. Whoever decided that did not think it through -- still want to wring the neck of that person... If you run ZFS on 64 bit, you shouldn't need to do any tuning of kernel kmem for ZFS. I have 2 raidz arrays and didn't have to do any tuning and they can saturate a gigabit network no problem.
I know its not in the kernel tree, but there is another possibility, GRAID5. Its in the ports tree, uses little memory and is very fast. I ran it for a couple of years but switched to ZFS when it appeared that the development on GRAID5 had ended. | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:20 | said by Derwood:If you run ZFS on 64 bit, you shouldn't need to do any tuning of kernel kmem for ZFS. That is a completely false statement. 32-bit vs. 64-bit on FreeBSD, with regards to ZFS ARC and the "kmem_map too small" panic, has no bearing -- it happens on either architecture.
Even in FreeBSD 9.x, you still have to tune vfs.zfs.arc_max to try and ensure that ARC usage won't exceed a certain value. I say "try" because there are known situations pertaining to in-memory metadata where ZFS memory usage can actually exceed that value.
Please be very careful here -- I am talking about vfs.zfs.arc_max and not vm.kmem_size or vm.kmem_size_max. You do not want to touch either of the latter two, unless you're running FreeBSD 7.x (32-bit or 64-bit). Alan Cox (of FreeBSD, not Linux) put a lot of time/effort into fixing the VM, so anything you read on the web about tuning the latter two settings is outdated or flat out wrong. Do not touch them. Furthermore, those settings have no bearing on the kmem_map (kmem != kmem_map).
Thus, memory pressure becomes an issue on FreeBSD when you have other memory-intensive applications using large amounts of memory and have ZFS in use. For example, a system with 8GB of RAM, running mysqld that's adjusted to take up ~4GBytes of memory + using ZFS without any ARC tuning will almost certainly panic after a few weeks of uptime. "A few weeks" is variable; it could be a month or two, it's hard to say, all depends on I/O load.
If you'd like me to dig up mailing list posts from kernel developers and users experiencing the problem even present-day (on FreeBSD 9.x), I can do so. Just let me know. Here's some quick validation.
The general advice I have given to people with a very high success rate is that you set vfs.zfs.arc_max to 50% of your system RAM, e.g. on an 8GByte system, vfs.zfs.arc_max=4096M. If things run stable for a few months, you can try increasing that a little bit (most of my 8GByte systems use a value of 5120M; on our system that runs a mysqld instance that's tuned to take up 1GB of RAM, I also use vfs.zfs.arc_max=5120M safely).
You have to remember that there is no good 100% reliable way to pick a value for this parameter. kmem_map is used by all sorts of things throughout the kernel, not just ZFS. So basically you're trying to find a "sweet spot" that ensures stability. You always need to give it a little room; e.g. don't go setting vfs.zfs.arc_max to the amount of system RAM you have.
said by Derwood:I have 2 raidz arrays and didn't have to do any tuning and they can saturate a gigabit network no problem. That's a benchmark comparison point, not a stability point. It's only a matter of time before your system kernel panics due to ARC exhaustion (showing up as "kmem_map too small").
The ARC exhaustion problem has been "dealt with" a few times since the days of 7.x -- there have been 5 or 6 commits to try and make ZFS comply better with vfs.zfs.arc_max. Originally some commits were done and "the issue was solved", only a few months later to find more cases of exhaustion, more commits, etc... It's still ongoing. However, it is very well-known and well-established at this point (by kernel developers) that there are still situations where ZFS usage can easily exceed that value.
said by Derwood:I know its not in the kernel tree, but there is another possibility, GRAID5. Its in the ports tree, uses little memory and is very fast. I ran it for a couple of years but switched to ZFS when it appeared that the development on GRAID5 had ended. The jury is still out on geom_raid5. Back in 2007 folks asked if it could be included in HEAD/CURRENT, and there was massive lashback from key kernel developers against it. You can read all the posts about it (ones from Ulf tend to be useful). The biggest issue is documented here (and confirmed in a follow-up post by the author). Note that disabling drive write caching = not acceptable (the performance hit is very extreme). There's also this (see last post).
-- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  DerwoodWherever you go, there you arePremium join:2003-01-21 Dayton, OH Reviews:
·RoadRunner Cable
| said by koitsu:That is a completely false statement. 32-bit vs. 64-bit on FreeBSD, with regards to ZFS ARC and the "kmem_map too small" panic, has no bearing -- it happens on either architecture. It has not happened for me and I have done no tuning. I'm on 9-STABLE. I tried 32bit initially and even with 8 gig I did see performance issues and even some panics. 64 bit has been absolutely rock solid for me. I've never had a panic or any performance issues of any kind. I guess I'm lucky then.
said by koitsu:The jury is still out on geom_raid5. Back in 2007 folks asked if it could be included in HEAD/CURRENT, and there was massive lashback from key kernel developers against it. You can read all the posts about it (ones from Ulf tend to be useful). The biggest issue is documented here (and confirmed in a follow-up post by the author). Note that disabling drive write caching = not acceptable (the performance hit is very extreme). There's also this (see last post). Yes. I had given Arne Woerner some space to work on an 8.2 server (9 was not ready yet) so he could get it working on the new kernel. He was able to get it to compile and work, but performance was nowhere near what it was on 7.x. Which is why I'm using ZFS now. When I saw it in the ports tree, I had hoped that someone else had taken up the task of fixing the performance issues. | |
|