Free Online Registration Required

The tutorial session you want to view requires your registering with us.

It’s fast and easy, and totally FREE.

And best of all, once you are registered, you’ll also have access to all the other 100’s of FREE Video Tutorials we offer!

 

×
Podcast
×
Transcript

Data Deduplication

Solaris 11 New Features Tutorial Part 2

 

>> Mick:  Hello, and welcome back to SkillBuilders tutorial which is looking at new features of ZFS in Solaris 11. In the earlier presentation, we looked at ZFS changes in terms of NFS and how you can share NFS file systems. There are a couple of facilities left to look at in the second part of the presentation, which looking at three major facilities. 

 

First of all, a new deduplication facility to save on space. Secondly, the ability to perform what’s called shadow migration where you can move the contents of a file system to a ZFS dataset while using the new dataset as though it was a complete copy. Then lastly, splitting a zpool, being able to take away a mirror component but actually export it and mount it as a fully operational new pool, which can be quite handy in a number of circumstances. 

 

First of all, we’re going to look at data deduplication. On the right-hand side here, we have a Solaris 11 system based on a logical domain with a necessary disc resources allocated. We are using virtual discs so that performance probably won’t be quite as quick as if you have physical discs or SAN LUNs, that sort of thing. 

 

[pause]

 

Now let’s have a look at deduplication. This is a new facility in Solaris 11 and it potentially can save an awful lot of disc space and the performance of it does seem pretty good, although that’s obviously up to you to judge if you use it in a real practical situation. Let me show you how you use it. 

 

There’s a lot of theory of deduplication and lots of recommendations of different ways to go about it, but in ZFS they use it at block-level. If you look at the paragraph that I’m circling now, you can read the statement by one of the ZFS developers concerning the reason as to why they chose to use the block-level mechanism. 

 

On the surface of it you might think that it causes some processing overhead but the results that you get when you try it are a little bit inconsistent and that’s not always true. 

 

[pause]

 

Here is an example on this page of creating two different zpools. The first one does not have deduplication on and the second one does. There’s a simple little test using the time utility to compare how long it takes to copy the same files to each type of pool. Let’s have a look at how you do this. 

 

First of all, we’ll create a pool. There’s no need for -f unless there was a previous file system in the device we’re going to use called lakenodedup. And the device I’m going to use is c2d3s0. Then I’m going to create a file system under there, a dataset called lakenodedup/data1 which of course be mounted on /lakenodedup/data1. I’ve pre-prepared a few files. 

 

[pause]

 

In the note it says there are about 1.8 gigs. Just to speed things up a bit, I’ve made them slightly smaller so they’re progressively bigger. The smallest ones, 72 meg. The next one, double that. And the next one, three times. They do have similarities within them but you can see they are different files. Let’s have a go copying them. 

 

[pause]

 

We’re going to copy a* to /lakenodedup/data1 and let’s see how long that takes. It shouldn’t take too long. I would estimate around 12 or 13 seconds. 

 

[pause]

 

It’s got me wrong straight away, 19.8 seconds. What we’ll do now, we’ll create another pool called lakededup this time, and which is going to be on c2d3s1. It’s the same disc, just a different partition. Hopefully that will balance up the performance a little bit and make it a more realistic comparison. 

 

Now we’re going to create a dataset within this with deduplication on. I’ll also show you this first example here which is actually setting dedup on in the newly created lakededup. I could’ve done this actually from the zpool create command line. Then I’m going to create another file system. 

 

[pause]

 

Actually, I don’t have to put -o dedup because the dedup property is inherited. So if I left that off, the dedup property would be inherited all the way down through the datasets in the pool. But I’ll do it anyway just to show you how it can be done. In the pool that doesn’t have deduplication, you can still have deduplication set on for individual datasets, so it’s entirely flexible. 

 

I’ve missed a little bit of syntax out there. 

 

[pause]

 

And I missed out the correct name. That’s more like it. So I do zfs get -r dedup in lakededup, you can see the dedup properties on. I did specifically set that on with the create and with the set. 

 

Now let’s do the copy. 

 

[pause]

 

This time to lakededup. Let’s see how long that takes. 

 

[pause]

 

It may not take so long. There we go. It’s actually quicker. I’m not sure how far we can trust the figures because of the basis of the ways this thing demonstrates but the other important thing of course is how much space it has taken up because apparently we should have the same amount of space occupied for each pool. But if we do a zpool list, lakededup dedup is occupying 145 megs and lakenodedup, 433. Now, that is quite a difference and there you can see straight away the benefits you get from deduplication. 

 

Quite stunning, really. I’m sure you wouldn’t get that in every situation but you’ll definitely going to get some benefit from this facility and it is so simple to use. All you have to do is set the dedup property on. There’s no further administration required. 

 

[pause]

 

If you’re copying larger amounts of data, ZFS has a bit of a lag in terms of for example when you remove space you don’t necessarily see the free space straight away and the same thing when you’re writing. If you see slightly odd values, wait a little while and then run the commands again and the thing will correct itself. 

 

[pause]

 

If I copy further within the deduplicated dataset… 

 

[pause]

 

Let’s copy that big file. 

 

[pause]

 

Why not create another copy? 

 

[pause]

 

And for good measure, let’s see what the du thinks were using. Getting on for a gigabyte. Let’s see what zpool list thinks we’re using. Quite a lot less, so almost down to about 22-23% of what the system storage is actually used, which isn’t bad. You do get some strange side effects if you start filling out the thing rapidly. You could, in fact, theoretically sort of overtake what the physical capacity of the file system is and the system adjusts things. 

 

If we do a df – h | gref lake, you can see already that although I created – the partitions I’ve used, by the way, are exactly the same and you can see that lakededup now actually appears to be bigger. And if I carry on copying, that size will actually increase and that’s how the system deals with trying to keep the illusion of the space used with no dedup, so an excellent facility and so simple to use. 

 

[pause]

 

Another option that’s available – although Oracle don’t recommend you use this – instead of setting dedup on, you can set it to verify and then further verification is carried out rather than relying on the underlying checksumming. That takes a little bit longer and that reduce performance. Just one further illustration of the size changes as you copy more data compared to what you saw previously. So that’s deduplication. 

 

 

Copyright SkillBuilders.com 2017

×