Browsed by
Tag: storage

How to pull blobs out of Archive Storage

How to pull blobs out of Archive Storage

So if you’re building a modern application, you definitely have a lot of options for storage of data, whether that be traditional database technologies (SQL, MySQL, etc) or NoSQL (Mongo, Cosmos, etc), or even just blob storage. Of the above options, Blob storage is by far the cheapest, providing a very low cost option for storing data long term.

The best way though to ensure that you get the most value out of blob storage, is to leverage the different tiers to your benefit. By using a tier strategy for your data, you can pay significantly less to store it for the long term. You can find the pricing for azure blob storage here.

Now most people are hesitant to leverage the archive tier because the idea of having to wait for the data to be re hydrated has a tendency to scare them off. But it’s been my experience that most data leveraged for business operations, has a shelf-life, and archiving that data is definitely a viable option. Especially for data that is not accessed often, which I would challenge most people storing blobs to capture data on and see how much older data is accessed. When you compare this need to “wait for retrieval” vs the cost savings of archive, in my experience it tends to really lean towards leveraging archive for data storage.

How do you move data to archive storage

When storing data in azure blob storage, the process of upload a blob is fairly straight forward, and all it takes is setting the access tier to “Archive” to move data to blob storage.

The below code generates a random file and uploads it to blob storage:

var accountClient = new BlobServiceClient(connectionString);

            var containerClient = accountClient.GetBlobContainerClient(containerName);

            // Get a reference to a blob
            BlobClient blobClient = containerClient.GetBlobClient(blobName);

            Console.WriteLine("Uploading to Blob storage as blob:\n\t {0}\n", blobClient.Uri);

            // Open the file and upload its data
            using FileStream uploadFileStream = File.OpenRead(localFilePath);
            var result = blobClient.UploadAsync(uploadFileStream, true);

            result.Wait();

            uploadFileStream.Close();

            Console.WriteLine("Setting Blob to Archive");

            blobClient.SetAccessTier(AccessTier.Archive);

How to re-hydrate a blob in archive storage?

There are two ways of re-hydrating blobs:

  1. Copy the blob to another tier (Hot or Cool)
  2. Set the access tier to Hot or Cool

It really is that simple, and it can be done using the following code:

var accountClient = new BlobServiceClient(connectionString);

            var containerClient = accountClient.GetBlobContainerClient(containerName);

            // Get a reference to a blob
            BlobClient blobClient = containerClient.GetBlobClient(blobName);

blobClient.SetAccessTier(AccessTier.Hot);

After doing the above, it will start the process of re-hydrating the blob automatically. And you need to monitor the properties of the blob which will allow you to see when it has finished hydrating.

Monitoring the re-hydration of a blob

One easy pattern for monitoring the blobs as they are rehydrated is to implement a queue and an azure function to monitor the blob during this process. I did this by implementing the following:

For the message model, I used the following to track the hydration process:

public class BlobHydrateModel
    {
        public string BlobName { get; set; }
        public string ContainerName { get; set; }
        public DateTime HydrateRequestDateTime { get; set; }
        public DateTime? HydratedFileDataTime { get; set; }
    }

And then implemented the following code to handle the re-hydration process:

public class BlobRehydrationProvider
    {
        private string _cs;
        public BlobRehydrationProvider(string cs)
        {
            _cs = cs;
        }

        public void RehydrateBlob(string containerName, string blobName, string queueName)
        {
            var accountClient = new BlobServiceClient(_cs);

            var containerClient = accountClient.GetBlobContainerClient(containerName);

            // Get a reference to a blob
            BlobClient blobClient = containerClient.GetBlobClient(blobName);

            blobClient.SetAccessTier(AccessTier.Hot);

            var model = new BlobHydrateModel() { BlobName = blobName, ContainerName = containerName, HydrateRequestDateTime = DateTime.Now };

            QueueClient queueClient = new QueueClient(_cs, queueName);
            var json = JsonConvert.SerializeObject(model);
            string requeueMessage = Convert.ToBase64String(Encoding.UTF8.GetBytes(json));
            queueClient.SendMessage(requeueMessage);
        }
    }

Using the above code, when you set the blob to hot, and queue a message it triggers an azure function which would then monitor the blob properties using the following:

[FunctionName("CheckBlobStatus")]
        public static void Run([QueueTrigger("blobhydrationrequests", Connection = "StorageConnectionString")]string msg, ILogger log)
        {
            var model = JsonConvert.DeserializeObject<BlobHydrateModel>(msg);
            
            var connectionString = Environment.GetEnvironmentVariable("StorageConnectionString");

            var accountClient = new BlobServiceClient(connectionString);

            var containerClient = accountClient.GetBlobContainerClient(model.ContainerName);

            BlobClient blobClient = containerClient.GetBlobClient(model.BlobName);

            log.LogInformation($"Checking Status of Blob: {model.BlobName} - Requested : {model.HydrateRequestDateTime.ToString()}");

            var properties = blobClient.GetProperties();
            if (properties.Value.ArchiveStatus == "rehydrate-pending-to-hot")
            {
                log.LogInformation($"File { model.BlobName } not hydrated yet, requeuing message");
                QueueClient queueClient = new QueueClient(connectionString, "blobhydrationrequests");
                string requeueMessage = Convert.ToBase64String(Encoding.UTF8.GetBytes(msg));
                queueClient.SendMessage(requeueMessage, visibilityTimeout: TimeSpan.FromMinutes(5));
            }
            else
            {
                log.LogInformation($"File { model.BlobName } hydrated successfully, sending response message.");
                //Trigger appropriate behavior
            }
        }

By checking the ArchiveStatus, we can tell when the blob is re-hydrated and can then trigger the appropriate behavior to push that update back to your application.

Copying blobs between storage accounts / regions

Copying blobs between storage accounts / regions

So a common question I get is copying blobs. So if you are working with azure blob storage, it’s sort of inevitable that you would need to do a data copy. Whether that be for a migration, re-architecture, any number of reasons … you will need to do a data copy.

Now this is something where I’ve seen all different versions of doing a data copy. And I’m going to talk through those options here, and ultimately how best to execute a copy within Azure Blob Storage.

I want to start with the number 1, DO NOT DO, option. That option is “build a utility to cycle through and copy blobs one by one.” This is the least desirable option for moving data for a couple of reasons:

  • Speed – This is going to be a single threaded, synchronous operation.
  • Complexity – This feels counter-intuitive, but the process of ensuring data copies, building fault handling, etc…is not easy. And not something you want to take on when you don’t have to.
  • Chances of Failure – Long running processes are always problematic, always. As these processes can fail, and when they do they can be difficult to recover from. So you are opening yourself up to potential problems.
  • Cost – At the end of the day, you are creating a long running process that will need to have compute running 24/7 for an extended period. Compute in the cloud costs money, so this is an additional cost.

So the question is, if I shouldn’t build my own utility, how do we get this done. There are really two options that I’ve used in the past to success:

  • AzCopy – This is the tried and true option. This utility provides an easy command line interface for kicking off copy jobs that can be run either in a synchronous or asynchronous method. Even in its synchronous option, you will see higher throughput for the copy. This removes some of the issues from above, but not all.
  • Copy API – a newer option, the Rest API enables a copy operation. This provides the best possible throughput and prevents you from having to create a VM, allowing for asynchronous copy operations in azure to facilitate this operation. The API is easy to use and documentation can be found here.

Ultimately, there are lots of ways and scenarios you can leverage these tools to copy data. The other one that I find usually raises questions, is if I’m migrating a large volume of data, how do I do it to minimize downtime.

The way I’ve accomplished this, is to break your data down accordingly.

  • Sort the data by age oldest to newest.
  • Starting with the oldest blobs, break them down into the following chucks.
  • Move the first 50%
  • Move the next 30%
  • Move the next 10-15%
  • Take a downtime window to copy the last 5-10%

By doing so, you gain the ability to minimize your downtime window while maximizing the backend copy. Now the above process only works if your newer data is accessed more often, it creates a good option for moving your blobs, and minimizing downtime.

Weekly Links – 4/27

Weekly Links – 4/27

So I know this post got out a little late today, and for that I do apologize. Home schooling has been interesting at our house. Its definitely been a transition for everyone involved. But overall its been going well. My kids have been real troopers. My wife is the most amazing woman on the planet, and has been the expert in all of this.

My daughter has setup a desk in my office, so its nice to think that during COVID, my home office has basically become a Wework.

See the source image

Down to business…

Fun Stuff:

So I’ve made no secret of my enjoyment of Outside cooking, and over the past two weeks I’ve done a lot of work with my Weber smoker. And this past week turned out great. I went from a small chicken, to a 11 lb Brisket! And then 12 hours later, we had an amazing meal.

Weekly links – 10/14

Weekly links – 10/14

So this past week, I spent every free moment working on a shed in my backyard, and like any constructive project its had a slew of delays. But we are powering through:

See the source image

Down to business…

Development:

Cloud:

Audio / Video:

Fun Stuff:

So as always I’m a big comic fan, and I’ve said before I’m a fan of the CW Arrowverse. For as much as DC movies are terrible, their TV shows are quite excellent. And the standout last year was Supergirl, it really tapped into what makes for the best Superman / Supergirl stories. The best stories are all based around problems that they can’t “super power their way out of”. Last season tackled real topics like trust of the media, xenophobia, racism, and others. This season is already moving towards tackling technology and its ability to change the way that we view reality and connect with each other.

Weekly Links – 9/23

Weekly Links – 9/23

So I know I’m a little late this week, but here they are. I was away at a conference in sunny Las Vegas for the week, and it was quite the week.

See the source image

But anyway down the business:

Development:

  • Cascadia Code is live: Normally don’t care about a font, but this is pretty cool because of its support of ligatures. Makes code much easier to read which is pretty awesome.
  • .NET Conf: Really cool virtual conference with more materials and announcements. Next week should have a lot of new annoucements.

Cloud:

Audio / Video:

Fun Stuff:

As always to live up to our name, here’s a nerd topic for the links. I’m a big batman fan, always have been. I’m pretty sure my kids knew who Batman was long before they knew Big Bird. With that I’ve been enjoying the current comic run, with Tom King as the writer, and it is coming to an end and they announced the new writer, James Tynion IV, who is a great writer who wrote Batman Eternal, so I’ve very excited. To read more, look here.

See the source image