Get Files in ZIP file stored on Azure without downloading it

Recently, I was working on a task where we had to get file entries and names off ZIP files stored on Azure. We had terabytes of data to go through and downloading them was not really an option. In the end of the day, we solved this in a totally different way, but I remained curious if this is possible, and it sure is.

The aim is to get all the entry names of ZIP files stored on an Azure Storage Account. Unfortunately, using our beloved HttpClient isn’t possible (or at least, I didn’t research enough). The reason is that although HttpClient does allow us to access an HttpRequest as a Stream, the Stream itself isn’t seekable (CanSeek: false).

This is why we need to use the Azure.Storage.Blobs API – this allows us to get a Seekable Stream against a File stored in Azure Storage Account. What this means is that we can download specific parts of the ZIP file where the names are stored, rather than the data itself. Here is a detailed diagram on how ZIP files are stored, though this is not needed as the libraries will handle all the heavy lifting for us – The structure of a PKZip file (jmu.edu)

We will also be using the out-of-the-box ZipArchive library. This will allow us to open a Zip File from a Stream. This library is also smart enough to know that if a stream is Seekable, it will seek to the part where the File Names are being stored rather than downloading the whole file.

Therefore, all we need is to open a stream to the ZIP using the Azure.Storage.Blobs, pass it to the ZipArchive library and read the entries out of it. This process ends up essentially almost instant, even for large ZIP files.

using Azure.Storage;
using Azure.Storage.Blobs;
using System;
using System.IO.Compression;
using System.Linq;
using System.Threading.Tasks;
namespace GetZipFileNamesFromAzureZip
{
class Program
{
private const string StorageAccountName = "xxxxxx";
private const string StorageAccountKey = "xxxxxxxxxxxxxxx";
private const string ContainerName = "xxxxxxxxxx";
private const string FileName = "file.zip";
private const string Url = "https://" + StorageAccountName + ".blob.core.windows.net";
static async Task Main(string[] args)
{
BlobServiceClient client = new BlobServiceClient(new Uri(Url), new StorageSharedKeyCredential(StorageAccountName, StorageAccountKey));
var container = client.GetBlobContainerClient(ContainerName);
var blobClient = container.GetBlobClient(FileName);
var stream = await blobClient.OpenReadAsync();
using ZipArchive package = new ZipArchive(stream, ZipArchiveMode.Read);
Console.WriteLine(string.Join(",", package.Entries.Select(x => x.FullName).ToArray()));
}
}
}

Until the next one!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s