Saturday, May 8, 2010

Extract File Extension and Check Image Format

[ Intention | Requirements | Discussion and Implementation | Final Code Snippets ]

Intention [top]

Once in a while, we all may need to extract the file extension from a file name. In the past, when I was doing it in Java, I used to use the String.lastIndexOf() to search for the last period in the filename and then extract the rest of the substring. We can apply the same idea in .NET using String.LastIndexOf() as well. However, .NET provides a single static method call; it makes everything a lot easier. However, this static method internal implementation is not using regular expression to extract the file extension. Yesterday, I was browsing around my old code and thinking that I would like to use regular expression to capture the file extension. Thus, I pick the portion of my file upload program for conversion. First, I will start with the requirements for this file upload portion and then go into the detail discussion. Finally I will present the complete code snippets. In this exercise, I will also show you how to check the image format.

Requirements [top]

  1. Use regular expression to capture or extract the file extension from the file name but the leading period (.) of the resulted file extension must be trimmed off. For example, .txt becomes txt.
  2. A file extension cannot contain spaces.
  3. Save the uploaded file if only if the file extension is one of the listed extensions (see the list below) and the file must be one of the supported image formats.
  4. Supported file extensions are BMP, GIF, PNG, TIFF, JPG, and JPEG.
  5. Supported image formats are BMP, GIF, PNG, TIFF and JPEG. They are also corresponding to the above file extensions

Discussion and Implementation [top]

There are a few ways to extract a file extension from a file name. One of the simple ways is to use System.IO.Path.GetExtension() method by passing the file name as a parameter, e.g.,

   System.IO.Path.GetExtension("abc.txt")

In this example, .txt is the extracted file extension and it always comes with a leading period (.). In order to fit our 1st requirement, we could remove the leading period from the resulted file extension. Unfortunately, the requirement binds us to use regular expression. Thus, the regular expression should look like the following (non-named capturing):

      \.([^.]+)$

According to the 2nd requirement (the extension cannot contain spaces), the previous regular expression is revised and becomes:

      \.([^(\s|.)]+)$

In C#, we can do the following to fulfill the #1 and the #2 requirements:

      String pattern = @"\.([^(\s|.)]+)$";        
      String fileExt = null;

      Regex r = new Regex(pattern);
      Match m = r.Match(filename);
      if (m.Success) {
        fileExt = m.Groups[1].Value;
      }

As we can see, the captured text is always placed inside the group 1 (at index 1), not 0. What if we have many captured groups? It could be hard to remember which one is what. Luckily, .NET supports name capturing. It greatly helps in coding and back-reference. In this example, the requirement is simple and we don't have to use named capture. But to exercise this named capture feature, we could revise the regular expression a bit and then reference the name in our code. Let's say that the captured text name is called ext. The revised code snippet becomes:

      String pattern = @"\.(?<ext>[^(\s|.)]+)$";        
      String fileExt = null;

      Regex r = new Regex(pattern);
      Match m = r.Match(filename);
      if (m.Success) {
        fileExt = m.Groups["ext"].Value; 
      }

A part of the third requirement is to check the resulted file extension against our supported file extension list. We can continuously get help from regular expression. Let's define our pattern for matching. In C#, we can write this:

      String SupportedImageExtPatterns = "^(BMP|GIF|PNG|TIFF|JPE?G)$";

To check the resulted file extension against the list, in C#, we can implement this way:

      Match m1 = Regex.Match(fileExt, 
                             SupportedImageExtPatterns,
                             RegexOptions.IgnoreCase);
      if (m1.Success) {
        ...
      }
or using one single statement,
      if (Regex.Match(fileExt, 
                      SupportedImageExtPatterns,
                      RegexOptions.IgnoreCase).Success) {  
          ...
      }

How to check the file is one of the supported image format? Since the file is uploaded via HTTP, the file could be anything. For this matter, we first need to confirm if the file is an image by referencing to the file's Stream object and then try to convert it to an image. If we get an ArgumentException, it is not an image. Let's say the file upload object is called FileUpload1. In C#, we can write the following to execute the idea.

      System.Drawing.Image theImage = null;
      try {
        theImage = System.Drawing.Image.FromStream(FileUpload1.PostedFile.InputStream);
      }
      catch (System.ArgumentException) {  // not an image
        ...
      }
After we've confirmed that the file is an image, we can go ahead to check the image format against our list with the help of the Equals() method of System.Drawing.Imaging.ImageFormat. For convenience, I group all the required System.Drawing.Imaging.ImageFormat objects together in an array so that I can iterate through the array for checking.
       System.Drawing.Imaging.ImageFormat[] SupportedImageFormats = { 
             System.Drawing.Imaging.ImageFormat.Bmp,      
             System.Drawing.Imaging.ImageFormat.Gif,      
             System.Drawing.Imaging.ImageFormat.Png,
             System.Drawing.Imaging.ImageFormat.Tiff,
             System.Drawing.Imaging.ImageFormat.Jpeg      
       };

The code snippet for checking image format in C# is illustrated below:
       System.Drawing.Imaging.ImageFormat[] SupportedImageFormats = { 
         System.Drawing.Imaging.ImageFormat.Bmp,      
         System.Drawing.Imaging.ImageFormat.Gif,      
         System.Drawing.Imaging.ImageFormat.Png,
         System.Drawing.Imaging.ImageFormat.Tiff,
         System.Drawing.Imaging.ImageFormat.Jpeg      
       };

       System.Drawing.Image theImage = null;
       try {
         theImage = System.Drawing.Image.FromStream(FileUpload1.PostedFile.InputStream);

         for (int i=0; i < SupportedImageFormats.Length; i++) {        
           if (theImage.RawFormat.Equals(SupportedImageFormats[i])) {
             ...
             break;
           }
         }
         ...

       }
       catch (System.ArgumentException) {  // not an image
         ...
       }

As soon as we confirm that the file is an image and it is also one of our expected formats, we can simply use the FileUpload object SaveAs() method by passing the absolute pathname. Let's say the directory to save is called Uploads. The statement may look something like below:

   FileUpload1.SaveAs(Server.MapPath("~/Uploads/") + FileUpload1.FileName);
Up to here, our mission is accomplished from extracting the file extension, checking image format to saving the file. The complete code snippets are presented in the next section.

Final Code Snippets [top]

The followings are two pieces of code snippets summarized for the above discussions. The difference between these two pieces of code is: one is using non-named capture to extract the file extension while the other is named capture. Other than that, everything is identical.

Using non-named capture to extract the file name [ Final Code Snippets | top ]

Note that
  1. the file upload control object is called FileUpload1, and
  2. the save directory is called Uploads and located at the root directory.
// declare the pattern for Requirement #1 and #2 to capture the file extension.
// the following pattern is using non-named capture.
String pattern = @"\.([^(\s|.)]+)$";        
String fileExt = null;

Regex r = new Regex(pattern);
Match m = r.Match(FileUpload1.PostedFile.FileName); 
if (m.Success) {
  // capture the file extension without the period
  // here we use non-named capture group
  fileExt = m.Groups[1].Value;

  // Requirement #4 - the supported file extensions
  String SupportedImageExtPatterns = "^(BMP|GIF|PNG|TIFF|JPE?G)$";

  if (Regex.Match(fileExt, 
                  SupportedImageExtPatterns,
                  RegexOptions.IgnoreCase).Success) {

    // Requirement #5 - the supported image formats 
    System.Drawing.Imaging.ImageFormat[] SupportedImageFormats = { 
          System.Drawing.Imaging.ImageFormat.Bmp,      
          System.Drawing.Imaging.ImageFormat.Gif,      
          System.Drawing.Imaging.ImageFormat.Png,
          System.Drawing.Imaging.ImageFormat.Tiff,
          System.Drawing.Imaging.ImageFormat.Jpeg      
    };

    System.Drawing.Image theImage = null;
    try {
      theImage = System.Drawing.Image.FromStream(FileUpload1.PostedFile.InputStream);
    
      for (int i=0; i < SupportedImageFormats.Length; i++) {        
        if (theImage.RawFormat.Equals(SupportedImageFormats[i])) {
           // Requirement #3 - save the file only if it is our expected image format.
           FileUpload1.SaveAs(Server.MapPath("~/Uploads/") + FileUpload1.FileName);
           break;
        }
      }      
    }
    catch (System.ArgumentException) {  // not an image
      ...  
    }
  }
}

Using named capture to extract the file name [ Final Code Snippets | top ]

Note that
  1. the file upload object is called FileUpload1, and
  2. the save directory is called Uploads and located at the root directory.
// declare the pattern for Requirement #1 and #2 to capture the file extension.
// the following pattern is using named capture.
String pattern = @"\.(?<ext>[^(\s|.)]+)$";        
String fileExt = null;

Regex r = new Regex(pattern);
Match m = r.Match(FileUpload1.PostedFile.FileName);
if (m.Success) {
  // capture the file extension without the period
  // here we use named capture group
  fileExt = m.Groups["ext"].Value;

  // Requirement #4 - the supported file extensions
  String SupportedImageExtPatterns = "^(BMP|GIF|PNG|TIFF|JPE?G)$";

  if (Regex.Match(fileExt, 
                  SupportedImageExtPatterns,
                  RegexOptions.IgnoreCase).Success) {

    // Requirement #5 - the supported image formats 
    System.Drawing.Imaging.ImageFormat[] SupportedImageFormats = { 
          System.Drawing.Imaging.ImageFormat.Bmp,      
          System.Drawing.Imaging.ImageFormat.Gif,      
          System.Drawing.Imaging.ImageFormat.Png,
          System.Drawing.Imaging.ImageFormat.Tiff,
          System.Drawing.Imaging.ImageFormat.Jpeg      
    };

    System.Drawing.Image theImage = null;
    try {
      theImage = System.Drawing.Image.FromStream(FileUpload1.PostedFile.InputStream);
    
      for (int i=0; i < SupportedImageFormats.Length; i++) {        
        if (theImage.RawFormat.Equals(SupportedImageFormats[i])) {
           // Requirement #3 - save the file only if it is our expected image format.
           FileUpload1.SaveAs(Server.MapPath("~/Uploads/") + FileUpload1.FileName);
           break;
        }
      }      
    }
    catch (System.ArgumentException) {  // not an image
      ...  
    }
  }
}