Reading EXIF data in Python

In order to review the properties of various common-format image files, I wrote a command-prompt script in Python that will display EXIF data and other properties of JPG, PNG, TIF, CR2 and NEF files.

I am routinely looking at images, so I wanted a Python script that could display the EXIF (Exchangeable image file format) data from any JPEG, PNG, TIF or RAW (CR2 and NEF) file, as well as summarise some key properties such as:

Format
Height and Width
Data type
Bit-depth
Mode

Whilst there are many Python programmes available, most will only work with specific file types. This is a problem because JPEG and TIF files natively support EXIF data, whilst the information in PNG files (where present) is encapaulated differently. Then there are the two common proprietary RAW image formats (Canon’s CR2 and Nikon’s NEF) which store their metadata in yet another format.

What I wanted was a single piece of code that would work out what the file format is then extract the EXIF or metadata in a manner appropriate for that format.

It turns out that this is quite an ask.

Extracting EXIF data from JPEG and TIF

JPEG (Joint Photographic Experts Group) and TIF (Tagged Image Format) files natively support EXIF. In order to extract some of the properties of these files, I found that I had to use two Python modules:

Imageio to determine the data type
Pillow to extract some basic metadata

The following code can be used to display a useful range of properties of JPEG and TIF files:

import imageio
from PIL import Image, ExifTags

image = "/path/to/image.ext"

# Read image into imageio for data type
pic = imageio.imread(image)

# Read image into PIL to extract basic metadata
type = Image.open(image)

# Calculations
megapixels = (type.size[0]*type.size[1]/1000000) # Megapixels
d = re.sub(r'[a-z]', '', str(pic.dtype)) # Dtype
t = len(Image.Image.getbands(type)) # Number of channels

print("--Summary--\n")
print("Filename: ",type.filename)
print("Format: ", type.format)
print("Data Type:", pic.dtype)
print("Bit Depth (per Channel):", d)
print("Bit Depth (per Pixel): ", int(d)*int(t))
print("Number of Channels: ", t)
print("Mode: ",type.mode)
print("Palette: ",type.palette)
print("Width: ", type.size[0])
print("Height: ", type.size[1])
print("Megapixels: ",megapixels)

Output should look something like this:

--Summary--

Filename:  /path/to/image.ext
Format:  JPEG
Data Type: uint8
Bit Depth (per Channel): 8
Bit Depth (per Pixel):  24
Number of Channels:  3
Mode:  RGB
Palette:  None
Width:  5616
Height:  3744
Megapixels:  21.026304

Next, the exifread package can be used to extract the EXIF data from these files:

import exifread

# Open image with ExifMode to collect EXIF data
exif_tags = open(image, 'rb')
tags = exifread.process_file(exif_tags)

# Create an empty array
exif_array = []

An empty array is created in preparation to receive the extracted EXIF data.

From this point, things get tricky and we need to establish two pathways; one for PNG files (which we’ll deal with later) and one for JPEG and TIF files.

# For non-PNGs
if type.format != "PNG":
  # Compile array from tags dict
    for i in tags:
      compile = i, str(tags[i])
      exif_array.append(compile)
      for properties in exif_array:
        if properties[0] != 'JPEGThumbnail':
        print(': '.join(str(x) for x in properties))

In this code, we are pushing all of the EXIF data into our array (exif_array) and printing the resultant output, except for JPEGThumbnail (as this will generate a lot of gibberish). This should display a nice list of EXIF properties.

Extracting metadata from PNG

PNG images are a little tricker and some extra modules are required.

from PIL.ExifTags import TAGS
from PIL.PngImagePlugin import PngImageFile, PngInfo

if type.format == "PNG":
  image = PngImageFile(image) 
  metadata = PngInfo()

  # Compile array from tags dict
  for i in image.text:
    compile = i, str(image.text[i])
    exif_array.append(compile)

Again, we are filling the exif_array array with data from the tags dict albeit via different means.

EXIF wasn’t supported in PNG until recently but the format can store other metadata. From here we have to branch again, because the metadata in the PNG file may be in XML form or it may not. First, a quick check to see if there is any metadata at all:

# If XML metadata, pull out data by idenifying data type and gathering useful meta
if len(exif_array) > 0:
  header = exif_array[0][0]
else:
  header = ""
  print("No available metadata")

Next, I will gather all of the XML output into an array called xml_output.

import re

xml_output = []
if header.startswith("XML"):
  xml = exif_array[0][1]
  xml_output.extend(xml.splitlines())
  # Remove useless meta tags
    for line in xml.splitlines():
     if "<" not in line:
       if "xmlns" not in line:
         # Remove equal signs, quotation marks, /> characters and leading spaces
         xml_line = re.sub(r'[a-z]*:', '', line).replace('="', ': ')
         xml_line = xml_line.rstrip(' />')
         xml_line = xml_line.rstrip('\"')
         xml_line = xml_line.lstrip(' ')
         print(xml_line)

elif header.startswith("Software"):
  print("No available metadata")

# If no XML, print available metadata
 else:
  for properties in exif_array:
    if properties[0] != 'JPEGThumbnail':
    print(': '.join(str(x) for x in properties))

I followed this up with a cleaning of the output as XML contains a lot of structure that I just didn’t need. Remember, all I wanted was a simple display of the basic metadata. To do this, I used the re package and some regular expressions along with the rstrip and lstrip functions in Python. The if/elif/else statement has been used to:

Check for XML output (identified by “XML” in the header) and print it up in a clean format.
Check that the XML header doesn’t start with “Software” as this is usually uninformative metadata about the version of Adobe Photoshop used to create the file.
Print out any other metadata that may be present if not in XML format.

Now the metadata from a PNG file can also be read.

Extracting metadata from GIF and BMP

Neither BMP (bitmap) nor GIF (graphic interchange format) support EXIF, but their summary properties can be determined. Hence I’ll include the following in my script:

if type.format == "GIF" or type.format == "BMP":
  print("No available metadata")

Extracting EXIF data from CR2 and NEF

RAW images are like “digital negatives”. They contain all of the information that a camera collects before an image is processed (usually) into a JPEG. Canon’s RAW format is called CR2 and Nikon’s is called NEF. Whilst proprietary, these can both be read into Python, but to do this we need the rawphoto module. (Note: rawphoto has been depreciated and replaced with rawkit which is also depreciated, but it all still works for our purposes).

from rawphoto.cr2 import Cr2
from rawphoto.nef import Nef

metadata = {}
filepath = image
  (filepath_no_ext, ext) = os.path.splitext(filepath)
  filename_no_ext = os.path.basename(filepath_no_ext)
  ext = ext.upper()
  if ext == '.CR2':
   raw = Cr2(filename=filepath)
  elif ext == '.NEF':
    raw = Nef(filename=filepath)
  else:
    raise TypeError("Format not supported")
    for i in range(len(raw.ifds)):
      ifd = raw.ifds[i]
      print("IFD #{}".format(i))
      raw_metadata(raw, ifd)
      for subifd in ifd.subifds:
        if isinstance(subifd, int):
          print("Subifd ", subifd)
          raw_metadata(raw, ifd.subifds[subifd], 1)
     raw.close()

This references a Python function (a def):

def raw_metadata(raw, ifd, level=1):
  for name in ifd.entries:
    e = ifd.entries[name]
    if name in ifd.subifds or isinstance(name, tuple):
      if isinstance(name, tuple):
        for n in name:
          print(level * "\t" + n + ":")
          raw_metadata(raw, ifd.subifds[n], level + 1)
      else:
        print(level * "\t" + name + ":")
        raw_metadata(raw, ifd.subifds[name], level + 1)
    else:
      if isinstance(name, str):
        if e.tag_type_key is 0x07:
          print(level * "\t" + "{}: {}".format(
            name,
            "[Binary blob]"
            ))
       else:
         print(level * "\t" + "{}: {}".format(
           name,
           ifd.get_value(e)
           ))

The above code is an adaption of Sam Whited’s GitHub Gist, but works really well for our purposes.

Putting it all together

Finally, it’s time to add some command-line functionality with my favourite module argparse and wrap everything in a couple of functions.

To set-up argparse, I’ll use the following code:

import argparse
import os

def options():
  parser = argparse.ArgumentParser(description="Read image metadata")
  parser.add_argument("-i", "--image", help="Input image file.", required=True)
  args = parser.parse_args()
  return args

Then all of the above functionality (except the CR2/NEF processing) will go into gen_metadata(image). I want my code to nicely handle unsupported image formats, so I can do a quick check in os and advise the user if there is an issue.

# Get options
args = options()
image = args.image

# Check for RAW images
name, extension = os.path.splitext(image)
# List valid extensions
ext = [".png", ".jpg", ".jpeg", ".cr2", ".nef", ".tif", ".bmp"]
if extension not in ext:
  print("File format ",extension," not supported.")
  exit()

Now everything can be wrapped-up into the final script:

#!/usr/bin/env python

import imageio
import exifread
from PIL import Image, ExifTags
from PIL.ExifTags import TAGS
from PIL.PngImagePlugin import PngImageFile, PngInfo
import re
import os
from rawphoto.cr2 import Cr2
from rawphoto.nef import Nef
import argparse


def options():
    parser = argparse.ArgumentParser(description="Read image metadata")
    parser.add_argument("-i", "--image", help="Input image file.", required=True)
    args = parser.parse_args()
    return args

# Via https://gist.github.com/SamWhited/af58edaed66414bded84
def raw_metadata(raw, ifd, level=1):
    for name in ifd.entries:
        e = ifd.entries[name]
        if name in ifd.subifds or isinstance(name, tuple):
            if isinstance(name, tuple):
                for n in name:
                    print(level * "\t" + n + ":")
                    raw_metadata(raw, ifd.subifds[n], level + 1)
            else:
                print(level * "\t" + name + ":")
                raw_metadata(raw, ifd.subifds[name], level + 1)
        else:
            if isinstance(name, str):
                if e.tag_type_key is 0x07:
                    print(level * "\t" + "{}: {}".format(
                        name,
                        "[Binary blob]"
                    ))
                else:
                    print(level * "\t" + "{}: {}".format(
                        name,
                        ifd.get_value(e)
                    ))

def gen_metadata(image):
    
    # Read image into imageio for data type
    pic = imageio.imread(image)

    # Read image into PIL to extract basic metadata
    type = Image.open(image)

    # Calculations
    megapixels = (type.size[0]*type.size[1]/1000000) # Megapixels
    d = re.sub(r'[a-z]', '', str(pic.dtype)) # Dtype
    t = len(Image.Image.getbands(type)) # Number of channels

    print("\n--Summary--\n")
    print("Filename: ",type.filename)
    print("Format: ", type.format)
    print("Data Type:", pic.dtype)
    print("Bit Depth (per Channel):", d)
    print("Bit Depth (per Pixel): ", int(d)*int(t))
    print("Number of Channels: ", t)
    print("Mode: ",type.mode)
    print("Palette: ",type.palette)
    print("Width: ", type.size[0])
    print("Height: ", type.size[1])
    print("Megapixels: ",megapixels)

    # Open image with ExifMode to collect EXIF data
    exif_tags = open(image, 'rb')
    tags = exifread.process_file(exif_tags)

    # Create an empty array
    exif_array = []

    # Print header
    print("\n--Metadata--\n")

    # For non-PNGs
    if type.format != "PNG":
        # Compile array from tags dict
        for i in tags:
            compile = i, str(tags[i])
            exif_array.append(compile)
        for properties in exif_array:
            if properties[0] != 'JPEGThumbnail':
                print(': '.join(str(x) for x in properties))

    if type.format == "PNG":
        image = PngImageFile(image) #via https://stackoverflow.com/a/58399815
        metadata = PngInfo()
        
        # Compile array from tags dict
        for i in image.text:
            compile = i, str(image.text[i])
            exif_array.append(compile)
        
        # If XML metadata, pull out data by idenifying data type and gathering useful meta
        if len(exif_array) > 0:
                header = exif_array[0][0]
        else:
            header = ""
            print("No available metadata")
        
        xml_output = []
        if header.startswith("XML"):
            xml = exif_array[0][1]
            xml_output.extend(xml.splitlines()) # Use splitlines so that you have a list containing each line
            # Remove useless meta tags
            for line in xml.splitlines():
                if "<" not in line:
                    if "xmlns" not in line:
                        # Remove equal signs, quotation marks, /> characters and leading spaces
                        xml_line = re.sub(r'[a-z]*:', '', line).replace('="', ': ')
                        xml_line = xml_line.rstrip(' />')
                        xml_line = xml_line.rstrip('\"')
                        xml_line = xml_line.lstrip(' ')
                        print(xml_line)
        
        elif header.startswith("Software"):
            print("No available metadata")
        
        # If no XML, print available metadata
        else:
            for properties in exif_array:
                if properties[0] != 'JPEGThumbnail':
                    print(': '.join(str(x) for x in properties))


    # Explanation for GIF or BMP
    if type.format == "GIF" or type.format == "BMP":
        print("No available metadata")

def main():

    # Get options
    args = options()
    image = args.image

    # Check for RAW images

    name, extension = os.path.splitext(image)

    # List valid extensions
    ext = [".png", ".jpg", ".jpeg", ".cr2", ".nef", ".tif", ".bmp"]
    if extension not in ext:
        print("File format ",extension," not supported.")
        exit()

    if extension == ".CR2":
        metadata = {}
        filepath = image
        (filepath_no_ext, ext) = os.path.splitext(filepath)
        filename_no_ext = os.path.basename(filepath_no_ext)
        ext = ext.upper()
        if ext == '.CR2':
            raw = Cr2(filename=filepath)
        elif ext == '.NEF':
            raw = Nef(filename=filepath)
        else:
            raise TypeError("Format not supported")
        for i in range(len(raw.ifds)):
            ifd = raw.ifds[i]
            print("IFD #{}".format(i))
            raw_metadata(raw, ifd)
            # Hax.
            for subifd in ifd.subifds:
                if isinstance(subifd, int):
                    print("Subifd ", subifd)
                    raw_metadata(raw, ifd.subifds[subifd], 1)
        raw.close()

    else:
        gen_metadata(image)

if __name__ == '__main__':
    main()

Running the script

The script can be executed with the following syntax on the command line:

/path/to/image_reporter.py -i /path/to/image.ext

The code has also been posted to GitHub Gist.

I ran this code on Python 3.7.4.

Further resources:

If you require some sample images with metadata in a wide variety of formats to use with this script for testing purposes, I recommend Drew Noakes’ metadata-extractor-images repository.

Some information about image bit depth is available from Adobe and PetaPixel.

Comments

2 responses to “Reading EXIF data in Python”

On 13 September 2021, Nick wrote:

Hi, thanks for the article! I’m trying to write a Python script that will scan through my pictures, extract what lens and focal length I used, and then present me with some nice statistics. Yet I’m having difficulties finding a Python module that can actually process raw images — and is not depreciated? Would you have any ideas?

On 10 July 2022, Joel wrote:

In the first example for reading the image properties you forgot to add the line:
import regex as re
If you don’t add this you’ll get and error for the line:
d = re.sub(r'[a-z]', '', str(pic.dtype)) # Dtype