sexta-feira, 11 de dezembro de 2015

Evaluating NIST Biometric Image Software (NBIS) using FVC2004 Databases - Part 2

This series of articles aim at evaluating NBIS biometric software by using it against fingerprint image databases from FVC2004 verification competition.



WARNING: If you have not yet read the first article, I hardly suggest you to do so: Evaluating NIST Biometric Image Software (NBIS) using FVC2004 Databases - Part 1.

Considering you have already followed the steps made in the first article, which included installing NBIS and downloading FVC2004 databases, open a terminal and type these commands:

source nbis-env.sh

cd images/

3. Lights, camera, action!


3.1. Converting TIFF images to WSQ format

The first step on our biometric journey is to convert fingerprint image files into a format called WSQ (Wavelet Scalar Quantization) [11]. The WSQ algorithm is based on wavelet theory and has become a standard for the exchange and storage of fingerprint images. It was developed by the FBI, the Los Alamos National Laboratory, and the National Institute of Standards and Technology (NIST).

NBIS provides a tool called CWSQ to compress grayscale fingerprint images using WSQ algorithm. We'll use it in this section.

In our case of FVC2004 databases, fingerprint image files are offered in TIFF format. We'll select a single one to test NBIS tools: "db1/101_1.tif". You could display this fingerprint image by issuing (or your preferred image viewer):

qiv db1/101_1.tif


As the image is in TIFF file format, we need to know its dimensions before invoking CWSQ. Therefore, type this:

identify db1/101_1.tif
db1/101_1.tif TIFF 640x480 640x480+0+0 8-bit Grayscale DirectClass 308KB 0.000u 0:00.009

The command output means that the image we chose is really a TIFF format, has 640 of width and 480 of height in pixels, and is in 8-bit Grayscale.

Finally, call the CWSQ program using these arguments (.75 stands for a 15:1 compression rate):

cwsq .75 wsq db1/101_1.tif -r 640,480,8

If successful, it will produce a new file with the extension ".wsq" (11 kB) in the same directory of the ".tif" (301 kB) file:

ls -la db1/101_1.*

NBIS provides a WSQ viewing tool called DPYIMAGE. Try running it to see the resulting WSQ file:

dpyimage db1/101_1.wsq

Once we could convert a single TIFF image into a WSQ specific format, let's do the same throughout all four FVC2004 databases. As each database has distinct dimensions, we need to specify them separately:

find db1/ -name "*.tif" -exec cwsq .75 wsq {} -r 640,480,8 \;
find db2/ -name "*.tif" -exec cwsq .75 wsq {} -r 328,364,8 \;
find db3/ -name "*.tif" -exec cwsq .75 wsq {} -r 300,480,8 \;
find db4/ -name "*.tif" -exec cwsq .75 wsq {} -r 288,384,8 \;

You can verify whether all WSQ files were created by issuing this command:

find -name "*.wsq" | head
./db1/108_4.wsq
./db1/109_2.wsq
./db1/102_2.wsq
./db1/109_5.wsq
./db1/101_4.wsq
./db1/104_4.wsq
./db1/104_8.wsq
./db1/103_2.wsq
./db1/110_4.wsq
./db1/109_8.wsq

3.2. Checking quality of WSQ images through NFIQ

Another interesting NIST algorithm to test is NIST Fingerprint Image Quality (NFIQ) [12], which also comes with the NBIS bundle.

NFIQ, a fingerprint image quality algorithm, analyses a fingerprint image and assigns to it a quality value on a scale of 1 (highest quality) to 5 (lowest quality). Higher quality images produce significantly better performance with matching algorithms. That's exactly why NFIQ is so important to biometric applications: it can be employed to assure fingerprint images are being collected with an acceptable quality.

As we have already produced WSQ files, the synthax for NFIQ is quite simple. Just execute this:

nfiq db1/101_1.wsq

By the output value ("2"), we can assume the biometric quality for the given image is quite good.

If we would want to check the quality of our entire dataset, we could execute the following Shell Script:

for i in `seq 1 4`
do
  for a in db$i/*.wsq
  do
    echo "$a"
    #nfiq $a
  done
done

3.3. Extracting features from fingerprints through MINDTCT

In an Automated Fingerprint Identification System (AFIS), there are two major biometric algorithms that play an indispensable role: 1) the features extractor (or simply "extractor") and 2) the templates matcher (or simply "matcher"). We'll start by the extractor: in the case of NBIS, it is embedded in a program called MINDTCT [1].

NBIS bundle provides MINDTCT, a minutiae detector, which automatically locates and records ridge ending and bifurcations in a fingerprint image [13, 15]. This system includes minutiae quality assessment based on local image conditions. The FBI's Universal Latent Workstation uses MINDTCT, and it too is the only known no cost system of its kind [16].

The program MINDTCT takes a WSQ compressed image file, processes the fingerprint image and automatically detects minutiae [16].

Considering the file "db1/101_1.wsq" we generated previously, we could invoke MINDTCT in order to extract the features:

mindtct -b -m1 db1/101_1.wsq db1/101_1

The result of the last instruction is a bunch of other files with the same prefix and in the same directory. Check it out:

find db1/101_1.*
db1/101_1.brw
db1/101_1.dm
db1/101_1.hcm
db1/101_1.lcm
db1/101_1.lfm
db1/101_1.min
db1/101_1.qm
db1/101_1.tif
db1/101_1.wsq
db1/101_1.xyt

One of these files is of special interest to us: the one with the ".xyt" extension. Take a look of its contents:

head db1/101_1.xyt
85 102 0 13
86 142 174 12
88 90 0 13
104 79 0 35
111 92 90 36
120 31 17 82
129 54 5 39
135 176 90 78
136 143 84 74
140 30 95 42

In this ".xyt" file created by the MINDTCT program, the minutiae values are written in "x y theta quality" format, each minutia per line. This file format is what NBIS matcher program, BOZORTH3, considers. We'll use it further in this article.

Likewise, we could extract the features of the entire dataset, by executing this small Shell Script:

for i in `seq 1 4`
do
  for a in db$i/*.wsq
  do
    echo "$a"
    b="${a/.wsq/}"
    mindtct $a $b
  done
done

After a little while, you can verify whether all XYT files were created by issuing this command:

find -name "*.xyt" | head
./db1/103_3.xyt
./db1/109_4.xyt
./db1/106_1.xyt
./db1/108_1.xyt
./db1/102_5.xyt
./db1/104_5.xyt
./db1/102_1.xyt
./db1/105_4.xyt
./db1/105_3.xyt
./db1/102_4.xyt

3.4. Performing fingerprint matches through BOZORTH3

Now that we have extracted the minutiae from every fingerprint image in the dataset, we can finally use the matcher algorithm. In the case of NBIS, it is embedded in a program called BOZORTH3 [1].

NBIS bundle provides BOZORTH3, a fingerprint matching system. It uses the minutiae detected by MINDTCT to determine if two fingerprints are from the same person, same finger [14]. It can analyze fingers two at a time or run in a batch mode comparing a single finger (probe) against a large database of fingerprints (gallery).

The program BOZORTH3 computes match scores from fingerprint minutiae files. The files are expected to be in xyt-format, a simple text file format that is produced by the minutiae detector program MINDTCT, which is also part of the NBIS distribution.

Considering the specific file "db1/101_1.xyt" we generated previously, as well as all others ".xyt" in that same directory, we could invoke BOZORTH3 in order to compare the fingerprint "101_1.tif" with the rest images in the same database (i.e., residing in "db1/" directory):

bozorth3 -m1 -A outfmt=spg -T 20 -p db1/101_1.xyt db1/*.xyt
271 db1/101_1.xyt db1/101_1.xyt
59 db1/101_1.xyt db1/101_2.xyt
25 db1/101_1.xyt db1/101_4.xyt

The output states that there are 3 possible fingerprint candidates that match the "101_1.tif" probe, where the first value is the number representing the matching score. As a good practice, we set a match score threshold (note the "-T 20" argument) in the matcher. Once the threshold is specified, only match scores meeting or exceeding that value are printed.

The first line represents exactly the same image as the probe ("101_1"). That explains the highest score value. The remaining two are fingerprints that the matching algorithm identified as possibly the same, "101_2" having higher probability than "101_4".

But as we know by the FVC2004 database, there are 8 versions for each fingerprint, which means in an ideal AFIS system all them should be listed:

find db1/101_?.xyt
db1/101_1.xyt
db1/101_2.xyt
db1/101_3.xyt
db1/101_4.xyt
db1/101_5.xyt
db1/101_6.xyt
db1/101_7.xyt
db1/101_8.xyt


Well, that is an issue concerning "Error Rates" in security systems, particularly "False Acceptance Rate (FAR)" and "False Rejection Rate (FRR)", but that is a whole new subject I'll keep for another article. :D

At least other fingerprint images in the same dataset (e.g., "102_*", "103_*", etc) didn't have high scores by the matcher algorithm.

In other words, in this case of NBIS algorithms against FVC2004 database, we are facing a high FRR value (i.e., many genuine fingerprints don't match) and a low FAR value (i.e., it is difficult to a fake fingerprint match).

If we would want to compute the matches of our entire dataset, considering each fingerprint as a probe at a time, we could execute the following Shell Script:

for i in `seq 1 4`
do
  for a in db$i/*.xyt
  do
    echo "[$a]"
    bozorth3 -m1 -A outfmt=spg -T 20 -p $a db$i/*.xyt
    echo
  done
done

For instance, note how good is the matching score for the fingerprint named "104_4.tif" in the "db3" dataset:

[db3/104_4.xyt]
80 db3/104_4.xyt db3/104_1.xyt
59 db3/104_4.xyt db3/104_2.xyt
136 db3/104_4.xyt db3/104_3.xyt
948 db3/104_4.xyt db3/104_4.xyt
113 db3/104_4.xyt db3/104_5.xyt
72 db3/104_4.xyt db3/104_7.xyt
61 db3/104_4.xyt db3/104_8.xyt


4. Summing up


4.1. A complete Shell Script

In order to reproduce all the steps explained before, I suggest you to create a single Shell Script, named "02-run-nbis.sh", inside the "images/" directory, with the following content:

#!/bin/bash

if ! which bozorth3
then
  echo "NBIS programs not in shell path"
  exit 1
fi

echo "Converting TIFF images to WSQ format..."
find db1/ -name "*.tif" -exec cwsq .75 wsq {} -r 640,480,8 \;
find db2/ -name "*.tif" -exec cwsq .75 wsq {} -r 328,364,8 \;
find db3/ -name "*.tif" -exec cwsq .75 wsq {} -r 300,480,8 \;
find db4/ -name "*.tif" -exec cwsq .75 wsq {} -r 288,384,8 \;

echo "Checking quality of WSQ images through NFIQ..."
for i in `seq 1 4`
do
  for a in db$i/*.wsq
  do
    echo "$a"
    nfiq $a
  done
done

echo "Extracting features from fingerprints through MINDTCT..."
for i in `seq 1 4`
do
  for a in db$i/*.wsq
  do
    echo "$a"
    b="${a/.wsq/}"
    mindtct $a $b
  done
done

echo "Performing fingerprint matches through BOZORTH3..."
for i in `seq 1 4`
do
  for a in db$i/*.xyt
  do
    echo "[$a]"
    bozorth3 -m1 -A outfmt=spg -T 20 -p $a db$i/*.xyt
    echo
  done
done

exit 0

Then you should give the file proper permissions and run it using the following instructions:

source nbis-env.sh

cd images/

chmod +x 02-run-nbis.sh

./02-run-nbis.sh

That's all for today, folks! We have reached the end of our article. I hope you enjoyed achieving the proposed tasks, as much as I enjoyed writing them. :D


References

Evaluating NIST Biometric Image Software (NBIS) using FVC2004 Databases - Part 1

This series of articles aim at evaluating NBIS biometric software by using it against fingerprint image databases from FVC2004 verification competition.




1. Contextualization matters


1.1. First of all, what is NBIS?

NBIS, which is an acronym for NIST Biometric Image Software [1], is a biometric software distribution developed by the National Institute of Standards and Technology (NIST) for the Federal Bureau of Investigation (FBI) and the Department of Homeland Security (DHS).



The NBIS software bundle provides a collection of application programs, utilities, and source code libraries. It is organized in two categories: i) non-export controlled and ii) export controlled.

The non-export controlled NBIS software is comprised of five major packages:

  • PCASYS: a neural network based fingerprint pattern classification system;
  • MINDTCT: a fingerprint minutiae detector;
  • NFIQ: a neural network based fingerprint image quality algorithm;
  • AN2K7: a reference implementation of the ANSI/NISTITL 1-2000 "Data Format for the Interchange of Fingerprint, Facial, Scar Mark & Tattoo (SMT) Information" standard; and
  • IMGTOOLS: a collection of image utilities, including encoders and decoders for Baseline and Lossless JPEG and the FBI's WSQ specification.

The export controlled NBIS software is organized into two major packages:

  • NFSEG: a fingerprint segmentation system useful for segmenting four-finger plain impressions; and
  • BOZORTH3: a minutiae based fingerprint matching system.

NIST software is publicly available for download [2] and it is apparently continuously developed to present days. Its last version, 5.0.0, was released in April of 2015.

1.2. Second, what is FVC2004?

The Fingerprint Verification Competition (FVC) is an international competition focused on fingerprint verification software assessment [3]. A subset of fingerprint impressions acquired with various sensors was provided to registered participants, to allow them to adjust the parameters of their algorithms. Participants were requested to provide enroll and match executable files of their algorithms; the evaluation was conducted at the organizers' facilities using the submitted executable files on a sequestered database, acquired with the same sensors as the training set.

These events received great attention both from academic and industrial biometric communities. They established a common benchmark, allowing developers to unambiguously compare their algorithms, and provided an overview of the state-of-the-art in fingerprint recognition. Based on the response of the biometrics community, last editions of FVC were undoubtedly successful initiatives.

FVC2004 [4], the Third International Fingerprint Verification Competition, was held in 2004. The contest involved four different databases (three real and one synthetic) [5] collected by using the following sensors and technologies:

  • DB1: optical sensor "V300" by CrossMatch [6];
  • DB2: optical sensor "U.are.U 4000" by Digital Persona [7];
  • DB3: thermal sweeping sensor "FingerChip FCD4B14CB" by Atmel [8]; and
  • DB4: synthetic fingerprint generation by SFinGe v3.0 [9].

At the end of the data collection, for each database a total of 120 fingers and 12 impressions per finger (1440 impressions) were gathered. As in previous editions, the size of each database to be used in the test was established as 110 fingers wide and 8 impressions per finger deep (880 fingerprints in all); collecting some additional data gave a margin in case of collection/labeling errors.

Fingers from 101 to 110 (set B) have been made available to the participants to allow parameter tuning before the submission of the algorithms; the benchmark is then constituted by fingers numbered from 1 to 100 (set A).

Sensor Type Image Size Set A (wxd) Set B (wxd) Resolution
DB1 Optical Sensor 640x480 (307 Kpixels) 100x8 10x8 500 dpi
DB2 Optical Sensor 328x364 (119 Kpixels) 100x8 10x8 500 dpi
DB3 Thermal sweeping Sensor 300x480 (144 Kpixels) 100x8 10x8 512 dpi
DB4 SFinGe v3.0 288x384 (108 Kpixels) 100x8 10x8 about 500 dpi

The following figure shows a sample image from each database:



2. Gentlemen, start your engines!


2.1. Compiling and setting up NBIS

Well, now that we know what NBIS and FVC are, it is time to put the things to work! We'll be considering Linux operating system from now on. :D

First of all, we need to download NBIS source codes, which are available on NIST's site [2]. At the time this post was being written, the last version was "Release 5.0.0".

Once the ZIP file is downloaded, we need to extract its files and rename the main directory by issuing these commands:

unzip nbis_v5_0_0.zip

mv Rel_5.0.0/ nbis-5.0.0/

cd nbis-5.0.0/

Then, as a good practice in Linux, let's assume a proper directory to install NBIS files: "/opt/nbis-5.0.0/".

mkdir /opt/nbis-5.0.0/

./setup.sh /opt/nbis-5.0.0/

You might face some dependency issues after running "setup.sh". This means you'll need to previously install some packages on your system. Do it before following the next step! For instance, on CentOS 6.x these packages were needed: cmake, libpng-devel, and libX11-devel.

Once "setup.sh" has been executed successfully, you'll run a series of "make" instructions:

make config
make it
make install
make catalog

Very good! At this time you already have 1) prepared the NBIS source codes, 2) compiled them, 3) installed it on /opt, and 4) created the catalog for libraries and programs.

As a suggestion, create a simple Shell Script in order to export system variables, named "nbis-env.sh", and with the following content:

#!/bin/bash

NBIS_HOME="/opt/nbis-5.0.0"

export PATH=$PATH:$NBIS_HOME/bin/
export MANPATH=$NBIS_HOME/man/

How does it work? Simple! Try executing these commands:

cwsq -version

man cwsq

Nothing happened, but errors, right? Now try typing this:

source nbis-env.sh

From now on, NBIS programs and manuals are on the path.

Let's check the version of CWSQ program:

cwsq -version
Standard Version: ANSI/NIST-ITL 1-2007
NBIS Non-Export Control Software Version: Release 5.0.0

And now take a look of its manual:

man cwsq

Well done! This simple Shell Script snippet serves to check whether NBIS programs are available in the path:

if ! which bozorth3; then echo "NBIS programs not in shell path"; fi

2.2. Let's fetch the database files

In order to facilitate retrieving and extracting the files from the FVC2004 site [10], I suggest you to write a simple Shell Script, named "01-get-images.sh", with the following content:

#!/bin/bash

# create directory for the ZIP files
if [ ! -d zips ]
then
  mkdir zips
fi

# download the files
for i in `seq 1 4`
do
  arq="DB${i}_B.zip"
  if [ ! -f zips/$arq ]
  then
    wget "http://bias.csr.unibo.it/fvc2004/Downloads/${arq}"
    mv $arq zips/
  fi
done

# remove directories
rm -rf images/

# extract files
for i in `seq 1 4`
do
  mkdir -p images/db$i
  unzip zips/DB${i}_B.zip -d images/db$i/
done

exit 0

Thus, simply give execution permission to the script and run it:

chmod +x 01-get-images.sh

./01-get-images.sh

You'll note the "images/" directory will be filled with several TIFF images from the four FVC2004 set B databases. Take a look at it:

find images/ | head
images/
images/db1
images/db1/104_3.tif
images/db1/105_6.tif
images/db1/107_8.tif
images/db1/105_3.tif
images/db1/106_4.tif
images/db1/109_1.tif
images/db1/110_4.tif
images/db1/102_5.tif


That's it! We finally ended the first article.

In the next article we'll start using NBIS binaries on FVC2004 fingerprint images: Evaluating NIST Biometric Image Software (NBIS) using FVC2004 Databases - Part 2.

References


sexta-feira, 25 de setembro de 2015

pgAFIS: Biometric Elephant

Biometric technologies are increasingly used in civilian applications. One example is that in the 2014 elections, 21.6 million Brazilians (15% of the country's voters) in 762 municipalities (including 15 capitals) should use biometrics in electronic voting machines in order to reduce the risk of errors, fraud and slowness.


In India a government pioneering initiative promises to create the largest biometric database in the world. In the country there are over half a billion people who lack any kind of identification, making it impossible for them to receive government aid, open bank accounts, apply for loans, get a driver's license, among others. Expecting to register the biometric data of more than one million people a day, the project hopes to have by the end of 2014 the records of all 1.2 billion Indians in its database.


Thus, while scratching on the subject, I found out that the sourcecodes of biometric routines from FBI/NIST are public. Such routines are the heart of AFIS (Automatic Fingerprint & Identification System), among which two algorithms stand out: the characteristics extractor (feature extractor) and the comparator of minutiae (matcher).


After studying this NIST code (in C language), I decided to create support for such features natively on my favorite DBMS, PostgreSQL. And so it was born pgAFIS (Automated Fingerprint Identification System support for PostgreSQL), an extension capable of providing feature extraction and minutiae comparison from SQL statements within PostgreSQL.

But how does that work in practice?

Well, let's start by data modeling. You'll need to create in the database a table in which information of fingerprints will be stored, as described below:

Table "public.fingerprints"
 Column |     Type     | Modifiers 
--------+--------------+-----------
 id     | character(5) | not null
 pgm    | bytea        | 
 wsq    | bytea        | 
 mdt    | bytea        | 
 xyt    | text         | 
Indexes:
    "fingerprints_pkey" PRIMARY KEY, btree (id)
  • "pgm" stores fingerprints raw images (PGM)
  • "wsq" stores fingerprints images in a compressed form (WSQ)
  • "mdt" stores fingerprints templates in  XYTQ  type in a binary format (MDT)
  • "xyt" stores fingerprints minutiae data in textual form
The data type used in these columns is "bytea" (an array of bytes), similar to BLOB in other DBMSs. PGM and WSQ formats are open and well-known in the market. On the other hand, MDT is a format I created. :D

Therefore, fingerprints images may be obtained in raw format and stored on "pgm" column in that table. PGM format is a kind of bitmap, where there is no compression. Fingerprint readers usually store image files using this format.


Now pgAFIS comes into play! A particular function is able to convert the binary data from a PGM content into a WSQ format (a kind of JPEG created by NIST/FBI). This is done by running this SQL statement:

UPDATE fingerprints
SET wsq = cwsq(pgm, 2.25, 300, 300, 8, null);

The second step where pgAFIS acts is the extraction of the fingerprints local characteristics (the minutiae) from the WSQ content. The result is the information in XYTQ type (i.e., horizontal and vertical coordinates, slope angle and quality) of each minutia. Again, you'll only need to execute a single SQL statement:

UPDATE fingerprints
SET mdt = mindt(wsq, true);

To get an idea of storing volume: fingerprint images in the PGM format of 300x300 pixels of 90 kB, when compressed in the WSQ format occupy 28 kB of disk space. After extracting minutiae and generating the content in the MDT format (specific to pgAFIS), that data occupies measly 150 bytes! The textual version of MDT, XYT, takes a little more: about 300 bytes!

afis=>
SELECT id,
  length(pgm) AS raw_bytes,
  length(wsq) AS wsq_bytes,
  length(mdt) AS mdt_bytes,
  length(xyt) AS xyt_chars
FROM fingerprints
LIMIT 5;

  id   | pgm_bytes | wsq_bytes | mdt_bytes | xyt_chars 
-------+-----------+-----------+-----------+-----------
 101_1 |     90015 |     27895 |       162 |       274
 101_2 |     90015 |     27602 |       186 |       312
 101_3 |     90015 |     27856 |       146 |       237
 101_4 |     90015 |     28784 |       154 |       262
 101_5 |     90015 |     27653 |       194 |       324
(5 rows)

These processes are part of the of biometrics Retrieve of biometrics represented in the figure below:


Very cool, but what we do with this pile of binary data? Biometric comparisons!

Biometric systems are able to perform two essential comparison operations: Verification and Identification.

In Verification, also called Authentication or [1:1] Search, a sample fingerprint is compared against a single fingerprint already stored in the database. This process can be seen in the figure below:


And that's where pgAFIS strikes again! With this extension, this operation can also be performed through a simple SQL statement:

SELECT (bz_match(a.mdt, b.mdt) >= 20) AS match
FROM fingerprints a, fingerprints b
WHERE a.id = '101_1' AND b.id = '101_6';

In this statement, the fingerprint identified as "101_1" is compared to "101_6". If the level of similarity between the two is at least 20, we can consider that both are equal, that is, refer to the same finger of the same person. It is a quick procedure, because the identification of the person is given at the start and the system only needs to return a boolean value: "yes" or "no".

Oppositely, Identification or [1:N] Search is a costly process for the system, since a sample fingerprint must be compared against all existing records in the biometrics database. In addition, its feedback is a set of possible identifications found in the database, that is, those considered most similar to the sample according to the comparison algorithm. This process can be seen in the figure below:


Even with this exception that the process may adversely affect the server, pgAFIS provides a feature to handle this risk: the limitation on the number of possible suspects. In SQL, a search example would be like this:

SELECT a.id AS probe, b.id AS sample,
  bz_match(a.mdt, b.mdt) AS score
FROM fingerprints a, fingerprints b
WHERE a.id = '101_1' AND b.id != a.id
  AND bz_match(a.mdt, b.mdt) >= 23
LIMIT 3;
In this example, the three fingerprints that more resemble the one labeled "101_1" are returned.

Did you enjoy it? Have fun with it! Contribute! Source codes are available on GitHub at https://github.com/hjort/pgafis.