FACEngine ID Pose Estimation Accuracy
A process unique to Animetrics, our Avatar Generation consists of producing a 3-dimensional model of a subject using highly complex mathematical algorithms analyze 2-dimensional images and, through a process for smooth deformations, generate realistic, accurate, fully structured and defined 3-dimensional avatars. The generated 3-dimensional model can be easily reoriented to vastly improve identification accuracy with off-pose data. Animetrics tests and validates the accuracy of this avatar generation and pose normalization by employing synthetic images generated through the manual landmarking of facial imagery whcih is rendered at a predetermined pose. We are then able to analyze the projected feature coordinates of the synthetic image, feed those values back into the avatar generation module and compare the know original results with the newly generated values.
Figure 1 below shows results depicting the comparison of a subset of the anatomically defined fiducial marks superimposed over the manually defined features. Green points are taken from the single merged geometry, projected into the projective plane. Red points show the manually defined features. We have studied the rigid motion estimation by generating simulated databases from the FRGC with known rigid motions of up to 30 degrees in X, Y, and Z directions. The figure below shows the accuracy of the rigid motion estimation generated. We observe on the order of 10−2 degree bias in Y and Z rotations throughout the full range of simulated rotations, and sub degree bias in X rotations, with under 2 degree standard deviation in X and under .25 degree standard deviation in Y and Z.
From an initial dataset of 1190 images, 1174 images were successfully detected with 2DIFA. Out of 1174 images used for validation of the Fine Pose Estimation, only 5 images produced a failed result (a rotation error of > 22° on any axis), and an additional 28 images produced degraded but acceptable results (a rotation error of >10° and ≤22° on any axis), for an overall success rate of 97.2%. The success rate against the entire dataset, including failures to properly localize the head is 95.9%.
The Data is high resolution, carefully collected data which, as expected, produced highly precise results. For the data, there were zero failures to detect the face. Rotation estimation for every image on all three axes was within an acceptable range of +/-10°.
Figure 3 below shows the distribution of errors in the rotation estimation of the data. The percentages for each bin are shown below the bin labels for each axis. Bins to the right of 0 are positive rotations, and left of 0 are negative rotations.
The uncontrolled data produced 16 failures to detect the face in the image out of 508 total images. This is mitigated by the fact that 13 of 16 failures were on heads which were within +/-10% of the designed scale range of the Coarse Pose Estimator.
The fine pose estimation of the FRGC uncontrolled data produced positive results. Excluding the 16 head detection failures, there was only one instance where the automatic rigid motion estimation produced a result more than 22 degrees from the ground truth result (in this case, the Y-axis). That particular image also produced threshold errors (>10°) in the X and Z axes. Out of the 491 remaining images, there were 15 instances of variation from the ground truth results greater than 10° (but < 22°) on one axis per instance (6 in X, 8 in Y and 1 in Z). Excluding the failed results, the standard deviation of the 491 successes & partial errors produced standard deviation error rates in line with the FRGC controlled data.
Table 2: Uncontrolled Results for Rotation Estimation
Figure 5 below shows the distribution of errors in the rotation estimation of the Uncontrolled data. The percentages for each bin are shown below the bin labels for each axis. Bins to the right of 0 are positive rotations, and left of 0 are negative rotations. Error percentages listed above each histogram are for variances that fell between 10° and 22° from the ground truth result.
Mobile Camera Data
A mobile camera was used to capture additional data. The camera had a resolution of 640 x 480 and sent imagery as an MPEG4 stream which was later rendered out to individual JPEG images. This data produced a larger number of errors including 4 instances in Y where the rotation error was greater than 22 degrees. The data collected (although all inside) was taken without concern for ambient lighting conditions, open windows, etc., as well as limited concern for head rotation, and is in that regard more “uncontrolled” than the FRGC Uncontrolled dataset. For example, Figure 6 below shows an example of increased pose variation presented in the mobile data:
In spite of the additional failures, X and Z rotations remained stable. There was a fairly significant increase in rotation error in the Y axis as shown in the table below. However, nearly 95% of the images analyzed in this set returned at least partially successful results.
Table 3: Mobile Camera Results for Rotation Estimation
The figure below shows the distribution of errors in the rotation estimation of the Uncontrolled data. The percentages for each bin are shown below the bin labels for each axis. Bins to the right of 0 are positive rotations, and left of 0 are negative rotations. Error percentages listed above each histogram are for variances that fell between 10° and 22° from the ground truth result.