Face Detection APIを試したメモ

なんと、ブラウザで顔認識ができる時代になりました。

試したのはChrome 70.0.3530.0（Official Build）canary （64 ビット）で。

仕様

Accelerated Shape Detection in Images

Accelerated Shape Detection in Images なる仕様らしく、大きく2つのAPIが定義されてる。

Face Detection API
Barcode Detection API

顔認識と、バーコードの認識ができる。

今回の記事で試したのは、顔認識のほう。

FaceDetector

APIとしてはシンプルで、このように。

const faceDetector = new window.FaceDetector(options);

オプションで指定できるのは2つ。

interface FaceDetectorOptions {
  fastMode: boolean;
  maxDetectedFaces: number;  
}

`fastMode`は速度優先なら`true`、精度優先なら`false`にする。
手元で試した感じだと実行速度が倍くらい変わる・・！

`maxDetectedFaces`は、検出する顔の数のMAXを決められる風なことが書いてあるけど、`1`に設定してもいっぱい検出するし、逆に大きい値を指定しても最大で8しか検出できなかった。

FaceDetectorが持ってるのは`detect()`のみ。
この`detect()`に`ImageBitmapSource`を渡すとPromiseで認識結果が返ってくる。

async function main() {
  const faceDetector = new window.FaceDetector();
  const $img = document.querySelector('img');
  
  const faces = await faceDetector.detect($img).catch(console.error);
  for (const face of faces) {
    // face: DetectedFace
  }
}

DetectedFace

認識された結果は配列で返ってくる（1つだけでも）ので、それをよしなに使う。

interface DetectedFace {
  boundingBox: DOMRectReadOnly;
  landmarks: Landmark[];
}

`boundingBox`は、`getBoundingClientRect()`で得られるのと同じで、矩形の情報が入ってる。
`x`, `y`, `top`, `left`, `witdh`などなど。

`landmarks`は、`landmarks.type`で目・鼻・口がどこにあったかが返ってくる。
今は`landmarks.locations`が単一座標になってて中心点だけが取れるけど、仕様には周囲を囲う座標的なことも書いてあるので、そのうち変わりそう。

デモつくった

任意の画像から顔を検出

https://leader22.github.io/face-detection-api-example/image/index.html

画像をアップするか、ダミーの（仕様にくっついてたデモに入ってた）画像に対して、顔認識できる。

カメラからリアルタイムに検出

https://leader22.github.io/face-detection-api-example/stream/index.html

`getUserMedia()`したストリームに対して、顔認識してみるデモ。

メインのとこだけコード抜粋。

const options = getFaceDetectorOptions($radio, $number);
const faceDetector = new window.FaceDetector(options);
const imageCapture = new window.ImageCapture($video.srcObject.getVideoTracks()[0]);

detect();
async function detect() {
  requestAnimationFrame(detect);

  let img;
  try {
    img = await imageCapture.grabFrame();
  } catch {
    // Sometimes this throws with message `undefined`...
    return;
  }
  const faces = await faceDetector.detect(img).catch(console.error);
  drawImageToCanvas(img, $canvas);
  drawFaceRectToCanvas(faces, $canvas);
}

`video`から画像を切り出すの、最近ではImageCapture APIが使える環境だとそれが一番楽ですね。
rAFで回すとなんか謎のエラーを吐くタイミングがあって謎やったけど・・。

おわりに

多数の顔を正確に検出する方面での用途としてはまだ微妙な感じ。
なぜか8コまでしか検出できなかったので。

ただ1つだけをトラッキングするのは十分に使い物になる感じがあった。
ブラウザだけでVTuberしてWebRTCで配信！みたいなの簡単に作れる気がする。