Using a Fujitsu Scansnap S1500 with Kubernetes

Apr 14, 2024

I really love my Doxie scanner, a little portable device that can run off AA batteries and also write out straight JPEG files to an SD card without much hassle. Its amazing for a truly portable scanner but if you want to scan a page duplex, or even more than three pages, it becomes a bit of a chore to deal with. After the scanning is done you’ve then got to handle those files into a usable format, thankfully the software does a lot of the heavy lifting but it is sluggish and makes a lot of weird assumptions about your system.

Digging around eBay a few days ago, my favourite local e-waste recycler had a Fujitsu ScanSnap S1500 about to end bidding at £15 with no bids, untested of course, but I took a punt and won it for £15. After buying a basic 24v 1 amp power adapter from Amazon it was up and working with only some minor issues; it seems that someone had tried to scan something that left a black smudge mark on the glass which caused a black line down the middle of the scan, a two-second scrub with some Isopropyl sorted it right out.

The dream of my workflow is to scan to Paperless without any real interaction bar the ‘Scan’ button on the device itself, it is possible, but not exactly a quick setup at the moment. I decided to try and get the scanner setup via my Kubernetes cluster just to save the problem of having another Raspberry Pi running somewhere with some hand-managed configuration. I came across ScanServJS which is a NodeJS UI for SANE and allowed for a decent level of automation without having to delve into a lot of scripting and fiddling.

With a basic configuration, I could have a simple UI to adjust scan settings, and a simple ‘Scan’ button on it, good enough for now. The project comes with some Docker installation instructions so adapting these to Kubernetes should be a quick job.

Detecting the Scanner

First step is getting the Kubernetes cluster to detect the device, and make sure that pods can access it. By far the easiest way to do this is to use node-feature-discovery, out of the box this operator will tag nodes with labels you can use in your Pod definitions.

You can add additional configuration to the operator to give it a custom label to make it easier to detect, but by default, a label will be created for any plugged-in USB device. The label will include the manafacture and device ID, so in the case of this Fujitsu ScanSnap S1500 it’ll be feature.node.kubernetes.io/usb-ff_04c5_11a2.present

Using this label we can define a node affinity for our pod to ensure that the pod runs on the same node as the scanner is connected to.

Installing ScanServJS

I’m going to take the lazy route. I use Helm for most of my application deployments and for these types of single container applications, I use a boiler-plate Helm chart that allows for some customisation. All of the hard work was done by the Kubernetes At Home team, and i’ve took a copy and adjusted to my own needs, but the major concepts are the same.

On top of this, I use FluxCD to install and manage my configuration, Flux is out of the scope of this post, but many guides are available online to get the basics.

So for our first pass, we’ll define our HelmRelease, the object that manages a Helm chart installation:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: scanservjs
  namespace: tools
spec:
  interval: 5m
  chart:
    spec:
      chart: common-chart
      version: 1.2.3
      sourceRef:
        kind: HelmRepository
        name: nikdoof
        namespace: flux-system
      interval: 5m
  values:
    global:
      nameOverride: scanservjs
    image:
      repository: sbs20/scanservjs
      tag: latest
    service:
      main:
        ports:
          http:
            port: 8080
    ingress:
      main:
        enabled: true
        hosts:
          - host: scanner.prod.doofnet.uk
            paths:
              - path: /
                pathType: Prefix
    securityContext:
      privileged: true
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: feature.node.kubernetes.io/usb-ff_04c5_11a2.present
                  operator: In
                  values:
                    - "true"
    persistence:
      usb-bus:
        enabled: true
        type: hostPath
        hostPath: /dev/bus/usb
        mountPath: /dev/bus/usb
      usb:
        enabled: true
        type: hostPath
        hostPath: /dev/usb
        mountPath: /dev/usb
      paperless-data:
        enabled: true
        type: pvc
        mountPath: /var/lib/scanservjs/output
        subPath: inbox
        existingClaim: paperless-data-pvc

Affinity

A few key parts we need to go over in this definition first is getting the pod to run on the node that has the scanner attached:

    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: feature.node.kubernetes.io/usb-ff_04c5_11a2.present
                  operator: In
                  values:
                    - "true"

Using our node labels, we define that we want this pod to start on the node with the label but once it’s started we don’t care. If you require the label all the time then the second the scanner goes to sleep it’ll disconnect from USB and the pod will be killed. As the node we’re plugged into is relatively static we don’t care too much about this, hence the use of requiredDuringSchedulingIgnoredDuringExecution.

USB Bus Access

Next, we want to ensure that the pod has access to the USB bus itself. I’ve decided to use the privileged security context, this allows the container to access resources on the node itself, such as the USB bus and other paths.

    securityContext:
      privileged: true

    persistence:
      usb-bus:
        enabled: true
        type: hostPath
        hostPath: /dev/bus/usb
        mountPath: /dev/bus/usb
      usb:
        enabled: true
        type: hostPath
        hostPath: /dev/usb
        mountPath: /dev/usb

We also specify that the USB bus, /dev/bus/usb and /dev/usb` are mounted in the container, this will allow ScanServJS/SANE can scan and read the USB bus without any extra configuration or permission wrangling. Yes, you can do this without privileged mode, but you’ll need to consider udev rules on the node and other modifications. As this is a relatively non-critical or high-risk container it is not much of an issue for my home cluster.

Output

The last thing we need to do is define an output folder for the resulting scans. In my instance, I want to use the Paperless NGX inbox folder so new scans can be picked up and organised into Paperless. As Paperless is also on this cluster I can just specify the PVC that Paperless already uses:

    persistence:
      paperless-data:
        enabled: true
        type: pvc
        mountPath: /var/lib/scanservjs/output
        subPath: inbox
        existingClaim: paperless-data-pvc

The output folder needs to be mounted as /var/lib/scanservjs/output, and in my case the PVC contains the entire Paperless data folders, so we specify the inbox subpath.

Configuring The Scanner

ScanServJS makes some educated guesses as to how your scanner should work. Using the SANE command line it’ll query the scanner and set some values based on the responses. In the case of the S1500 SANE, not ScanServJS, gets it wrong.

The paper size is set to 210mm x 279mm.
The resolution is set to its lowest value.
The duplex scan option isn’t enabled.

ScanServJS thankfully allows you to override configuration settings by providing a Javascript config file into the application. For us, using Kubernetes, we can map in this file from a ConfigMap:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: scanservjs-conf
  namespace: tools
data:
  config.local.js: |
    /* eslint-disable no-unused-vars */
    const options = { paths: ['/usr/lib/scanservjs'] };
    const Process = require(require.resolve('./server/classes/process', options));
    const dayjs = require(require.resolve('dayjs', options));

    module.exports = {
      afterConfig(config) {
      },

      afterDevices(devices) {
        devices
        .filter(d => d.id.includes('fujitsu:ScanSnap S1500'))
        .forEach(device => {
          device.features['--source'].default = 'ADF Duplex';
          device.features['--mode'].default = 'Gray';
          device.features['--resolution'].default = 300;
          device.settings.batchMode.default = 'auto';
          device.settings.pipeline.default = 'PDF (JPG | @:pipeline.high-quality)';

          device.features['--page-height'].default = 297;
          device.features['--page-width'].default = 210;
          device.features['-l'].limits = [0, 215];
          device.features['-t'].limits = [0, 297];
          device.features['-x'].default = 210;
          device.features['-x'].limits = [0, 215];
          device.features['-y'].default = 297;
          device.features['-y'].limits = [0, 297];
        });
      },


      async afterScan(fileInfo) {
      },

      actions: [
      ]
    };

    controller:
      annotations:
        configmap.reloader.stakater.com/reload: "scanservjs-conf"
    persistence:
      config:
        enabled: true
        type: "configMap"
        name: "scanservjs-conf"
        mountPath: "/etc/scanservjs/config.local.js"
        subPath: "config.local.js"

The extra annotation, configmap.reloader.stakater.com/reload is for another operator: Reloader, its only job is to restart pods where an associated ConfigMap has been updated. This is useful as by default Kubernetes doesn’t do this for you.

ScanServJS has a useful recipes page that goes over the most common configuration items that people need. Thankfully most scanners just work, so you may not need it for yours.

The Final `HelmRelease`

With all that done, here is the final HelmRelease and ConfigMap for installing the application, and configuring it with sensible defaults:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: scanservjs
  namespace: tools
spec:
  interval: 5m
  chart:
    spec:
      chart: common-chart
      version: 1.2.3
      sourceRef:
        kind: HelmRepository
        name: nikdoof
        namespace: flux-system
      interval: 5m
  values:
    global:
      nameOverride: scanservjs
    image:
      repository: sbs20/scanservjs
      tag: latest
    controller:
      annotations:
        configmap.reloader.stakater.com/reload: "scanservjs-conf"
    service:
      main:
        ports:
          http:
            port: 8080
    ingress:
      main:
        enabled: true
        hosts:
          - host: scanner.prod.doofnet.uk
            paths:
              - path: /
                pathType: Prefix
    securityContext:
      privileged: true
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: feature.node.kubernetes.io/usb-ff_04c5_11a2.present
                  operator: In
                  values:
                    - "true"
    persistence:
      usb-bus:
        enabled: true
        type: hostPath
        hostPath: /dev/bus/usb
        mountPath: /dev/bus/usb
      usb:
        enabled: true
        type: hostPath
        hostPath: /dev/usb
        mountPath: /dev/usb
      paperless-data:
        enabled: true
        type: pvc
        mountPath: /var/lib/scanservjs/output
        subPath: inbox
        existingClaim: paperless-data-pvc
      config:
        enabled: true
        type: "configMap"
        name: "scanservjs-conf"
        mountPath: "/etc/scanservjs/config.local.js"
        subPath: "config.local.js"

It has been a short but interesting adventure into getting the scanner to work on my Kubernetes cluster. The next step is to make use of sanebd to trigger scans from the button on the device, but that is for another post.