Adding OAuth2 to Jupyter Notebooks on Kubernetes

Adding OAuth2 to Jupyter Notebooks on Kubernetes

TrueFoundry users can deploy Jupyter Notebooks on their personal cloud accounts, such as AWS, Azure, or GCP. This feature allows them to conduct machine learning experiments and training jobs on their own machines with ease. Initially, notebooks deployed through TrueFoundry were secured using a username-password combination. However, in response to widespread client requests, we have integrated Single Sign-On. This means users can now conveniently access their notebooks with the same login they use for TrueFoundry. This blog post delves into the specifics of how we implemented this feature.

Launching a Jupyter Notebook on TrueFoundry

Notebooks on TrueFoundry

TrueFoundry internally uses a fork of Kubeflow Notebook Controller to orchestrate the deployment of notebooks. The controller provides various features that we leverage, like:

  1. Simplified notebook spec: The Kubeflow Notebook APIs are simple and the controller orchestrates the creation of the Jupyter Notebook deployments.
  2. Automatic Culling: The controller automatically shuts down the notebook after a certain period of inactivity. This is incredibly useful to our clients who run experiments on notebooks backed by GPU machines.
  3. Persistent Home Directory: The controller takes care of creating a persistent volume that saves user progress on the notebook across sessions.
  4. Extensible base images: The controller supports a suite of base notebook images of Jupyter Notebook and VS Code maintained by TrueFoundry. The user can extend the features on these Docker images by adding a startup script or installing specific libraries.

For context, here’s what a simple Kubeflow Notebook object looks like:

apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: my-notebook
spec:
  template:
    spec:
      containers:
        - name: my-notebook
          image: kubeflownotebookswg/jupyter:master
          args:
            [
              "start.sh",
              "lab",
              "--LabApp.token=''",
              "--LabApp.allow_remote_access='True'",
              "--LabApp.allow_root='True'",
              "--LabApp.ip='*'",
              "--LabApp.base_url=/test/my-notebook/",
              "--port=8888",
              "--no-browser",
            ]

Basic Auth for Notebooks

Before implementing OAuth2, TrueFoundry provided users with the option to enhance the security of their public notebooks by integrating basic authentication. This added layer of security was crucial to ensure that only authorized individuals could access the sensitive content of these notebooks. To implement this feature, TrueFoundry utilized the capabilities of WebAssembly (Wasm) plugins within the Istio proxy, specifically the Envoy proxy.

Istio, an open-source service mesh, offers a framework for managing network communications between various service workloads. With Istio, TrueFoundry was empowered to inject custom logic directly into the network layer, which is managed by the Envoy proxy. This approach allowed for effective control and security of the traffic flowing to and from their Jupyter Notebooks. The key to the implementation of basic auth was the WasmPlugin, a feature of Istio that facilitates the deployment of WebAssembly modules within the Envoy proxy.

This basic authentication WasmPlugin is integrated into a sequence of network filters within the Envoy proxy. These filters enable the execution of higher-level functions related to access control, transformation, data enrichment, auditing, and more, thereby enhancing the overall security and functionality of the service mesh. Here’s a simplified version of the spec for adding basic auth filter to the Envoy filter chain:

apiVersion: extensions.istio.io/v1alpha1
kind: WasmPlugin
metadata:
  name: basic-auth
  namespace: istio-ingress
spec:
  phase: AUTHN
  pluginConfig:
    basic_auth_rules:
      - credentials:
          - user:pass
        hosts: www.example.com
        prefix: /secret/
  selector:
    matchLabels:
      istio: ingressgateway
  url: oci://ghcr.io/istio-ecosystem/wasm-extensions/basic_auth:1.12.0

OAuth2 for Notebook

For implementing OAuth2 in our notebooks, we utilized an Envoy filter, but the approach differed from that of basic authentication. Unlike the basic auth where we could conveniently insert a pre-built WasmPlugin into the filter chain, OAuth2 required a more tailored solution. To achieve this, we employed an HTTP filter specifically designed for OAuth. At TrueFoundry, our Single Sign-On system integrates with FusionAuth, serving as our OAuth provider.

Here’s how the Envoy Filter spec looks like – refer to the comments in the file for more details:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: truefoundry-notebook-tfy-oauth2  # Name of the EnvoyFilter
  namespace: auth-test  # Namespace where the EnvoyFilter is deployed
spec:
  workloadSelector:
    labels:
      truefoundry.com/application: truefoundry-notebook  # Selector targeting workloads with specific labels
  configPatches:
  - applyTo: CLUSTER
    match:
      context: SIDECAR_OUTBOUND
    patch:
      operation: ADD
      value:
        name: tfy-oauth2  # Name of the cluster for OAuth2 authentication service
        type: LOGICAL_DNS  # Type of service discovery (DNS)
        connect_timeout: 5s  # Timeout for establishing a connection
        lb_policy: ROUND_ROBIN  # Load balancing policy
        # other load balancing config
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: "envoy.filters.network.http_connection_manager"
            subFilter:
              name: envoy.filters.http.jwt_authn
    patch:
      operation: INSERT_BEFORE  # Inserting this filter before the JWT auth filter
      value:
       name: envoy.filters.http.tfy-oauth  # Name of the OAuth filter
       typed_config:
         "@type": type.googleapis.com/envoy.extensions.filters.http.oauth2.v3.OAuth2
         config:
           use_refresh_token: false  # Whether to use a refresh token
           pass_through_matcher:
             - name: Authorization
               present_match: true  # Pass through if Authorization header is present
           forward_bearer_token: true  # Forward bearer token to upstream
           auth_type: BASIC_AUTH  # Type of authentication used
           token_endpoint:
             cluster: tfy-oauth2  # Cluster for token endpoint
             uri: <token-endpoint-uri-of-oauth-provider>
             timeout: 5s  # Timeout for token endpoint
           authorization_endpoint: <authorization-endpoint-uri-of-oauth-provider>
           redirect_uri: https://%REQ(:authority)%/truefoundry-notebook/_auth/callback  # Redirect URI for callback
           redirect_path_matcher:
             path:
               exact: /truefoundry-notebook/_auth/callback  # Path for redirect URI
           signout_path:
             path:
               exact: /truefoundry-notebook/_auth/signout  # Path for signout
           credentials:
             client_id: <client-id-for-oauth>
             token_secret:
                # configuration to fetch token secret
                # read more about how we fetch secrets here:
                # https://www.envoyproxy.io/docs/envoy/latest/configuration/security/secret
             hmac_secret:
                # configuration to fetch hmac

When a user attempts to access a service protected by the OAuth2 filter for the first time, they are redirected to the authorization_endpoint. This endpoint is the URL of our external OAuth Provider, which, in our implementation, is the FusionAuth-based TrueFoundry login modal. This redirection is a critical step in the OAuth process, guiding users to a secure location where they can authenticate and consequently grant the necessary permissions for access to the service.

Once the login is complete, FusionAuth will redirect you to the redirect_uri (configured in the filter specification), adding a secret, temporary authorization code there. This request is intercepted by the filter and it makes a request to token_endpoint, exchanging the code for a JWT token. Finally, the filter sets cookies with the JWT token.

Subsequent accesses to the service are passed through the HTTP Filter since the cookie sets the Authorization header with JWT as the value. The filter is configured to pass through such requests (refer pass_through_matcher in the spec). To validate that the JWT is a valid token, we create a RequestAuthentication policy that will check with the OAuth provider:

apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  # ...
spec:
  selector:
    # ...
  jwtRules:
  - issuer: "truefoundry.com"
    fromHeaders:
    - name: Authorization
      prefix: "Bearer "
    audiences:
      - <client-id>
    jwksUri: <oauth-provider-jwks-uri>
    forwardOriginalToken: true

Finally, we add the Authorization Policy that specify what requests to apply RequestAuthentication to. We want to apply authorization to all requests on port 8888:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: best-notebook-tfy-oauth2
  namespace: auth-test
spec:
  selector:
    matchLabels:
      truefoundry.com/application: best-notebook
  action: DENY
  rules:
  - from:
    - source:
        notRequestPrincipals: ["*"]
    to:
      - operation:
          ports:
            - "8888"