Simple Page View Tracking with Cloudflare KV

>> Page View Tracking

Since this blog is personal and small, I did not believe page view tracking was needed nor was it worth the cost. While free alternatives exist, I do not want extra privacy analytics such as IP address, browser or region to be collected which may identify the user, so I did not consider it until recently. With Cloudflare's free KV storage offering, I could create a simple worker that increments a page view counter by page title when viewed. This would require adding a site script to invoke the worker; however, this can be circumvented by script blockers which is fine since I only need a general view. In terms of Cloudflare privacy policy, the only caveat with this approach is that the endpoints may still log IP for operational monitoring, so using a VPN service or Tor is highly encouraged. Privacy concerns aside, free workers are only offered 1000 daily writes, therefore this approach only works if the site traffic has less than 1000 daily views. Nonetheless, this approach works for my use case and is significantly better than maintaining and securing a database.

(Source code for worker and CLI tool can be found here and here respectively.)

>> Cloudflare Worker

After creating a Cloudflare account, playing around with the dashboard and checking out the ecosystem, I decided to build and deploy the worker using the wrangler CLI with the Rust WASM template specifically worker-rs KV example makes it straightforward. Getting up and started with is:

# Install wrangler via cargo
$ cargo install wrangler

# Copy the template
$ wrangler generate page-tracker-worker https://github.com/cloudflare/rustwasm-worker-template
$ cd page-tracker-worker

# Fetch credentials via browser login
$ wrangler login

# Create the KV storage and store the last line config entry
$ KV_ENTRY=$(wrangler kv:namespace create PAGE_COUNTER --verbose | tail -n 1 )
{ binding = "PAGE_COUNTER", id ="mykvid" }

# Copy the value to wrangler.toml under the kv_namespaces document property
# Using sed to insert it into the empty 6th line easily
$ sed -i "6s/^/kv_namespaces = [ $KV_ENTRY ]" wrangler.toml

# To check if the storage is created
$ wrangler kv:namespace list

The wrangler.toml should look like this:

# wrangler.toml
name = "page-tracker-worker"
type = "javascript"
workers_dev = true
compatibility_date = "2021-10-10"
compatibility_flags = [ "formdata_parser_supports_files" ]
kv_namespaces = [ { binding = "PAGE_COUNTER", id = "mykvid" } ] # NEW LINE HERE

# ... rest of template omitted ...

This is all the configuration needed, the next step is to implement the worker. The worker essentially takes an HTTP POST request, gets the request path and increments the path's counter. With some minor CORS handling and cues from worker-rs docs, the main worker code is easily done:

// src/main.rs
lazy_static! {
    // The site name for CORS
    static ref SITE_URI: &'static str = "https://fnlog.dev";
    // Make sure this matches with the one in `wrangler.toml`
    static ref KV_BINDING: &'static str = "PAGE_COUNTER";
}

// Template main entry point
#[event(fetch, respond_with_errors)]
pub async fn main(req: Request, env: Env) -> Result<Response> {
    // Better logging on error
    utils::set_panic_hook();

    // OPTIONS CORS handler
    if matches!(req.method(), Method::Options) {
        let mut cors_headers = Headers::new();

        cors_headers.set("Access-Control-Allow-Origin", &SITE_URI)?;
        cors_headers.set("Access-Control-Allow-Methods", "POST,OPTIONS")?;
        cors_headers.set("Access-Control-Max-Age", "86400")?;

        // Allow required headers from the request
        cors_headers.set(
            "Access-Control-Allow-Headers",
            &req.headers()
                .get("Access-Control-Request-Headers")
                .ok()
                .unwrap()
                .unwrap_or("".to_owned()),
        )?;

        return Ok(Response::ok("")?.with_headers(cors_headers));
    }

    // POST handler
    // This is POST to make sure it is accessed via network call
    if matches!(req.method(), Method::Post) {
        // Get request path
        let path = req.path();

        // Initialize KV storage by name
        let kv = env.kv(&KV_BINDING)?;

        // Get string counter of the path
        let counter = kv
            .get(&path)
            .await?
            .map(|val| val.as_string())
            .and_then(|txt| txt.parse::<usize>().ok())
            .unwrap_or(0);

        // Increment counter
        let new_counter = (counter + 1).to_string();

        // Store new value
        // Minor gatcha is that .execute() is needed to commit the change
        kv.put(&path, &new_counter)?.execute().await?;

        // Set CORS headers
        let mut cors_headers = Headers::new();
        cors_headers.set("Access-Control-Allow-Origin", &SITE_URI)?;

        // Return success with the path
        return Ok(Response::ok(path)?.with_headers(cors_headers));
    }

    // Fallthrough response
    Response::error("Bad Request", 400)
}

Testing the worker locally:

# Run the dev worker at 127.0.0.1:8787
$ wrangler dev

# Test the worker with curl
# Path: /
$ curl -XPOST http://127.0.0.1:8787/   # Value: 1
$ curl -XPOST http://127.0.0.1:8787/   # Value: 2
# Path: /dev/blog_name
$ curl -XPOST http://127.0.0.1:8787/dev/blog_name

# Check the KV keys
$ wrangler kv:key list --binding PAGE_COUNTER
[{"name": "/"},{"name": "/dev/blog_name"}]

# Fetch KV value by key
$ wrangler kv:key get '/' --binding PAGE_COUNTER
2
$ wrangler kv:key get '/dev/blog_name' --binding PAGE_COUNTER
1

With it working locally, deploying it as simple as wrangler publish:

# Deploy worker to Cloudflare
$ wrangler publish
 Successfully published your script to
 https://page-tracker-worker.fnlog-dev.workers.dev

# Test if it is deployed
$ curl -XPOST https://page-tracker-worker.fnlog-dev.workers.dev/test/path
/test/path

The final step is to invoke this endpoint when a user views or loads a page. As described in the beginning, it can look like this:

function trackPage(success) {
    // URL of the worker
    const workerUri = "https://page-tracker-worker.fnlog-dev.workers.dev";

    // Get the path of the current URL
    const { pathname } = new URL(window.location);

    // Append the path to the worker URI for the complete URL
    let workerUrl = workerUri + pathname

    // Send the network request
    var xhr = window.XMLHttpRequest ? new XMLHttpRequest() : new ActiveXObject('Microsoft.XMLHTTP');
    xhr.open('POST', workerUrl);

    xhr.onreadystatechange = function() {
        if (xhr.readyState>3 && xhr.status==200) success(xhr.responseText);
    };
    xhr.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
    xhr.send(null);
}

trackPage(function(data){ /* NOOP */ });

If CORS is an issue, do review the worker response headers; otherwise, the site now has DIY analytics. Visit the details for the KV dashboard views after some time.

>> Analytics

Although the analytics work, it is tedious to login and check the KV dashboard for the page views and with no means to export the data. Using the Cloudflare API as a workaround, a scheduled daily script could fetch the entries and write it to a CSV file. In particular, the list keys and read key value endpoints is what is needed. After getting my API Token, it should be easy to write a CLI script for this with structopt, reqwest and tokio.

To access the API, the API JWT(PT_JWT), account ID(PT_ACCOUNT_ID) and KV ID(PT_KV_ID) are needed that can be environment variables. To get the account and KV ID, visit the KV dashboard details and the URL should match this pattern: https://dash.cloudflare.com/$PT_ACCOUNT_ID/workers/kv/namespaces/$PT_KV_ID. With structopt env args, the CLI can be started like so:

// src/main.rs
use anyhow::Result;
use std::path::PathBuf;
use structopt::StructOpt;

#[derive(Debug, StructOpt)]
#[structopt()]
enum Opt {
    // Download subcommand
    // Subcommands allow more commands to be added easily
    Download {
        #[structopt(long, env = "PT_JWT", hide_env_values = true)]
        jwt: String,
        #[structopt(long, env = "PT_ACCOUNT_ID", hide_env_values = true)]
        account_id: String,
        #[structopt(long, env = "PT_KV_ID", hide_env_values = true)]
        kv_id: String,

        // Optional arg to specify the download folder
       #[structopt(long, default_value = ".")]
        output_dir: PathBuf,
    },
}

// Setup async main with tokio and quick error handling with anyhow
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    match Opt::from_args() {
        Opt::Download { jwt, account_id, kv_id, output_dir } => {
            todo!();
        }
    }

    Ok(())
}

With the credentials available, the API endpoints can be async functions with some help from serde:

// src/main.rs
use reqwest::Client;
use percent_encoding::{utf8_percent_encode, NON_ALPHANUMERIC};
use serde::Deserialize;

// Define credential as 3-tuple string for type convenience
type Credential = (String, String, String);

// List Keys sample output:
// {
//   "result": [{ "name": "/about/" }],
//   "success": true,
//   "errors": [],
//   "messages": [],
//   "result_info": {
//     "count": 1,
//     "cursor": ""
//   }
// }

#[derive(Debug, Deserialize)]
struct ListKeysPayload {
    result: Vec<ListKey>,
}

#[derive(Debug, Deserialize)]
struct ListKey {
    name: String,
}

// Given a reqwest client and credentials, fetch all KV keys.
async fn list_keys(client: Client, cred: &Credential) -> Result<Vec<String>> {
    let (jwt, account_id, kv_id) = cred;
    let url = format!(
        "https://api.cloudflare.com/client/v4/accounts/{}/storage/kv/namespaces/{}/keys",
        account_id, kv_id
    );

    // JWT is an authorization bearer header
    let resp = client.get(url).bearer_auth(jwt).send().await?;
    let payload = resp.json::<ListKeysPayload>().await?;

    // The only thing needed is the collected `.result[].name`
    Ok(payload
        .result
        .into_iter()
        .map(|key| key.name)
        .collect::<Vec<_>>())
}

// Given a reqwest client, credentials and KV key, fetch the key's value
async fn get_key_value(client: Client, cred: &Credential, key: &str) -> Result<usize> {
    let (jwt, account_id, kv_id) = cred;

    let url = format!(
        "https://api.cloudflare.com/client/v4/accounts/{}/storage/kv/namespaces/{}/values/{}",
        account_id,
        kv_id,
        // The key has to be percent encoded
        utf8_percent_encode(key, NON_ALPHANUMERIC)
    );

    let resp = client.get(url).bearer_auth(jwt).send().await?;
    // No need for custom deserializer since the value is returned directly
    let value = resp.json::<usize>().await?;

    Ok(value)
}

The main fetching code can be implemented using these functions and some futures-rs combinators:

// src/main.rs
// In the download command
// Opt::Download { jwt, account_id, kv_id, output_dir } => {
use futures::{
    future::FutureExt,
    stream::{FuturesUnordered, StreamExt},
};

// Initialize client and credentials
let client = Client::new();
let credentials = (jwt, account_id, kv_id);

// Fetch all keys
let keys = list_keys(client.clone(), &credentials).await?;

// For each key, fetch its value
// To run them concurrently, `FuturesUnordered` is used
let view_futures = FuturesUnordered::new();

for key in keys {
    view_futures.push({
        let client = client.clone();
        let credentials = credentials.clone();

        async move {
            let view: usize = get_key_value(client, &credentials, &key).await?;

            Ok((key, view))
        }
    });
}

// Run all the futures and collect them into a list
let mut data = view_futures.collect::<Vec<Result<_>>>().await;

// Sort the data by path to easily read the data.
data.sort_by_key(|res| {
    if let Ok((path, _)) = res {
        Some(path.clone())
    } else {
        None
    }
});

The last step is to write this data to a CSV file that is made easy by the csv crate:

// src/main.rs
use csv::Writer;
use chrono::{DateTime, Utc};
use serde::Serialize;
use std::io::Write;

#[derive(Debug, Serialize)]
struct CsvRecord {
    path: String,
    views: usize,
}

// Generate the output path using the current date and time
// Sample path: ./my-output-dir/2021-10-12T04:00:50Z.csv
let now: DateTime<Utc> = Utc::now();
let output_path = output_dir.join(now.format(&"%FT%TZ.csv").to_string())

// Open the path for writes
let mut wtr = Writer::from_path(&output_path)?;

// Write each successful entry
for view_res in data {
    let (path, views) = view_res?;

    wtr.serialize(CsvRecord { path, views })?;
}

// Finalize the CSV
wtr.flush()?;

After all that code, the CLI can now be built and the data exported:

# Export required env variables
$ export PV_JWT="1234" PV_ACCOUNT_ID="account_id" PV_KV_ID="kv_id"

# Build the CLI
$ cargo build --release

# Download the data
$ ./target/release/page-tracker download --output-dir "/mnt/analytics_data"

# Sample data
$ cat /mnt/analytics_data/2021-10-12T04:00:50Z.csv
path,views
/about/,2
/dev/browsing-w3m-anonymously-with-tor/,1

>>> Systemd Timer

To run the CLI as a scheduled job with user systemd timers, make sure the CLI is in the PATH and create these two files:

// ~/.config/systemd/user/fnlog-tracker-download.timer
[Unit]
Description=Download page tracker data
Wants=fnlog-tracker-download.timer

[Service]
Type=oneshot
ExecStart=page-tracker download --output-dir "/mnt/analytics_data"
Environment=PT_JWT=jwt
Environment=PT_ACCOUNT_ID=account_id
Environment=PT_KV_ID=kv_id

[Install]
WantedBy=multi-user.target

// ~/.config/systemd/user/fnlog-tracker-timer.timer
[Unit]
Description=Trigger daily analytics download
Requires=fnlog-tracker-download.service

[Timer]
Unit=fnlog-tracker-download.service
OnCalendar=*-*-* 12:00:00

[Install]
WantedBy=timers.target

Start and enable them afterwards:

# Test if the service works
$ systemctl --user start fnlog-tracker-download.service
$ systemctl --user status fnlog-tracker-download.service

# Start and enable the timer
$ systemctl --user start fnlog-tracker-timer.timer
$ systemctl --user enable fnlog-tracker-timer.timer

>> Page View Tracking

>> Cloudflare Worker

>> Analytics

>>> Systemd Timer

>> References