spark-wasm browser demo

A real, runnable single-page app that runs Spark-dialect SQL entirely in the browser via the spark-wasm crate -- no server, no backend, data never leaves the tab. This is the spark-rust analogue of the DuckDB-WASM SQL console.

flow wasm sandbox, Arrow-backed, 391 registered scalar functions.

What it shows

The demo uses the friendly JSON path run_sql_json(sql, rowsJson): you hand it rows as a JSON array of objects (registered as table data) and get a JSON array back -- no apache-arrow JS library needed. (The zero-copy Arrow-IPC export run_sql(sql, Uint8Array) -> Uint8Array is also available for callers that already hold Arrow buffers.)

Run it

# one-time prereqs
rustup target add wasm32-unknown-unknown
cargo install wasm-bindgen-cli --version 0.2.122   # must match Cargo.lock

# build the wasm + JS bindings into ./pkg, then serve
bash build.sh
bash serve.sh        # -> http://localhost:8000/

Open http://localhost:8000/ and query away. Edit the SQL or the JSON rows and press Run (or Ctrl/Cmd+Enter).

build.sh sets PROTOC and the getrandom_backend="wasm_js" RUSTFLAG for you. The optional wasm-opt -Oz size pass runs only if wasm-opt is on PATH; without it the demo still runs, the binary is just larger. Measured artifact sizes (this engine, element_at fix included): the debug wasm is ~1.2 GB, the release wasm is 85 MB, and wasm-bindgen --target web emits pkg/spark_wasm_bg.wasm at 80 MB + pkg/spark_wasm.js at ~18 KB. A production bundle should always run wasm-opt -Oz on top (DuckDB-WASM's shipped artifact is ~30 MB for comparison). The size reflects the full 391-function Spark surface compiled in -- tree-shaking to the functions a given app uses is the documented follow-up.

How it fits together

  index.html --loads--> app.js --import--> pkg/spark_wasm.js  (wasm-bindgen glue)
                                              |
                                              v
                                     pkg/spark_wasm_bg.wasm   (the engine)
                                              |
   run_sql_json(sql, rowsJson) ---------------+
     = preprocess_spark_sql            (Spark dialect -> ANSI)
     + register rows as MemTable `data`
     + DataFusion parse / plan / execute
     + datafusion_spark + spark_udfs::register_all   (391 scalar fns)
     -> JSON rows out

The network is touched once, to fetch spark_wasm_bg.wasm from this origin. Everything after that -- every byte of your data -- stays inside the wasm sandbox in the tab. That sandbox is the privacy guarantee: the module has no filesystem and no socket capability handed to it.

Files

FileRole
index.htmlUI shell + styles.
app.jsLoads the wasm module and calls init / udf_count / preprocess_sql / run_sql_json.
build.shcargo build --target wasm32 + wasm-bindgen --target web to pkg/.
serve.shStatic file server (python3 -m http.server).
pkg/Generated bindings (git-ignored; produced by build.sh).

Notes / limits

← back to the demo