Capture one or more .har files with Playwright.
Convert the HAR files to CSV.
Use the generated CSV files in Excel, Power Query, Power BI, or another analysis tool.

Tools

`capture_search_har`

Captures search result pages as HAR files.

Supported search engines:

Google
DuckDuckGo
Bing
Brave Search

Supported browsers:

Firefox
Chromium

`har_entries_to_csv`

Converts HAR files to two CSV files:

har_entries.csv: one row per HAR log.entries[] item.
har_summary.csv: one row per HAR file, with aggregated measurements.

Requirements

Activate the project environment before running the tools:

source .noroff-env/bin/activate

Playwright must be installed in the active environment:

pip install playwright
playwright install firefox chromium

Check that the commands are available:

which capture_search_har
which har_entries_to_csv

Basic Capture

Run a search with all supported search engines using Firefox:

capture_search_har \
  --query "weather oslo"

Run a search with Chromium:

capture_search_har \
  --query "weather oslo" \
  --browser chromium

Show the browser window during capture:

capture_search_har \
  --query "weather oslo" \
  --browser chromium \
  --headed

Use a fixed wait condition and timeout:

capture_search_har \
  --query "weather oslo" \
  --browser chromium \
  --wait-until load \
  --timeout-ms 60000 \
  --headed

Capture Selected Search Engines

Only Google:

capture_search_har \
  --query "weather oslo" \
  --engines google

Google and DuckDuckGo:

capture_search_har \
  --query "weather oslo" \
  --engines google duckduckgo

Output Directory

Use --output-dir to choose where HAR files are written:

capture_search_har \
  --query "weather oslo" \
  --browser chromium \
  --output-dir normal_chromium

The generated HAR filenames include timestamp, search engine, and query:

20260508_144327_google_weather_oslo.har

Tor / Proxy Capture

If Tor is available as a SOCKS proxy on 127.0.0.1:9050:

capture_search_har \
  --query "weather oslo" \
  --browser chromium \
  --wait-until load \
  --timeout-ms 60000 \
  --headed \
  --proxy socks5://127.0.0.1:9050 \
  --output-dir tor_chromium

Test Tor before capture:

curl --socks5-hostname 127.0.0.1:9050 https://check.torproject.org/api/ip

The response should contain:

"IsTor": true

Convert HAR to CSV

Run from a folder that contains a data/ directory with HAR files:

har_entries_to_csv

This reads:

data/*.har

and writes:

har_entries.csv
har_summary.csv

Use a custom input folder:

har_entries_to_csv \
  --input-dir normal_chromium

Use custom output filenames:

har_entries_to_csv \
  --input-dir normal_chromium \
  --entries-output entries_normal_chromium.csv \
  --summary-output summary_normal_chromium.csv

Recommended Folder Structure

One practical setup is to keep each test condition in its own folder:

work/
  normal_chromium/
    data/
      *.har
    har_entries.csv
    har_summary.csv

  normal_firefox/
    data/
      *.har
    har_entries.csv
    har_summary.csv

  tor_chromium/
    data/
      *.har
    har_entries.csv
    har_summary.csv

  tor_firefox/
    data/
      *.har
    har_entries.csv
    har_summary.csv

Alternatively, write HAR files directly into condition folders and pass those folders to har_entries_to_csv with --input-dir.

CSV Outputs

`har_entries.csv`

One row per network request in the HAR file.

Useful for inspecting details such as:

request URL
domain
HTTP method
status code
request cookies
response cookies
query parameters
POST data presence
approximate transfer size

`har_summary.csv`

One row per HAR file.

Useful for analysis and visualisation. Important columns include:

har_filename
search_engine
query_text
requests_total
unique_domains
third_party_requests
request_cookies_total
response_cookies_total
query_params_total
post_requests_total
tracking_hint_requests
transferred_kb_approx
page_load_ms
status_2xx
status_3xx
status_4xx
status_5xx

Notes on Interpretation

The HAR files and CSV files show observable browser-side network activity only. They do not show what a search engine stores internally on the server side.

tracking_hint_requests is a keyword-based flag. It is useful for filtering and inspection, but it is not proof of tracking by itself.

har_entries.csv may contain sensitive data such as full URLs, cookie values, headers, and query parameters. Treat it as raw data.

For charts and reporting, har_summary.csv is usually the better starting point.