Search HAR Capture and CSV Tools
This project contains small tools for collecting browser network data from search engines and converting the resulting HAR files into CSV datasets.
The intended workflow is:
- Capture one or more
.harfiles with Playwright. - Convert the HAR files to CSV.
- Use the generated CSV files in Excel, Power Query, Power BI, or another analysis tool.
Tools
capture_search_har
Captures search result pages as HAR files.
Supported search engines:
- DuckDuckGo
- Bing
- Brave Search
Supported browsers:
- Firefox
- Chromium
har_entries_to_csv
Converts HAR files to two CSV files:
har_entries.csv: one row per HARlog.entries[]item.har_summary.csv: one row per HAR file, with aggregated measurements.
Requirements
Activate the project environment before running the tools:
source .noroff-env/bin/activate
Playwright must be installed in the active environment:
pip install playwright
playwright install firefox chromium
Check that the commands are available:
which capture_search_har
which har_entries_to_csv
Basic Capture
Run a search with all supported search engines using Firefox:
capture_search_har \
--query "weather oslo"
Run a search with Chromium:
capture_search_har \
--query "weather oslo" \
--browser chromium
Show the browser window during capture:
capture_search_har \
--query "weather oslo" \
--browser chromium \
--headed
Use a fixed wait condition and timeout:
capture_search_har \
--query "weather oslo" \
--browser chromium \
--wait-until load \
--timeout-ms 60000 \
--headed
Capture Selected Search Engines
Only Google:
capture_search_har \
--query "weather oslo" \
--engines google
Google and DuckDuckGo:
capture_search_har \
--query "weather oslo" \
--engines google duckduckgo
Output Directory
Use --output-dir to choose where HAR files are written:
capture_search_har \
--query "weather oslo" \
--browser chromium \
--output-dir normal_chromium
The generated HAR filenames include timestamp, search engine, and query:
20260508_144327_google_weather_oslo.har
Tor / Proxy Capture
If Tor is available as a SOCKS proxy on 127.0.0.1:9050:
capture_search_har \
--query "weather oslo" \
--browser chromium \
--wait-until load \
--timeout-ms 60000 \
--headed \
--proxy socks5://127.0.0.1:9050 \
--output-dir tor_chromium
Test Tor before capture:
curl --socks5-hostname 127.0.0.1:9050 https://check.torproject.org/api/ip
The response should contain:
"IsTor": true
Convert HAR to CSV
Run from a folder that contains a data/ directory with HAR files:
har_entries_to_csv
This reads:
data/*.har
and writes:
har_entries.csv
har_summary.csv
Use a custom input folder:
har_entries_to_csv \
--input-dir normal_chromium
Use custom output filenames:
har_entries_to_csv \
--input-dir normal_chromium \
--entries-output entries_normal_chromium.csv \
--summary-output summary_normal_chromium.csv
Recommended Folder Structure
One practical setup is to keep each test condition in its own folder:
work/
normal_chromium/
data/
*.har
har_entries.csv
har_summary.csv
normal_firefox/
data/
*.har
har_entries.csv
har_summary.csv
tor_chromium/
data/
*.har
har_entries.csv
har_summary.csv
tor_firefox/
data/
*.har
har_entries.csv
har_summary.csv
Alternatively, write HAR files directly into condition folders and pass those
folders to har_entries_to_csv with --input-dir.
CSV Outputs
har_entries.csv
One row per network request in the HAR file.
Useful for inspecting details such as:
- request URL
- domain
- HTTP method
- status code
- request cookies
- response cookies
- query parameters
- POST data presence
- approximate transfer size
har_summary.csv
One row per HAR file.
Useful for analysis and visualisation. Important columns include:
har_filenamesearch_enginequery_textrequests_totalunique_domainsthird_party_requestsrequest_cookies_totalresponse_cookies_totalquery_params_totalpost_requests_totaltracking_hint_requeststransferred_kb_approxpage_load_msstatus_2xxstatus_3xxstatus_4xxstatus_5xx
Notes on Interpretation
The HAR files and CSV files show observable browser-side network activity only. They do not show what a search engine stores internally on the server side.
tracking_hint_requests is a keyword-based flag. It is useful for filtering and
inspection, but it is not proof of tracking by itself.
har_entries.csv may contain sensitive data such as full URLs, cookie values,
headers, and query parameters. Treat it as raw data.
For charts and reporting, har_summary.csv is usually the better starting point.