Crawling Historical Cryptocurrency Data with Rust
In this era of seemingly endless growth in stocks and cryptocurrency returns, it appears that everyone, regardless of age or background, has dabbled in either stocks or cryptocurrencies. With over 90% of NASDAQ stocks trading above their 200-day moving averages and Bitcoin showing a 100% growth rate over three months, the market’s bullish nature is undeniable. However, even during such remarkable bull markets, those with limited initial capital either see minimal returns or risk substantial losses by overleveraging their positions during minor market downturns. This situation has led to a common discussion among acquaintances:
“What if we just treat it like a lottery and invest $10 weekly in any random cryptocurrency? Surely one of them might yield significant returns?”
While investing in Bitcoin would undoubtedly show the highest average returns, our goal here is to seek ‘substantial gains.’ Therefore, approaching this like a lottery ticket, with even a small probability of hitting the jackpot, might be worth considering.
To calculate the probability of this ‘crypto lottery’ generating significant returns, we need to collect historical trading data from various cryptocurrencies. This post details the process of collecting historical data from Investing.com using HTTP requests.
Rust reqwest
While Python is commonly used for web crawling, given my current fascination with Rust, I opted to use it for this project as well. In Rust, the http crate provides HTTP request functionality. Historical cryptocurrency price information can be found at “https://kr.investing.com/crypto/{cryptoName}/historical-data”. Rather than checking each manually, I first crawled the “/crypto/currencies” link to obtain a list of cryptocurrencies, then iterated through them.
Following the process step by step, we first retrieve the site’s HTML. Note that we must modify the header to avoid a ‘404 Forbidden’ error:
use reqwest::header::USER_AGENT;
let client = reqwest::blocking::Client::new();
let mut res = client.get("https://kr.investing.com/crypto/currencies")
.header(USER_AGENT, "Agent name")
.send().unwrap();
We then convert the response to a string:
let mut body = String::new();
res.read_to_string(&mut body).unwrap();
To analyze this string effectively, Rust provides the select crate:
use select::document::Document;
use select::predicate::{Attr, Class, Name, Predicate};
let document = Document::from(body.as_str());
The cryptocurrency list we need is located in the tbody section of the “js-all-crypto-table” class. Each coin is wrapped in a tr element:
let coin_table = document.find(Class("js-all-crypto-table")).next().unwrap()
.find(Name("tbody")).next().unwrap();
for coin in coin_table.find(Name("tr")){
// Coin iteration loop
}
We can then extract each coin’s number, name, relative path, and absolute path:
let rank = coin.find(Class("rank")).next().unwrap().text();
let name = coin.find(Class("cryptoName")).next().unwrap().attr("title").unwrap();
let rel_link = coin.find(Name("a")).next().unwrap().attr("href").unwrap();
let link = format!("https://kr.investing.com/{}/historical-data", rel_link);