* Remove Request::build * All mutations on Request follow builder pattern The previous `build()` on request was necessary because mutating functions did not follow a proper builder pattern (taking `&mut self` instead of `mut self`). With a proper builder pattern, the need for `.build()` goes away. * All Request body and call methods consume self Anything which "executes" the request will now consume the `Request` to produce a `Result<Response>`. * Move all config from request to agent builder Timeouts, redirect config, proxy settings and TLS config are now on `AgentBuilder`. * Rename max_pool_connections -> max_idle_connections * Rename max_pool_connections_per_host -> max_idle_connections_per_host Consistent internal and external naming. * Introduce new AgentConfig for static config created by builder. `Agent` can be seen as having two parts. Static config and a mutable shared state between all states. The static config goes into `AgentConfig` and the mutable shared state into `AgentState`. * Replace all use of `Default` for `new`. Deriving or implementing `Default` makes for a secondary instantiation API. It is useful in some cases, but gets very confusing when there is both `new` _and_ a `Default`. It's especially devious for derived values where a reasonable default is not `0`, `false` or `None`. * Remove feature native_tls, we want only native rustls. This feature made for very clunky handling throughout the code. From a security point of view, it's better to stick with one single TLS API. Rustls recently got an official audit (very positive). https://github.com/ctz/rustls/tree/master/audit Rustls deliberately omits support for older, insecure TLS such as TLS 1.1 or RC4. This might be a problem for a user of ureq, but on balance not considered important enough to keep native_tls. * Remove auth and support for basic auth. The API just wasn't enough. A future reintroduction should at least also provide a `Bearer` mechanism and possibly more. * Rename jar -> cookie_store * Rename jar -> cookie_tin Just make some field names sync up with the type. * Drop "cookies" as default feature The need for handling cookies is probably rare, let's not enable it by default. * Change all feature checks for "cookie" to "cookies" The outward facing feature is "cookies" and I think it's better form that the code uses the official feature name instead of the optional library "cookies". * Keep `set` on Agent level as well as AgentBuilder. The idea is that an auth exchange might result in a header that need to be set _after_ the agent has been built.
104 lines
2.8 KiB
Rust
104 lines
2.8 KiB
Rust
use chrono::Local;
|
|
use rayon::prelude::*;
|
|
|
|
use std::io::{self, BufRead, BufReader, Read};
|
|
use std::iter::Iterator;
|
|
use std::time::Duration;
|
|
use std::{env, error, fmt, result};
|
|
|
|
use ureq;
|
|
|
|
#[derive(Debug)]
|
|
struct Oops(String);
|
|
|
|
impl From<io::Error> for Oops {
|
|
fn from(e: io::Error) -> Oops {
|
|
Oops(e.to_string())
|
|
}
|
|
}
|
|
|
|
impl From<ureq::Error> for Oops {
|
|
fn from(e: ureq::Error) -> Oops {
|
|
Oops(e.to_string())
|
|
}
|
|
}
|
|
|
|
impl From<rayon_core::ThreadPoolBuildError> for Oops {
|
|
fn from(e: rayon_core::ThreadPoolBuildError) -> Oops {
|
|
Oops(e.to_string())
|
|
}
|
|
}
|
|
|
|
impl error::Error for Oops {}
|
|
|
|
impl fmt::Display for Oops {
|
|
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
|
self.0.fmt(f)
|
|
}
|
|
}
|
|
|
|
type Result<T> = result::Result<T, Oops>;
|
|
|
|
fn get(agent: &ureq::Agent, url: &String) -> Result<Vec<u8>> {
|
|
let response = agent.get(url).call()?;
|
|
let mut reader = response.into_reader();
|
|
let mut bytes = vec![];
|
|
reader.read_to_end(&mut bytes)?;
|
|
Ok(bytes)
|
|
}
|
|
|
|
fn get_and_write(agent: &ureq::Agent, url: &String) -> Result<()> {
|
|
println!("🕷️ {} {}", Local::now(), url);
|
|
match get(agent, url) {
|
|
Ok(_) => println!("✔️ {} {}", Local::now(), url),
|
|
Err(e) => println!("⚠️ {} {} {}", Local::now(), url, e),
|
|
}
|
|
Ok(())
|
|
}
|
|
|
|
fn get_many(urls: Vec<String>, simultaneous_fetches: usize) -> Result<()> {
|
|
let agent = ureq::builder()
|
|
.timeout_connect(std::time::Duration::from_secs(5))
|
|
.timeout(Duration::from_secs(20))
|
|
.build();
|
|
let pool = rayon::ThreadPoolBuilder::new()
|
|
.num_threads(simultaneous_fetches)
|
|
.build()?;
|
|
pool.scope(|_| {
|
|
urls.par_iter().map(|u| get_and_write(&agent, u)).count();
|
|
});
|
|
Ok(())
|
|
}
|
|
|
|
fn main() -> Result<()> {
|
|
let args = env::args();
|
|
if args.len() == 1 {
|
|
println!(
|
|
r##"Usage: {:#?} top-1m.csv
|
|
|
|
Where top-1m.csv is a simple, unquoted CSV containing two fields, a rank and
|
|
a domain name. For instance you can get such a list from https://tranco-list.eu/.
|
|
|
|
For each domain, this program will attempt to GET four URLs: The domain name
|
|
name with HTTP and HTTPS, and with and without a www prefix. It will fetch
|
|
using 50 threads concurrently.
|
|
"##,
|
|
env::current_exe()?
|
|
);
|
|
return Ok(());
|
|
}
|
|
env_logger::init();
|
|
let file = std::fs::File::open(args.skip(1).next().unwrap())?;
|
|
let bufreader = BufReader::new(file);
|
|
let mut urls = vec![];
|
|
for line in bufreader.lines() {
|
|
let domain = line?.rsplit(",").next().unwrap().to_string();
|
|
urls.push(format!("http://{}/", domain));
|
|
urls.push(format!("https://{}/", domain));
|
|
urls.push(format!("http://www.{}/", domain));
|
|
urls.push(format!("https://www.{}/", domain));
|
|
}
|
|
get_many(urls, 50)?;
|
|
Ok(())
|
|
}
|