This removes the cookie header on redirect, otherwise if there are
multiple redirects, a new cookie header gets appended each time, while
the existing one remains. This could result in the cookies being leaked
to a third party.
Previously, `ureq` used `HTTP CONNECT` for *all* requests, including
plain-text HTTP requests. However, some proxy servers like Squid
only allow tunneling via the `HTTP CONNECT` method on port 443 - since
it is usually only used to proxy HTTPS requests. As a result,
it was not possible to use `ureq` with a Squid server in its default configuration.
With the changes from this commit, ureq will interact with HTTP proxies
in a more standard-conforming way, where `CONNECT` is only used for
HTTPS requests. HTTP request paths are transformed in such a way that they
comply with RFC 7230 [1].
Tested against Squid 4.13 in standard configuration, with and without basic authentication,
for HTTP and HTTPS requests.
[1] https://www.rfc-editor.org/rfc/rfc7230#section-5.3.2
This allows removing the hack where we create a Response with an empty `reader`,
then immediately mutate it to set the real reader. It also happens to allow us
to get rid of 3 fields of Response that were only used to pass information to
`stream_to_reader`.
I've tried to keep the structure and logic of the body calculation as close to
the same as possible, for ease of review and to avoid introducing bugs. I think
there are some followup fixes we can make to the logic, which will be made
easier by having it in a self contained function.
If the user reads exactly the number of bytes in the response, then
drops the Read object, streams will never get returned to the pool since
the user never triggered a read past the end of the LimitedRead.
This fixes that by making PoolReturnRead aware of the level below it, so
it can ask if a stream is "done" without actually doing a read.
Also, a refactorign:
Previously, Response contained an Option<Box<Unit>> because the testing
method `from_str()` would construct a Response with no associated Unit.
However, this increased code complexity with no corresponding test
benefit. Instead, construct a fake Unit in from_str().
Also, instead of taking an `Option<Box<Unit>>`, PoolReturnRead now takes
a URL (to figure out host and port for the PoolKey) and an &Agent where
it will return the stream. This cuts interconnectedness somewhat:
PoolReturnRead doesn't need to know about Unit anymore.
The auth header stripping was in the wrong place (when serializing the request),
rather than in the construction of the Unit, where it ought to be.
This also makes redirect header retention testable.
Contrary to smoke-test, this takes full URLs on the commandline and
prints their contents to stdout. This makes it easier to test behavior
with specific URLs. I hope to later add flags for various behaviors like
printing headers, following redirects, enabling / disabling cookies,
and verbose output.
Also add a useful debug line when receiving a cookie header.
- Don't panic on the mutex in all tests if a single test fails
- Give a more helpful message if a test handler wasn't registered
- Enable env_logger for tests
This allows Error to report both the URL that caused an error, and the
original URL that was requested.
Change unit::connect to use the Response history for tracking number of
redirects, instead of passing the count as a separate parameter.
Incidentally, move handling of the `stream` fully inside `Response`.
Instead of `do_from_read` + `set_stream`, we now have `do_from_stream`,
which takes ownership of the stream and keeps it. We also have
`do_from_request`, which does all of `do_from_stream`, but also sets the
`previous` field.
Change "Bad" to "Invalid" in error names, mimicking io::Error::ErrorKind.
Change InvalidProxyCreds to ProxyUnauthorized.
Change DnsFailed to just Dns (the fact that there was a failure is implicit
in the fact that this was an error).
Now that Responses with non-2xx statuses get turned into `Error`,
there is less need for these. Also, surveying the set of public crates
that depend on ureq, none of them use these methods. It seems that
users tend to prefer checking the status code directly.
Here is my thinking on each of these individually:
.ok() -- With the new Result API, any Request you get back will be
.ok(). Also, I think the name .ok() is a little confusing with
Result::ok().
.error() - with the new Result API, this is an exact overlap with
anything that would return Error. People will just check for whether a
Result is Err(...) rather than call .error().
.client_error() - most of the time, if someone wants to specially handle
a 4xx error, they want to handle specific ones, because the response to
them is different. For instance a specialized response to a 404 would be
"delete this from the list of URLs to check in the future," where a
specialized response to a 401 would be "try and load updated
credentials." For instance:
4200edb9ed/healthchecks/src/manage.rs (L70-L84)75d4b363b6/src/lib.rs (L59-L63)1d7daea38b/src/netlify.rs (L101-L112)
.server_error() - I don't have as much objection to this one, since it's
reasonable to want to treat all server errors (500, 502, 503) more or
less the same. Although even at that, 501 Not Implemented seems like
people would want to handle it differently. I guess that doesn't come up
much in practice - I've never seen a 501 in the wild.
.redirect() - Usually redirects are handled under the hood, unless
someone disables automatic redirect handling. I'm not terribly opposed
to this one, but given that no-one's using it and it's just as easy to
do 300..399.contains(resp.status()), I'm mildly inclined towards
deletion.
Also, remove Response::{ok, error, client_error, server_error,
redirect}. The idea is that you would access these through the
Error object instead.
I fetched all the reverse dependencies of ureq on crates.io and looked
for uses of the methods being removed. I found none.
I'm also considering removing the error_on_non_2xx method entirely. If
it's easy to get the underlying response for errors, it would be nice to
make that the single way to do things rather than support two separate
ways of handling HTTP errors.
It turns out Headers is actually an internal-only API. None of the
user-facing types use it.
Unfortunately, making it unexported also required deleting the doctests,
since doctests can only run against a public interface.
This adds a source field to keep track of upstream errors and allow
backtraces, plus a URL field to indicate what URL an error was
associated with.
The enum variants we used to use for Error are now part of a new
ErrorKind type. For convenience within ureq, ErrorKinds can be turned
into an Error with `.new()` or `.msg("some additional information")`.
Error acts as a builder, so additional information can be added after
initial construction. For instance, we return a DnsFailed error when
name resolution fails. When that error bubbles up to Request's
`do_call`, Request adds the URL.
Fixes#232.
Instead, rely on Url's built-in query parameter handling. A Request now
accumulates a list of query param pairs, and joins them with a parsed
URL at the time do_call is called.
In the process, remove some getters that rely on parsing the URL.
Adapting these getters was going to be awkward, and they mostly
duplicate things people can readily get by parsing the URL.
* Remove Request::build
* All mutations on Request follow builder pattern
The previous `build()` on request was necessary because mutating
functions did not follow a proper builder pattern (taking `&mut self`
instead of `mut self`). With a proper builder pattern, the need for
`.build()` goes away.
* All Request body and call methods consume self
Anything which "executes" the request will now consume the `Request`
to produce a `Result<Response>`.
* Move all config from request to agent builder
Timeouts, redirect config, proxy settings and TLS config are now on
`AgentBuilder`.
* Rename max_pool_connections -> max_idle_connections
* Rename max_pool_connections_per_host -> max_idle_connections_per_host
Consistent internal and external naming.
* Introduce new AgentConfig for static config created by builder.
`Agent` can be seen as having two parts. Static config and a mutable
shared state between all states. The static config goes into
`AgentConfig` and the mutable shared state into `AgentState`.
* Replace all use of `Default` for `new`.
Deriving or implementing `Default` makes for a secondary instantiation
API. It is useful in some cases, but gets very confusing when there
is both `new` _and_ a `Default`. It's especially devious for derived
values where a reasonable default is not `0`, `false` or `None`.
* Remove feature native_tls, we want only native rustls.
This feature made for very clunky handling throughout the code. From a
security point of view, it's better to stick with one single TLS API.
Rustls recently got an official audit (very positive).
https://github.com/ctz/rustls/tree/master/audit
Rustls deliberately omits support for older, insecure TLS such as TLS
1.1 or RC4. This might be a problem for a user of ureq, but on balance
not considered important enough to keep native_tls.
* Remove auth and support for basic auth.
The API just wasn't enough. A future reintroduction should at least
also provide a `Bearer` mechanism and possibly more.
* Rename jar -> cookie_store
* Rename jar -> cookie_tin
Just make some field names sync up with the type.
* Drop "cookies" as default feature
The need for handling cookies is probably rare, let's not enable it by
default.
* Change all feature checks for "cookie" to "cookies"
The outward facing feature is "cookies" and I think it's better form
that the code uses the official feature name instead of the optional
library "cookies".
* Keep `set` on Agent level as well as AgentBuilder.
The idea is that an auth exchange might result in a header that need
to be set _after_ the agent has been built.