Commit Graph

108 Commits

Author SHA1 Message Date
Martin Algesten
7b2f28bbc2 Tidy up Response::url initialization 2021-12-22 07:58:45 +01:00
Martin Algesten
f3857eed00 Ensure we provide a Transport::message() when we can 2021-12-19 21:17:26 +01:00
Martin Algesten
0f0dec5f32 Fixes after feedback 2021-12-19 11:00:39 +01:00
Martin Algesten
2b0eca9827 Move auth header on redirect to unit construction
The auth header stripping was in the wrong place (when serializing the request),
rather than in the construction of the Unit, where it ought to be.

This also makes redirect header retention testable.
2021-12-19 11:00:39 +01:00
Martin Algesten
27533cf31b Use match expression against RedirectAuthHeaders to ensure exhaustive 2021-12-19 11:00:39 +01:00
Martin Algesten
c59632bd97 Use Url (instead of String) in internal history var 2021-12-19 11:00:39 +01:00
llde
26a3715f62 Revert move of variable 2021-12-19 11:00:39 +01:00
llde
653f791638 Create new configuration option for redirect preserving authorization header in Agent. Handle new option in Unit 2021-12-19 11:00:39 +01:00
llde
38ad90307d Preserve Authorization in same host redirects, when scheme and port are equals 2021-12-19 11:00:39 +01:00
Andrew Hickman
eb78813df5 Redact sensitive headers when logging prelude (#415)
Closes #414
2021-10-05 08:57:16 -07:00
Martin Algesten
526eb7b9e0 Fix clippy lints 2021-08-23 20:45:33 +02:00
Jelle Besseling
6797258e9d Use debug logs when making requests 2021-08-14 14:36:25 +02:00
Niketh Murali
4665b0aa5a Fix clippy warnings
Fix linter warning from clippy about unnecessary borrows - "This expression borrows a reference ... that is immediately dereferenced by the compiler"
2021-08-13 09:26:04 +02:00
Gus Power
37e1e91e22 Clear Content-Length header from redirected requests (#394)
This PR removes the Content-Length header from subsequent redirect requests if set.
A test verifies the new behaviour.
2021-06-18 16:12:31 -07:00
Martin Algesten
40e156e2a3 Url access functions for Request (simpler) 2021-03-24 20:58:47 +01:00
Martin Algesten
b42e9afd71 Fix clippy warnings 2021-03-24 20:29:43 +01:00
Martin Algesten
ea53c7cedd Use is_tchar for cookie name check 2021-03-24 20:09:27 +01:00
Martin Algesten
5c9b1b9a0c Enforce cookie RFC name/value rules 2021-03-24 20:09:27 +01:00
Martin Algesten
c5c40cf138 Stop percent encoding cookies 2021-03-24 20:09:27 +01:00
Martin Algesten
026cf75690 Handle non-utf8 status and headers
Non-utf8 headers are ignored and reading the value for them will
yield `None`.
2021-03-14 23:14:43 +01:00
Jacob Hoffman-Andrews
6f86ee7f93 Add example "cureq". (#330)
Contrary to smoke-test, this takes full URLs on the commandline and
prints their contents to stdout. This makes it easier to test behavior
with specific URLs. I hope to later add flags for various behaviors like
printing headers, following redirects, enabling / disabling cookies,
and verbose output.

Also add a useful debug line when receiving a cookie header.
2021-02-21 14:26:12 -08:00
Jacob Hoffman-Andrews
b246f0a9d2 Apply deadline across redirects. (#313)
Previously, each redirect could take timeout time, so a series of slow
redirects could run for longer than expected, or indefinitely.
2021-02-07 12:29:35 -08:00
Michael Diamond
0c467fee13 Add a user_agent() method to AgentBuilder to configure the default User-Agent header. (#311) 2021-01-29 17:23:30 -08:00
Martin Algesten
06abdff4bf cargo fmt (#303) 2021-01-17 11:07:52 -08:00
Joshua Nelson
d0bd2d5ea9 Use iteration instead of recursion for connect (#291)
This allows handling larger redirect chains.

Fixes #290
2021-01-05 13:55:26 -08:00
Joshua Nelson
f0245aad23 Fix some clippy lints (#292)
This commit can be replicated with `cargo +nightly clippy --fix -Z unstable-options`,
plus an edit to fix another `return` missed by clippy.
2021-01-03 20:10:43 -08:00
Joshua Nelson
aeeff40c95 Fix 307/308 redirects (again)
- Add unit test
- Use the original method instead of hard-coding GET
2021-01-03 20:31:00 -05:00
Joshua Nelson
498b7137c2 Make tests much easier to debug
- Don't panic on the mutex in all tests if a single test fails
- Give a more helpful message if a test handler wasn't registered
- Enable env_logger for tests
2021-01-03 20:26:17 -05:00
Jacob Hoffman-Andrews
8cb4f401e3 Add history to response objects (#275)
This allows Error to report both the URL that caused an error, and the
original URL that was requested.

Change unit::connect to use the Response history for tracking number of
redirects, instead of passing the count as a separate parameter.

Incidentally, move handling of the `stream` fully inside `Response`.
Instead of `do_from_read` + `set_stream`, we now have `do_from_stream`,
which takes ownership of the stream and keeps it. We also have
`do_from_request`, which does all of `do_from_stream`, but also sets the
`previous` field.
2020-12-13 11:59:11 -08:00
Jacob Hoffman-Andrews
6c9378ce37 De-redundantize Error kinds. (#259)
Change "Bad" to "Invalid" in error names, mimicking io::Error::ErrorKind.

Change InvalidProxyCreds to ProxyUnauthorized.

Change DnsFailed to just Dns (the fact that there was a failure is implicit
in the fact that this was an error).
2020-12-05 12:05:29 -08:00
Jacob Hoffman-Andrews
c3a6f50dbe Remove status methods on Response. (#258)
Now that Responses with non-2xx statuses get turned into `Error`,
there is less need for these. Also, surveying the set of public crates
that depend on ureq, none of them use these methods. It seems that
users tend to prefer checking the status code directly.

Here is my thinking on each of these individually:

.ok() -- With the new Result API, any Request you get back will be
.ok(). Also, I think the name .ok() is a little confusing with
Result::ok().

.error() - with the new Result API, this is an exact overlap with
anything that would return Error. People will just check for whether a
Result is Err(...) rather than call .error().

.client_error() - most of the time, if someone wants to specially handle
a 4xx error, they want to handle specific ones, because the response to
them is different. For instance a specialized response to a 404 would be
"delete this from the list of URLs to check in the future," where a
specialized response to a 401 would be "try and load updated
credentials." For instance:

4200edb9ed/healthchecks/src/manage.rs (L70-L84)

75d4b363b6/src/lib.rs (L59-L63)

1d7daea38b/src/netlify.rs (L101-L112)

.server_error() - I don't have as much objection to this one, since it's
reasonable to want to treat all server errors (500, 502, 503) more or
less the same. Although even at that, 501 Not Implemented seems like
people would want to handle it differently. I guess that doesn't come up
much in practice - I've never seen a 501 in the wild.

.redirect() - Usually redirects are handled under the hood, unless
someone disables automatic redirect handling. I'm not terribly opposed
to this one, but given that no-one's using it and it's just as easy to
do 300..399.contains(resp.status()), I'm mildly inclined towards
deletion.
2020-12-05 11:32:25 -08:00
Jacob Hoffman-Andrews
18a9b08973 Revert deletions of client_error and friends. 2020-12-05 15:29:11 +01:00
Jacob Hoffman-Andrews
4c3b93d86d Add Error::{kind, status, into_response}.
Also, remove Response::{ok, error, client_error, server_error,
redirect}. The idea is that you would access these through the
Error object instead.

I fetched all the reverse dependencies of ureq on crates.io and looked
for uses of the methods being removed. I found none.

I'm also considering removing the error_on_non_2xx method entirely. If
it's easy to get the underlying response for errors, it would be nice to
make that the single way to do things rather than support two separate
ways of handling HTTP errors.
2020-12-05 15:29:11 +01:00
Jacob Hoffman-Andrews
35c03521b9 Add debug logs for stream pooling. 2020-12-05 15:05:20 +01:00
Joshua Nelson
6bab430d29 Only follow 307/308 redirects for methods without a body 2020-12-05 15:03:47 +01:00
Joshua Nelson
8d21052c7e Follow 307/308 redirects 2020-12-05 15:03:47 +01:00
Jacob Hoffman-Andrews
6a7b064f2a Remove Headers from the public API. (#224)
It turns out Headers is actually an internal-only API. None of the
user-facing types use it.

Unfortunately, making it unexported also required deleting the doctests,
since doctests can only run against a public interface.
2020-11-22 00:15:13 -08:00
Jacob Hoffman-Andrews
fade03b54e Rewrite the Error type. (#234)
This adds a source field to keep track of upstream errors and allow
backtraces, plus a URL field to indicate what URL an error was
associated with.

The enum variants we used to use for Error are now part of a new
ErrorKind type. For convenience within ureq, ErrorKinds can be turned
into an Error with `.new()` or `.msg("some additional information")`.

Error acts as a builder, so additional information can be added after
initial construction. For instance, we return a DnsFailed error when
name resolution fails. When that error bubbles up to Request's
`do_call`, Request adds the URL.

Fixes #232.
2020-11-21 16:14:44 -08:00
Jacob Hoffman-Andrews
ec8dace1af Turn Unit into a built Request (#223)
This involved removing the Request reference from Unit, and adding an
Agent, a method, and headers.

Also, move is_retryable to Unit.
2020-11-14 01:12:01 -08:00
Jacob Hoffman-Andrews
a0b901f35b Remove qstring dependency. (#221)
Instead, rely on Url's built-in query parameter handling. A Request now
accumulates a list of query param pairs, and joins them with a parsed
URL at the time do_call is called.

In the process, remove some getters that rely on parsing the URL.
Adapting these getters was going to be awkward, and they mostly
duplicate things people can readily get by parsing the URL.
2020-11-13 00:02:52 -08:00
Martin Algesten
1369c32351 API changes for 2.0
* Remove Request::build
* All mutations on Request follow builder pattern

The previous `build()` on request was necessary because mutating
functions did not follow a proper builder pattern (taking `&mut self`
instead of `mut self`). With a proper builder pattern, the need for
`.build()` goes away.

* All Request body and call methods consume self

Anything which "executes" the request will now consume the `Request`
to produce a `Result<Response>`.

* Move all config from request to agent builder

Timeouts, redirect config, proxy settings and TLS config are now on
`AgentBuilder`.

* Rename max_pool_connections -> max_idle_connections
* Rename max_pool_connections_per_host ->  max_idle_connections_per_host

Consistent internal and external naming.

* Introduce new AgentConfig for static config created by builder.

`Agent` can be seen as having two parts. Static config and a mutable
shared state between all states. The static config goes into
`AgentConfig` and the mutable shared state into `AgentState`.

* Replace all use of `Default` for `new`.

Deriving or implementing `Default` makes for a secondary instantiation
API.  It is useful in some cases, but gets very confusing when there
is both `new` _and_ a `Default`. It's especially devious for derived
values where a reasonable default is not `0`, `false` or `None`.

* Remove feature native_tls, we want only native rustls.

This feature made for very clunky handling throughout the code. From a
security point of view, it's better to stick with one single TLS API.
Rustls recently got an official audit (very positive).

https://github.com/ctz/rustls/tree/master/audit

Rustls deliberately omits support for older, insecure TLS such as TLS
1.1 or RC4. This might be a problem for a user of ureq, but on balance
not considered important enough to keep native_tls.

* Remove auth and support for basic auth.

The API just wasn't enough. A future reintroduction should at least
also provide a `Bearer` mechanism and possibly more.

* Rename jar -> cookie_store
* Rename jar -> cookie_tin

Just make some field names sync up with the type.

* Drop "cookies" as default feature

The need for handling cookies is probably rare, let's not enable it by
default.

* Change all feature checks for "cookie" to "cookies"

The outward facing feature is "cookies" and I think it's better form
that the code uses the official feature name instead of the optional
library "cookies".

* Keep `set` on Agent level as well as AgentBuilder.

The idea is that an auth exchange might result in a header that need
to be set _after_ the agent has been built.
2020-10-25 11:47:38 +01:00
Jacob Hoffman-Andrews
703ca41960 Push mutexes down into pool and cookie store. (#193)
Previously, Agent stored most of its state in one big
Arc<Mutex<AgentState>>. This separates the Arc from the Mutexes.
Now, Agent is a thin wrapper around an Arc<AgentState>. The individual
components that need locking, ConnectionPool and CookieStore, now are
responsible for their own locking.

There were a couple of reasons for this. Internal components that needed
an Agent were often instead carrying around an Arc<Mutex<AgentState>>.
This felt like the components were too intertwined: those other
components shouldn't have to care quite so much about how Agent is
implemented. Also, this led to compromises of convenience: the Proxy on
Agent wound up stored inside the `Arc<Mutex<AgentState>>` even though it
didn't need locking. It was more convenient that way because that was
what Request and Unit had access too.

The other reason to push things down like this is that it can reduce
lock contention. Mutations to the cookie store don't need to lock the
connection pool, and vice versa. This was a secondary concern, since I
haven't actually profiled these things and found them to be a problem,
but it's a happy result of the refactoring.

Now all the components outside of Agent take an Agent instead of
AgentState.

In the process I removed `Agent.cookie()`. Its API was hard to use
correctly, since it didn't distinguish between cookies on different
hosts. And it would have required updates as part of this refactoring.
I'm open to reinstating some similar functionality with a refreshed API.

I kept `Agent.set_cookie`, but updated its method signature to take a
URL as well as a cookie.

Many of ConnectionPool's methods went from `&mut self` to `&self`,
because ConnectionPool is now using interior mutability.
2020-10-20 00:03:45 -07:00
Jacob Hoffman-Andrews
e36c1c2aa1 Switch to Result-based API. (#132)
Gets rid of synthetic_error, and makes the various send_* methods return `Result<Response, Error>`.
Introduces a new error type "HTTP", which represents an error due to status codes 4xx or 5xx.
The HTTP error type contains a boxed Response, so users can read the actual response if they want.
Adds an `error_for_status` setting to disable the functionality of treating 4xx and 5xx as errors.
Adds .unwrap() to a lot of tests.

Fixes #128.
2020-10-17 00:40:48 -07:00
Jacob Hoffman-Andrews
5b75deccef Use correct host on redirect. (#180) 2020-10-06 00:10:56 -07:00
Jacob Hoffman-Andrews
2d4b42e298 Use cookie_store crate instead of cookie::CookieJar (#169)
CookieJar doesn't support the path-match and domain-match algorithms from [RFC 6265](https://tools.ietf.org/html/rfc6265#section-5.1.3), while cookie_store does.

This fixes some issues with the cookie matching algorithm currently in ureq. For instance,
the domain-match uses substring matching rather than the RFC 6265 algorithm.

This deletes two tests:

match_cookies_returns_nothing_when_no_cookies didn't test much
agent_cookies was failing because cookie_store rejects cookies on the `test:` scheme.
  The way around this is to set up a testserver - but it turns out cookies_on_redirect already
  does that, and covers the same cases and more.

This changes some cookie-related behavior:

 - Cookies could previously be sent to a wrong domain - e.g. a cookie set on `example.com`
  could go to `example.com.evil.com` or `evilexample.com`. Probably no one was relying on
  this, since it's quite broken.
 - A cookie with a path of `/foo` could be sent on a request to `/foobar`, but now it can't.
 - Cookies could previously be set on IP addresses, but now they can't.
 - Cookies could previously be set for domains other than the one on the request (or its
  parents), but now they can't.
 - When a cookie had no domain attribute, it would previously get the domain from the
  request, and subsequently be sent to that domain and all subdomains. Now, it will only
  be sent to that exact domain (host-only).

That last one is probably the most likely to break people, since someone could depend
on it without realizing it was broken behavior.
2020-10-04 10:21:09 -07:00
Jacob Hoffman-Andrews
d4dfe4096f Feature-gate AgentState 2020-10-01 14:27:59 -07:00
Martin Algesten
0346794e87 Fix bug in force-unwrapping when resetting timers
When running tests locally, this error can surface.

```
---- test::agent_test::custom_resolver stdout ----
thread 'test::agent_test::custom_resolver' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }', src/stream.rs:60:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

The problem is that setting the timeouts might fail, and this is done
in a From trait where there is not possibility to "bubble" the
io::Error.

```
socket.set_read_timeout(None).unwrap();
socket.set_write_timeout(None).unwrap();
```

This commit moves the resetting of timers to an explicit `Stream::reset()` fn
that must be called every time we're unwrapping the inner stream.
2020-09-29 11:10:16 +02:00
Jacob Hoffman-Andrews
17d7e147eb Handle ConnectionReset+ConnectionAbort at any time (#168)
Previously we had a special case for BadStatusRead that would happen
only when we got a ConnectionAborted error reading the status line.
However, sometimes we get ConnectionReset instead. Also the HTTP
spec says that idempotent requests may be retried anytime a connection
is closed prematurely.

The change treats as retryable any ConnectionAborted OR ConnectionReset
error while reading the status line and headers. It removes the special
case BadStatusRead error.

Fixes #165 (I think).
2020-09-29 01:55:34 -07:00
Jacob Hoffman-Andrews
06d6435374 Merge branch 'master' of https://github.com/algesten/ureq into cookie_store 2020-09-29 01:45:52 -07:00
Jacob Hoffman-Andrews
065b560dfb Add log dependency. (#170)
Also add log statements to unit. Each request gets one info line;
retries, redirects, and responses get logged at debug level.
2020-09-29 01:37:39 -07:00