Skip Menu |
 
Ticket metadata
The Basics
Id: 3200
Status: resolved
Priority: 0/
Queue: OpenSSL-Bugs

Custom Fields
Milestone: (no value)
Subsystem: (no value)
Severity: (no value)
Broken in: (no value)

People
Owner: Nobody in particular
Requestors: Ron Barber
Cc:
AdminCc:

More about the requestors

Ron Barber

Comments about this user: No comment entered about this user
Groups this user belongs to
  • Everyone
  • Unprivileged

New reminder:
Subject:
Owner:
Due:

Dates
Created: Sat Dec 14 08:41:53 2013
Starts: Not set
Started: Sat Dec 14 14:38:11 2013
Last Contact: Wed Dec 18 23:42:08 2013
Due: Not set
Closed: Fri Jan 10 22:02:01 2014
Updated: Fri Jan 10 22:02:01 2014 by Stephen Henson



Subject: Crash in OpenSSL 1.0.1e w/TLS 1.2 (under load)
Date: Fri, 13 Dec 2013 16:13:03 +0000
To: "rt@openssl.org" <rt@openssl.org>
From: Ron Barber <rbarber@yahoo-inc.com>
Download (untitled) / with headers
text/plain 16.7k

Message body is not shown because it is too large.

On Sat Dec 14 08:41:53 2013, rbarber@yahoo-inc.com wrote:
Show quoted text
> We are seeing a segfault when TLS 1.2 is enabled with OpenSSL 1.0.1e (also
> with 1.0.1a). We are running Apache Traffic Server on RHEL6 and when we
> upgraded OpenSSL from 1.0.0 to 1.0.1 we started seeing this issue. I was
> able to narrow down the issue to TLS 1.2 by disabling TLS 1.2. The crash
> consistently happens in less than 1 hour when receiving production load
> (~1000 requests per second) where approx. 15-20% of requests are https.
> Some more details can be obtained from the traffic server reported bug
> (https://issues.apache.org/jira/browse/TS-2355). I don't know anything
> about OpenSSL but did some poking around on the core dump (maybe this will
> help):
>

Hmm... that's a weird one. The debug info tells me it is a TLS v1.0 connection and that it is attempting to use MD5 when calculating the handshake hash. It caches handshake records in the function ssl3_digest_cached_records() using pretty much the same logic that fails later on. That function wouldn't be called if the handshake buffer was never initialised but it should be initialised when the connection is accepted.

So it looks like it's a "this can't happen error",,,

There is a way of stopping the crash at that point by checking to see if EVP_MD_CTX_copy returns an error (which is sensible anyway) but that's fixing a symptom rather than the underlying cause.

Steve.
-- 
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
CC: "openssl-dev@openssl.org" <openssl-dev@openssl.org>
Subject: Re: [openssl.org #3200] Crash in OpenSSL 1.0.1e w/TLS 1.2 (under load)
Date: Mon, 16 Dec 2013 21:09:39 +0000
To: "rt@openssl.org" <rt@openssl.org>
From: Ron Barber <rbarber@yahoo-inc.com>
Download (untitled) / with headers
text/plain 1.4k
On 12/14/13 7:38 AM, "Stephen Henson via RT" <rt@openssl.org> wrote:
Show quoted text
>Hmm... that's a weird one. The debug info tells me it is a TLS v1.0
>connection
>and that it is attempting to use MD5 when calculating the handshake hash.
>It
>caches handshake records in the function ssl3_digest_cached_records()
>using
>pretty much the same logic that fails later on. That function wouldn't be
>called if the handshake buffer was never initialised but it should be
>initialised when the connection is accepted.
>
>So it looks like it's a "this can't happen error",,,
>
>There is a way of stopping the crash at that point by checking to see if
>EVP_MD_CTX_copy returns an error (which is sensible anyway) but that's
>fixing a
>symptom rather than the underlying cause.
>
>Steve.
>--
>Dr Stephen N. Henson. OpenSSL project core developer.
>Commercial tech support now available see: http://www.openssl.org
>

Thank you Steve. Not sure how to proceed from here, is there more
information from the core dumps which would be useful?

I suppose this could be an integration issue between traffic server and
openssl, but I don't see how since we don't have any crash issues when
SSL_OP_NO_TLSv1_2 is set in the call to SSL_CTX_set_options for the server
ctx. Keep in mind that we could be dealing with a not-well-behaved or
well intentioned client.

Not knowing anything about SSL, could the original negotiation have been
TLS v1.2 and then this crash when it attempted to switch to TLS v1.0?
On Mon Dec 16 22:20:47 2013, rbarber@yahoo-inc.com wrote:
Show quoted text
>
> Thank you Steve. Not sure how to proceed from here, is there more
> information from the core dumps which would be useful?
>

Yes, please print out the entire s->s3->handshake_dgst array instead of just the first element. That is:

s->s3->handshake_dgst[0]
s->s3->handshake_dgst[1]
.. up to ...
s->s3->handshake_dgst[5]

Show quoted text
> I suppose this could be an integration issue between traffic server and
> openssl, but I don't see how since we don't have any crash issues when
> SSL_OP_NO_TLSv1_2 is set in the call to SSL_CTX_set_options for the server
> ctx. Keep in mind that we could be dealing with a not-well-behaved or
> well intentioned client.
>

OpenSSL of course should not crash when presented with a broken or mailicious client.

Well if you have SSL_OP_NO_TLSv1_2 set then the only MD5+SHA1 digests in that array are set. If however you use TLS v1.2 then others can be used too. So it's possible that something is confusing that array initialisation using a TLS v1.2 client, but I'm not sure of the mechanism.

Steve.
-- 
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
CC: "openssl-dev@openssl.org" <openssl-dev@openssl.org>
Subject: Re: [openssl.org #3200] Crash in OpenSSL 1.0.1e w/TLS 1.2 (under load)
Date: Tue, 17 Dec 2013 21:28:22 +0000
To: "rt@openssl.org" <rt@openssl.org>
From: Ron Barber <rbarber@yahoo-inc.com>
Download (untitled) / with headers
text/plain 6.8k


On 12/16/13, 6:40 PM, "Stephen Henson via RT" <rt@openssl.org> wrote:
Show quoted text
>
>Yes, please print out the entire s->s3->handshake_dgst array instead of
>just
>the first element. That is:
>
>s->s3->handshake_dgst[0]
>s->s3->handshake_dgst[1]
>.. up to ...
>s->s3->handshake_dgst[5]


I had to set this back up so this is a new core dump (similar stack trace):
Program terminated with signal 11, Segmentation fault.
#0 0x00002ae454d896b1 in EVP_DigestFinal_ex (ctx=0x2ae4652107d0,
md=0x2ae465210750 "", size=0x2ae465210804) at digest.c:271
271 digest.c: No such file or directory.
in digest.c
Missing separate debuginfos, use: debuginfo-install
expat-2.0.1-11.el6_2.x86_64 glibc-2.12-1.107.el6.x86_64
libattr-2.4.44-7.el6.x86_64 libcap-2.16-5.5.el6.x86_64
libevent-1.4.13-4.el6.x86_64 libgcc-4.4.7-3.el6.x86_64
libstdc++-4.4.7-3.el6.x86_64 libxml2-2.7.6-12.el6_4.1.x86_64
nss-softokn-freebl-3.12.9-11.el6.x86_64 openssl-1.0.1e-11.el6.x86_64
pcre-7.8-6.el6.x86_64 tcl-8.5.7-6.el6.x86_64
xz-libs-4.999.9-0.3.beta.20091007git.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) where
#0 0x00002ae454d896b1 in EVP_DigestFinal_ex (ctx=0x2ae4652107d0,
md=0x2ae465210750 "", size=0x2ae465210804) at digest.c:271
#1 0x00002ae454a37cb3 in tls1_final_finish_mac (s=0x2ae5644b0760,
str=0x2ae454a5e949 "client finished", slen=15, out=0x2ae564472114 "") at
t1_enc.c:926
#2 0x00002ae454a2b1e4 in ssl3_do_change_cipher_spec (s=0x2ae5644b0760) at
s3_pkt.c:1462
#3 0x00002ae454a2ad00 in ssl3_read_bytes (s=0x2ae5644b0760, type=22,
buf=0x2ae5643521f0 "\020", len=4, peek=0) at s3_pkt.c:1306
#4 0x00002ae454a2c110 in ssl3_get_message (s=0x2ae5644b0760, st1=8608,
stn=8609, mt=-1, max=516, ok=0x2ae465210a9c) at s3_both.c:451
#5 0x00002ae454a1af11 in ssl3_get_cert_verify (s=0x2ae5644b0760) at
s3_srvr.c:2924
#6 0x00002ae454a1625c in ssl3_accept (s=0x2ae5644b0760) at s3_srvr.c:677
#7 0x00002ae454a483c4 in SSL_accept (s=0x2ae5644b0760) at ssl_lib.c:940
#8 0x00000000006711ba in SSLNetVConnection::sslServerHandShakeEvent
(this=0x2ae570291c60, err=@0x2ae465210d1c) at SSLNetVConnection.cc:488
#9 0x0000000000672a77 in SSLNetVConnection::sslStartHandShake
(this=0x2ae570291c60, event=<value optimized out>, err=@0x2ae465210d1c) at
SSLNetVConnection.cc:470
#10 0x0000000000671cd2 in SSLNetVConnection::net_read_io
(this=0x2ae570291c60, nh=0x2ae45f844bf0, lthread=0x2ae45f841010) at
SSLNetVConnection.cc:217
#11 0x000000000067b7b2 in NetHandler::mainNetEvent (this=0x2ae45f844bf0,
event=<value optimized out>, e=<value optimized out>) at UnixNet.cc:386
#12 0x00000000006a334f in handleEvent (this=0x2ae45f841010, e=0x199dd70,
calling_code=5) at I_Continuation.h:146
#13 EThread::process_event (this=0x2ae45f841010, e=0x199dd70,
calling_code=5) at UnixEThread.cc:141
#14 0x00000000006a3d33 in EThread::execute (this=0x2ae45f841010) at
UnixEThread.cc:265
#15 0x00000000006a21ea in spawn_thread_internal (a=0x1baf330) at
Thread.cc:88
#16 0x000000324f407851 in start_thread () from /lib64/libpthread.so.0
#17 0x000000324f0e890d in clone () from /lib64/libc.so.6
(gdb) info locals
ret = 10980

(gdb) f 1
#1 0x00002ae454a37cb3 in tls1_final_finish_mac (s=0x2ae5644b0760,
str=0x2ae454a5e949 "client finished", slen=15, out=0x2ae564472114 "") at
t1_enc.c:926
926 t1_enc.c: No such file or directory.
in t1_enc.c
(gdb) print *s
$11 = {version = 769, type = 8192, method = 0x2ae454c6dee0, rbio =
0x2ae5640e4930, wbio = 0x2ae56404da50, bbio = 0x2ae56404da50, rwstate = 1,
in_handshake = 1, handshake_func = 0x2ae454a1541e <ssl3_accept>, server =
1, new_session = 0, quiet_shutdown = 1,
shutdown = 0, state = 8608, rstate = 240, init_buf = 0x2ae5640fc530,
init_msg = 0x2ae5643521f4, init_num = 0, init_off = 0, packet =
0x2ae528bcae83 "\024\003\001", packet_length = 0, s2 = 0x0, s3 =
0x2ae564471e00, d1 = 0x0, read_ahead = 0, msg_callback = 0,
msg_callback_arg = 0x0, hit = 0, param = 0x2ae56412f710, cipher_list =
0x0, cipher_list_by_id = 0x0, mac_flags = 0, enc_read_ctx =
0x2ae5640424c0, read_hash = 0x2ae564567630, expand = 0x0, enc_write_ctx =
0x0, write_hash = 0x0, compress = 0x0,
cert = 0x2ae56432d1f0, sid_ctx_length = 0, sid_ctx = '\000' <repeats 31
Show quoted text
times>, session = 0x2ae564a545b0, generate_session_id = 0, verify_mode =
0, verify_callback = 0, info_callback = 0, error = 0, error_code = 0,
psk_client_callback = 0, psk_server_callback = 0,
ctx = 0x1ba6e50, debug = 0, verify_result = 0, ex_data = {sk =
0x2ae5644203b0, dummy = 0}, client_CA = 0x0, references = 1, options =
21102596, mode = 0, max_cert_list = 102400, first_packet = 0,
client_version = 771, max_send_fragment = 16384,
tlsext_debug_cb = 0, tlsext_debug_arg = 0x0, tlsext_hostname = 0x0,
servername_done = 1, tlsext_status_type = -1, tlsext_status_expected = 0,
tlsext_ocsp_ids = 0x0, tlsext_ocsp_exts = 0x0, tlsext_ocsp_resp = 0x0,
tlsext_ocsp_resplen = -1,
tlsext_ticket_expected = 1, tlsext_ecpointformatlist_length = 0,
tlsext_ecpointformatlist = 0x0, tlsext_ellipticcurvelist_length = 0,
tlsext_ellipticcurvelist = 0x0, tlsext_opaque_prf_input = 0x0,
tlsext_opaque_prf_input_len = 0, tlsext_session_ticket = 0x0,
tls_session_ticket_ext_cb = 0, tls_session_ticket_ext_cb_arg = 0x0,
tls_session_secret_cb = 0, tls_session_secret_cb_arg = 0x0, initial_ctx =
0x1ba6e50, next_proto_negotiated = 0x0, next_proto_negotiated_len = 0
'\000', srtp_profiles = 0x0, srtp_profile = 0x0,
tlsext_heartbeat = 0, tlsext_hb_pending = 0, tlsext_hb_seq = 0,
renegotiate = 2, srp_ctx = {SRP_cb_arg = 0x0,
TLS_ext_srp_username_callback = 0, SRP_verify_param_callback = 0,
SRP_give_srp_client_pwd_callback = 0, login = 0x0, N = 0x0, g = 0x0, s =
0x0, B = 0x0,
A = 0x0, a = 0x0, b = 0x0, v = 0x0, info = 0x0, strength = 1024, srp_Mask
Show quoted text
= 0}}
(gdb) print s->s3->handshake_dgst[0]
$1 = (EVP_MD_CTX *) 0x0
(gdb) print s->s3->handshake_dgst[1]
$2 = (EVP_MD_CTX *) 0x0
(gdb) print s->s3->handshake_dgst[2]
$3 = (EVP_MD_CTX *) 0x0
(gdb) print s->s3->handshake_dgst[3]
$4 = (EVP_MD_CTX *) 0x0
(gdb) print s->s3->handshake_dgst[4]
$5 = (EVP_MD_CTX *) 0x2ae5648a5a90
(gdb) print s->s3->handshake_dgst[5]
$6 = (EVP_MD_CTX *) 0x0
(gdb) print *s->s3->handshake_dgst[4]
$8 = {digest = 0x2ae45507de20, engine = 0x0, flags = 0, md_data =
0x2ae56421b9b0, pctx = 0x0, update = 0x2ae454d92290 <update256>}
(gdb) info locals
hashsize = 16
i = 72
ctx = {digest = 0x0, engine = 0x0, flags = 0, md_data = 0x0, pctx = 0x0,
update = 0}
buf = '\000' <repeats 48 times>"\320,
\a!eH\000\000\000\340\267Od\345*\000\000\030\036Gd\345*\000\000\322\351\245
T\344*\000\000\340\267Od\345*\000\000\b\270Od\345*\000\000(\270Od\345*\000\
000\016\261\315T\001\000\000\000\300$\004d\345*\000\000\200a\aU\344*\000"
q = 0x2ae465210750 ""
buf2 = '\000' <repeats 11 times>
idx = 0
mask = 16
err = 0
md = 0x2ae45507dc20

(gdb) f 0
#0 0x00002ae454d896b1 in EVP_DigestFinal_ex (ctx=0x2ae4652107d0,
md=0x2ae465210750 "", size=0x2ae465210804) at digest.c:271
271 digest.c: No such file or directory.
in digest.c
(gdb) print *ctx
$12 = {digest = 0x0, engine = 0x0, flags = 0, md_data = 0x0, pctx = 0x0,
update = 0}
(gdb)
I've added some error and sanity checking to the relevant piece of code. OpenSSL *should* just end up reporting an internal error now if that happens instead of crashing. If you end up with lots of those then it may need further investigation.

The new code is here:

http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=0294b2be5f4c11

Steve.
-- 
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
CC: "openssl-dev@openssl.org" <openssl-dev@openssl.org>
Subject: Re: [openssl.org #3200] Crash in OpenSSL 1.0.1e w/TLS 1.2 (under load)
Date: Wed, 18 Dec 2013 19:43:43 +0000
To: "rt@openssl.org" <rt@openssl.org>
From: Ron Barber <rbarber@yahoo-inc.com>
On 12/18/13, 7:40 AM, "Stephen Henson via RT" <rt@openssl.org> wrote:

Show quoted text
>I've added some error and sanity checking to the relevant piece of code.
>OpenSSL *should* just end up reporting an internal error now if that
>happens
>instead of crashing. If you end up with lots of those then it may need
>further
>investigation.
>
>The new code is here:
>
>http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=0294b2be5f4c11
>
>Steve.
>--
>Dr Stephen N. Henson. OpenSSL project core developer.
>Commercial tech support now available see: http://www.openssl.org
>
Thanks Steve. After applying the patch and letting it run in production
for approx. 5 hours I did not see any crashes. The only suspicious (i.e.
Change in behavior from previous) looking error message was two of these:
[Dec 18 15:27:51.789] Server {0x2ab820908700} ERROR:
SSL::27:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version
number:s3_pkt.c:337:
[Dec 18 17:15:41.125] Server {0x2ab820605700} ERROR:
SSL::24:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version
number:s3_pkt.c:337:
Subject: Re: [openssl.org #3200] Crash in OpenSSL 1.0.1e w/TLS 1.2 (under load)
Date: Wed, 18 Dec 2013 23:42:02 +0100
To: Ron Barber via RT <rt@openssl.org>
From: "Dr. Stephen Henson" <steve@openssl.org>
Download (untitled) / with headers
text/plain 910b
On Wed, Dec 18, 2013, Ron Barber via RT wrote:

Show quoted text
> Thanks Steve. After applying the patch and letting it run in production
> for approx. 5 hours I did not see any crashes. The only suspicious (i.e.
> Change in behavior from previous) looking error message was two of these:
> [Dec 18 15:27:51.789] Server {0x2ab820908700} ERROR:
> SSL::27:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version
> number:s3_pkt.c:337:
> [Dec 18 17:15:41.125] Server {0x2ab820605700} ERROR:
> SSL::24:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version
> number:s3_pkt.c:337:
>

Many thanks for that info. I think I've traced the cause of the thing now with
that clue. It might have security implications (DoS only though) so I'll keep
any further details off the public mailing lists.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
Subject: Re: [openssl.org #3200] Crash in OpenSSL 1.0.1e w/TLS 1.2 (under load)
Date: Fri, 10 Jan 2014 13:51:41 +0100
To: rt@openssl.org
From: Tomas Hoger <thoger@redhat.com>
Download (untitled) / with headers
text/plain 440b
On Wed, 18 Dec 2013 23:42:08 +0100 Stephen Henson via RT wrote:

Show quoted text
> Many thanks for that info. I think I've traced the cause of the thing
> now with that clue. It might have security implications (DoS only
> though) so I'll keep any further details off the public mailing lists.

This is now covered by CVE-2013-6449 and fixed in 1.0.1f:

http://www.openssl.org/news/vulnerabilities.html#2013-6449

I assume this RT can be closed now.

th.