Discussion:
[Bug libstdc++/61582] New: C11 regex memory corruption
(too old to reply)
max at cert dot cx
2014-06-23 00:05:26 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

Bug ID: 61582
Summary: C11 regex memory corruption
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: major
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: max at cert dot cx

Hi,

Tested on GCC 4.8.1

----------
#include <regex>

using namespace std;

int main (int argc, char *argv[])
{
regex r(argv[1]);
return 0;
}
----------

# g++ c11RE.cpp -o c11RE -std=c++11 -Wall
# ./c11RE '.*'
# ./c11RE '(|'
Segmentation fault (core dumped)
# ./c11RE '((x|'
*** Error in `./c11RE': malloc(): memory corruption: 0x00007fffa0cb8670 ***

Expected (regex_error):
# ./c11RE '(x}'
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Aborted (core dumped)

------------
(gdb) r '(|'
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/cx/c11RE '(|'

Program received signal SIGSEGV, Segmentation fault.
0x00000000004022cc in
std::__detail::_StateSeq::_StateSeq(std::__detail::_StateSeq const&) ()
(gdb) bt
#0 0x00000000004022cc in
std::__detail::_StateSeq::_StateSeq(std::__detail::_StateSeq const&) ()
#1 0x0000000000404a05 in std::__detail::_Compiler<char const*,
std::regex_traits<char> >::_M_disjunction() ()
#2 0x0000000000407901 in std::__detail::_Compiler<char const*,
std::regex_traits<char> >::_M_atom() ()
#3 0x00000000004069cb in std::__detail::_Compiler<char const*,
std::regex_traits<char> >::_M_term() ()
#4 0x000000000040567e in std::__detail::_Compiler<char const*,
std::regex_traits<char> >::_M_alternative() ()
#5 0x00000000004049c8 in std::__detail::_Compiler<char const*,
std::regex_traits<char> >::_M_disjunction() ()
#6 0x0000000000403ef2 in std::__detail::_Compiler<char const*,
std::regex_traits<char> >::_Compiler(char const* const&, char const* const&,
std::regex_traits<char>&, unsigned int) ()
#7 0x0000000000403297 in std::shared_ptr<std::__detail::_Automaton>
std::__detail::__compile<char const*, std::regex_traits<char> >(char const*
const&, char const* const&, std::regex_traits<char>&, unsigned int) ()
#8 0x0000000000402abb in std::basic_regex<char, std::regex_traits<char>
::basic_regex(char const*, unsigned int) ()
#9 0x0000000000401767 in main ()
(gdb) x/i $rip
=> 0x4022cc <_ZNSt8__detail9_StateSeqC2ERKS0_+16>: mov (%rax),%rdx
(gdb) x/x $rax
0xffffffffffffffe8: Cannot access memory at address 0xffffffffffffffe8
(gdb) x/x $rdx
0xffffffffffffffe8: Cannot access memory at address 0xffffffffffffffe8
------------

BR,
Maksymilian
http://cxsecurity.com/
redi at gcc dot gnu.org
2014-06-23 08:13:44 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

Jonathan Wakely <redi at gcc dot gnu.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |INVALID
Severity|major |normal

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
*sigh* <regex> is not implemented prior to GCC 4.9.0, I thought the whole world
was aware of that by now.
max at cert dot cx
2014-06-24 19:37:40 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #2 from Maksymilian A <max at cert dot cx> ---
Sorry for mistake.
Could you check this again ?

***@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/home/cx/gcc49/bin/g++
COLLECT_LTO_WRAPPER=/home/cx/gcc49/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /home/cx/gcc49/source/gcc-4.9.0/configure --disable-multilib
--prefix=/home/cx/gcc49
Thread model: posix
gcc version 4.9.0 (GCC)
***@cx:~/REstd11/kozak5$ cat c11re.c
#include <iostream>
#include <string>
#include <regex>

using namespace std;

int main (int argc, char *argv[])
{
if (std::regex_match ("GNUj", std::regex(argv[1]) ))
std::cout << "ELO\n";
return 0;
}
***@cx:~/REstd11/kozak5$ ~/gcc49/bin/g++ -o c11re c11re.c -std=c++11
***@cx:~/REstd11/kozak5$ ./c11re '((x|'
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Przerwane (core dumped)
***@cx:~/REstd11/kozak5$ ./c11re '((.*)()?*{100})'
Naruszenie ochrony pamięci (core dumped)
***@cx:~/REstd11/kozak5$

(gdb) r '((.*)()?*{100})'
Starting program: /home/cx/REstd11/kozak5/./c11re '((.*)()?*{100})'

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402f15 in std::_Bit_reference::operator bool() const
()
(gdb) x/i $rip
=> 0x402f15 <_ZNKSt14_Bit_referencecvbEv+15>:
mov (%rax),%rdx
(gdb) i r
rax 0x200000000063a128 2305843009220223272
rbx 0xffffffffffffffff -1
rcx 0x200000000063a128 2305843009220223272
rdx 0x8000000000000000 -9223372036854775808
rsi 0x200000000063a128 2305843009220223272
rdi 0x7fffffffd350 140737488343888
rbp 0x7fffffffd310 0x7fffffffd310
rsp 0x7fffffffd310 0x7fffffffd310
r8 0x2 2
r9 0x20 32
r10 0x3 3
r11 0x7ffff75b5798 140737343346584
r12 0x402880 4204672
r13 0x7fffffffe260 140737488347744
r14 0x0 0
r15 0x0 0
=> 0x402f15 <_ZNKSt14_Bit_referencecvbEv+15>:
rip 0x402f15 0x402f15 <std::_Bit_reference::operator bool()
const+15>

...

#0 0x0000000000402f15 in std::_Bit_reference::operator bool() const ()
#1 0x000000000040a1bc in void std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
false>::_M_dfs<true>(long) ()
#2 0x000000000040a275 in void std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
false>::_M_dfs<true>(long) ()
#3 0x000000000040a493 in void std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
false>::_M_dfs<true>(long) ()
#4 0x000000000040a28f in void std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
false>::_M_dfs<true>(long) ()
#5 0x000000000040a3a5 in void std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
false>::_M_dfs<true>(long) ()
#6 0x000000000040a3a5 in void std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits---Type <return>
to continue, or q <return> to quit---
<char>, false>::_M_dfs<true>(long) ()
#7 0x000000000040a3a5 in void std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
false>::_M_dfs<true>(long) ()
#8 0x0000000000407ee0 in bool std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
false>::_M_main<true>() ()
#9 0x0000000000406172 in std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
false>::_M_match() ()
#10 0x0000000000404cf5 in bool std::__detail::__regex_algo_impl<char const*,
std::allocator<std::sub_match<char const*> >, char, std::regex_traits<char>,
(std::__detail::_RegexExecutorPolicy)0, true>(char const*, char const*,
std::match_results<char const*, std::allocator<std::sub_match<char const*> >
&, std::basic_regex<char, std::regex_traits<char> > const&,
std::regex_constants::match_flag_type) ()
#11 0x000000000040449e in bool std::regex_match<char const*,
std::allocator<std::sub_match<char const*> >, char, std::regex_traits<c---Type
<return> to continue, or q <return> to quit---
har> >(char const*, char const*, std::match_results<char const*,
std::allocator<std::sub_match<char const*> > >&, std::basic_regex<char,
std::regex_traits<char> > const&, std::regex_constants::match_flag_type) ()
#12 0x000000000040405c in bool std::regex_match<char const*, char,
std::regex_traits<char> >(char const*, char const*, std::basic_regex<char,
std::regex_traits<char> > const&, std::regex_constants::match_flag_type) ()
#13 0x0000000000403d4c in bool std::regex_match<char, std::regex_traits<char>
(char const*, std::basic_regex<char, std::regex_traits<char> > const&,
std::regex_constants::match_flag_type) ()
#14 0x0000000000402a5f in main ()
redi at gcc dot gnu.org
2014-06-25 09:54:19 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

Jonathan Wakely <redi at gcc dot gnu.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |NEW
Last reconfirmed| |2014-06-25
Resolution|INVALID |---
Summary|C11 regex memory corruption |C++11 regex memory
| |corruption
Ever confirmed|0 |1

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Maksymilian A from comment #2)
Post by max at cert dot cx
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
Przerwane (core dumped)
I think this is by design.
Post by max at cert dot cx
Naruszenie ochrony pamięci (core dumped)
That's a bug.

(It would be helpful if you didn't put C11 in the subject, this has nothing to
do with C)
redi at gcc dot gnu.org
2014-06-25 18:01:15 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

Jonathan Wakely <redi at gcc dot gnu.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |timshen at gcc dot gnu.org

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
That segfault is already fixed on trunk, although possibly just latent
max at cert dot cx
2014-06-25 19:15:39 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #5 from Maksymilian Arciemowicz <max at cert dot cx> ---
Thanks for feedback. I'm going verify this on trunk
max at cert dot cx
2014-06-25 23:31:34 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #6 from Maksymilian Arciemowicz <max at cert dot cx> ---
@Jonathan: true but check this case

***@cx:~/REtrunk/kozak5$ ~/gccTRUNK/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/home/cx/gccTRUNK/bin/g++
COLLECT_LTO_WRAPPER=/home/cx/gccTRUNK/libexec/gcc/x86_64-unknown-linux-gnu/4.10.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../trunk/configure --prefix=/home/cx/gccTRUNK/
--disable-multilib
Thread model: posix
gcc version 4.10.0 20140625 (experimental) (GCC)
***@cx:~/REtrunk/kozak5$ ~/gccTRUNK/bin/g++ c11re.c -o c11re -std=c++11
***@cx:~/REtrunk/kozak5$ ./c11re '(.*{100}{100}{100})'
Naruszenie ochrony pamięci (core dumped)

Program received signal SIGSEGV, Segmentation fault.
0x000000000041014e in std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
true>::_State_info<std::integral_constant<bool, true>,
std::vector<std::sub_match<char const*>, std::allocator<std::sub_match<char
const*> > > >::_M_visited(long) const ()

BR,
Maksymilian
http://cxsecurity.com/
timshen at gcc dot gnu.org
2014-06-26 06:14:59 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #7 from Tim Shen <timshen at gcc dot gnu.org> ---
"(.*{100}{100}{100})" seems to be a stack overflow. It's because regex executor
uses recursion. It could be fixed (not segfault but memory exhaustion) by using
a std::stack and simulate recursion; IMH, however, directly throwing
regex_error::error_space is the right thing here to do.
max at cert dot cx
2014-06-26 07:11:19 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #8 from Maksymilian Arciemowicz <max at cert dot cx> ---
(In reply to Tim Shen from comment #7)
Post by timshen at gcc dot gnu.org
"(.*{100}{100}{100})" seems to be a stack overflow. It's because regex
executor uses recursion. It could be fixed (not segfault but memory
exhaustion) by using a std::stack and simulate recursion; IMH, however,
directly throwing regex_error::error_space is the right thing here to do.
Yeap it's stack overflow. Why regex_error::error_space? Not better
regex_error::error_stack?
timshen at gcc dot gnu.org
2014-06-26 07:16:59 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #9 from Tim Shen <timshen at gcc dot gnu.org> ---
(In reply to Maksymilian Arciemowicz from comment #8)
Post by max at cert dot cx
(In reply to Tim Shen from comment #7)
Post by timshen at gcc dot gnu.org
"(.*{100}{100}{100})" seems to be a stack overflow. It's because regex
executor uses recursion. It could be fixed (not segfault but memory
exhaustion) by using a std::stack and simulate recursion; IMH, however,
directly throwing regex_error::error_space is the right thing here to do.
Yeap it's stack overflow. Why regex_error::error_space? Not better
regex_error::error_stack?
Sorry for not clarify that: I prefer throwing error_space when constructing
(complaining about too many states) instead of throwing error_stack when
matching. To solve the latter problem, as I said, we can use a std::stack or
something to avoid a stack overflow.
max at cert dot cx
2014-06-26 07:59:32 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #10 from Maksymilian Arciemowicz <max at cert dot cx> ---
There is also one other alternative like this

http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/regex/regcomp.c.diff?r1=1.29&r2=1.30&f=h
timshen at gcc dot gnu.org
2014-07-01 03:06:18 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #11 from Tim Shen <timshen at gcc dot gnu.org> ---
Author: timshen
Date: Tue Jul 1 03:05:45 2014
New Revision: 212185

URL: https://gcc.gnu.org/viewcvs?rev=212185&root=gcc&view=rev
Log:
PR libstdc++/61061
PR libstdc++/61582
* include/bits/regex_automaton.h (_NFA<>::_M_insert_state): Add
a NFA state limit. If it's exceeded, regex_constants::error_space
will be throwed.
* include/bits/regex_automaton.tcc (_StateSeq<>::_M_clone): Use
map (which is sparse) instead of vector. This reduce n times clones'
cost from O(n^2) to O(n).
* include/std/regex: Add map dependency.
* testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc: New
testcase.


Added:

trunk/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc
Modified:
trunk/libstdc++-v3/ChangeLog
trunk/libstdc++-v3/include/bits/regex_automaton.h
trunk/libstdc++-v3/include/bits/regex_automaton.tcc
trunk/libstdc++-v3/include/std/regex
max at cert dot cx
2014-07-01 18:54:41 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #12 from Maksymilian Arciemowicz <max at cert dot cx> ---
Ups. Check this (.*{100}{300})

gcc version 4.10.0 20140701 (experimental) (GCC)
--------
Starting program: /home/cx/REtrunk/kozak5/t3 '(.*{100}{300})'

Program received signal SIGSEGV, Segmentation fault.
0x000000000040c22a in std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
true>::_M_dfs(std::__detail::_Executor<char const*,
std::allocator<std::sub_match<char const*> >, std::regex_traits<char>,
true>::_Match_mode, long) ()
--------
max at cert dot cx
2014-07-04 10:25:22 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #13 from Maksymilian Arciemowicz <max at cert dot cx> ---
@Tim: do you need help?
timshen at gcc dot gnu.org
2014-07-04 18:00:15 UTC
Permalink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582

--- Comment #14 from Tim Shen <timshen at gcc dot gnu.org> ---
(In reply to Maksymilian Arciemowicz from comment #13)
Post by max at cert dot cx
@Tim: do you need help?
This is what I'm going to do:
https://gcc.gnu.org/ml/libstdc++/2014-07/msg00008.html

Please send to libstdc++ ml if you have any ideas.

Loading...