zturner created this revision.
zturner added reviewers: clayborg, jingham, labath.
Herald added subscribers: JDevlieghere, aprantl.

When we evaluate a variable name as part of an expression, we run a regex 
against it so break it apart.  The intent seems to be that we want to get the 
main variable name (i.e. the part of the user input which is both necessary and 
sufficient to find the record in debug info), but leave the rest of the 
expression alone (for example the variable could be an instance of a class, and 
you have written `variable.member`.

But I believe the current regex to be too restrictive.  For example, it 
disallows variable templates, so for example if the user writes `variable<int>` 
we would strip off the `<int>`, but this is absolutely necessary to find the 
proper record in the debug info.  It also doesn't allow things like ` 
'anonymous namespace'::variable` which under the Microsoft ABI is a valid 
thing.  Nor does it permit spaces, so we couldn't have something like `foo<long 
double>` (assuming we first fixed the template issue).

Rather than try to accurately construct a regex for the set of all possible 
things that constitute a variable name, it seems easier to construct a regex to 
match all the things that **do not** constitute a variable name.  Specifically, 
an occurrence of the . operator or -> operator, since that's what ultimately 
defines the beginning of a sub-expression.

So this changes the regex accordingly.


https://reviews.llvm.org/D54454

Files:
  lldb/source/Symbol/Variable.cpp


Index: lldb/source/Symbol/Variable.cpp
===================================================================
--- lldb/source/Symbol/Variable.cpp
+++ lldb/source/Symbol/Variable.cpp
@@ -383,8 +383,12 @@
   } break;
 
   default: {
-    static RegularExpression g_regex(
-        llvm::StringRef("^([A-Za-z_:][A-Za-z_0-9:]*)(.*)"));
+    // A variable name can be something like foo, foo::bar, foo<int>::bar,
+    // ::foo, foo<long double>::bar, and more.  Rather than trying to construct
+    // a perfect regex, which is almost certainly going to lead to some edge
+    // cases that we don't handle, let's just take everything until the first
+    // . operator or -> operator.
+    static RegularExpression g_regex("^([^.-]*)(.*)");
     RegularExpression::Match regex_match(1);
     std::string variable_name;
     variable_list.Clear();


Index: lldb/source/Symbol/Variable.cpp
===================================================================
--- lldb/source/Symbol/Variable.cpp
+++ lldb/source/Symbol/Variable.cpp
@@ -383,8 +383,12 @@
   } break;
 
   default: {
-    static RegularExpression g_regex(
-        llvm::StringRef("^([A-Za-z_:][A-Za-z_0-9:]*)(.*)"));
+    // A variable name can be something like foo, foo::bar, foo<int>::bar,
+    // ::foo, foo<long double>::bar, and more.  Rather than trying to construct
+    // a perfect regex, which is almost certainly going to lead to some edge
+    // cases that we don't handle, let's just take everything until the first
+    // . operator or -> operator.
+    static RegularExpression g_regex("^([^.-]*)(.*)");
     RegularExpression::Match regex_match(1);
     std::string variable_name;
     variable_list.Clear();
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to