clayborg added a comment.
I am trying to switch to using llvm::sys::path::remove_dots() and we will see
where we end up. By switching to this it means:
"./foo.c" --> "foo.c"
"./foo/bar.c" --> "foo/bar.c"
This makes is easier to see if a relative path matches another FileSpec since
we don't have to remove the "./" from the beginning of the path.
================
Comment at: source/Breakpoint/BreakpointResolverFileLine.cpp:131
+ while (relative_path.consume_front("."))
+ /* Do nothing. */;
+ }
----------------
I will be removing this after I switch to using llvm::sys::path::remove_dots()
instead of the Normalize() I converted.
================
Comment at: source/Utility/FileSpec.cpp:107
+ for (auto i = path.find('/'); i != llvm::StringRef::npos;
+ i = path.find('/', i + 1)) {
+ const auto nextChar = safeCharAtIndex(path, i+1);
----------------
labath wrote:
> aprantl wrote:
> > clayborg wrote:
> > > clayborg wrote:
> > > > aprantl wrote:
> > > > > clayborg wrote:
> > > > > > I am fine switching to using the llvm functions for removing, this
> > > > > > is only detecting if any normalization needs to happen. If there is
> > > > > > an equivalent LLVM function that will only run through the string
> > > > > > one time to detect all needs for normalization (no multiple passes
> > > > > > looking for "..", then for "." etc like we used to have), I would
> > > > > > be happy to use it, but there doesn't seem to be. We are trying to
> > > > > > avoid having to create a "SmallVectorImpl< char >" for all paths if
> > > > > > they don't need fixing here. This function is just iterating
> > > > > > through an llvm::StringRef just to see if normalization needs to
> > > > > > happen. If it does, then we make a SmallVectorImpl< char > and we
> > > > > > fix the path. We can easily use llvm functions for the fixing part.
> > > > > I see, you want to avoid copying the string when it isn't necessary
> > > > > to do so. IMHO the best way to do this is to add a `bool
> > > > > hasDots(const Twine &)` function to llvm::sys::path and add a `bool
> > > > > remove_dot_dot` parameter to `removeDots()`. Since you have already
> > > > > written the unittests, adding the function to LLVM shouldn't be any
> > > > > extra work (I'll help reviewing), and we'll have 100 LoC less to
> > > > > maintain.
> > > > >
> > > > > This way we can avoid having two functions that perform path
> > > > > normalization that superficially look like they might do the same
> > > > > thing, but a year from now nobody can really tell whether they behave
> > > > > exactly the same, and whether a bug found in one implementation
> > > > > should also be fixed in the other one, etc... .
> > > > I want to know if there are redundant slashes as well. I want something
> > > > like "bool llvm::sys::has(const Twine &, bool dots, bool dot_dots, bool
> > > > redundant_slashes)". But I don't see that getting accepted? I just want
> > > > a "bool llvm:sys::path::contains_redundant_stuff(const Twine &)". Not
> > > > sure on the name though. needs_normalization? can_be_shortened?
> > > > Everything in LLVM right now assumes that dots, dot dots, slashes and
> > > > realpath stuff happen in the same function. Not sure how to break that
> > > > up without ruining the performance of the one and only loop that should
> > > > be happening. I like the current approach since it doesn't require
> > > > chopping up the string into an array and iterating through the array.
> > > Also not sure if the LLVM stuff will try to substitute in the current
> > > working directory for "."?
> > I just read the source code of remove_dots()
> > (http://llvm.org/doxygen/Path_8cpp_source.html#l00699) and if I'm not
> > mistaken, it actually also removes double-separators as a side-effect,
> > since it iterates over the path components of the string as it constructs
> > the copy. It also seems to avoid the copy operation if it isn't necessary.
> >
> > Could you take another look? Perhaps I'm missing something, (or perhaps we
> > can just turn this into a small addition to remove_dots).
> >
> > > Everything in LLVM right now assumes that dots, dot dots, slashes and
> > > realpath stuff happen in the same function.
> > I'm not sure I understand.
> FWIW, this could be implemented in a much simpler way, while still making
> sure we run through the string only once. I'm thinking of something like:
> ```
> bool first = true;
> for(it = llvm::sys::path::begin(path, style), end =
> llvm::sys::path::end(path); it != end; ++it) {
> if (!first && (*it == "." || *it == "..")) return true;
> first = false;
> }
> return false;
> ```
This will all go away, I am just going to call llvm::sys::path::remove_dots()...
================
Comment at: source/Utility/FileSpec.cpp:187
+ continue; // Skip '.' unless it is the first thing
+ }
+ else if (component != "..") {
----------------
Yes we would need to remove this. Again, I will switch to using
llvm::sys::path::remove_dots and we can deal with the fallout there,.
================
Comment at: source/Utility/FileSpec.cpp:406
+ m_directory.Clear();
+ }
}
----------------
> they resolve to the same file in a virtual filesystem, where all referenced
> path components exist and are directories
Normalizing happens after resolving and if we don't resolve a path, we have no
way to know what is a directory and what isn't. We will be setting breakpoints
for remote targets quite a bit in LLDB and ww can't assume or stat anything in
a path. So I would say FileSpec's are equivalent if the relative paths match
all components.
> Now the (c) part would imply that foo/.. should normalize to ./. and not .,
> which is a bit odd, but it is consistent with our stated intention of
> preserving directory information. If we do not want to have ./. here, then we
> need to come up with a different definition of what it means to be
> "normalized".
I guess we could make the m_directory contain "." and m_filename contain
nothing for the "./." case. It doesn 't make sense to have "." in both places.
================
Comment at: source/Utility/FileSpec.cpp:857
+ !IsPathSeparator(components[i].back(), syntax) &&
+ (result.empty() || !IsPathSeparator(result.back(), syntax)))
result += GetPreferredPathSeparator(syntax);
----------------
labath wrote:
> Why is this necessary? `result` can't be empty because we have just appended
> something to it (and we've checked that `components[i]` is non-empty).
Because if you don't check this joining {"/', "foo.txt"} will result in
"//foo.txt" which is wrong,.
https://reviews.llvm.org/D45977
_______________________________________________
lldb-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits