On Sun, Jan 14, 2018 at 4:33 PM, Tong Sun <suntong...@gmail.com> wrote: > Not being able to do that, I have to save all the Token() info to different > variables, then pass all those variables to my function separately, instead > of passing merely a single tokenizer.
Instead of using different variables, I'd just pass the Token itself around. It should already contain everything you need. For example, if t is a variable of type html.Token, and t.Type is html.StartTagToken, html.EndTagToken or html.SelfClosingTagToken, then t.Data is the tag name, such as "script". It's a string-typed field, not a method that returns a string, so there's no restrictions like those on calling Tokenizer method multiple times. As an optional, advanced level comment, t.DataAtom will also be a hashed uint32 value of that string, for well known strings. For example, the uint32 constant atom.Script (from the golang.org/x/net/html/atom package) corresponds to a "script" tag. Comparing uint32 values is noticably faster than comparing string values, if you're doing a *lot* of tag name comparisons. For example, I can't remember the exact number, but IIRC, the x/net/html parser (which builds a DOM tree from the token stream) got a 10% or 30% speed boost by comparing atoms instead of strings. On Sun, Jan 14, 2018 at 4:53 PM, Tong Sun <suntong...@gmail.com> wrote: > Actually, found out I only called Token() once: > > https://play.golang.org/p/HtevQ3RbQsi > > reader := strings.NewReader("<div class=\"hello\">SomeText</div>") > tokenizer := html.NewTokenizer(reader) > tokenizer.Next() > fmt.Println(tokenizer.TagName()) > fmt.Println(tokenizer.Token()) Again, from the "EBNF" in the package documentation: ---- In EBNF notation, the valid call sequence per token is: Next {Raw} [ Token | Text | TagName {TagAttr} ] ---- If you're not familiar with EBNF (https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), this means that you can call the Token method (once), or call the TagName method (once), but not call both. > I.e., what I used in the loop the standard way was `TagName()`: > > case html.StartTagToken, html.EndTagToken: > tn, _ := z.TagName() > tag := strings.ToLower(string(tn)) The strings.ToLower call should be unnecessary. As https://godoc.org/golang.org/x/net/html#Tokenizer.TagName says, "TagName returns the *lower-cased* name of a tag token", (emphasis added). > But by the time I need to call Token() within my function (printElmt) to get > the full token info, it's already impossible. As I said earlier in this message, just call Token (the method) once, at the top of the loop, and switch on its Type field, pass the Token (the type) around, etc. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.