On Sun, Jan 14, 2018 at 4:33 PM, Tong Sun <suntong...@gmail.com> wrote:
> Not being able to do that, I have to save all the Token() info to different
> variables, then pass all those variables to my function separately, instead
> of passing merely a single tokenizer.

Instead of using different variables, I'd just pass the Token itself
around. It should already contain everything you need. For example, if
t is a variable of type html.Token, and t.Type is html.StartTagToken,
html.EndTagToken or html.SelfClosingTagToken, then t.Data is the tag
name, such as "script". It's a string-typed field, not a method that
returns a string, so there's no restrictions like those on calling
Tokenizer method multiple times.

As an optional, advanced level comment, t.DataAtom will also be a
hashed uint32 value of that string, for well known strings. For
example, the uint32 constant atom.Script (from the
golang.org/x/net/html/atom package) corresponds to a "script" tag.
Comparing uint32 values is noticably faster than comparing string
values, if you're doing a *lot* of tag name comparisons. For example,
I can't remember the exact number, but IIRC, the x/net/html parser
(which builds a DOM tree from the token stream) got a 10% or 30% speed
boost by comparing atoms instead of strings.

On Sun, Jan 14, 2018 at 4:53 PM, Tong Sun <suntong...@gmail.com> wrote:
>  Actually, found out I only called Token() once:
> https://play.golang.org/p/HtevQ3RbQsi
>  reader := strings.NewReader("<div class=\"hello\">SomeText</div>")
>  tokenizer := html.NewTokenizer(reader)
>  tokenizer.Next()
>  fmt.Println(tokenizer.TagName())
>  fmt.Println(tokenizer.Token())

Again, from the "EBNF" in the package documentation:

In EBNF notation, the valid call sequence per token is:

Next {Raw} [ Token | Text | TagName {TagAttr} ]

If you're not familiar with EBNF
this means that you can call the Token method (once), or call the
TagName method (once), but not call both.

> I.e., what I used in the loop the standard way was `TagName()`:
>  case html.StartTagToken, html.EndTagToken:
>  tn, _ := z.TagName()
>  tag := strings.ToLower(string(tn))

The strings.ToLower call should be unnecessary. As
https://godoc.org/golang.org/x/net/html#Tokenizer.TagName says,
"TagName returns the *lower-cased* name of a tag token", (emphasis

> But by the time I need to call Token() within my function (printElmt) to get
> the full token info, it's already impossible.

As I said earlier in this message, just call Token (the method) once,
at the top of the loop, and switch on its Type field, pass the Token
(the type) around, etc.

You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to