Re: [racket-users] xml library clarification - """ symbol parsing

Kira Fri, 22 Nov 2019 19:15:55 -0800

I just cannot understand how to solve XML related problems by using this 
library.
Perhaps there is lack of examples, and no description of functions purpose.
And from bare description I am failing to imagine practical use.


For example, why (source) struct exists? And how I can use it?

Why there functions (read-xml/document [in]) and (read-xml/element [in]) 
And how i can use them?
As I mentioned earlier, my guess was that I can process XML sequentially by 
using them in tandem, but it seems this guess was wrong.

What is the pattern for navigating inside (element) structure? For example 
getting to ROOT->tagA->tagB ?
And I cannot go from tagB to tagA, rigth?

Lets assume such XML:
<ROOT>
  <A>
    <B1>test</B1>
    <B2>&quot;test qoute&quot;</B2>
  </A>
  <A>
    <B1>test2</B1>
    <B2>&quot;test &quot;qoute2&quot;&quot;</B2>
  </A>
</ROOT>




I want to transform this into list of struct (data b1 b2)

And perhaps do this in sequential manner if i need to parse 10 millions of 
A tags.

I tried to use (se-path*/list), like this:
(define rawmxl "<ROOT><A><B1>test</B1><B2>&quot;test 
qoute&quot;</B2></A><A><B1>test2</B1><B2>&quot;test 
&quotqoute2&quot&quot;</B2></A></ROOT>")
(define xexpr (string->xexpr rawmxl))
(se-path*/list '(A) xexpr)




But this gives me plain list:
'((B1 () "test") (B2 () "\"" "test qoute" "\"") (B1 () "test2") (B2 () "\"" 
"test " "\"" "qoute2" "\"" "\""))


Without distinction of what A tag content is where.
So i cannot reason about it.

And (se-path*/list '(A B2) xexpr) gives me:
'("\"" "test qoute" "\"" "\"" "test " "\"" "qoute2" "\"" "\"")


so this is even worse.

And it will be great if a can get 1 list from the beginning, because I have 
20 millions of this records.

So now I am moved to (match) solution.
For example:
(match xexpr
  [(list 'ROOT '()
         (list 'A '()
               (list 'B1 '() b1)
               (list 'B2 '() b2 __1)) __1) (list b1 b2)]
  [_ 'empty])




Now I am getting somewhere, but I get 2 separate lists again, and I am not 
sure about memory effectiveness of match (I am assume it is effective).
Can I get one list of structs in this scenario? (without manually looping 
over A tags) and how about nested (match) effectiveness?
And now I begin to feel that parsing raw XML "by hand" won't be much harder 
then solution I am getting now. Perhaps this is due to intrinsic nature of 
XML itself?



пятница, 22 ноября 2019 г., 17:10:19 UTC+2 пользователь Sage Gerard написал:
>
> I'm interested in this quote:
>
>  [...] creating a huge problems with even simple XML parsing. (I am 
> basically battling XML lib all day already to do most simple tasks)
>
>
> I think that when you asked about why the xml collection behaves the way 
> it does, the conversation turned away from your experience.
>
> Could you elaborate on the specifics of what you are trying to do in one 
> of your projects so that we can see your pain points in context?
>
> *~slg*
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/f64331d0-374a-43ea-8a59-18c7e4457bc5%40googlegroups.com.

Re: [racket-users] xml library clarification - """ symbol parsing

Reply via email to