Re: Obtain the query interface url of BCS server.

DFS Tue, 13 Sep 2022 10:07:40 -0700

On 9/13/2022 3:46 AM, [email protected] wrote:

On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote:

On 9/12/2022 5:00 AM, [email protected] wrote:

I want to do the query from with in script based on the interface here [1]. For this 
purpose, the underlying posting URL must be obtained, say, the URL corresponding to 
"ITA Settings" button, so that I can make the corresponding query URL and issue 
the query from the script.


However, I did not find the conversion rules from these buttons to the 
corresponding URL. Any hints for achieving this aim?

[1] 
https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10

Regards,
Zhao

You didn't say what you want to query. Are you trying to download
entire sections of the Bilbao Crystallographic Server?


I am engaged in some related research and need some specific data used by BCS 
server.


What specific data?

Is it available elsewhere?

Maybe the admins will give you access to the data.


I don't think they will provide such convenience to researchers who have no 
cooperative relationship with them.

You can try. Tell the admins what data you want, and ask them for theeasiest way to get it.

* this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
brings up the table of space group symbols.

* choose say #7: Pc

* now click ITA Settings, then choose the last entry "P c 1 1" and it
loads:

https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita


Not only that, but I want to obtain all such URLs programmatically!

You might be able to fool around with that URL and substitute values and
get back the data you want (in HTML) via Python. Do you really want
HTML results?

Hit Ctrl+U to see the source HTML of a webpage

Right-click or hit Ctrl + Shift + C to inspect the individual elements
of the page


For batch operations, all these manual methods are inefficient.

Yes, but I don't think you'll be able to retrieve the URLsprogrammatically. The JavaScript code doesn't put them in the HTMLresult, except for that one I showed you, which seems like a mistake ontheir part.

So you'll have to figure out the search fields, and your python programwill have to cycle through the search values:


Sample from above
gnum   = 007
what   = gp
trmat  = b,-a-c,c
unconv = P c 1 1
from   = ita

wBase   = "https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen";
wGnum   = "?gnum="   + findgnum
wWhat   = "&what="   + findWhat
wTrmat  = "&trmat="  + findTrmat
wUnconv = "&unconv=" + findUnconv
wFrom   = "&from="   + findFrom
webpage  = wBase + wGnum + wWhat + wTrmat + wUnconv + wFrom

Then if that returns a hit, you'll have to parse the resulting HTML andextract the exact data you want.




I did something similar a while back using the requests and lxml libraries
----------------------------------------------------------------
#build url
wBase    = "http://www.usdirectory.com";
wForm    = "/ypr.aspx?fromform=qsearch"
wKeyw    = "&qhqn=" + keyw
wCityZip = "&qc="   + cityzip
wState   = "&qs="   + state
wDist    = "&rg="   + str(miles)
wSort    = "&sb=a2z"  #sort alpha
wPage    = "&ap="   #used with the results page number
webpage  = wBase + wForm + wKeyw + wCityZip + wState + wDist

#open URL
page     = requests.get(webpage)
tree     = html.fromstring(page.content)

#no matches
matches = tree.xpath('//strong/text()')
if passNbr == 1 and ("No results were found" in str(matches)):
        print "No results found for that search"
        exit(0)
----------------------------------------------------------------



2.x code file: https://file.io/VdptORSKh5CN

Best Regards,
Zhao


--
https://mail.python.org/mailman/listinfo/python-list

Re: Obtain the query interface url of BCS server.

Reply via email to