Eis-D-Z commented on code in PR #1715:
URL: https://github.com/apache/libcloud/pull/1715#discussion_r912810371


##########
contrib/scrape-ec2-prices.py:
##########
@@ -161,82 +48,155 @@
     "extra-large",
 ]
 
-RE_NUMERIC_OTHER = re.compile(r"(?:([0-9]+)|([-A-Z_a-z]+)|([^-0-9A-Z_a-z]+))")
-
-BASE_PATH = os.path.dirname(os.path.abspath(__file__))
-PRICING_FILE_PATH = os.path.join(BASE_PATH, "../libcloud/data/pricing.json")
-PRICING_FILE_PATH = os.path.abspath(PRICING_FILE_PATH)
-
 
+def download_json():
+    response = requests.get(URL, stream=True)
+    try:
+        return open(TEMPFILE, "r")
+    except IOError:
+        with open(TEMPFILE, "wb") as fo:
+            for chunk in response.iter_content(chunk_size=2**20):
+                if chunk:
+                    fo.write(chunk)
+    return open(TEMPFILE, "r")
+
+
+def get_json():
+    try:
+        return open(TEMPFILE, "r")
+    except IOError:
+        return download_json()
+
+
+# Prices and sizes are in different dicts and categorized by sku
+def get_all_prices():
+    # return variable
+    # prices = {sku : {price: int, unit: string}}
+    prices = {}
+    current_sku = ""
+    current_rate_code = ""
+    amazonEC2_offer_code = "JRTCKXETXF"

Review Comment:
   The file has two big keys, "products" and "terms". Under "products" are the 
technical details of each sku. 
   Under "terms" there are pricing data. A sample entry is like this:
   { "terms": { "onDemand": { sku: { sku: {"JRTCKXETXF": { "priceDimensions": { 
rateCode: {"unit": "$", "value": "0.17" } } } } } } }
   
   It is fortunate that this ijson parser has been used since it makes it easy 
to find the "rateCode" number (with ijson you parse every line and it tells you 
if what you encounter is a key or a value). Otherwise it would be more 
difficult. I guess that amazon has a way of determining the rateCode based on 
the sku or some other data but I haven't found their relation.
   
   The JRTCKXETXF is the same for all EC2 items and I named it 
amazonEC2_offer_code.
   
   Thus what the code does is parse the "terms" dict line by like, when it 
encounters a dict key under terms["OnDemand"] set it as the current_sku. Then 
search the next lines for the rateCode of the item and after that finally get 
the price and the unit of the sku. Save those two in the dict to be returned 
under the sku. ({sku1: {"price": "xx", "unit": "$"}, sku2:{... }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@libcloud.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to