Hello all,
you've probably seen the emails about the NASA CPF server changing to https. I now adapted our python script in order to download the CPFs from that new server. I used the requests package rather than curl as I suggested by CDDIS examples. It's a bit tricky since they decided to use a login system which is not very suitable for automated download. Anyway, it works now. I post the code here for your convenience.
A few notes to the code. It's made to work with python3 and the python requests package installed. You need to obtain Earth Data login credentials first and then insert them here in the code (line ~37 / 38). The code will download the V1 CPFs, if you want the new ones, you have to change the target folder. Please note that the script will delete all existing files in the local target folder. If you use the function in a graphical interface, your callback function can update a progress bar etc. If it returns "false", the downloads will be cancelled. The function will return the number of downloaded CPFs.
The script is not entirely by me, it also uses code I found on the Earth Data site.
Comments and cheers welcome, as always.
Daniel
import os
from glob import glob
import requests
class SessionWithHeaderRedirection(requests.Session):
AUTH_HOST = 'urs.earthdata.nasa.gov'
def __init__(self, username, password):
super().__init__()
self.auth = (username, password)
# Overrides from the library to keep headers when redirected to or from
# the NASA auth host.
def rebuild_auth(self, prepared_request, response):
headers = prepared_request.headers
url = prepared_request.url
if 'Authorization' in headers:
original_parsed = requests.utils.urlparse(response.request.url)
redirect_parsed = requests.utils.urlparse(url)
if (original_parsed.hostname != redirect_parsed.hostname) and \
redirect_parsed.hostname != self.AUTH_HOST and \
original_parsed.hostname != self.AUTH_HOST:
del headers['Authorization']
return
def download_CPFs_ssl(local_dir, update_callback):
# remove old files
if not os.path.isdir(local_dir):
os.mkdir(local_dir)
filelist = glob(os.path.join(local_dir, "*"))
for f in filelist:
os.remove(f)
# define urls and credentials
url = "https://cddis.nasa.gov/archive/slr/cpf_predicts/current/"
username = ""
password = ""
# make the request to the web site to get filenames
session = SessionWithHeaderRedirection(username, password)
response = session.get(url + "*?list")
# check if response is okay
if response.status_code is not requests.codes.ok:
log.error("Could not connect to CPF server. HTML code: %d" % (response.status_code))
return False
# parse the response and make list of filenames
lines = response.text.split('\n')
filenames = []
for line in lines:
if line.startswith("#"): # comment lines
continue
if line.strip() == "": # empty lines
continue
filename, size = line.split()
filenames.append(filename)
# download each file and save it
excl_list = ["MD5SUMS", "SHA512SUMS", "index.html"]
for i, filename in enumerate(filenames):
if filename in excl_list:
continue
filepath = url + filename
response = session.post(filepath)
with open(os.path.join(local_dir, filename), "wb") as f_out:
f_out.write(response.content)
keep_running = update_callback(100. * i / len(filenames))
if not keep_running:
break
return i
if __name__ == "__main__":
def print_progress(p):
print(p)
return True
download_CPFs_ssl("./CPF/", print_progress)