Author Topic: CPF Download from new https server (Read 25357 times)

danielhampf · « **on:** January 16, 2020, 09:28:24 AM »

Hello all,
you've probably seen the emails about the NASA CPF server changing to https. I now adapted our python script in order to download the CPFs from that new server. I used the requests package rather than curl as I suggested by CDDIS examples. It's a bit tricky since they decided to use a login system which is not very suitable for automated download. Anyway, it works now. I post the code here for your convenience.
A few notes to the code. It's made to work with python3 and the python requests package installed. You need to obtain Earth Data login credentials first and then insert them here in the code (line ~37 / 38). The code will download the V1 CPFs, if you want the new ones, you have to change the target folder. Please note that the script will delete all existing files in the local target folder. If you use the function in a graphical interface, your callback function can update a progress bar etc. If it returns "false", the downloads will be cancelled. The function will return the number of downloaded CPFs.
The script is not entirely by me, it also uses code I found on the Earth Data site.
Comments and cheers welcome, as always.
Daniel

import os
from glob import glob
import requests

class SessionWithHeaderRedirection(requests.Session):
AUTH_HOST = 'urs.earthdata.nasa.gov'
def __init__(self, username, password):
super().__init__()
self.auth = (username, password)

# Overrides from the library to keep headers when redirected to or from
# the NASA auth host.
def rebuild_auth(self, prepared_request, response):
headers = prepared_request.headers
url = prepared_request.url

if 'Authorization' in headers:
original_parsed = requests.utils.urlparse(response.request.url)
redirect_parsed = requests.utils.urlparse(url)

if (original_parsed.hostname != redirect_parsed.hostname) and \
redirect_parsed.hostname != self.AUTH_HOST and \
original_parsed.hostname != self.AUTH_HOST:
del headers['Authorization']
return

def download_CPFs_ssl(local_dir, update_callback):
# remove old files
if not os.path.isdir(local_dir):
os.mkdir(local_dir)
filelist = glob(os.path.join(local_dir, "*"))
for f in filelist:
os.remove(f)

# define urls and credentials
url = "https://cddis.nasa.gov/archive/slr/cpf_predicts/current/"
username = ""
password = ""

# make the request to the web site to get filenames
session = SessionWithHeaderRedirection(username, password)
response = session.get(url + "*?list")

# check if response is okay
if response.status_code is not requests.codes.ok:
log.error("Could not connect to CPF server. HTML code: %d" % (response.status_code))
return False

# parse the response and make list of filenames
lines = response.text.split('\n')
filenames = []
for line in lines:
if line.startswith("#"): # comment lines
continue
if line.strip() == "": # empty lines
continue
filename, size = line.split()
filenames.append(filename)

# download each file and save it
excl_list = ["MD5SUMS", "SHA512SUMS", "index.html"]
for i, filename in enumerate(filenames):
if filename in excl_list:
continue
filepath = url + filename
response = session.post(filepath)
with open(os.path.join(local_dir, filename), "wb") as f_out:
f_out.write(response.content)
keep_running = update_callback(100. * i / len(filenames))
if not keep_running:
break
return i

if __name__ == "__main__":
def print_progress(p):
print(p)
return True

download_CPFs_ssl("./CPF/", print_progress)

Toshimichi Otsubo · « **Reply #1 on:** February 06, 2020, 11:08:14 AM »

Hi Daniel and all,

I tried a different way - their "secondary" option ftp-ssl seemed easier in our environment.

Here is how I do using 'lftp' (command-line ftp client):

$ lftp -f cddis-cpf.lftp

where cddis-cpf.lftp (or any file name) is a text file like:
---

Code: [Select]

open -d -u anonymous,<YOUR EMAIL ADDRESS> -e 'set ftp:ssl-force true' gdc.cddis.eosdis.nasa.gov
set xfer:clobber true
cd /pub/slr/cpf_predicts/current
lcd <YOUR LOCAL DIRECTORY>
mirror -e -x MD5SUMS -x SHA512SUMS
exit

---

(Replace the two "< >" parts.)

Then, the local directory will contain the same files as the "/pub/slr/cpf_predicts/current" directory of CDDIS. The lftp mirror downloads only newer files, so it will be quicker for the second time.

Toshi

Author Topic: CPF Download from new https server (Read 25357 times)

danielhampf

CPF Download from new https server

Toshimichi Otsubo

Re: CPF Download from new https server