Networks and Engineering Standing Committee Forum

Questions to the NESC Forum => Data and Software Questions => Topic started by: danielhampf on January 16, 2020, 09:28:24 AM

Title: CPF Download from new https server
Post by: danielhampf on January 16, 2020, 09:28:24 AM
Hello all,
you've probably seen the emails about the NASA CPF server changing to https. I now adapted our python script in order to download the CPFs from that new server. I used the requests package rather than curl as I suggested by CDDIS examples. It's a bit tricky since they decided to use a login system which is not very suitable for automated download. Anyway, it works now. I post the code here for your convenience.
A few notes to the code. It's made to work with python3 and the python requests package installed. You need to obtain Earth Data login credentials first and then insert them here in the code (line ~37 / 38). The code will download the V1 CPFs, if you want the new ones, you have to change the target folder. Please note that the script will delete all existing files in the local target folder. If you use the function in a graphical interface, your callback function can update a progress bar etc. If it returns "false", the downloads will be cancelled. The function will return the number of downloaded CPFs.
The script is not entirely by me, it also uses code I found on the Earth Data site.
Comments and cheers welcome, as always.
Daniel




import os
from glob import glob
import requests

class SessionWithHeaderRedirection(requests.Session):
    AUTH_HOST = 'urs.earthdata.nasa.gov'
    def __init__(self, username, password):
        super().__init__()
        self.auth = (username, password)   
   
   # Overrides from the library to keep headers when redirected to or from
   # the NASA auth host.
    def rebuild_auth(self, prepared_request, response):
        headers = prepared_request.headers
        url = prepared_request.url
 
        if 'Authorization' in headers:
            original_parsed = requests.utils.urlparse(response.request.url)
            redirect_parsed = requests.utils.urlparse(url)
 
            if (original_parsed.hostname != redirect_parsed.hostname) and \
                    redirect_parsed.hostname != self.AUTH_HOST and \
                    original_parsed.hostname != self.AUTH_HOST:
                del headers['Authorization']
        return   
         
def download_CPFs_ssl(local_dir, update_callback):
    # remove old files
    if not os.path.isdir(local_dir):
        os.mkdir(local_dir)
    filelist = glob(os.path.join(local_dir, "*"))
    for f in filelist:
        os.remove(f)

    # define urls and credentials
    url = "https://cddis.nasa.gov/archive/slr/cpf_predicts/current/"
    username = ""
    password = ""

    # make the request to the web site to get filenames
    session = SessionWithHeaderRedirection(username, password)       
    response = session.get(url + "*?list")
   
    # check if response is okay
    if response.status_code is not requests.codes.ok:
        log.error("Could not connect to CPF server. HTML code: %d" % (response.status_code))
        return False   
   
    # parse the response and make list of filenames
    lines = response.text.split('\n')
    filenames = []
    for line in lines:
        if line.startswith("#"):    # comment lines
            continue
        if line.strip() == "":      # empty lines
            continue
        filename, size = line.split()
        filenames.append(filename)       

    # download each file and save it
    excl_list = ["MD5SUMS", "SHA512SUMS", "index.html"]
    for i, filename in enumerate(filenames):
        if filename in excl_list:
            continue
        filepath = url + filename
        response = session.post(filepath)
        with open(os.path.join(local_dir, filename), "wb") as f_out:
            f_out.write(response.content)
        keep_running = update_callback(100. * i / len(filenames))
        if not keep_running:
            break
    return i
       
       
       
if __name__ == "__main__":           
    def print_progress(p):
        print(p)
        return True
               
    download_CPFs_ssl("./CPF/", print_progress)
           
Title: Re: CPF Download from new https server
Post by: Toshimichi Otsubo on February 06, 2020, 11:08:14 AM
Hi Daniel and all,

I tried a different way - their "secondary" option ftp-ssl seemed easier in our environment.

Here is how I do using 'lftp' (command-line ftp client):

$ lftp -f cddis-cpf.lftp

where cddis-cpf.lftp (or any file name) is a text file like:
---
Code: [Select]
open -d -u anonymous,<YOUR EMAIL ADDRESS> -e 'set ftp:ssl-force true' gdc.cddis.eosdis.nasa.gov
set xfer:clobber true
cd /pub/slr/cpf_predicts/current
lcd <YOUR LOCAL DIRECTORY>
mirror -e -x MD5SUMS -x SHA512SUMS
exit
---

(Replace the two "< >" parts.)

Then, the local directory will contain the same files as the "/pub/slr/cpf_predicts/current" directory of CDDIS.  The lftp mirror downloads only newer files, so it will be quicker for the second time.

Toshi