API Docs
Define the base abstractions for BDC-Collectors and Data Collections.
- class bdc_collectors.base.BaseCollection(scene_id: str, collection=None)
Define the collection signature of a Provider.
A collection essentially represents an Item briefing, in other words, a path to the scene identifier. Each collection item has custom path resolver which is resolved by
path(). You may implement this method in your impl class for custom directory location.- compressed_file(collection, prefix=None, path_include_month=False, **kwargs) Path
Retrieve the path to the compressed file L1.
Deprecated since version 0.6.2: This function will be deprecated in the next release. Use
path()instead with own extension method.
- get_assets(collection, path=None, prefix=None, **kwargs) Dict[str, str]
Get a list of extra assets contained in collection path.
- Parameters:
collection – A instance of bdc_catalog.models.Collection context.
path (Path) – Path to seek for the files. Default is
None.prefix (str) – Extra prefix. By default, used the Brazil Data Cube Cluster.
- Returns:
- Dict[str, str]
Map of
asset_nameand theabsolute assetin disk.
- get_files(collection, path=None, prefix=None, **kwargs) Dict[str, Path]
List all files in the collection.
- Returns:
- Dict[str,Path]
Map of found files in resolved path location.
- path(collection, prefix=None, path_include_month=False, **kwargs) Path
Retrieve the relative path to the Collection on Brazil Data Cube cluster.
Note
When prefix is not set, this func tries to get value from env
DATA_DIR.- Parameters:
collection – Instance of BDC Collection model
prefix (str) – Path prefix
- class bdc_collectors.base.BaseProvider
Define the signature of a Data Collector Provider.
- collections_supported()
Retrieve the collections supported by the Provider instance.
- Returns:
- Dict[str, Type[BaseCollection]]
List of Well-known collection in
Provider.
- disconnect()
Disconnect from Data Provider.
- download(scene_id: str, *args, **kwargs) str
Download the scene from remote provider.
- download_all(scenes: List[SceneResult], output: str, **kwargs) Tuple[List[str], List[str], List[str]]
Bulk download scenes from remote provider.
- get_collector(collection: str) Type[BaseCollection]
Retrieve the data type of the given data collection.
- search(query, *args, **kwargs) List[SceneResult]
Search for data set in Provider.
- Parameters:
name. (query - Data set reference)
provider. (*args - Optional parameters order for the given)
provider (**kwargs - Optional keywords for given)
start_date (like)
on. (end_date and so)
- bdc_collectors.base.BulkDownloadResult
Type to identify Bulk download result, which represents Success, scheduled (offline) and failure.
alias of
Tuple[List[str],List[str],List[str]]
- class bdc_collectors.base.SceneParser(scene_id: str)
Define the base parser of Scene identifiers.
- level() str
Retrieve the collection level.
- processing_date() datetime
Retrieve the scene processing date.
- satellite() str
Retrieve the scene satellite origin.
- sensing_date() datetime
Retrieve the scene sensing date.
- source() str
Define meta information for scene_id.
- tile_id() str
Retrieve the tile identifier from scene_id.
- class bdc_collectors.base.SceneResult(scene_id, cloud_cover, **kwargs)
Class structure for Query Scene results.
- property cloud_cover: float
Retrieve the cloud cover metadata.
- property link: str
Retrieve the link of scene id.
Note
It usually points to download url.
- property scene_id: str
Retrieve the scene identifier.
Extension
Define the BDC-Collector flask extension.
- class bdc_collectors.ext.CollectorExtension(app: Flask, **kwargs)
Define the flask extension of BDC-Collectors.
You can initialize this extension as following:
app = Flask(__name__) ext = CollectorExtension(app)
This extension use the Python Entry points specification for load data providers dynamically. By default, we use the entrypoint bdc_collectors.providers as defined in setup.py:
entry_points={ 'bdc_collectors.providers': [ 'google = bdc_collectors.google', 'usgs = bdc_collectors.usgs', 'onda = bdc_collectors.onda', 'scihub = bdc_collectors.scihub' ], },
Each provider is hold in the property state and may be accessed using:
from flask import current_app ext = current_app.extensions['bdc_collector'] ext.get_provider('providerName')
Note
Make sure to initialize the
CollectorExtensionbefore.We also the command line bdc-collector which provides a way to consume those providers in terminal:
bdc-collector --help
- get_provider(provider: str) Type[BaseProvider]
Retrieve a provider class loaded in module.
- init_app(app: Flask, **kwargs)
Initialize the BDC-Collector extension, loading supported providers dynamically.
- init_providers(entry_point: str = 'bdc_collectors.providers', **kwargs)
Load the supported providers from setup.py entry_point.
- list_providers() List[str]
Retrieve a list of supported providers.
- class bdc_collectors.ext.CollectorState
Class for holding Collector state of the extension.
- add_provider(provider_name: str, provider: Type[BaseProvider])
Add a new provider to supports.
- get_provider(provider: str) Type[BaseProvider]
Try to retrieve the data provider type.
Providers
SciHub Copernicus
Define the implementation of Sentinel Provider.
- class bdc_collectors.scihub.SciHub(*users, **kwargs)
Define a simple implementation of Sentinel api.
This module uses sentinel-sat to search and to download files from Copernicus.
TODO: Document how to download multiple files using multiple accounts.
- download(scene_id: str, output: str, **kwargs) str
Try to download data from Copernicus.
- Raises:
DownloadError when scene not found. –
DataOfflineError when scene is not available/offline. –
- download_all(scenes: List[SceneResult], output: str, **kwargs)
Download multiple scenes from Sentinel-Sat API.
- Parameters:
method. (scenes - List of scenes found by search)
directory (output - Output)
sentinel-sat. (**kwargs - Others parameters to be attached into)
- search(query, **kwargs)
Search for products on Sentinel provider.
- Parameters:
name (query - Product)
parameters (**kwargs - Optional)
- bdc_collectors.scihub.init_provider()
Register sentinel provider.
ONDA Catalogue
Defines the structures for ONDA Catalogue.
- class bdc_collectors.onda.ONDA(**kwargs)
ONDA Catalog provider.
This providers consumes the ONDA Open Catalogue.
Note
This provider requires username and password, respectively. You can create an account ONDA Registration
- download(scene_id: str, output: str, **kwargs)
Download scene from ONDA catalogue API.
- Raises:
DataOfflineError when scene is not available/offline. –
- download_all(scenes: List[SceneResult], output: str, **kwargs)
Bulk download from ONDA provider.
- Parameters:
download (scenes - List of SceneResult to)
save (output - Directory to)
max_workers (**kwargs - Optional parameters. You can also set)
default. (which is 2 by)
- search(query, **kwargs)
Search for data set in ONDA Provider.
Currently, it is not supported yet.
- bdc_collectors.onda.init_provider()
Register the ONDA provider.
CREODIAS
Defines the structures for CREODIAS API.
The CREODIAS API is an alternative to download Earth Observation datasets related with Copernicus program. It serves the Sentinel-2 and others datasets in their platform.
Basic usage:
from flask import current_app
ext = current_app.extensions['bdc_collector']
provider_klass = ext.get_provider("CREODIAS")
provider = provider_klass("user", "passwd")
provider.search(...)
- class bdc_collectors.creodias.CREODIAS(**kwargs)
CREODIAS Catalog provider.
This providers consumes the CREODIAS API.
Note
This provider requires username and password, respectively. You can create an account CREODIAS Registration.
The CREODIAS has implemented Rate Limit in their API services. The limit is 60 requests per minute, per source IP address. Make sure to do not overflow 60 requests.
- download(scene_id: str, output: str, **kwargs)
Download scene from CREODIAS API.
Note
When download is interrupted, the file is not removed. The
temporaryfile is defined by.tmpin the end of filename. Whenever a download is triggered and there is already atempfile, the module will try to resume download.- Parameters:
scene_id – The Scene Identifier to download
output – The base output directory
- Keyword Arguments:
force (bool) – Flag to re-download file (do not use cache). Defaults to
False.- Raises:
DataOfflineError – when scene is not available/offline.
- download_all(scenes: List[SceneResult], output: str, **kwargs)
Bulk download from CREODIAS provider in parallel.
- Parameters:
download (scenes - List of SceneResult to)
save (output - Directory to)
- Keyword Arguments:
max_workers (int) – Number of parallel download. Defaults to
2.collection (str) – The CREODIAS Collection name.
- Returns:
- Tuple[List[SceneResult], List[str], List[Exception]]
Returns the list of
successdownloaded,scheduledfiles and downloaderrors, respectively.
- search(query, **kwargs)
Search for data set in CREODIAS Provider.
Based in CREODIAS EO-Data-Finder API, the following products are available in catalog:
Sentinel1Sentinel2Sentinel3Sentinel5PLandsat8Landsat7Landsat5Envisat
You can also specify the processing level
processingLevelto filter which data set should retrieve. For Sentinel2, useLEVEL1Cfor L1 data,LEVEL2Aas L2, etc.- Parameters:
name (query - The collection)
- Keyword Arguments:
start_date (str|datetime) – Start date time filter
end_date (str|datetime) – End date time filter
geom (str) – Region of Interest (WKT)
bbox (Tuple[float,float,float,float]) – The bounding box values ordened as
west,south,east,north.status (str) – CREODIAS API Status for data sets. Defaults to
all.
- bdc_collectors.creodias.init_provider()
Register the CREODIAS provider.
Note
Called once by
CollectorExtension.
Google Public Data Sets
Defines the structures for Google Provider access.
- class bdc_collectors.google.Google(**kwargs)
Google provider definition.
This providers consumes the Google Public Data Sets
Currently, we support both Sentinel-2 and Landsat products.
Note
This provider requires GOOGLE_APPLICATION_CREDENTIALS to work properly. Make sure to set in terminal or pass as variable in constructor.
- download(scene_id: str, *args, **kwargs)
Download scene from Google buckets.
- search(query, *args, **kwargs)
Search for data set in Google Provider.
Currently, it is not supported yet, since requires to download large .csv to check.
TODO: Implement way to download and keep up to dated the .csv file.
- bdc_collectors.google.guess_scene_parser(scene_id)
Try to identify a parser for Scene Id.
- Raises:
RuntimeError when cant parse scene_id. –
- Parameters:
product (scene_id - Scene id)
- Returns:
A Google Data Set
- bdc_collectors.google.init_provider()
Register the provider Google.
USGS
Define the structures for USGS Earth Explorer Provider access.
- class bdc_collectors.usgs.USGS(**kwargs)
Define the USGS provider.
This providers consumes the USGS EarthExplorer catalog.
This module follows the new experimental API 1.5.
Note
Remember to call .disconnect() to avoid blocked IP’s on USGS Server.
- disconnect()
Disconnect from the USGS server.
- download(scene_id: str, *args, **kwargs)
Download Landsat product from USGS.
- get_collector(collection: str) Type[BaseCollection]
Retrieve a data definition supported by USGS provider.
- search(query, *args, **kwargs) List[SceneResult]
Search for data set in USGS catalog.
- bdc_collectors.usgs.init_provider()
Register the USGS provider.
MODIS
Define the module for dealing with NASA MODIS products.
- class bdc_collectors.modis.ModisAPI(username, password, **kwargs)
Represent an abstraction of how to iterate with NASA MODIS API.
- download(scene_id: str, *args, **kwargs) str
Download the MODIS product from NASA.
- Parameters:
identifier (scene_id - The MODIS Scene)
parameters (*args - List of optional)
download (**kwargs - Extra parameters used to)
- get_collector(collection: str) Type[BaseCollection]
Represent the structure to deal with Provider API.
- search(query, *args, **kwargs) List[SceneResult]
Search for MODIS product on NASA Catalog.
- class bdc_collectors.modis.ModisScene(scene_id: str)
Define the parser of MODIS Scene identifiers.
- level() str
Retrieve the collection level.
- processing_date()
Retrieve the scene processing date.
- satellite()
Retrieve the satellite name.
- sensing_date()
Retrieve the scene sensing date.
- source()
Retrieve the scene source name.
- tile_id()
Retrieve the Vertical Horizontal tile value.
- version()
Retrieve the Collection Version.
- bdc_collectors.modis.init_provider()
Register the NASA Modis provider.
Copernicus Dataspace EcoSystem
Define the implementation of Copernicus Dataspace Program.
This module is a new version of bdc_collectors.scihub.SciHub (deprecated) to consume and download Sentinel products.
- class bdc_collectors.dataspace.DataspaceProvider(username: str, password: str, strategy: BaseProvider | None = None, **kwargs)
Represent the Driver for Copernicus Dataspace program.
This module supports the following API provider using strategies: - ODATA - STAC
By default, the ODATAStrategy is used to search for Sentinel Data. For Authorization and Token Authentication, as defined in Access Token, an
access_tokenis required to download data. By default, this module stores these tokens inbdc_collectors.dataspace._token.TokenManager. Whenever a download is initiated bybdc_collectors.dataspace.DataspaceProvider.download(), the bdc-collectors creates two (2) access tokens in memory and then use it to download as many scenes as can. When the token expires, it automatically refresh a new token.Examples
The following example consists in a minimal download scenes from Dataspace program using ODATA API
>>> from bdc_collectors.dataspace import DataspaceProvider >>> provider = DataspaceProvider(username='user@email.com', password='passwd') >>> entries = provider.search('SENTINEL-2', bbox=(-54, -12, -50, -10), product="S2MSI2A") >>> for entry in entries: ... provider.download(entry, output='sentinel-2')
You may change the API backend with command:
>>> from bdc_collectors.dataspace.stac import StacStrategy >>> stac = StacStrategy() >>> provider = DataspaceProvider(username='user@email.com', password='passwd', strategy=stac) >>> # or change directly in API >>> provider.strategy = stac
- download(query: SceneResult | str, output: str, *args, **kwargs) str
Download the specified item from API provider.
- download_all(scenes: List[SceneResult], output: str, **kwargs) Tuple[List[str], List[str], List[str]]
Download multiple scenes from remote Copernicus Dataspace program in bulk-mode.
- search(query, *args, **kwargs) List[SceneResult]
Search for data products in Copernicus Dataspace program.
- bdc_collectors.dataspace.init_provider()
Register the Copernicus Dataspace provider.
Note
Called once by
CollectorExtension.
- bdc_collectors.dataspace.is_valid_zip(filepath: str) bool
Check the consistency of Zip file.
Dataspace API Implementations
Define the implementation of ODATA for provider Copernicus Dataspace Program.
- class bdc_collectors.dataspace.odata.ODATAStrategy(odata_api_url: str = 'https://catalogue.dataspace.copernicus.eu/odata', odata_api_max_records: int = 12000, odata_api_limit: int = 500, **kwargs)
Represent the implementation of Copernicus Dataspace program API using ODATA (Open Data Protocol).
- search(query, *args, **kwargs) List[SceneResult]
Search for data products in Copernicus Dataspace program.
Define the implementation of STAC for provider Copernicus Dataspace Program.
- class bdc_collectors.dataspace.stac.StacStrategy(stac_url: str = 'https://catalogue.dataspace.copernicus.eu/stac', **kwargs)
Represent the implementation of Copernicus Dataspace program API using STAC (SpatioTemporal Asset Catalog).
Note
According Copernicus Dataspace docs, this method is not fully supported by STAC clients. We recommend to use
bdc_collectors.dataspace.odata.ODATAStrategyinstead. See more in STAC Product Catalog.- collections_supported()
Retrieve the list of supported collections by STAC.
- search(query, *args, **kwargs) List[SceneResult]
Search for data products in Copernicus Dataspace program.
Describe Abstraction for Sentinel Data Space EcoSystem on Copernicus.
- class bdc_collectors.dataspace._token.TokenManager(username: str, password: str, token_lock_name='dataspace-tokens', token_cache: str | Cache | None = None, token_limit: int = 2, **kwargs)
Global user client for Sentinel Accounts.
This class stores the access tokens in memory using a cache method: Redis or Python Dict. Whenever a token is about to expire, this class automatically asks for a new token in DataSpace authorization server.
Examples
Use the TokenManager as following to generate a new token:
>>> from bdc_collectors.dataspace._token import TokenManager >>> manager = TokenManager("username", "password") >>> token = manager.get_token() >>> # Use the token to download anything during the next 10 minutes >>> another = manager.get_token()
You can also use Redis Backend for token management. (Make sure you have the library ‘redis’ installed and server up and running.)
>>> from bdc_collectors.dataspace._cache import RedisStrategy >>> from bdc_collectors.dataspace._token import TokenManager >>> manager = TokenManager("username", "password", token_cache=RedisStrategy()) >>> token = manager.get_token()
- get_token()
Try to get available user to download.
- property tokens: List[AccessToken]
Retrieve all users from disk.
- use()
Try to lock an atomic user.
Define a minimal cache strategy for Dataspace metadata.
This file contains the following strategies:
- bdc_collectors.dataspace._cache.RedisStrategy
- bdc_collectors.dataspace._cache.RawDictStrategy
- class bdc_collectors.dataspace._cache.Cache
Simple abstraction of Cache handler.
- get(keys)
Retrieve the cache information.
- Parameters:
keys (str)
- Returns:
(str) Cache values
- lock(key: str, **kwargs)
Retrieve a lock for dealing with cache.
- store(key, value, **properties)
Store the value into cache.
- Parameters:
key (str) – Cache key
value (str) – Cache value
**properties – Extra properties to cache handler
- class bdc_collectors.dataspace._cache.CacheService(strategy)
Base cache service.
Handle the cache implementations to isolates the cache abstraction through libraries.
- add(key, value, duration=None)
Store the value into cache handler.
- Parameters:
key (str) – Cache key
value (str) – Cache value
duration (int) – Time expiration (ms)
- exists(key: str) bool
Check if the key is stored in cache.
- get(key)
Retrieve the cache information.
- Parameters:
key (str) – Cache key
- Returns:
str Cache information value
- lock(key: str, **kwargs)
Try to get a lock from the cache system.
- class bdc_collectors.dataspace._cache.RawDictStrategy(**kwargs)
Simple implementation of cache as strategy using Python dictionaries.
- exists(key)
Check if a key is cached.
- get(key)
Retrieve the cache information from Redis handler.
- lock(key: str, **kwargs)
Retrieve a Redis Lock.
- store(key, value, **properties)
Store the cache values into Redis handler.
- class bdc_collectors.dataspace._cache.RedisStrategy(redis_url=None)
Simple implementation of Redis cache as strategy.
- exists(key)
Check if a key is cached.
- get(key)
Retrieve the cache information from Redis handler.
- lock(key: str, **kwargs)
Retrieve a Redis Lock.
- store(key, value, **properties)
Store the cache values into Redis handler.
Exceptions
Define the common exceptions for Data Download.
- exception bdc_collectors.exceptions.DataOfflineError(scene_id)
Indicate that the scene_id is not available (Offline).
Frequently used by Sentinel
SciHuborDataspaceProvider.
- exception bdc_collectors.exceptions.DownloadError(message)
Generic error for Download.