scitacean.File#
- class scitacean.File(local_path, remote_path, remote_gid, remote_perm, remote_uid, checksum_algorithm=None, _remote_size=None, _remote_creation_time=None, _remote_checksum=None, _checksum_cache=None)[source]#
Store local and remote paths and metadata for a file.
There are two central properties:
remote_path
: Path to the remote file relative to the dataset’ssource_folder
. This is always set, even if the file does not exist on the remote filesystem.local_path
: Path to the file on the local filesystem. IsNone
if the file does not exist locally.
Files can be in one of three states and the state can be changed as shown below. The state can be queried using
File.is_on_local()
andFile.is_on_remote()
.local remote │ │ │ uploaded downloaded │ │ │ └───────────> local+remote <───────────┘
Constructors
from_local
(path, *[, remote_path, ...])Construct a File object for a file on the local filesystem.
from_remote
(remote_path, size, creation_time)Construct a new file object for a remote file.
from_download_model
(model, *[, ...])Construct a new file object from a SciCat download model.
Methods
checksum
()Return the checksum of the file.
downloaded
(*, local_path)Return new file metadata after a download.
Check if the file on local is up-to-date.
make_model
(*[, for_archive])Build a pydantic model for this file.
remote_access_path
(source_folder)Full path to the file on the remote if it exists.
uploaded
(*[, remote_path, remote_uid, ...])Return new file metadata after an upload.
Check that the file on disk matches the metadata.
Attributes
Algorithm to use for checksums.
The logical creation time of the SciCat file.
True if the file is on local.
True if the file is on remote.
The size in bytes of the file.
Path to the file on the local filesystem.
Path to the file on the remote filesystem.
Unix group ID on remote.
Unix file mode on remote.
Unix user ID on remote.
- checksum()[source]#
Return the checksum of the file.
This can take a long time to compute for large files.
If the file exists on local, return the current checksum of the local file. Otherwise, return the stored checksum in the catalogue.
- property creation_time: datetime#
The logical creation time of the SciCat file.
If the file exists on local, return the time the local file was last modified. Otherwise, return the stored time in the catalogue.
- downloaded(*, local_path)[source]#
Return new file metadata after a download.
Assumes that the input file exists on remote. The returned object is on both local and remote.
- classmethod from_download_model(model, *, checksum_algorithm=None, local_path=None)[source]#
Construct a new file object from a SciCat download model.
- classmethod from_local(path, *, remote_path=None, remote_uid=None, remote_gid=None, remote_perm=None)[source]#
Construct a File object for a file on the local filesystem.
The returned object references a file that exists locally but not on the remote. However, it does contain a
remote_path
which is constructed from the provided local path or from the providedremote_path
.- Parameters:
remote_path (
str
|RemotePath
|None
, default:None
) – Path on the remote, relative to thesource_folder
of a dataset. By default, it is constructed aspath.name
.remote_uid (
str
|None
, default:None
) – User ID on the remote. Will be determined automatically on upload.remote_gid (
str
|None
, default:None
) – Group ID on the remote. Will be determined automatically on upload.remote_perm (
str
|None
, default:None
) – File permissions on the remote. Will be determined automatically on upload.
- Returns:
File
– A new file object.
- classmethod from_remote(remote_path, size, creation_time, checksum=None, checksum_algorithm=None, remote_uid=None, remote_gid=None, remote_perm=None)[source]#
Construct a new file object for a remote file.
The local path of the returned
File
isNone
.- Parameters:
remote_path (
str
|RemotePath
) – Path the remote file relative to the dataset’s source folder.size (
int
) – Size in bytes on the remote filesystem.creation_time (
datetime
|str
) – Date and time the file was created on the remote filesystem. If astr
, it is parsed usingdateutil.parser.parse
.checksum (
str
|None
, default:None
) – Checksum of the file.checksum_algorithm (
str
|None
, default:None
) – Algorithm used to compute the given checksum. Must be passed whenchecksum is not None
.remote_uid (
str
|None
, default:None
) – User ID on the remote.remote_gid (
str
|None
, default:None
) – Group ID on the remote.remote_perm (
str
|None
, default:None
) – File permissions on the remote.
- Returns:
File
– A new file object.
Added in version 23.10.0.
- local_is_up_to_date()[source]#
Check if the file on local is up-to-date.
- Returns:
bool
– True if the file exists on local and its checksum matches the stored checksum for the remote file.
- make_model(*, for_archive=False)[source]#
Build a pydantic model for this file.
- Parameters:
for_archive (
bool
, default:False
) – Select whether the file is stored in an archive or on regular disk, that is whether it belongs to a Datablock or an OrigDatablock.- Returns:
UploadDataFile
– A new pydantic model.
- remote_access_path(source_folder)[source]#
Full path to the file on the remote if it exists.
- Return type:
-
remote_path:
RemotePath
# Path to the file on the remote filesystem.
- property size: int#
The size in bytes of the file.
If the file exists on local, return the current size of the local file. Otherwise, return the stored size in the catalogue.
- uploaded(*, remote_path=None, remote_uid=None, remote_gid=None, remote_perm=None, remote_creation_time=None, remote_size=None)[source]#
Return new file metadata after an upload.
Assumes that the input file exists on local. The returned object is on both local and remote.
- Parameters:
remote_path (
str
|RemotePath
|None
, default:None
) – New remote path.remote_uid (
str
|None
, default:None
) – New user ID on remote, overwrites any current value.remote_gid (
str
|None
, default:None
) – New group ID on remote, overwrites any current value.remote_perm (
str
|None
, default:None
) – New unix permissions on remote, overwrites any current value.remote_creation_time (
datetime
|None
, default:None
) – Time the file became available on remote. Defaults to the current time in UTC.remote_size (
int
|None
, default:None
) – File size on remote.
- Returns:
File
– A new file object.
- validate_after_download()[source]#
Check that the file on disk matches the metadata.
Compares file size and, if possible, its checksum. Raises on failure. If the function returns without exception, the file is valid.
- Raises:
IntegrityError – If a check fails.
- Return type: