scitacean.File#

class scitacean.File(local_path, remote_path, remote_gid, remote_perm, remote_uid, checksum_algorithm=None, _remote_size=None, _remote_creation_time=None, _remote_checksum=None, _checksum_cache=None)[source]#

Store local and remote paths and metadata for a file.

There are two central properties:

  • remote_path: Path to the remote file relative to the dataset’s source_folder. This is always set, even if the file does not exist on the remote filesystem.

  • local_path: Path to the file on the local filesystem. Is None if the file does not exist locally.

Files can be in one of three states and the state can be changed as shown below. The state can be queried using File.is_on_local() and File.is_on_remote().

local                                  remote
  │                                      │
  │ uploaded                  downloaded │
  │                                      │
  └───────────> local+remote <───────────┘

Constructors

from_local(path, *[, remote_path, ...])

Construct a File object for a file on the local filesystem.

from_remote(remote_path, size, creation_time)

Construct a new file object for a remote file.

from_download_model(model, *[, ...])

Construct a new file object from a SciCat download model.

Methods

checksum()

Return the checksum of the file.

downloaded(*, local_path)

Return new file metadata after a download.

local_is_up_to_date()

Check if the file on local is up-to-date.

make_model(*[, for_archive])

Build a pydantic model for this file.

remote_access_path(source_folder)

Full path to the file on the remote if it exists.

uploaded(*[, remote_path, remote_uid, ...])

Return new file metadata after an upload.

validate_after_download()

Check that the file on disk matches the metadata.

Attributes

checksum_algorithm

Algorithm to use for checksums.

creation_time

The logical creation time of the SciCat file.

is_on_local

True if the file is on local.

is_on_remote

True if the file is on remote.

size

The size in bytes of the file.

local_path

Path to the file on the local filesystem.

remote_path

Path to the file on the remote filesystem.

remote_gid

Unix group ID on remote.

remote_perm

Unix file mode on remote.

remote_uid

Unix user ID on remote.

checksum()[source]#

Return the checksum of the file.

This can take a long time to compute for large files.

If the file exists on local, return the current checksum of the local file. Otherwise, return the stored checksum in the catalogue.

Returns:

str | None – The checksum of the file.

checksum_algorithm: str | None = None#

Algorithm to use for checksums.

property creation_time: datetime#

The logical creation time of the SciCat file.

If the file exists on local, return the time the local file was last modified. Otherwise, return the stored time in the catalogue.

downloaded(*, local_path)[source]#

Return new file metadata after a download.

Assumes that the input file exists on remote. The returned object is on both local and remote.

Parameters:

local_path (str | Path) – New local path.

Returns:

File – A new file object.

classmethod from_download_model(model, *, checksum_algorithm=None, local_path=None)[source]#

Construct a new file object from a SciCat download model.

Parameters:
  • model (DownloadDataFile) – Pydantic model for the file.

  • checksum_algorithm (str | None, default: None) – Algorithm to use to compute the checksum of the file.

  • local_path (str | Path | None, default: None) – Value for the local path.

Returns:

File – A new file object.

classmethod from_local(path, *, remote_path=None, remote_uid=None, remote_gid=None, remote_perm=None)[source]#

Construct a File object for a file on the local filesystem.

The returned object references a file that exists locally but not on the remote. However, it does contain a remote_path which is constructed from the provided local path or from the provided remote_path.

Parameters:
  • path (str | Path) – Full path of the local file.

  • remote_path (str | RemotePath | None, default: None) – Path on the remote, relative to the source_folder of a dataset. By default, it is constructed as path.name.

  • remote_uid (str | None, default: None) – User ID on the remote. Will be determined automatically on upload.

  • remote_gid (str | None, default: None) – Group ID on the remote. Will be determined automatically on upload.

  • remote_perm (str | None, default: None) – File permissions on the remote. Will be determined automatically on upload.

Returns:

File – A new file object.

classmethod from_remote(remote_path, size, creation_time, checksum=None, checksum_algorithm=None, remote_uid=None, remote_gid=None, remote_perm=None)[source]#

Construct a new file object for a remote file.

The local path of the returned File is None.

Parameters:
  • remote_path (str | RemotePath) – Path the remote file relative to the dataset’s source folder.

  • size (int) – Size in bytes on the remote filesystem.

  • creation_time (datetime | str) – Date and time the file was created on the remote filesystem. If a str, it is parsed using dateutil.parser.parse.

  • checksum (str | None, default: None) – Checksum of the file.

  • checksum_algorithm (str | None, default: None) – Algorithm used to compute the given checksum. Must be passed when checksum is not None.

  • remote_uid (str | None, default: None) – User ID on the remote.

  • remote_gid (str | None, default: None) – Group ID on the remote.

  • remote_perm (str | None, default: None) – File permissions on the remote.

Returns:

File – A new file object.

Added in version 23.10.0.

property is_on_local: bool#

True if the file is on local.

property is_on_remote: bool#

True if the file is on remote.

local_is_up_to_date()[source]#

Check if the file on local is up-to-date.

Returns:

bool – True if the file exists on local and its checksum matches the stored checksum for the remote file.

local_path: Path | None#

Path to the file on the local filesystem.

make_model(*, for_archive=False)[source]#

Build a pydantic model for this file.

Parameters:

for_archive (bool, default: False) – Select whether the file is stored in an archive or on regular disk, that is whether it belongs to a Datablock or an OrigDatablock.

Returns:

UploadDataFile – A new pydantic model.

remote_access_path(source_folder)[source]#

Full path to the file on the remote if it exists.

Return type:

RemotePath | None

remote_gid: str | None#

Unix group ID on remote.

remote_path: RemotePath#

Path to the file on the remote filesystem.

remote_perm: str | None#

Unix file mode on remote.

remote_uid: str | None#

Unix user ID on remote.

property size: int#

The size in bytes of the file.

If the file exists on local, return the current size of the local file. Otherwise, return the stored size in the catalogue.

uploaded(*, remote_path=None, remote_uid=None, remote_gid=None, remote_perm=None, remote_creation_time=None, remote_size=None)[source]#

Return new file metadata after an upload.

Assumes that the input file exists on local. The returned object is on both local and remote.

Parameters:
  • remote_path (str | RemotePath | None, default: None) – New remote path.

  • remote_uid (str | None, default: None) – New user ID on remote, overwrites any current value.

  • remote_gid (str | None, default: None) – New group ID on remote, overwrites any current value.

  • remote_perm (str | None, default: None) – New unix permissions on remote, overwrites any current value.

  • remote_creation_time (datetime | None, default: None) – Time the file became available on remote. Defaults to the current time in UTC.

  • remote_size (int | None, default: None) – File size on remote.

Returns:

File – A new file object.

validate_after_download()[source]#

Check that the file on disk matches the metadata.

Compares file size and, if possible, its checksum. Raises on failure. If the function returns without exception, the file is valid.

Raises:

IntegrityError – If a check fails.

Return type:

None