# Testing code with Scitacean

Testing programs that use Scitacean can be tricky as those tests might require access to a SciCat server or fileserver.
Scitacean provides two way to help with this, tools for deploying servers on the local machine as well as fakes to perform tests without any actual servers. This guide describes both methods.

Firstly, faking is implemented by [FakeClient](../generated/modules/scitacean.testing.client.FakeClient.rst) and [FakeFileTransfer](../generated/modules/scitacean.testing.transfer.FakeFileTransfer.rst) .
Those two classes follow the same separation of concerns as the real classes.
That is `FakeClient` handles metadata and `FakeFileTransfer` handles files.
They can be mixed and matched freely with the real client and file transfers.
But it is generally recommended to combine them.

Secondly, SciCat servers and fileservers are managed by the [scicat_backend](../generated/modules/scitacean.testing.backend.fixtures.scicat_backend.rst) and [sftp_fileserver](../generated/modules/scitacean.testing.sftp.fixtures.sftp_fileserver.rst) pytest fixtures.

First, create a test dataset and file.

In [None]:
from scitacean import Dataset

dataset = Dataset(
    type="raw",
    name="Important data",
    owner_group="faculty",
    owner="ridcully",
    principal_investigator="Ridcully",
    contact_email="ridcully@uu.am",
    data_format="spellbook-9000",
    source_folder="/upload/abcd",
    creation_location="UnseenUniversity",
)

In [None]:
from pathlib import Path

path = Path("test-data/spellbook.txt")
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("w") as f:
    f.write("fireball power=1000 mana=123")

In [None]:
dataset.add_local_files("test-data/spellbook.txt")

## FakeClient

[scitacean.testing.client.FakeClient](../generated/modules/scitacean.testing.client.FakeClient.rst) has the same interface as the regular [Client](../generated/classes/scitacean.Client.rst) but never connects to any SciCat server.
Instead, it maintains an internal record of datasets and datablocks.
It is easiest to explain with an example.
First, create a `FakeClient`.
The url is completely arbitrary and only needs to be passed for parity with the real client.

In [None]:
from scitacean.testing.client import FakeClient
from scitacean.testing.transfer import FakeFileTransfer

client = FakeClient.without_login(
    url="https://fake.scicat",
    file_transfer=FakeFileTransfer())

### Upload

And now we can upload our test dataset as usual:

In [None]:
finalized = client.upload_new_dataset_now(dataset)
str(finalized)

However, this did not talk to a SciCat server.
We can check if the fake upload was successful by inspecting the `client`.
`client.datasets` is a `dict` that contains all datasets known to the fake server keyed by PID:

In [None]:
client.datasets.keys()

In [None]:
pid = list(client.datasets.keys())[0]
client.datasets[pid]

The client has recorded the upload from earlier.
However, it stored the dataset as a [model](../generated/modules/scitacean.model.rst), not as a regular `Dataset` object.
In addition, since the dataset has a file, an original datablock was uploaded as well: (Datablocks store metadata and paths of files in SciCat.)

In [None]:
client.orig_datablocks.keys()

In [None]:
# use the pid of the dataset
client.orig_datablocks[pid]

When writing tests, those recorded dataset and datablock models can be used to check if an upload worked.

### Download

`FakeClient` can also download datasets that are stored in its `datasets` dictionary:

In [None]:
downloaded = client.get_dataset(pid)
str(downloaded)

This is now an actual `Dataset` object like you would get from a real client.

If we want to test downloads independently of uploads, we can populate `client.datasets` and `cliend.orig_datablocks` manually.
But keep in mind that those store *models*. See the [model reference](../generated/modules/scitacean.model.rst) for an overview.
And also note that `orig_datablocks` stores a list of models for each dataset as there can be multiple datablocks per dataset.

### Fidelity

Although `FakeClient` is sufficient for many tests, it does not behave exactly the same way as a real client.
For example, it does not perform any validation of datasets or handle credentials.
In addition, it does not modify uploaded datasets like a real server would.
This can be seen from both the `finalized` dataset returned by `client.upload_new_dataset_now(dataset)` above and `downloaded`.

If a test requires these properties, consider using a locally deployed SciCat server.
See in particular the [developer documentation on testing](../developer/testing.rst).

## FakeFileTransfer

`FakeClient` used above only fakes a SciCat server, i.e. handling of metadata.
If we also want to test file uploads and downloads, we can use [scitacean.testing.transfer.FakeFileTransfer](../generated/modules/scitacean.testing.transfer.FakeFileTransfer.rst).

Starting from a clean slate, create a fake client with a fake file transfer as above:

In [None]:
from scitacean.testing.client import FakeClient
from scitacean.testing.transfer import FakeFileTransfer

client = FakeClient.without_login(
    url="https://fake.scicat",
    file_transfer=FakeFileTransfer())

And upload a dataset:

In [None]:
finalized = client.upload_new_dataset_now(dataset)

The file transfer has recorded the upload of the file without actually uploading it anywhere.
We can inspect all files on the fake fileserver using:

In [None]:
client.file_transfer.files

This is a dictionary keyed by [remote_access_path](../generated/classes/scitacean.File.rst#scitacean.File.remote_access_path) to the content of the file.

We can also download the file.

In [None]:
downloaded = client.get_dataset(finalized.pid)
with_downloaded_file = client.download_files(downloaded, target="test-data/download")

In [None]:
file = list(with_downloaded_file.files)[0]
file

In [None]:
with file.local_path.open() as f:
    print(f.read())

If we want to test downloads independently of uploads, we can populate `client.file_transfer.files` manually.

## Local SciCat server

[scitacean.testing.backend](../generated/modules/scitacean.testing.backend.rst) provides tools to set up a SciCat backend and API in a Docker container on the local machine.
It is primarily intended to be used via the [pytest](https://docs.pytest.org/) fixtures in [scitacean.testing.backend.fixtures](../generated/modules/scitacean.testing.backend.fixtures.rst).

The fixtures can configure, spin up, and seed a SciCat server and database in Docker containers.
They can furthermore provide easy access to the server by building clients.
And they clean up after the test session by stopping the Docker containers.

Note the caveats in [scitacean.testing.backend](../generated/modules/scitacean.testing.backend.rst) about clean up and use of `pytest-xdist`.

### Set up

First, ensure that [Docker](https://www.docker.com/) is installed and running on your machine.
Then, configure pytest by

- registering the fixtures and
- adding a command line option to enable backend tests.

To this end, add the following in your `conftest.py`:

In [None]:
import pytest
from scitacean.testing.backend import add_pytest_options as add_backend_options


pytest_plugins = (
    "scitacean.testing.backend.fixtures",
)

def pytest_addoption(parser: pytest.Parser) -> None:
    add_backend_options(parser)

The backend will only be launched when the corresponding command line option is given.
By default, this is `--backend-tests` but it can be changed via the `option` argument of `add_pytest_options`.

### Use SciCat in tests

Tests that require the server can now request it as a fixture:

In [None]:
def test_something_with_scicat(require_scicat_backend):
    # test something
    ...

The `require_scicat_backend` fixture will ensure that the backend is running during the test.
If backend tests have not been enabled by the command line option, the test will be skipped.

The simplest way to connect to the server is to request the `client` or `real_client` fixture:

In [None]:
def test_something_with_scicat_client(client):
    # test something
    ...

The `client` fixture provides both a client connected to the SciCat server and a fake client.
(Both without a file transfer).
The test will run two times, once with each client if backend tests are enabled.
If they are disabled, the test will only run with a fake client.

If your test does not work with a fake client, you can request `real_client` instead of `client` to only get the real client.
Make sure to also request `require_scicat_backend` in this case to skip the test if backend tests are disabled.
Or skip them explicitly:

In [None]:
def test_something_with_real_client(real_client):
    if real_client is None:
        pytest.skip("Backend tests disabled")
        # or do something else

    # do the actual tests

### Seed data

The database used by the local SciCat server is seeded with a number of datasets from [scitacean.testing.backend.seed](../generated/modules/scitacean.testing.backend.seed.rst).
These datasets are accessible via both real and fake clients.

To access the seed, use for example:

In [None]:
from scitacean.testing.backend import seed

def test_download_raw(client):
    dset = seed.INITIAL_DATASETS["raw"]
    downloaded = client.get_dataset(dset.pid)
    assert downloaded.owner == dset.owner

Both clients, i.e., also the fake client, require that the database has been seeded, even when backend tests are disabled.
You can ensure this by requesting either `scicat_backend` or `require_scicat_backend` along `fake_client` in your test.
To write a test that uses only a fake client but with seed, use

In [None]:
def test_seeded_fake(fake_client, scicat_backend):
    dset = seed.INITIAL_DATASETS["raw"]
    downloaded = fake_client.get_dataset(dset.pid)
    assert downloaded.owner == dset.owner

This will run the test both when backend tests are enabled and disabled.
In the latter case, the server is never launched and `fake_client` is seeded in a different way.
This different way of seeding corresponds to how [scitacean.testing.client.FakeClient](../generated/modules/scitacean.testing.client.FakeClient.rst) processes uploaded files.
So it may not be entirely the same as with a real backend.
See in particular the [Fidelity](#Fidelity) section

## Local SFTP fileserver

[scitacean.testing.sftp](../generated/modules/scitacean.testing.sftp.rst) provides tools to set up an SFTP server in a Docker container on the local machine.
It is primarily intended to be used via the [pytest](https://docs.pytest.org/) fixtures in [scitacean.testing.sftp.fixtures](../generated/modules/scitacean.testing.sftp.fixtures.rst).

The fixtures can configure, spin up, and seed an SFTP server in a Docker container.
They also clean up after the test session by stopping the Docker container.
(Scritly speaking, the server is an SSH server but all users except root are restricted to SFTP.)

Note the caveats in [scitacean.testing.sftp](../generated/modules/scitacean.testing.sftp.rst) about clean up and use of `pytest-xdist`.

### Set up

First, ensure that [Docker](https://www.docker.com/) is installed and running on your machine.
Then, configure pytest by

- registering the fixtures and
- adding a command line option to enable sftp tests.

To this end, add the following in your `conftest.py`: (Or merge it into the setup for backend tests from above.)

In [None]:
import pytest
from scitacean.testing.sftp import add_pytest_option as add_sftp_option


pytest_plugins = (
    "scitacean.testing.sftp.fixtures",
)

def pytest_addoption(parser: pytest.Parser) -> None:
    add_sftp_option(parser)

The SFTP server will only be launched when the corresponding command line option is given.
By default, this is `--sftp-tests` but it can be changed via the `option` argument of `add_pytest_option`.

### Use SFTP in tests

Tests that require the server can now request it as a fixture:

In [None]:
def test_something_with_sftp(require_sftp_fileserver):
    # test something
    ...

The `require_sftp_fileserver` fixture will ensure that the SFTP server is running during the test.
If SFTP tests have not been enabled by the command line option, the test will be skipped.

Connecting to the server is not as straight forward as for the SciCat backend.
It requires passing a special `connect` function to the file transfer.
This can be done by requesting `sftp_connect_with_username_password`.
For example, the following opens a connection to the server to upload a file:

In [None]:
from scitacean.transfer.sftp import SFTPFileTransfer

def test_sftp_upload(
    sftp_access,
    sftp_connect_with_username_password,
    require_sftp_fileserver,
    sftp_data_dir,
):
    sftp = SFTPFileTransfer(host=sftp_access.host,
                            port=sftp_access.port,
                            connect=sftp_connect_with_username_password)
    ds = Dataset(...)
    with sftp.connect_for_upload(dataset=ds) as connection:
        # do upload
        ...
    # assert that the file has been copied to sftp_data_dir
    ...

Uploaded files are readable on the host.
So the test can read from `sftp_data_dir` to check if the upload succeeded.
This directory is mounted as `/data` on the server.

Using an SFTP file transfer with `Client` requires some extra steps.
An example is given by `test_client_with_sftp` in https://github.com/SciCatProject/scitacean/blob/main/tests/transfer/sftp_test.py.
It uses a subclass of `SFTPFileTransfer` to pass `sftp_connect_with_username_password` to the connection as `Client` cannot do this itself.

### Seed data

The server's filesystem gets seeded with some files from https://github.com/SciCatProject/scitacean/tree/main/src/scitacean/testing/sftp/sftp_server_seed.
Those files are copied to `sftp_data_dir` on the host which is mounted to `/data/seed` on the server.

In [None]:
# This cell is hidden.
# It should remove *only* files and directories created by this notebook.
import shutil
shutil.rmtree("test-data", ignore_errors=True)