How to work properly

Docker

What is a container?

A container is process on your machine that is isolated from all other processes. This isolation stands on kernels namespaces which is a feature that has been in Linux for a long time. Docker makes it easy to use. A container is:

  • A runnable instance of an image
  • Can be run on local machines, virtual machines or deployed on the cloud
  • Is portable (can be run on any OS)
  • Is isolated from other containers and runs its own software, binaries, and configurations

What is container’s image?

When running a container, it uses an isolated file system. This custome filesystem is provided by a container image (contains everything to run an application : dependencies, configurations, scripts, binaries, environment variables, default command to run, metadata)

How to containerize an application?

We need to build the container image of an application using Dockerfile. A Docker file is simply a text-based file with no file extension. A Dockerfile contains a script of instruction that Docker uses to create a container image.

  1. In the app directory, create a file named DockerFile
  2. Edit the DockerFile
    1. FROM
    2. WORKDIR
    3. COPY
    4. RUN
    5. CMD
    6. EXPOSE
  3. Build the container image with the following command
  4. docker build -t getting-started .

Start an app container

Now that  we have an image, we can run the application in a container using the docker run command.

docker run -dp 3000:3000 getting-started
-d to run the container in detach mode

In Docker Desktop, go to containers tab to see a list of your containers.

One major issue that we have now, is that the data is not persistent. When our container is deleted or stopped the data disappear. In order to solve this issue, we need to create a container Volume and mount it when we launch the application.

Note : Docker Desktop is a paying version, Docker on Linux is free.

How to work properly

Use of a Virtual environment

  • Create a folder to work in
  • Create a python environment with the following command:
  • Activate environment with

     

Use Git and GitLab or Github

  • Git init your App
  • Make your first commit
  • I would recommand to use the git extension in Visual Code Studio

Use of Pyenv

  • Pyenv allows to manager version of python

Structure of projects

App:

Make the difference with Visual Studio

  • Explorer : folder and file structures
  • Search
  • Source Control : Git Management, make sure to display the changes in for of a tree, and get ready to make your first commit
  • Install plugins :
    • Dataiku DSS
    • Docker
    • Python
    • Pylance
    • Powershell
  • Testing: define your testing strategies
  • Powershell : Quick access to windows powershell
  • Dataiku : connect to your dss project via API key
  • Docker : manage images and containers
  • Connect git to remote location (Gitlab, Github, etc…)

Technical Stack choices

  • Databases : Neo4j (Graph), MySQL, Amazon RDS
  • Frontend : Reactjs + Vuejs (Typescript)
  • Backend : Flask API, Django (python)
  • Versioning + CI/CD : Gitlab, Github
  • Process manager : Supervisor or Circus (python)
  • Data Update : Airflow
  • Cloud server : AWS EC2, S3, Docker
  • API Key, User And Password as Environment Variables on the server side (you can use a .env file or .secrets file)
  • Authentication : Auth0
  • Payment System : Stripe (Publishable Key on the front, Secret Key on the Flask back)
    • npm install @stripe/react-stripe-js @stripe/stripe-js
    • pip install stripe
    • create payments database in MYSQL and write information in it using Flask api
  • Analytics : Google analytics

Project structure

Graph Neptune

Graph transversal : goes through a graph to retrieve interesting data regarding a specific query.

Amazon Neptune is a fully managed graph database service. The core is a graph database engine.

Neptune supports : Apache Tinkerpop Gremlins, Neo4J openCypher

  1. Create a Neptune DB Cluster
  2. Connecting to Neptune graph : Snowflake and Neptune are in 2 different VPC
  3. Use curl and ssL certificate to communicate with Neptune endpoint
  4. Use Neptune Bulk Loader to ingest large amounts of data
  5. Use Graph query langage to query data
  6. Use vizualisation tools : Graph-explorer, Tom Sawyer, Cambridge Intelligence, Graphistry, … or custom app
  7. Export data using curl
  8. Use Amazon lambda to retrieve data or query the graph on a regular basis
  9. Neptune ML setup and SageMaker : Neptune supports both transductive inference, which returns predictions that were pre-computed at the time of training, based on your graph data at that time, and inductive inference, which returns applies data processing and model evaluation in real time, based on current data. See The difference between inductive and transductive inference.

 

Neptune ML can train machine learning models to support five different categories of inference:

  • Node classification
  • Node regression
  • Edge classification
  • Edge regression
  • Link prediction

Max size : 768GB

Neo4j

Neo4j graph can be pulled using docker image. Loading data into the graph is quite easy, and only needs to follow a naming framework for nodes and relationships csv files.

To start a neo4j project from scratch we need :

docker-compose.yml


  •  

neo4j

Makefile

 

Import data with when you are in the container (use docker exec -it command)


  •  

Run a docker image

Entrer dans un conteneur docker

Stop a docker image / delete a docker image

Pull an existing image

Display all active containers

Display all existing images

System cleaning

  • Open folder in Visual Studio
  • Create git
  • Create virtual environment with venv
  • Activate the environment by running Activate.ps1 in Scripts folder
  • Create a requirements.txt
  • Gitignore venv
  • Start creating your files
  • Code and debug
  • Containerize?
  • Deliver to your server (CI/CD)

Flask is running at port 5000

React is running at port 3000

Neo4j is running on port 7687 and is accessible on port 7474 via the browser

Gestion de Git

  1. Git clone (ou git pull plus tard) : (git pull origin master)
  2. Git checkout -b (créer une branche)
  3. Git checkout « nom de branche » : change de branche
  4. Code sur la branche
  5. Git add. (ajouter les fichiers aux commits)
  6. Git status
  7. Git commit -m “maj function “
  8. Git push -u origin « nom de la branche » (origin est le nom du remote)
  9. Proposition de merge, « create merge request »
  10. Git pull

Pour tester, il vaut mieux récupérer une branche avant de merger.

Edit the .gitignore file

.gitignore : key, libraries, data